Summary
Join PayPay as an SRE and contribute to the stability and scalability of our rapidly growing systems. You will analyze current technologies, develop monitoring tools, and ensure system reliability. Responsibilities include troubleshooting production issues, implementing solutions to improve system performance, and collaborating with a cross-functional team. The ideal candidate possesses 5+ years of software development experience, expertise in Kubernetes and AWS, and strong communication skills. We offer a hybrid workstyle, flexible hours, comprehensive benefits including social insurance, 401K, and paid time off, and opportunities for professional growth within a dynamic FinTech environment.
Requirements
- Experience troubleshooting, tuning high performance microservice architectures running on Kubernetes and AWS in highly available production environments
- 5+ years experience in software development in Python, Java, Go, etc with strong fundamentals in data structures, algorithms, problem solving and complexity analysis
- Curious and proactive in finding performance bottlenecks, scalability and resilience problem areas and addressing them
- Experience with observability tools and gathering data
- Database knowledge such as RDS, NoSQL, distributed TiDB, etc
- Excellent communication skills, collaborative and getting things done attitude
- Enjoy taking up a challenge and driving it to conclusion
Responsibilities
- Analyze current technologies used in the company and develop monitoring and notification tools to improve observability and visibility
- Ensure system stability by pre-emptively verifying failure scenarios and implement solutions to reduce MTTR
- Develop solutions to improve system performance with a focus on high availability, scalability and resilience
- Integrate telemetry and alerting platforms to track and improve reliability of systems
- Implement industry best practices for system development, configuration management and system deployment
- Ensure seamless flow of information between teams by documenting knowledge gained
- Be up to date on modern technologies and trends to advocate for inclusion within products if they add value
- Participate in incident management including troubleshooting production issues, driving root cause analysis (RCA) and actively sharing lessons learned to improve system reliability and internal knowledge
Preferred Qualifications
- Container image management and optimization
- Experience in large distributed system architecture and capacity planning
- Understanding of IaC, automation tools, terraform, cloud formation, etc
- Background in SRE/DevOps concepts and implementation
- Experience in managing monitoring tools like CloudWatch, VictoriaMetrics, Prometheus and reporting with Snowflake and Sigma
- In depth knowledge of web technologies such as CloudFront, Nginx, etc
- Experience in designing, implementing or maintaining disaster recovery strategies and multi-region architecture to ensure high availability, resilience, and business continuity across critical systems
- Language ability in Japanese is a plus
Benefits
- Social Insurance (health insurance, employee pension, employment insurance and compensation insurance)
- 401K
- Translation/Interpretation support
- VISA sponsor + Relocation support
- Hybrid Workstyle (flexible working style including Remote and office)
- Super Flex Time (No Core Time)
- Every Sat/Sun/National holidays (In Japan)/New Year's break/Company-designated Special days
- Annual leave (up to 14 days in the first year, granted proportionally according to the month of employment. Can be used from the date of hire)
- Personal leave (5 days each year, granted proportionally according to the month of employment๏ผ
- PayPay's own special paid leave system, which can be used to attend to illnesses, injuries, hospital visits, etc., of the employee, family members, pets, etc
- Annual salary paid in 12 installments (monthly)
- Reviewed once a year
- Special Incentiveย once a year *Based on company performance and individual contribution and evaluation
- Late overtime allowance
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.