Site Reliability Engineer

Logo of PayPay Corporation

PayPay Corporation

📍Remote - Worldwide

Job highlights

Summary

Join PayPay as a Senior SRE and lead a team focused on maintaining a robust observability pipeline across EKS clusters. You will be responsible for the architecture, implementation, and optimization of our platform, collaborating with other engineering teams and mentoring junior engineers. This role requires 5+ years of experience as an SRE or Tech Lead, extensive AWS and EKS experience, and proficiency in programming languages like Python, Go, or Rust. The position offers a flexible work-from-anywhere arrangement in Japan, comprehensive benefits including health insurance, 401k, and paid time off, and a competitive salary with performance-based incentives.

Requirements

  • 5+ years of experience as a Site Reliability Engineer or Tech Lead
  • 5+ years of experience in AWS and EKS
  • Several years of experience in designing, implementing, and operating large-scale observability with Victoria Metrics
  • Senior Infratsructure Engineer level of understanding of cloud architecture, particularly on AWS, and AWS/EKS network infrastructure
  • Proficient in programming one or multiple languages like Python, Go, or Rust
  • Strong problem-solving and troubleshooting skills to quickly identify and resolve complex issues
  • Passion for continuously improving observability practices and driving innovation

Responsibilities

  • Lead the engineering team in the architecture, implementation, and optimization of our platform built on Victoria Metrics, OpenTelemetry, Quickwit, and ClickHouse
  • Develop and maintain deep understanding of network and cloud infrastructure to enable effective troubleshooting and incident response
  • Collaborate closely with other engineering teams, providing guidance on reliability, performance, and efficiency
  • Automate incident response to proactively address issues before they impact customers
  • Mentor the technical skills of the engineering team
  • Drive continuous improvement of observability tools and practices to enhance visibility and reliability across the organization
  • Communicate with stakeholders to explain complex technical concepts
  • To establish a culture of reliability throughout the Organization
  • Manage growth sustainably
  • Deliver customer and engineer satisfaction

Preferred Qualifications

  • Some experience with OpenTelemetry, ClickHouse, and Quickwit is a definite plus
  • Experience in large distributed system architecture and capacity planning
  • Bilingual in English and Japanese is nice to have, but not required

Benefits

  • Social Insurance (health insurance, employee pension, employment insurance and compensation insurance)
  • 401K
  • Translation/Interpretation support
  • VISA sponsor + Relocation support
  • Work From Anywhere at Anytime
  • Super Flex Time (No Core Time)
  • Every Sat/Sun/National holidays (In Japan)/New Year's break/Company-designated Special days
  • Annual leave (up to 14 days in the first year, granted proportionally according to the month of employment. Can be used from the date of hire)
  • Personal leave (5 days each year, granted proportionally according to the month of employment)
  • ���PayPay's own special paid leave system, which can be used to attend to illnesses, injuries, hospital visits, etc., of the employee, family members, pets, etc
  • Annual salary paid in 12 installments (monthly)
  • Based on skills, experience, and abilities
  • Reviewed once a year
  • Special Incentive once a year *Based on company performance and individual contribution and evaluation
  • Late overtime allowance, Work from anywhere allowance (JPY100,000)
  • ���Payroll payment can be changed to digital salary payment “PayPay Paycheck” for an amount set by you

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.