Senior Site Reliability Engineer

Algolia Logo

Algolia

πŸ“Remote - France

Summary

Join Algolia's Platform as a Service (PaaS) team as a Senior Site Reliability Engineer (IC4). Enhance scalable infrastructure focusing on CI/CD, Observability, and application hosting. Bridge the gap between junior and senior staff, ensuring reliability, scalability, and performance of Algolia’s Search Products. Build and optimize systems, mentor junior engineers, and collaborate across teams. Transition legacy systems to a modern Kubernetes-based architecture and contribute to long-term infrastructure strategies. This senior role requires strong programming skills, experience with CI/CD pipelines, observability frameworks, Kubernetes, and distributed systems. Mentorship experience and excellent communication skills are also essential.

Requirements

  • Proficient in Golang and Python with a solid understanding of software craftsmanship
  • Hands-on experience in building and maintaining CI/CD pipelines using tools like GitHub Actions, CircleCI, or alternatives
  • Familiarity with best practices for ensuring build and deployment reliability
  • Experience designing and implementing monitoring, alerting, and observability frameworks that provide actionable insights
  • Strong troubleshooting skills in production environments
  • Proven experience in managing and optimizing Kubernetes-based architectures and working with public cloud providers such as GCP, AWS, or Microsoft Azure
  • Experience in designing, building, and operating distributed systems at scale, with a focus on reliability, availability, and performance
  • Experience mentoring junior engineers and helping them grow
  • Ability to collaborate with cross-functional teams and contribute to strategic initiatives
  • Ability to independently solve complex technical problems with minimal supervision while collaborating effectively with other team members
  • Strong ability to communicate complex technical issues to both technical and non-technical audiences
  • Ability to organize and prioritize multiple projects

Responsibilities

  • Contribute to the design, optimization, and maintenance of the CI/CD pipelines to improve the speed, reliability, and efficiency of the development lifecycle
  • Assist in driving standardization across various services hosted on the platform
  • Lead efforts to improve the observability of critical systems, working closely with cross-functional teams to ensure actionable monitoring and alerting frameworks are in place
  • Help troubleshoot complex issues and optimize system reliability
  • Contribute to the development and operation of our Kubernetes-based architecture
  • Ensure systems are resilient, scalable, and optimized for performance
  • Actively participate in enhancing cloud-based solutions for API management and microservices
  • Collaborate with team members to ensure system scalability, operability, and performance
  • Lead initiatives to optimize resource utilization, focusing on cost efficiency while maintaining high system availability
  • Mentor mid-level engineers (IC3) by providing guidance on technical challenges and SRE best practices
  • Support team growth by fostering knowledge-sharing sessions and helping establish processes that drive operational excellence
  • Work closely with product, software, and other SRE teams to ensure that platform goals align with broader business objectives
  • Drive initiatives aimed at enhancing platform stability, security, and scalability

Preferred Qualifications

Knowledge of Ruby is a plus

Benefits

  • Flexible workplace model
  • Option to work remotely either as fully remote or hybrid-remote employees

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs