Summary
Join Algolia's Infrastructure as a Service (IaaS) team as a Senior Site Reliability Engineer. You will play a key role in designing, implementing, and maintaining highly available, scalable, and fault-tolerant systems. This position focuses on Kubernetes and cloud services leadership, advanced system management, and control plane advancement. You will collaborate with cross-functional teams, mentor peers, and lead technical projects. The ideal candidate possesses extensive programming expertise, deep Kubernetes and Linux experience, and public cloud mastery. Algolia offers a flexible workplace model with a high-trust environment, allowing for autonomy in choosing work location and time.
Requirements
- Proficient knowledge in programming languages such as Python, Ruby, or Golang, with a focus on developing maintainable, high-quality software
- Proven track record in managing and optimizing large-scale Kubernetes clusters and Linux server fleets, ensuring operational excellence
- In-depth understanding of the complexities and challenges of distributed systems, with experience designing and operating scalable, fault-tolerant architectures
- Advanced knowledge of public cloud providers such as Microsoft Azure, AWS, or GCP, including architectural best practices and cost optimization strategies
- Demonstrated ability to tackle complex technical challenges independently and lead resolution efforts across teams
- Strong ability to mentor junior and mid-level engineers, fostering technical growth and collaboration within the team
- Adept at articulating complex technical concepts to diverse audiences and aligning stakeholders around shared goals
Responsibilities
- Oversee and enhance Kubernetes-based architectures and cloud services, ensuring fault tolerance, optimal resource utilization, and seamless scalability
- Lead the improvement of infrastructure code and automation, managing a fleet of thousands of servers to maintain safety, efficiency, and reliability
- Architect and extend the control plane into a comprehensive platform that empowers teams to develop performant and scalable products, with reliability baked into the foundation
- Collaborate with and mentor team members while solving complex technical challenges, fostering a culture of ownership and shared accountability across teams
- Define and implement engineering processes and best practices to ensure the delivery of high-quality, reliable, and scalable systems
- Identify and resolve systemic issues, implement innovative solutions, and drive continuous improvement in infrastructure and operations
Benefits
- Algolia’s flexible workplace model is designed to empower all Algolians to fulfill our mission to power search and discovery with ease. We place an emphasis on an individual’s impact, contribution, and output, over their physical location
- Algolia is a high-trust environment and our team members have the autonomy to choose where they want to work and when
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.