Summary

Join Algolia's Platform as a Service (PaaS) team as a Senior Site Reliability Engineer (IC4). Enhance scalable infrastructure focusing on CI/CD, Observability, and application hosting. Bridge the gap between junior and senior staff, ensuring reliability, scalability, and performance of Algolia’s Search Products. Build and optimize systems, mentor junior engineers, and collaborate across teams. Transition legacy systems to a modern Kubernetes-based architecture and contribute to long-term infrastructure strategies. This senior role requires strong programming skills, experience with CI/CD pipelines, observability frameworks, Kubernetes, and distributed systems. Mentorship experience and excellent communication skills are also essential.

Requirements

Proficient in Golang and Python with a solid understanding of software craftsmanship
Hands-on experience in building and maintaining CI/CD pipelines using tools like GitHub Actions, CircleCI, or alternatives
Familiarity with best practices for ensuring build and deployment reliability
Experience designing and implementing monitoring, alerting, and observability frameworks that provide actionable insights
Strong troubleshooting skills in production environments
Proven experience in managing and optimizing Kubernetes-based architectures and working with public cloud providers such as GCP, AWS, or Microsoft Azure
Experience in designing, building, and operating distributed systems at scale, with a focus on reliability, availability, and performance
Experience mentoring junior engineers and helping them grow
Ability to collaborate with cross-functional teams and contribute to strategic initiatives
Ability to independently solve complex technical problems with minimal supervision while collaborating effectively with other team members
Strong ability to communicate complex technical issues to both technical and non-technical audiences
Ability to organize and prioritize multiple projects

Responsibilities

Contribute to the design, optimization, and maintenance of the CI/CD pipelines to improve the speed, reliability, and efficiency of the development lifecycle
Assist in driving standardization across various services hosted on the platform
Lead efforts to improve the observability of critical systems, working closely with cross-functional teams to ensure actionable monitoring and alerting frameworks are in place
Help troubleshoot complex issues and optimize system reliability
Contribute to the development and operation of our Kubernetes-based architecture
Ensure systems are resilient, scalable, and optimized for performance
Actively participate in enhancing cloud-based solutions for API management and microservices
Collaborate with team members to ensure system scalability, operability, and performance
Lead initiatives to optimize resource utilization, focusing on cost efficiency while maintaining high system availability
Mentor mid-level engineers (IC3) by providing guidance on technical challenges and SRE best practices
Support team growth by fostering knowledge-sharing sessions and helping establish processes that drive operational excellence
Work closely with product, software, and other SRE teams to ensure that platform goals align with broader business objectives
Drive initiatives aimed at enhancing platform stability, security, and scalability

Preferred Qualifications

Knowledge of Ruby is a plus

Benefits

Flexible workplace model
Option to work remotely either as fully remote or hybrid-remote employees

Senior Site Reliability Engineer

Algolia

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Intetics

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

ServiceNow

Remote

DevOps

Senior