Site Reliability Engineering Manager

Algolia Logo

Algolia

📍Remote - France

Summary

Join Algolia as a Site Reliability Engineering Manager and lead the Fleet team, responsible for the provisioning and global reliability of Search Products at scale. You will manage a team of experienced engineers, ensuring the availability and cost optimization of Search Products. Your responsibilities include collaborating with senior leadership to define technical direction, building relationships with stakeholders, providing leadership and mentorship to your team, establishing engineering processes, defining and maintaining SLAs and KPIs, resolving complex technical issues, designing and implementing monitoring systems, automating processes, managing the team's budget, and documenting projects and processes.

Requirements

  • 4+ years of engineering management experience
  • You are fluent in Agile methodology and can lead a project from the idea to Production
  • You are an excellent communicator, collaborating with Product managers, Technical Program Managers, and Individual Contributors to your team
  • You are comfortable managing a large team regrouping all seniority levels, and accompanying Individual Contributors in their growth and development
  • You know how to deploy an application from laptop to production, are able to fully automate it, and you are comfortable with Production requirements (Observability, Alerting, ...)
  • You are knowledgeable in DevOps principles and CI/CD pipelines
  • You are knowledgeable in Configuration Management and Infrastructure as Code such as Chef and Terraform
  • You are knowledgeable in at least one programming language (Python, Golang, Ruby.) and are familiar with software craftsmanship
  • Full professional English proficiency
  • Ability to make decisions and take ownership for them

Responsibilities

  • Operating and scaling the entire Search fleet , ensuring global performance and reliability
  • Reducing and maintaining the level of incidents through actionable KPIs and well-defined SLOs, while coaching and delegating Tier 3 support responsibilities
  • Running and continuously improving our in-house Edge Load Balancer
  • Building, operating, and enhancing a robust backup and restore system to ensure compliance with our SLAs
  • FinOps responsibilities , including monitoring infrastructure costs at scale and identifying optimization opportunities
  • Collaborating with senior leadership to define the overall technical direction and strategy for the organization , and ensure that the SRE team's goals and initiatives are aligned with this strategy
  • As well as building and maintaining strong relationships with stakeholders across the organization , as you represent the SRE organization in cross-functional meetings
  • You will also stay close to product and design teams to ensure that the user experience is always top of mind
  • You are expected to provide leadership, guidance and mentorship to your team members , helping them to develop their technical skills and knowledge of best practices in site reliability engineering
  • You will continuously evaluate and improve the performance of the SRE team, and you will identify and implement initiatives to drive operational excellence and improve overall service reliability
  • Establishing and enforcing engineering processes and best practices that ensure high-quality, reliable, and scalable systems , as well as working with other teams to promote the adoption of these processes and practices across the organization
  • You will be responsible for defining and maintaining service level agreements (SLAs) and key performance indicators (KPIs) for your team's services, and you will work with other teams to ensure that these SLAs and KPIs are being met
  • As well as leading cross-functional efforts to resolve complex technical issues and mitigate operational risks across multiple teams and domains
  • Along with your team you will help design and implement monitoring, alerting, and metrics systems to ensure the availability, performance, and reliability of your team's services, and you continuously refine and improve these systems
  • Collaborating with other technical teams to identify opportunities to automate processes, as well as designing and implementing automated tools and systems to support these processes
  • As manager, you will also manage the budget for your team , ensuring that resources are being used efficiently
  • Finally, you will be responsible for documenting your team's projects and processes , and ensuring that this documentation is up-to-date and accessible to all stakeholders

Benefits

  • Algolia’s flexible workplace model is designed to empower all Algolians to fulfill our mission to power search and discovery with ease
  • We place an emphasis on an individual’s impact, contribution, and output, over their physical location
  • Algolia is a high-trust environment and many of our team members have the autonomy to choose where they want to work and when
  • While we have a global presence with physical offices in Paris, NYC, London, Sydney and Bucharest, we also offer many of our team members the option to work remotely either as fully remote or hybrid-remote employees
  • Please note that positions listed as "Remote" are only available for remote work within the specified country
  • Positions listed within a specific city are only available in that location - depending on the nature of the role it may be available with either a hybrid-remote or in-office schedule

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.