Manager, Site Reliability Engineering at Algolia

Summary

Join Algolia as a Site Reliability Engineering Manager and lead the Fleet team, ensuring the global reliability of Search Products. You will manage a team of experienced engineers, focusing on optimizing Search Product availability and costs. Responsibilities include collaborating with senior leadership to define technical direction, building relationships with stakeholders, mentoring team members, and establishing engineering processes. You will define and maintain SLAs and KPIs, resolve complex technical issues, and design monitoring systems. The role also involves managing the team's budget and documenting projects and processes. Algolia offers a flexible workplace strategy, allowing for remote or hybrid work options.

Requirements

4+ years of engineering management experience
You are fluent in Agile methodology and can lead a project from the idea to Production
You are an excellent communicator, collaborating with Product managers, Technical Program Managers, and Individual Contributors to your team
You are comfortable managing a large team regrouping all seniority levels, and accompanying Individual Contributors in their growth and development
You know how to deploy an application from laptop to production, are able to fully automate it, and you are comfortable with Production requirements (Observability, Alerting, ...)
You are knowledgeable in DevOps principles and CI/CD pipelines
You are knowledgeable in Configuration Management and Infrastructure as Code such as Chef and Terraform
You are knowledgeable in at least one programming language (Python, Golang, Ruby.) and are familiar with software craftsmanship
Full professional English proficiency
Ability to make decisions and take ownership for them

Responsibilities

Operating and scaling the entire Search fleet , ensuring global performance and reliability
Reducing and maintaining the level of incidents through actionable KPIs and well-defined SLOs, while coaching and delegating Tier 3 support responsibilities
Running and continuously improving our in-house Edge Load Balancer
Building, operating, and enhancing a robust backup and restore system to ensure compliance with our SLAs
FinOps responsibilities , including monitoring infrastructure costs at scale and identifying optimization opportunities
Collaborating with senior leadership to define the overall technical direction and strategy for the organization , and ensure that the SRE team's goals and initiatives are aligned with this strategy
As well as building and maintaining strong relationships with stakeholders across the organization , as you represent the SRE organization in cross-functional meetings
You will also stay close to product and design teams to ensure that the user experience is always top of mind
You are expected to provide leadership, guidance and mentorship to your team members , helping them to develop their technical skills and knowledge of best practices in site reliability engineering
You will continuously evaluate and improve the performance of the SRE team, and you will identify and implement initiatives to drive operational excellence and improve overall service reliability
Establishing and enforcing engineering processes and best practices that ensure high-quality, reliable, and scalable systems , as well as working with other teams to promote the adoption of these processes and practices across the organization
You will be responsible for defining and maintaining service level agreements (SLAs) and key performance indicators (KPIs) for your team's services, and you will work with other teams to ensure that these SLAs and KPIs are being met
As well as leading cross-functional efforts to resolve complex technical issues and mitigate operational risks across multiple teams and domains
Along with your team you will help design and implement monitoring, alerting, and metrics systems to ensure the availability, performance, and reliability of your team's services, and you continuously refine and improve these systems
Collaborating with other technical teams to identify opportunities to automate processes, as well as designing and implementing automated tools and systems to support these processes
As manager, you will also manage the budget for your team , ensuring that resources are being used efficiently
Finally, you will be responsible for documenting your team's projects and processes , and ensuring that this documentation is up-to-date and accessible to all stakeholders

Benefits

Algolia’s flexible workplace model is designed to empower all Algolians to fulfill our mission to power search and discovery with ease
We place an emphasis on an individual’s impact, contribution, and output, over their physical location
Algolia is a high-trust environment and many of our team members have the autonomy to choose where they want to work and when
While we have a global presence with physical offices in Paris, NYC, London, Sydney and Bucharest, we also offer many of our team members the option to work remotely either as fully remote or hybrid-remote employees
Please note that positions listed as "Remote" are only available for remote work within the specified country
Positions listed within a specific city are only available in that location - depending on the nature of the role it may be available with either a hybrid-remote or in-office schedule

Manager, Site Reliability Engineering

Algolia

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Manager

Share this job:

Similar Remote Jobs

Remote

DevOps

Manager

Remote

DevOps

Manager

DC SCORES

Remote

DevOps

Manager

Remote

DevOps

Manager

Remote

DevOps

Manager

Remote

DevOps

Manager

Remote

DevOps

Manager

Canonical

Remote

DevOps

Manager

Remote

DevOps

Senior