Lead Site Reliability Engineer

Natera
Summary
Join Natera as a Lead Site Reliability Engineer and lead a team of SREs in ensuring the stability, scalability, and performance of our production bioinformatics applications and infrastructure. You will be responsible for team management, implementing monitoring and incident response processes, managing the release process, providing production support, and collaborating with various teams. The role requires a Bachelor's degree, 5+ years of experience in SRE, and strong technical skills in cloud platforms, container orchestration, and IaC tools. Experience with bioinformatics applications is highly desirable. Natera offers competitive benefits including comprehensive medical, dental, vision, life and disability plans, free testing for employees and their families, fertility care benefits, pregnancy and baby bonding leave, 401k benefits, and more.
Requirements
- Bachelorβs degree in Computer Science, Bioinformatics, Engineering, or a related field; advanced degree preferred
- Minimum of 5 years of experience in site reliability engineering, systems engineering, or a related role, with at least 2 years in a leadership or managerial capacity
- Strong understanding of SRE principles and best practices, including monitoring, incident management, release management, and performance optimization
- Proficiency with cloud platforms (e.g., AWS, Azure, GCP) and container orchestration tools (e.g., Docker, Kubernetes)
- Experience with infrastructure as code (IaC) tools (e.g., Terraform, Ansible) and CI/CD pipelines
- Proven ability to lead and manage technical teams, with strong skills in mentoring, coaching, and performance management
- Excellent problem-solving skills and the ability to work under pressure in a fast-paced environment
- Strong interpersonal and communication skills, with the ability to collaborate effectively with both technical and non-technical stakeholders
Responsibilities
- Lead and mentor a team of SREs, fostering a culture of collaboration, innovation, and continuous improvement
- Define clear goals and performance metrics for the team, and oversee the execution of their responsibilities
- Conduct regular one-on-ones, provide constructive feedback, and facilitate professional development opportunities for team members
- Implement and manage monitoring, alerting, and incident response processes to ensure the reliability and uptime of bioinformatics systems
- Drive the resolution of operational issues, perform root cause analysis, and implement preventive measures to mitigate recurrence
- Manage the end-to-end release process for bioinformatics applications, including planning, coordination, and deployment
- Collaborate with development teams to ensure timely and successful releases, minimizing disruptions and ensuring alignment with release schedules
- Develop and enforce best practices for release management, including version control, release notes, and rollback procedures
- Provide ongoing support for production systems, including handling incidents, performing routine maintenance, and addressing user-reported issues
- Implement and manage procedures for system health checks, backups, and disaster recovery
- Ensure that production environments are monitored, and that any issues are promptly identified and resolved
- Work closely with bioinformatics scientists, data engineers, and software developers to understand their needs and optimize system performance
- Collaborate with other engineering and IT teams to integrate bioinformatics applications with broader enterprise operational tracking systems and tools
- Participate in cross-functional projects to enhance overall system architecture and deployment strategies
- Develop and enforce best practices for deployment, configuration management, and system maintenance
- Lead efforts in capacity planning, performance tuning, and infrastructure scaling to accommodate evolving research demands
- Maintain documentation and standard operating procedures for all SRE-related activities
- Stay abreast of emerging technologies and trends in site reliability engineering and bioinformatics
- Evaluate and recommend new tools, technologies, and processes to enhance system reliability and operational efficiency
Preferred Qualifications
- Experience working with bioinformatics applications or in a production environment related to research or clinical data analysis is highly desirable
- Familiarity with bioinformatics tools and data workflows is a plus
- A proactive and innovative mindset, with a passion for improving system reliability and efficiency
- Strong analytical skills with the ability to perform detailed root cause analysis and drive resolution
- Commitment to fostering a culture of continuous improvement and learning within the team
Benefits
- Comprehensive medical, dental, vision, life and disability plans for eligible employees and their dependents
- Natera employees and their immediate families receive free testing in addition to fertility care benefits
- Pregnancy and baby bonding leave
- 401k benefits
- Commuter benefits
- Generous employee referral program