Director of Site Reliability Engineering at ThousandEyes

Summary

Join Cisco ThousandEyes as Director of Site Reliability Engineering, Network Assurance Data Platform and play a critical role in shaping and executing our cloud and big data, ML/AI infrastructure strategy. You will lead teams of talented engineers and collaborate closely with cross-functional teams to design, build, and maintain our infrastructure, cloud platforms, and security practices.

Requirements

You have a deep understanding of the distributed systems design, cloud technology and their components, dependencies, and code that define infrastructure
You possess a deep understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts
Extensive hands-on experience building cloud, big data and/or ML/AI infrastructure (e.g. EMR, Airflow, Comet ML, AWS SageMaker, Spark, etc)
Extensive hands-on experience operating mission-critical services in production environments which are required to have high availability and reliability
Proven ability to think strategically and align technical initiatives with business objectives
Can provide a strong technical vision for your teams and ensure consistent delivery of objectives
Have experience formulating a team's technical strategy and roadmap; you've collaborated and partnered effectively with several other teams to execute shared goals
Understand how to balance tactical needs with strategic growth and quality-based initiatives that can span multiple quarters
Proven site reliability engineering management experience leading multiple teams

Responsibilities

Lead and inspire a talented team of site reliability engineers, fostering a culture of innovation, collaboration, and excellence in development and operation of infrastructure platforms
Drive the strategic vision for the development, implementation, and management of cloud, data, ML/AI platforms
Collaborate closely with cross-functional teams, including development, product management, and security to define and implement reliable, secure, and scalable infrastructure platforms
Provide oversight and direction in the development and operation of cloud platforms, ensuring high-quality, scalable, and reliable solutions that meet customer needs
Drive operational excellence in operations and security processes
Mentor and develop engineering talent, fostering a culture of continuous learning and professional growth within the site reliability engineering group

Benefits

Quality medical, dental and vision insurance
A 401(k) plan with a Cisco matching contribution
Short and long-term disability coverage
Basic life insurance
Numerous wellbeing offerings
Up to twelve paid holidays per calendar year, which includes one floating holiday, plus a day off for their birthday
Up to 20 days of Paid Time Off (PTO) each year
Paid time away to deal with critical or emergency issues without tapping into their PTO
Additional paid time to volunteer and give back to the community
Employee Stock Purchase Program

Director of Site Reliability Engineering

ThousandEyes

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Director

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Director

Remote

DevOps

Manager

Remote

Remote

DevOps

Senior

Learning Technologies Group plc

Remote

DevOps

Mid-level

Remote

DevOps

Principal

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Canonical

Remote

DevOps

Manager