Remote Director of Site Reliability Engineering

Logo of ThousandEyes

ThousandEyes

πŸ“Remote - India, Portugal

Job highlights

Summary

Join Cisco ThousandEyes as Director of Site Reliability Engineering, Network Assurance Data Platform and play a critical role in shaping and executing our cloud and big data, ML/AI infrastructure strategy. You will lead teams of talented engineers and collaborate closely with cross-functional teams to design, build, and maintain our infrastructure, cloud platforms, and security practices.

Requirements

  • You have a deep understanding of the distributed systems design, cloud technology and their components, dependencies, and code that define infrastructure
  • You possess a deep understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts
  • Extensive hands-on experience building cloud, big data and/or ML/AI infrastructure (e.g. EMR, Airflow, Comet ML, AWS SageMaker, Spark, etc)
  • Extensive hands-on experience operating mission-critical services in production environments which are required to have high availability and reliability
  • Proven ability to think strategically and align technical initiatives with business objectives
  • Can provide a strong technical vision for your teams and ensure consistent delivery of objectives
  • Have experience formulating a team's technical strategy and roadmap; you've collaborated and partnered effectively with several other teams to execute shared goals
  • Understand how to balance tactical needs with strategic growth and quality-based initiatives that can span multiple quarters
  • Proven site reliability engineering management experience leading multiple teams

Responsibilities

  • Lead and inspire a talented team of site reliability engineers, fostering a culture of innovation, collaboration, and excellence in development and operation of infrastructure platforms
  • Drive the strategic vision for the development, implementation, and management of cloud, data, ML/AI platforms
  • Collaborate closely with cross-functional teams, including development, product management, and security to define and implement reliable, secure, and scalable infrastructure platforms
  • Provide oversight and direction in the development and operation of cloud platforms, ensuring high-quality, scalable, and reliable solutions that meet customer needs
  • Drive operational excellence in operations and security processes
  • Mentor and develop engineering talent, fostering a culture of continuous learning and professional growth within the site reliability engineering group

Benefits

  • Quality medical, dental and vision insurance
  • A 401(k) plan with a Cisco matching contribution
  • Short and long-term disability coverage
  • Basic life insurance
  • Numerous wellbeing offerings
  • Up to twelve paid holidays per calendar year, which includes one floating holiday, plus a day off for their birthday
  • Up to 20 days of Paid Time Off (PTO) each year
  • Paid time away to deal with critical or emergency issues without tapping into their PTO
  • Additional paid time to volunteer and give back to the community
  • Employee Stock Purchase Program

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let ThousandEyes know you found this job on JobsCollider. Thanks! πŸ™