Remote Senior Site Reliability Engineering Manager

Logo of Sumo Logic

Sumo Logic

πŸ“Remote - India

Job highlights

Summary

Join Sumo Logic's team of innovators as a Senior Engineering Manager, SRE. Lead a global team in maintaining excellent uptime numbers for our services, reducing operational workload for engineers, and promoting blameless post-mortem culture.

Requirements

  • B.S. in Computer Sciences or related discipline (M.S., or Ph.D. is a plus)
  • Minimum 8+ years of industry experience with a proven track record of ownership, delivery, and operational excellence
  • Minimum 3+ years in a management role
  • Experience being responsible for key SLOs of a cloud-based SaaS: availability, uptime, performance, and security
  • Experience in multi-threaded programming and distributed systems
  • Object-oriented programming experience, for example in Java, Scala, Golang
  • Experience with high volumes of data using the latest technologies such as Kafka, Kubernetes and Docker
  • Agile software development experience (test-driven development, iterative and incremental development). Experience in big data and/or 24x7 commercial service is highly desirable
  • Hands-on experience with public cloud Infrastructure-as-a-service and Platform-as-a-service offerings - Amazon Web Services, Google Cloud Platform, etc

Responsibilities

  • Drive the program that maintains excellent uptime numbers for our services
  • Manage error budgets and associated policies for key product SLOs
  • Promote blameless post-mortem culture combined with developer operational accountability
  • Continuously reduce operational workload for engineers by means of infrastructure improvements and automation
  • Carry out projects that actively reduce our AWS spend
  • Manage AWS resource reservations for our whole infrastructure
  • Observe our current spend on cloud resources and improve our cost monitoring ecosystem
  • Help product teams develop secure applications for the Sumo Logic platform
  • Integrate and implement solutions improving Sumo Logic’s security posture
  • Lead security reviews and penetration tests at design and implementation stages
  • Partner with the Security Operations Center (SOC) and Compliance team on our security and compliance posture, vulnerability management, and threat modeling of our tech stack
  • Educate product teams on secure development best practices and Quality Engineering teams on continuous improvement of security testing
  • Lead and grow a global team of SREs adept at building extremely high-volume, fault-tolerant, efficient, and scalable backend systems
  • Partner with our technical leadership team to review choices on an ongoing basis, in anticipation of increased scale and ever-evolving technology to meet the demands of growing business. Leverage technical skills to successfully analyze and improve the efficiency, scalability, and reliability of our backend systems

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Sumo Logic know you found this job on JobsCollider. Thanks! πŸ™