Senior Site Reliability Engineer II

closed
Sumo Logic Logo

Sumo Logic

πŸ“Remote - India

Summary

Join Sumo Logic as a Senior Site Reliability Engineer 2, focusing on cost optimization. You will own the availability of Sumo's planet-scale products, working with the global SRE and FinOps teams to monitor, alert, and optimize cost efficiency. Collaborate with cross-functional teams to maximize the value of cloud infrastructure spending. Develop data models for cost analysis and profitability forecasting, and lead cost-saving projects. This role requires strong technical expertise and leadership in a fast-paced environment.

Requirements

  • Understand and apply modern approaches to cloud-native software financial operations (finOps)
  • Experience monitoring, alerting, and forecasting cloud spend
  • Cloud native application development and operations experience leveraging best practices
  • Strong debugging and trouble-shooting skills across the entire technology stack
  • Deep understanding of AWS Networking, Compute, Storage, and managed services
  • Competency with modern CI/CD tooling like Kubernetes, Terraform, Ansible & Jenkins
  • Experience with full life cycle support of services, from creation to production support
  • Versed in Infrastructure as Code practices using technologies like Terraform or Cloud Formation
  • Ability to author production ready code in at least one the following: Java, Scala or Go
  • Experience with Linux systems and at home on the command line
  • Experienced with agile frameworks, such as Scrum and Kanban, and how to operate within these frameworks to continually deliver value
  • Flexible and willing to step into new roles and responsibilities
  • Willingness to learn and use Sumo Logic products for solving reliability and security issues
  • Bachelor’s or Master's Degree in Computer Science, Electrical Engineering, or another scientific or technical discipline
  • 6-8 years of industry experience

Responsibilities

  • Support the engineering teams within your product area by maintaining and executing a reliability roadmap of opportunities for improvement for reliability, maintainability, security, efficiency, and velocity - and help realizing those opportunities that drive efficiency and reduce costs
  • Lead initiatives to implement cost-saving projects, leveraging data insights and technical expertise
  • Develop and maintain data models and reporting for Sumo Logic cost and profitability, such as unit economics or cost per customer
  • Ensure engineering teams have near-real-time visibility into their cloud spend and can be held accountable for it, by providing them with cost dashboards, alerts and insights
  • Collaborate with cross-functional teams to identify opportunities for cost optimization and efficiency improvements
  • Engage in capacity planning activities by designing a resource reservation strategy and participate in commitment purchases (CSP, RI etc.)
  • Drive continuous improvement and operational excellence for our FinOps tools
  • Participate in on-call rotations to understand operations workload so you can continually work to improve the on-call experience and reduce operational workload for running microservices and related components
  • Complete projects to optimize and tune on-call experience for your engineering teams
  • Write code and automation to reduce operational workload, increase efficiency, improve security posture, eliminate toil, and enable Sumo’s developers to deliver features more rapidly
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability, velocity, and cost efficiency
  • Facilitate blame-free root cause analysis meetings for incidents to learn and drive improvement
  • Participate in and continually improve our global IRC (incident response coordination) for all products
  • Drive root cause identification and issue resolution with the teams
  • Work inside of a fast-paced iterative environment

Preferred Qualifications

  • Experience using Sumo Logic products or other observability products for reliability and security
  • Experienced with planet scale product development
  • Running and operating SaaS products on AWS Cloud with expert level proficiency
  • 1+ years of experience with cloud financial management or FinOps framework
  • Expert level experience in one or more of: Java, Go, Scala, or Python
  • Expert level experience in one or more of: Terraform, Jenkins, Kubernetes
  • Extensive experience running and tuning JVM workloads at scale
This job is filled or no longer available

Similar Remote Jobs