Staff Site Reliability Engineer

Logo of SentinelOne

SentinelOne

💵 $148k-$204k
📍Remote - United States

Job highlights

Summary

Join SentinelOne, a leading cybersecurity company, as a Site Reliability Engineer (SRE). You will play a crucial role in ensuring the stability, reliability, and scalability of our products and services. This position involves managing Kubernetes, creating IaC, leading troubleshooting during incidents, and collaborating with engineering teams to improve system reliability. You will design and implement comprehensive monitoring and alerting, analyze systems to identify areas for improvement, and develop automation strategies. The ideal candidate possesses extensive experience in SRE, particularly within large-scale SaaS or cloud environments, and a strong understanding of distributed systems. U.S. citizenship is required due to Federal Government contract requirements.

Requirements

  • 7+ years of experience in Site Reliability Engineering, preferably with a large scale SaaS product or large cloud-based distributed system
  • 5+ years of production experience with orchestration systems like Kubernetes, Nomad or Mesos
  • Experience with a scripting language, such as Python, Golang, Java, or Ruby
  • Familiarity with running Java and JavaScript applications, including build and deploy
  • AWS experience, and familiarity with other platforms like GCP
  • Experience using Infrastructure as Code (IaC) to setup cloud-native services
  • Familiarity with CI and practical delivery using Jenkins, GHA, ArgoCD, etc. or similar; familiarity with deployment strategies like blue-green, rolling deploys, canary deploys, and best practices around deployment automation
  • Curiosity, fast-learning, and great communication skills
  • U.S. Citizenship

Responsibilities

  • Support the stability, reliability, and scalability of SentinelOne’s distributed systems through various tasks performed by the Site Reliability Engineering organization including managing Kubernetes, creating IaC, and leading troubleshooting during incident response
  • Identify areas, such as performance issues and availability concerns, as well as perform other technical and architectural reviews to partner with fellow engineering teams to improve overall reliability of SentinelOne systems
  • Design and implement comprehensive monitoring and alerting, as well as concepts such as SLIs/SLOs and critical user journeys to provide deeper insight into the performance and availability of SentinelOne’s systems
  • Analyze systems, identify toil, and develop and implement strategies such as automation to streamline and optimize SRE’s support of critical systems

Preferred Qualifications

  • 2+ years of experience in a FedRAMP environment
  • Ability to work in a diverse and distributed team
  • Self-starter attitude, with passion for new technologies and empathy for legacy systems
  • Ability to learn quickly, and navigate through unfamiliar programming languages, systems, and processes

Benefits

  • Medical, Vision, Dental
  • 401(k)
  • Commuter
  • Health and Dependent FSA
  • Unlimited PTO
  • Industry-leading gender-neutral parental leave
  • Paid company holidays
  • Paid sick time
  • Employee stock purchase program
  • Disability and life insurance
  • Employee assistance program
  • Gym membership reimbursement
  • Cell phone reimbursement
  • Numerous company-sponsored events including regular happy hours and team-building events

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let SentinelOne know you found this job on JobsCollider. Thanks! 🙏