Principal Engineer, Site Reliability Engineering, Observability
SentinelOne
π΅ $204k-$281k
πRemote - United States
Please let SentinelOne know you found this job on JobsCollider. Thanks! π
Job highlights
Summary
Join our Site Reliability Engineering (SRE) Team at SentinelOne as an experienced Principal Engineer to architect and lead the implementation of advanced observability, automated triage, and self-healing capabilities within our microservices-based SaaS environment.
Requirements
- Extensive SRE Experience: Proven experience in architecting and implementing SRE solutions at scale within a microservices or distributed systems environment
- 15+ years of progressive professional experience, with 5+ years of recent experience supporting enterprise SaaS environments (or equivalent combination of education, experience, and certifications)
- Technical Expertise: Deep knowledge of incident management, alert correlation, automated triage, self-healing strategies, and SLO frameworks. Strong understanding of observability platforms, including monitoring, logging, and tracing solutions
- Programming & Scripting: Proficient in one or more programming languages (e.g., Python, Go, Java) with experience in automation and scripting for incident management workflows
- Machine Learning & Data Analysis: Experience with machine learning, anomaly detection, or data analytics techniques for real-time alert correlation and triage systems
- Cloud Infrastructure: Expertise in cloud platforms (e.g., AWS, GCP, Azure) and container orchestration (e.g., Kubernetes), with experience in infrastructure-as-code (e.g., Terraform)
- Problem-Solving & Decision-Making: Ability to make critical architectural decisions with a focus on business impact, reliability, and system performance
Responsibilities
- Design and guide the implementation of end-to-end alert correlation, auto-triage, and uto-remediation frameworks that meet the needs of a microservices-based SaaS architecture
- Ensure solutions align with business priorities and customer impact goals
- Define, implement, and monitor Service Level Objectives (SLOs) in collaboration with product and engineering teams
- Establish reliability standards that meet business and customer expectations, driving accountability and transparency around service performance
- Partner with software engineers, SREs, and data scientists to implement and refine monitoring, alerting, alert correlation, auto-remediation, and SLO solutions
- Lead initiatives to promote best practices and knowledge sharing across all of SentinelOne engineering
- Mentor engineers and contribute to a culture of reliability engineering excellence through thought leadership and guidance on advanced SRE principles and practices
Benefits
- Medical, Vision, Dental
- 401(k)
- Commuter
- Health and Dependent FSA
- Unlimited PTO
- Industry leading gender-neutral parental leave
- Paid Company Holidays
- Paid Sick Time
- Employee stock purchase program
- Disability and life insurance
- Employee assistance program
- Gym membership reimbursement
- Cell phone reimbursement
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Remote Jobs
- π°$204k-$281kπUnited States
- πIndia
- πWorldwide
- π°$220k-$240kπUnited States
- π°$150k-$190kπWorldwide
- πCanada
- πWorldwide
Please let SentinelOne know you found this job on JobsCollider. Thanks! π