Summary

Join SentinelOne as a Staff Infrastructure Engineer and become a pivotal technical leader in our Observability team. You will design, implement, and optimize solutions for high-volume data collection, storage, and analysis, ensuring the reliability and efficiency of our global platform. Leverage your expertise in Grafana, Prometheus, Thanos, and OTEL to drive operational efficiency and automation. Architect and implement scalable systems, serve as a subject matter expert, and lead incident resolution. Collaborate with cross-functional teams and take end-to-end ownership of critical features. This role requires 7+ years of experience in IT and profound hands-on experience in architecting and optimizing observability solutions. SentinelOne offers a competitive benefits package including stock and bonuses, flexible time off, insurance and health benefits, and work perks.

Requirements

A distinguished track record of 7+ years in IT or a related technical field, demonstrating sustained growth and impact
Profound, hands-on experience in architecting and optimizing comprehensive observability solutions
Demonstrated mastery in cutting-edge infrastructure design and robust cloud architecture
Extensive, proven experience in ensuring the extreme reliability of high-scale SaaS products
Deep expertise with foundational observability technologies, including Grafana, Prometheus, Thanos/Mimir/Cortex, OTEL, or comparable advanced platforms
A strong command of container orchestration systems like Kubernetes, proficiently utilizing tools such as Helm, Kustomize, and similar
Substantial multi-cloud experience , possessing deep expertise in at least one major platform (AWS, GCP)
A solid grasp of modern CI/CD principles and advanced deployment automation tools (e.g., GitHub Actions)
Proficiency in various sophisticated deployment strategies such as blue-green, rolling deployments, and canary releases
Exceptional programming proficiency in a mainstream language, with deep expertise in GoLang being highly desirable (or a strong willingness to master GoLang if proficient in another major programming language)
Comprehensive understanding and hands-on experience with Infrastructure as Code (IaC) tools like Terraform and Ansible

Responsibilities

Drive exemplary operational efficiency for critical observability services (Grafana, Prometheus, Thanos, OTEL), meticulously balancing unwavering reliability with shrewd cost-effectiveness . This includes expertly optimizing cloud resource utilization and strategically aligning workloads with optimal machine types across our multi-cloud environment
Champion automation to drastically reduce operational toil and minimize pager burden, freeing up engineering cycles for innovation
Cultivate robust operational visibility by rigorously implementing Infrastructure as Code (IaC), embedding comprehensive observability, and championing industry best practices
Architect and implement robust, scalable systems and platforms that directly empower SentinelOne engineers to deliver features with unparalleled safety, speed, and reliability
Serve as a subject matter expert (SME) and actively administrate core observability tools, including Grafana, Prometheus, Thanos/Mimir/Cortex, and OTEL collectors/pipelines
Operate and innovate across diverse, large-scale environments , spanning Kubernetes clusters (EKS, GKE) and core cloud platforms (AWS, GCP)
Lead swift and effective resolution of highly complex technical incidents and issues , ensuring continuous system integrity and peak performance
Elevate team quality by meticulously reviewing technical designs and code, providing insightful, constructive feedback that fosters growth and upholds SentinelOne's high standards
Drive impactful cross-functional collaboration , strategically engaging with diverse teams to define system requirements and ensure our platform robustly meets the evolving needs of all stakeholders
Take end-to-end ownership of critical features, from initial requirements refinement through to flawless production deployment and ongoing operational excellence
Participate in on-call rotations, providing expert-level support to ensure the continuous reliability and readiness of our production systems

Preferred Qualifications

Valuable familiarity with the unique complexities of on-premises and air-gapped Kubernetes deployments

Benefits

Stock & Bonuses: Grant of Restricted Stock Units with a 4-year vesting plan, annual performance-based bonuses, and an employee stock purchase plan
Time Off & Well-being: Flexible Time Off, on top of the standard 5 weeks vacation, flexible paid sick days, fully paid Short Term Sick/Nursing Leave, 16-week parental leave, grandparent leave, and additional company holidays
Insurance & Health: Pension Insurance Contribution, Premium life insurance, Private medical care (for you and +1), and a Global Employee Assistance Program
Work Perks: Monthly meal and well-being allowance, high-end MacBook/Windows laptop, work-from-home support, and in-office refreshments
Growth & Community: LinkedIn Learning, internal mentoring, educational support, generous referral bonuses, and optional company events (sports, BBQs, charity)

Staff Infrastructure Engineer - Developer Platform

SentinelOne

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Mid-level

Remote

Data

Mid-level

Remote

DevOps

Mid-level

Remote

Data

Senior

Remote

Data

Entry Level

Angi

Remote

DevOps

Senior

Remote

Data

Mid-level

Calendly

Remote

Data

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level