Staff Infrastructure Engineer - Developer Platform

SentinelOne
Summary
Join SentinelOne as a Staff Infrastructure Engineer and become a pivotal technical leader in our Observability team. You will design, implement, and optimize solutions for high-volume data collection, storage, and analysis, ensuring the reliability and performance of our global platform. Leverage your expertise in Grafana, Prometheus, Thanos, and OTEL to drive operational efficiency and champion automation. You will collaborate with diverse teams, lead incident resolution, and mentor others. This role requires a distinguished track record in IT and profound hands-on experience in architecting observability solutions. SentinelOne offers flexible working hours, remote work options, and a competitive benefits package.
Requirements
- A distinguished track record of 7+ years in IT or a related technical field, demonstrating sustained growth and impact
- Profound, hands-on experience in architecting and optimizing comprehensive observability solutions
- Demonstrated mastery in cutting-edge infrastructure design and robust cloud architecture
- Extensive, proven experience in ensuring the extreme reliability of high-scale SaaS products
- Deep expertise with foundational observability technologies, including Grafana, Prometheus, Thanos/Mimir/Cortex, OTEL, or comparable advanced platforms
- A strong command of container orchestration systems like Kubernetes, proficiently utilizing tools such as Helm, Kustomize, and similar
- Substantial multi-cloud experience , possessing deep expertise in at least one major platform (AWS, GCP)
- A solid grasp of modern CI/CD principles and advanced deployment automation tools (e.g., GitHub Actions)
- Proficiency in various sophisticated deployment strategies such as blue-green, rolling deployments, and canary releases
- Exceptional programming proficiency in a mainstream language, with deep expertise in GoLang being highly desirable (or a strong willingness to master GoLang if proficient in another major programming language)
- Comprehensive understanding and hands-on experience with Infrastructure as Code (IaC) tools like Terraform and Ansible
Responsibilities
- Drive exemplary operational efficiency for critical observability services (Grafana, Prometheus, Thanos, OTEL), meticulously balancing unwavering reliability with shrewd cost-effectiveness . This includes expertly optimizing cloud resource utilization and strategically aligning workloads with optimal machine types across our multi-cloud environment
- Champion automation to drastically reduce operational toil and minimize pager burden, freeing up engineering cycles for innovation
- Cultivate robust operational visibility by rigorously implementing Infrastructure as Code (IaC), embedding comprehensive observability, and championing industry best practices
- Architect and implement robust, scalable systems and platforms that directly empower SentinelOne engineers to deliver features with unparalleled safety, speed, and reliability
- Serve as a subject matter expert (SME) and actively administrate core observability tools, including Grafana, Prometheus, Thanos/Mimir/Cortex, and OTEL collectors/pipelines
- Operate and innovate across diverse, large-scale environments , spanning Kubernetes clusters (EKS, GKE) and core cloud platforms (AWS, GCP)
- Lead swift and effective resolution of highly complex technical incidents and issues , ensuring continuous system integrity and peak performance
- Elevate team quality by meticulously reviewing technical designs and code, providing insightful, constructive feedback that fosters growth and upholds SentinelOne's high standards
- Drive impactful cross-functional collaboration , strategically engaging with diverse teams to define system requirements and ensure our platform robustly meets the evolving needs of all stakeholders
- Take end-to-end ownership of critical features, from initial requirements refinement through to flawless production deployment and ongoing operational excellence
- Participate in on-call rotations, providing expert-level support to ensure the continuous reliability and readiness of our production systems
Preferred Qualifications
Valuable familiarity with the unique complexities of on-premises and air-gapped Kubernetes deployments
Benefits
- Flexible working hours and the option to work remotely from anywhere in Slovakia
- Access to major co-working spaces for those who prefer an office environment
- In Czechia, you can also work from our modern offices in Prague or Brno
- Salary starting from 5000 EUR/month
- Annual bonus based on company performance, paid in two instalments
- The final base salary may be adjusted based on the individual skills and experience of the selected candidate
- Stock & Bonuses : Grant of Restricted Stock Units with a 4-year vesting plan, annual performance-based bonuses, and an employee stock purchase plan
- Time Off & Well-being: Flexible Time Off, on top of the standard 5 weeks vacation, flexible paid sick days, fully paid Short Term Sick/Nursing Leave, 16-week parental leave, grandparent leave, and additional company holidays
- Insurance & Wellbeing : Pension Insurance Contribution, Premium life insurance, and a Global Employee Assistance Program
- Work Perks : Monthly meal and well-being allowance, high-end MacBook/Windows laptop, work-from-home support, and in-office refreshments
- Growth & Community : LinkedIn Learning, internal mentoring, educational support, generous referral bonuses, and optional company events (sports, BBQs, charity)