Staff Infrastructure Engineer - Developer Platform

SentinelOne Logo

SentinelOne

πŸ“Remote - Czech Republic

Summary

Join SentinelOne as a Staff Infrastructure Engineer and become a pivotal technical leader in our Observability team. You will design, implement, and optimize solutions for high-volume data collection, storage, and analysis, ensuring the reliability and efficiency of our global platform. Leverage your expertise in Grafana, Prometheus, Thanos, and OTEL to drive operational efficiency and automation. Architect and implement scalable systems, serve as a subject matter expert, and lead incident resolution. Collaborate with cross-functional teams and take end-to-end ownership of critical features. This role requires 7+ years of experience in IT and profound hands-on experience in architecting and optimizing observability solutions. SentinelOne offers a competitive benefits package including stock and bonuses, flexible time off, insurance and health benefits, and work perks.

Requirements

  • A distinguished track record of 7+ years in IT or a related technical field, demonstrating sustained growth and impact
  • Profound, hands-on experience in architecting and optimizing comprehensive observability solutions
  • Demonstrated mastery in cutting-edge infrastructure design and robust cloud architecture
  • Extensive, proven experience in ensuring the extreme reliability of high-scale SaaS products
  • Deep expertise with foundational observability technologies, including Grafana, Prometheus, Thanos/Mimir/Cortex, OTEL, or comparable advanced platforms
  • A strong command of container orchestration systems like Kubernetes, proficiently utilizing tools such as Helm, Kustomize, and similar
  • Substantial multi-cloud experience , possessing deep expertise in at least one major platform (AWS, GCP)
  • A solid grasp of modern CI/CD principles and advanced deployment automation tools (e.g., GitHub Actions)
  • Proficiency in various sophisticated deployment strategies such as blue-green, rolling deployments, and canary releases
  • Exceptional programming proficiency in a mainstream language, with deep expertise in GoLang being highly desirable (or a strong willingness to master GoLang if proficient in another major programming language)
  • Comprehensive understanding and hands-on experience with Infrastructure as Code (IaC) tools like Terraform and Ansible

Responsibilities

  • Drive exemplary operational efficiency for critical observability services (Grafana, Prometheus, Thanos, OTEL), meticulously balancing unwavering reliability with shrewd cost-effectiveness . This includes expertly optimizing cloud resource utilization and strategically aligning workloads with optimal machine types across our multi-cloud environment
  • Champion automation to drastically reduce operational toil and minimize pager burden, freeing up engineering cycles for innovation
  • Cultivate robust operational visibility by rigorously implementing Infrastructure as Code (IaC), embedding comprehensive observability, and championing industry best practices
  • Architect and implement robust, scalable systems and platforms that directly empower SentinelOne engineers to deliver features with unparalleled safety, speed, and reliability
  • Serve as a subject matter expert (SME) and actively administrate core observability tools, including Grafana, Prometheus, Thanos/Mimir/Cortex, and OTEL collectors/pipelines
  • Operate and innovate across diverse, large-scale environments , spanning Kubernetes clusters (EKS, GKE) and core cloud platforms (AWS, GCP)
  • Lead swift and effective resolution of highly complex technical incidents and issues , ensuring continuous system integrity and peak performance
  • Elevate team quality by meticulously reviewing technical designs and code, providing insightful, constructive feedback that fosters growth and upholds SentinelOne's high standards
  • Drive impactful cross-functional collaboration , strategically engaging with diverse teams to define system requirements and ensure our platform robustly meets the evolving needs of all stakeholders
  • Take end-to-end ownership of critical features, from initial requirements refinement through to flawless production deployment and ongoing operational excellence
  • Participate in on-call rotations, providing expert-level support to ensure the continuous reliability and readiness of our production systems

Preferred Qualifications

Valuable familiarity with the unique complexities of on-premises and air-gapped Kubernetes deployments

Benefits

  • Stock & Bonuses: Grant of Restricted Stock Units with a 4-year vesting plan, annual performance-based bonuses, and an employee stock purchase plan
  • Time Off & Well-being: Flexible Time Off, on top of the standard 5 weeks vacation, flexible paid sick days, fully paid Short Term Sick/Nursing Leave, 16-week parental leave, grandparent leave, and additional company holidays
  • Insurance & Health: Pension Insurance Contribution, Premium life insurance, Private medical care (for you and +1), and a Global Employee Assistance Program
  • Work Perks: Monthly meal and well-being allowance, high-end MacBook/Windows laptop, work-from-home support, and in-office refreshments
  • Growth & Community: LinkedIn Learning, internal mentoring, educational support, generous referral bonuses, and optional company events (sports, BBQs, charity)

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs