Staff AI Infrastructure Engineer at SentinelOne

Summary

Join SentinelOne as a Staff AI Infrastructure Engineer and build, automate, and manage AI infrastructure at scale. You will design and maintain systems for deploying AI models across cloud environments, automate infrastructure using Helm, ArgoCD, and Terraform, manage Kubernetes clusters, and implement CI/CD pipelines. Collaborate with engineering and product teams, monitor infrastructure health, and drive best practices. This role requires a degree in a related field or equivalent experience, 7+ years of relevant experience, and proficiency with infrastructure-as-code tools and Kubernetes. Exceptional candidates will have SRE experience and expertise in monitoring and logging tools. SentinelOne offers a competitive salary, benefits, and a collaborative work environment.

Requirements

A degree in Computer Science, Information Technology, or related field, or equivalent practical experience
7+ years of experience managing scalable, secure, and resilient infrastructure for AI and machine learning applications
Deep proficiency with infrastructure-as-code tools like Helm, Terraform and ArgoCD
Extensive hands-on experience with Kubernetes for deploying containerized workloads
Demonstrated experience with major cloud platforms (AWS, GCP, Azure), specifically with services related to AI model hosting (e.g., Azure OpenAI)
Experience implementing and managing CI/CD pipelines (GitHub Actions, Jenkins)
Familiarity with compliance frameworks, particularly FedRAMP, and security best practices
Strong scripting and automation skills using Python, Bash, or similar languages
Excellent problem-solving skills, creativity, and self-driven motivation

Responsibilities

Architect, build, and maintain scalable infrastructure to host and serve AI products and models reliably
Automate infrastructure deployment and management using Helm, ArgoCD and Terraform
Manage and optimize Kubernetes clusters to support high-performance AI workloads
Implement and manage CI/CD pipelines utilizing GitHub Actions and Jenkins
Ensure infrastructure compliance with security standards including FedRAMP and related guidelines
Collaborate closely with AI engineering, product teams, and DevOps to meet infrastructure requirements
Monitor infrastructure health and performance, implementing optimizations proactively
Drive infrastructure best practices and mentor team members to foster technical excellence

Preferred Qualifications

Previous experience as a Site Reliability Engineer (SRE), particularly in AI or ML contexts
Monitoring and logging tools (Prometheus, Grafana, Datadog, Jaeger)
Networking concepts and security best practices within cloud infrastructure
Professional certifications in Kubernetes or cloud platforms (AWS, Azure, GCP)

Benefits

Medical, Vision, Dental
401(k)
Commuter
Health and Dependent FSA
Unlimited PTO
Industry-leading gender-neutral parental leave
Paid Company Holidays
Paid Sick Time
Employee stock purchase program
Disability and life insurance
Employee assistance program
Gym membership reimbursement
Cell phone reimbursement
Numerous company-sponsored events, including regular happy hours and team-building events

Staff AI Infrastructure Engineer

SentinelOne

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

Software Development

Mid-level

Aledade, Inc.

Remote

Software Development

Mid-level

Remote

Software Development

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level