Summary

Join Turvo's SRE team as a hands-on Senior Site Reliability Engineer and make a significant impact. Collaborate with a global team to ensure customer satisfaction and exceed expectations. This remote role, based in Dallas, TX, requires experience in SRE, Production Support, and Elastic Kubernetes Service (EKS). You will manage application availability, performance, and capacity planning, establish standardized practices, and drive SLI/SLO measurement. The ideal candidate will have a strong technical background in cloud infrastructure, distributed systems, and Kubernetes. Turvo offers competitive salaries, bonuses, and a comprehensive benefits package.

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, or a similar discipline
10+ years experience in SRE, DevOps and/or Information Technology
Must have previous role(s) in SRE/production support in a large-scale environment
Strong technical knowledge of cloud infrastructure, distributed systems, and reliability practices
Strong hands-on experience with Kubernetes (EKS) in production environments
Proficiency with AWS infrastructure and services (EC2, S3, RDS, IAM)
Hands-on experience with tools such as ELK (Elasticsearch, Logstash, Kibana), Grafana, CloudWatch, Jenkins, and Jira
Proficient in one of scripting/programming languages (Java, Python)
Significant Experience with relational databases (MySQL) and NoSQL (preferably Mongo DB)
Solid experience with Docker and Infrastructure-as-Code tools like Terraform or Cloud Formation
Strong troubleshooting/problem-solving skills with the ability to make swift informed judgment calls
Strong written and verbal communication skills with demonstrated ability to communicate effectively with all levels of an organization
Must be eager to continuously improve customer experience by collaborating with engineering leads, product, and customer success teams
Passionate and collaborative team player with a strong work ethic and focus on achieving shared goals
Security background and understanding of SaaS platform security

Responsibilities

Manage the complete application availability, performance, efficiency, and capacity planning lifecycle while ensuring round-the-clock monitoring for a highly scalable and reliable platform
Establish standardized practices for monitoring, incident response, blameless postmortems, releases, and other maintenance activities
Create, prioritize, communicate, and execute a roadmap for the site reliability function to align with organizational goals
Drive and manage the measurement of SLI/SLO, ensuring the team meets established goals for availability and SLA
Manage and resolve cross-team performance issues, from identifying the root cause to determining and implementing improvements
Collaborate with engineering leads to influence and prioritize resiliency and reliability efforts through code, monitoring feedback, and process enhancements

Benefits

Great health, dental, vision benefits
Competitive salaries and bonuses
401k with employer match
Learning & development opportunities
Paid parental leave
Focus on work-life balance
Monthly wellness day

Senior Site Reliability Engineer

Turvo

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Senior

Provision IAM

Remote

DevOps

Senior

Remote

DevOps

Senior

Fetch

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

BeyondTrust

Remote

DevOps

Senior

Wizeline

Remote

DevOps

Senior