Senior Site Reliability Engineer at Roadie

Summary

Join Roadie, a UPS company, as a Senior Site Reliability Engineer and contribute to the optimization and reliability of our platform. You will build systems, maintain Kubernetes clusters, deploy monitoring solutions, and collaborate with cross-functional teams. This role requires extensive experience in SRE, DevOps, Kubernetes, and AWS. We offer competitive compensation, comprehensive health insurance, 401k matching, tuition assistance, flexible work schedule with unlimited PTO, and more. The ideal candidate is a skilled problem-solver with a strong understanding of site reliability practices and a willingness to learn.

Requirements

5+ Years in various SRE roles
5+ Years in various DevOPS/System Engineering roles
5+ Years of experience building and managing production Kubernetes infrastructure
6+ Years experience with popular scripting languages (Python, Ruby, Bash, etc.)
Experience with Infrastructure as code such as Terraform or Crossplane
Experience with CI/CD Development tools (CircleCI, etc.)
Experience with GitOPS Tools (ArgoCD)
Experience using a broad range of AWS technologies (RDS, ElasticSearch, VPC, EKS, S3, CloudFront, MSK, Elasticache, CloudWatch, etc.)
Experience developing and maintaining YAML templating systems (Helm charts, Kustomize, etc)
Must be able to work independently, be self-motivated and handle multiple priorities
Comfortable working in a fast-paced agile environment
Finally, a willingness to admit what you don’t know, and learn what you need to learn quickly

Responsibilities

Build systems that optimize the uptime and reliability of our platform, and support the management and optimization of our software delivery pipeline, observability and infrastructure operations
Maintain, support, and engineer production and non-production Kubernetes Clusters (EKS) as well as ES, MSK, RDS, and EC (Redis) clusters
Deploy and maintain monitoring and logging solutions based on Prometheus, Loki, Thanos, Grafana, OpenTelemetry and New Relic
Collaborate with cross-functional teams to identify and address potential bottlenecks, optimize resource utilization, and proactively prevent system failures
Define and manage SLO, SLI and error budgets
Develop processes, tools and automation to reduce toil across engineering teams
Plan and forecast service capacity and demand, assess cost optimization, and tune systems and software
Debug production / non-production issues
Take part in 24/7 on-call rotation

Benefits

Competitive compensation packages
100% covered health insurance premiums for yourself
401k with company match
Tuition and student loan repayment assistance (that’s right - Roadie will contribute directly to your existing student loans!)
Flexible work schedule with unlimited PTO
Monthly 3-day weekends
Monthly WFH stipend
Paid sabbatical leave- tenured team members are given time to rest, relax, and explore
The technology you need to get the job done

Senior Site Reliability Engineer

Roadie

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Senior

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Senior

Trase

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior