Summary

Join Runwise, a fast-paced climate-tech startup, as a Senior Site Reliability Engineer (Sr. SRE). You will maintain the stability and performance of our services, ensuring reliability, scalability, and fault tolerance. Collaborate with engineers to build and maintain tools improving system reliability and efficiency. Responsibilities include designing and maintaining scalable infrastructure, automating workflows, building monitoring systems, collaborating with development teams, participating in on-call rotations, defining SLOs/SLIs, conducting capacity planning, and advocating for engineering best practices. This role requires 5+ years of experience in SRE, DevOps, or infrastructure roles, proven success managing production systems in cloud environments, experience with infrastructure-as-code tools, strong scripting skills, and familiarity with CI/CD practices. Runwise offers a competitive salary, comprehensive benefits, and a hybrid work environment.

Requirements

5+ years of experience in Site Reliability Engineering, DevOps, or infrastructure-focused roles
Proven success managing production systems in cloud environments like AWS, with a strong understanding of scalability and fault tolerance
Experience using infrastructure-as-code tools like AWS CloudFormation and Ansible to manage and automate deployments
Strong scripting or development skills in Python, Go, and Bash for building tools and automating workflows
Hands-on experience with observability and alerting systems like Prometheus, Grafana, or CloudWatch
Deep familiarity with CI/CD practices and tools, especially GitHub Actions, and a track record of improving build and release automation
Comfort participating in on-call rotations and managing incident response, including postmortems and service recovery
Ability to collaborate effectively across remote, distributed teams, with strong asynchronous communication and documentation habits
A proactive mindset with a focus on continuous improvement, resilience, and customer impact
Excitement about working in a fast-paced climate-tech company making a measurable environmental difference

Responsibilities

Design and maintain scalable infrastructure in AWS cloud and distributed on-prem systems
Automate infrastructure provisioning, deployment pipelines, and operational workflows using tools like Terraform, Ansible, or Helm
Build and improve monitoring, alerting, and observability systems (e.g., Cloud Health, Grafana)
Collaborate with development teams to improve service reliability, performance, and scalability
Participate in on-call rotation and manage incident response, including root cause analysis and postmortems
Define and track service-level objectives (SLOs) and service-level indicators (SLIs)
Conduct capacity planning, chaos testing, and disaster recovery exercises
Advocate for engineering best practices across CI/CD, security, and fault tolerance

Preferred Qualifications

Additional experience with distributed IoC systems is a huge plus

Benefits

Medical, dental, and vision insurance
HSA & FSA options
Paid Parental Leave
Access to Talkspace & Health Advocate
Flexible PTO
Commuter Benefits
401K
Company-paid life insurance
Voluntary supplemental life insurance
Free in-office lunch on Wednesdays
Hybrid work environment
Summer Fridays
Monthly L&D Series
Employee Resource Groups (e.g. DEIB Committee, Run Club)

Senior Site Reliability Engineer

Runwise

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

ServiceNow

Remote

DevOps

Senior

Coalition, Inc.

Remote

DevOps

Senior

Coalition, Inc.

Remote

DevOps

Senior

Censys

Remote

DevOps

Senior

SMG Swiss Marketplace Group

Remote

DevOps

Senior