Summary

Join Superhuman, a company building the productivity platform of the future, as a Senior Site Reliability Engineer (SRE) / DevOps Engineer. This dual role combines SRE responsibilities (60%) ensuring system availability and performance with DevOps practices (40%) focusing on automation and CI/CD. You will collaborate with software engineers, design scalable systems, monitor service health, and implement disaster recovery plans. The ideal candidate possesses 6+ years of experience in SRE or DevOps, strong cloud platform proficiency, and expertise in various tools and technologies. Superhuman offers a competitive salary ($160,000 - $185,000), comprehensive benefits including health insurance, 401k matching, generous PTO, and professional development opportunities. We are open to candidates in the US, Canada, or Latin America.

Requirements

6+ years of experience in SRE, DevOps, or systems engineering roles
Proven experience managing high-availability, mission-critical systems
Strong proficiency with cloud platforms (GCP, AWS, or Azure)
Hands-on experience with containers and orchestration tools (Docker, Kubernetes)
Expertise in monitoring, logging, and alerting tools (e.g., Metabase, Datadog, Prometheus, Grafana, etc)
Proficiency in scripting/programming languages (Python, Go, Bash, etc.)
Knowledge of database management systems (SQL/NoSQL)
Strong knowledge of networking, security, and distributed systems
Experience with Infrastructure as Code (Terraform, Ansible, Chef, or Puppet)
Familiarity with version control systems (Git) and CI/CD pipelines (Jenkins, GitLab CI, etc.)
Strong communication skills and ability to work collaboratively across teams
Problem-solving mindset with a focus on root cause analysis
Proactive, self-driven, and able to handle high-pressure environments

Responsibilities

Collaborate with software engineers to design scalable, fault-tolerant systems and services. Help smoothly integrate AI-solutions into existing architectures, ensuring that AI models, frameworks, and tools work efficiently within a broader system without causing disruptions
Proactively monitor service health, availability, and performance using monitoring tools like Metabase, Datadog, Prometheus, Grafana, etc
Establish SLAs, SLOs, and SLIs for key services and ensure alignment with business goals
Respond to and troubleshoot production issues, ensuring quick resolution and minimal downtime
Conduct post-incident reviews to ensure continuous learning and improvement
Perform capacity planning and scaling activities to ensure system resilience during traffic spikes or unexpected failures
Automate repetitive tasks to enhance efficiency (e.g., provisioning, monitoring, and alerting)
Implement self-healing mechanisms to reduce manual intervention
Continuously analyze system performance, identify bottlenecks, and work with teams to optimize applications and infrastructure
Design and implement disaster recovery plans and high availability strategies
Test failover mechanisms and backups regularly
Collaborate with our security team to ensure infrastructure adheres to best practices and compliance requirements
Implement and manage security monitoring, patching, and auditing for critical services
Build, maintain, and enhance CI/CD pipelines using tools like Jenkins, GitLab CI, CircleCI, or similar
Ensure smooth and efficient deployment processes, enabling fast and reliable delivery of code changes to production
Manage and automate infrastructure provisioning and configuration using tools like Terraform
Work on containerization solutions using Docker and orchestration with Kubernetes
Work closely with development teams to ensure best practices in deployment and release processes
Champion DevOps culture by mentoring and guiding other engineers in the use of tools and best practices

Benefits

Medical, dental, and vision insurance: 100% coverage for you and 75% coverage for all your dependents
Voluntary insurance: short-term disability, long-term disability, and life insurance
401(k) plan (we match 75 cents per dollar, up to 4% of your salary)
Free access to Northstar, a financial wellness platform that provides financial advisors + personal finance tools
Enjoy our generous and flexible Paid Time Off (PTO) policy, with our amazing team members taking an average of 20 days per year
13 additional company holidays, plus your own Care Days, Flexible Holidays, and a company-wide Winter Break
Generous parental, caregiver, healthcare, and compassionate leave policies
$3000 per year towards your professional development
Free access to Calm and Aaptive
Allyship education program to help build your best self
Custom MacBook Pro
$1000 budget for workstation setup
$260/month for your lunches, groceries, or whatever nutrition you need to stay fueled up!
Flexible spending accounts for commuter costs, dependent care, and healthcare expenses

Senior Site Reliability Engineer

Superhuman

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Intetics

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

ServiceNow

Remote

DevOps

Senior