Senior Site Reliability Engineer

Superhuman Logo

Superhuman

πŸ’΅ $160k-$185k
πŸ“Remote - United States, Canada

Summary

Join Superhuman, a company building the productivity platform of the future, as a Senior Site Reliability Engineer (SRE) / DevOps Engineer. This dual role combines SRE responsibilities (60%) ensuring system availability and performance with DevOps practices (40%) focusing on automation and CI/CD. You will collaborate with software engineers, design scalable systems, monitor service health, and implement disaster recovery plans. The ideal candidate possesses 6+ years of experience in SRE or DevOps, strong cloud platform proficiency, and expertise in various tools and technologies. Superhuman offers a competitive salary ($160,000 - $185,000), comprehensive benefits including health insurance, 401k matching, generous PTO, and professional development opportunities. We are open to candidates in the US, Canada, or Latin America.

Requirements

  • 6+ years of experience in SRE, DevOps, or systems engineering roles
  • Proven experience managing high-availability, mission-critical systems
  • Strong proficiency with cloud platforms (GCP, AWS, or Azure)
  • Hands-on experience with containers and orchestration tools (Docker, Kubernetes)
  • Expertise in monitoring, logging, and alerting tools (e.g., Metabase, Datadog, Prometheus, Grafana, etc)
  • Proficiency in scripting/programming languages (Python, Go, Bash, etc.)
  • Knowledge of database management systems (SQL/NoSQL)
  • Strong knowledge of networking, security, and distributed systems
  • Experience with Infrastructure as Code (Terraform, Ansible, Chef, or Puppet)
  • Familiarity with version control systems (Git) and CI/CD pipelines (Jenkins, GitLab CI, etc.)
  • Strong communication skills and ability to work collaboratively across teams
  • Problem-solving mindset with a focus on root cause analysis
  • Proactive, self-driven, and able to handle high-pressure environments

Responsibilities

  • Collaborate with software engineers to design scalable, fault-tolerant systems and services. Help smoothly integrate AI-solutions into existing architectures, ensuring that AI models, frameworks, and tools work efficiently within a broader system without causing disruptions
  • Proactively monitor service health, availability, and performance using monitoring tools like Metabase, Datadog, Prometheus, Grafana, etc
  • Establish SLAs, SLOs, and SLIs for key services and ensure alignment with business goals
  • Respond to and troubleshoot production issues, ensuring quick resolution and minimal downtime
  • Conduct post-incident reviews to ensure continuous learning and improvement
  • Perform capacity planning and scaling activities to ensure system resilience during traffic spikes or unexpected failures
  • Automate repetitive tasks to enhance efficiency (e.g., provisioning, monitoring, and alerting)
  • Implement self-healing mechanisms to reduce manual intervention
  • Continuously analyze system performance, identify bottlenecks, and work with teams to optimize applications and infrastructure
  • Design and implement disaster recovery plans and high availability strategies
  • Test failover mechanisms and backups regularly
  • Collaborate with our security team to ensure infrastructure adheres to best practices and compliance requirements
  • Implement and manage security monitoring, patching, and auditing for critical services
  • Build, maintain, and enhance CI/CD pipelines using tools like Jenkins, GitLab CI, CircleCI, or similar
  • Ensure smooth and efficient deployment processes, enabling fast and reliable delivery of code changes to production
  • Manage and automate infrastructure provisioning and configuration using tools like Terraform
  • Work on containerization solutions using Docker and orchestration with Kubernetes
  • Work closely with development teams to ensure best practices in deployment and release processes
  • Champion DevOps culture by mentoring and guiding other engineers in the use of tools and best practices

Benefits

  • Medical, dental, and vision insurance: 100% coverage for you and 75% coverage for all your dependents
  • Voluntary insurance: short-term disability, long-term disability, and life insurance
  • 401(k) plan (we match 75 cents per dollar, up to 4% of your salary)
  • Free access to Northstar, a financial wellness platform that provides financial advisors + personal finance tools
  • Enjoy our generous and flexible Paid Time Off (PTO) policy, with our amazing team members taking an average of 20 days per year
  • 13 additional company holidays, plus your own Care Days, Flexible Holidays, and a company-wide Winter Break
  • Generous parental, caregiver, healthcare, and compassionate leave policies
  • $3000 per year towards your professional development
  • Free access to Calm and Aaptive
  • Allyship education program to help build your best self
  • Custom MacBook Pro
  • $1000 budget for workstation setup
  • $260/month for your lunches, groceries, or whatever nutrition you need to stay fueled up!
  • Flexible spending accounts for commuter costs, dependent care, and healthcare expenses

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs