Senior Site Reliability Engineer
closedSupermetrics
πRemote - United States
Job highlights
Summary
Join our Infrastructure team in Canada as a Senior Site Reliability Engineer to ensure our platform is scalable, reliable, and easy to use. In this role, you'll raise the team's bar in Kubernetes expertise, operate the platform, write Terraform configuration, maintain tooling, develop Helm charts, respond to incidents, support pre-sales teams, review architecture changes, troubleshoot technical issues, and participate in on-call rotations.
Requirements
- 4+ years of experience in Site Reliability Engineering, Platform Engineering, or related roles
- Strong understanding of containers and experience operating Kubernetes clusters at scale
- Proficient in database concepts with hands-on experience in both relational and NoSQL databases
- In-depth knowledge of Linux systems and Terraform
- In-depth experience and understanding of AWS and/or GCP
- Solid understanding of modern observability practices and tools
- Automation mindset with the ability to automate repetitive tasks using scripting languages such as Python or Bash
- Team player spirit
- Willing to take on-call rotations during non-business hours
- Good communication skills, in particular in writing (documentation, but able to write good PRs too)
- Strong problem-solving skills with a passion for the tools, technologies and problems in this space
Responsibilities
- Raise the team's bar in Kubernetes expertise
- Operate the platform that enables our SaaS products to be used by thousands of businesses from around the world, defining SLAs and SLOs and driving the automation that will ensure we meet them
- Write Terraform configuration and modules that bootstrap a Kubernetes cluster, or review PRs with contributions from other members, making sure that our modules are truly reusable and well-defined, improving how we test and release them
- Write (using Golang, for example) and maintain or improve our tooling, ensuring it facilitates platform utilization by engineering teams
- Develop and maintain Helm charts for internal deployments and third-party software
- Respond to an incident with our production environment
- Support our pre-sales team and help them answer potential customers' questions on our architecture and how we guarantee data security or consistency or ensure uptime
- Review an architecture change involving a new database and take part in the meetings discussing the pros and cons of such an approach
- Rewrite a Github Action to improve how we deploy to Kubernetes using GitOps
- Troubleshoot and resolve technical issues as they arise
- Participate in our on-call rotations to provide support, respond to incidents, or handle internal users' questions
Benefits
- Competitive compensation package, including equity
- Excellent work equipment and home office allowance for those working in our fully remote locations
- Health care benefits and leisure time insurance
- Annual 1000 euros of personal learning budget
- Sports and wellbeing allowance
This job is filled or no longer available
Similar Remote Jobs
- πAsia
- πUnited States
- π°$95k-$125kπWorldwide
- πWorldwide
- πUnited States
- π°$198k-$220kπUnited States
- πUnited States
- πIsrael
- πEurope