Senior Site Reliability Engineer

Movable Ink Logo

Movable Ink

πŸ“Remote - Canada

Summary

Join Movable Ink as a Senior Site Reliability Engineer and be 100% hands-on with infrastructure and software development. You'll work on a multi-region, active-active content serving platform handling billions of daily requests. Responsibilities include improving infrastructure tooling and automation, building and maintaining core applications and observability platforms, monitoring systems, and collaborating with other teams. This role requires significant experience in SRE or software engineering, building scalable services, and managing large-scale observability platforms. Experience with AWS, Kubernetes, Terraform, Chef, and various programming languages is essential. The company encourages applications even if you don't meet every qualification, emphasizing diversity and inclusion.

Requirements

  • Experience in Site Reliability or Software Engineering, building and maintaining scalable, resilient services
  • Building the tooling and automation to manage those services, as well as investigating system and application metrics to diagnose and resolve performance issues
  • 4+ years experience as an SRE or Software Engineer, with a focus on Cloud platforms. We use AWS
  • Experience building and operating large scale observability platforms. We use Prometheus, Thanos, Loki and Tempo
  • Experience and willingness to operate in an on-call environment, evaluating and improving monitoring and alerting systems, and developing run books to investigate and debug issues. Every member of the SRE team does a week long on-call rotation every 5 to 6 weeks
  • Strong experience with infrastructure as code tools. We use Terraform and Chef
  • Strong experience with operating Kubernetes and running workloads on it. We use EKS
  • Familiarity with one or more high level programming languages and a willingness to learn. We use NodeJS, Golang, Ruby, Python, Bash and Shell scripting
  • Linux experience (Ubuntu/Debian)

Responsibilities

  • Improve the tooling and automation of our infrastructure to minimize manual work, increase performance, and decrease the frequency and severity of incidents
  • Build, maintain, and support core applications
  • Build and operate our core internal observability platform
  • Monitor our systems for capacity, performance, and troubleshooting issues
  • Partner with the rest of the SRE team and our service engineering teams to ensure smooth, continued delivery of our service to clients

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.