Site Reliability Engineer at Evertz

Summary

Join our growing team as a highly motivated and passionate Site Reliability Engineer at evertz.io, where you will contribute to building services used by major players in the broadcast and media industry. You will work with talented teams to enhance our multi-tenant SaaS platform hosted on AWS, utilizing best-in-class observability tools. Your responsibilities will include debugging incidents, implementing platform improvements, automating processes, and building tools to ensure reliability. We offer flexible working hours and excellent benefits, along with the opportunity to experiment with new technologies. The role requires significant experience in managing production infrastructure, programming, and working with various AWS services.

Requirements

At least 3 years of hands-on experience managing critical, high-availability production infrastructure, demonstrating success in maintaining reliability and maximizing application uptime
Proficient in at least one programming language (such as Python, Java, or Rust), with experience designing and building production-quality automation, tools, or software libraries
At least 3 years working with monitoring, log aggregation, and observability platforms such as Datadog, CloudWatch, Honeycomb, Splunk, or New Relic, using data-driven insights to proactively identify and resolve issues
Excellent analytical skills with the ability to understand end-to-end use cases, map system flows, debug complex issues, and anticipate potential failure points
Proven track record translating SLO’s and SLI’s into actionable improvements. Reliability, monitoring, and observability are not just words to you
At least 3 years of experience with cloud technologies, in particular AWS Services and tools such as Cloud Formation, Lambda, DynamoDB, SQS, SNS, EC2, S3, AWS CLI, Boto3
Solid foundation in Linux systems administration, networking, and security
Familiarity with the use and configuration of CI & CD pipelines such as Jenkins & AWS CodePipeline

Responsibilities

Work with our talented teams to help harden our multi-tenant SaaS platform
Using best in class observability tooling, you will be working to debug incidents, while also identifying and implementing improvements to the platform to ensure its continued reliability
Your drive to eliminate toil will see you automating processes and building the tools to do so

Preferred Qualifications

Experience architecting and deploying serverless applications in cloud environments
Experience with infrastructure-as-code tools like Terraform or CloudFormation, enabling reproducible and scalable environments
Previous participation in production on-call rotations, with direct involvement in incident management and post-incident reviews
Demonstrated expertise in performance optimization for core AWS services, including Lambda, DynamoDB, API Gateway, SQS, EventBridge, and EC2
Experience supporting and improving systems with frequent, high-velocity deployment cycles
Familiarity with security compliance frameworks (e.g., OWASP, ISO, CSA, PCI), and hands-on experience conducting threat assessments and implementing remediation plans
Background in security practices, including penetration testing, threat modeling, and usage of both open-source and commercial security tools
Experience developing and implementing advanced deployment strategies for web application infrastructures—such as canary, A/B testing, blue/green deployments, or red/line patterns
Hands-on experience with chaos engineering—intentionally testing systems under extreme conditions to improve reliability and fault tolerance
Track record of championing system reliability, continuous improvement, and operational excellence throughout an organization

Benefits

Flexible working hours
Great benefits

Site Reliability Engineer

Evertz

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

GoDaddy

Remote

DevOps

Mid-level

Remote

DevOps

Senior