Site Reliability Engineer at StarCompliance

Summary

Join Starcompliance as a Site Reliability Engineer (SRE) and play a pivotal role in modernizing our platform. You will lead the evolution from legacy systems to modern, scalable microservices, focusing on application-level observability, autoscaling, and progressive delivery. Collaborate with cross-functional teams to design, build, and implement next-generation SRE practices and tools. This role offers the opportunity to make a significant impact on our platform's reliability and scalability as we grow to support thousands of customers and millions of end users. You will champion reliability by design, lead observability overhauls, and develop auto-scaling strategies. This is a foundational role in a company-wide modernization initiative.

Requirements

5+ years in SRE, DevOps, or Production Engineering roles, ideally within a SaaS or cloud-native environment
Deep experience with cloud platforms (preferably Azure or AWS), and Infrastructure-as-Code tools (e.g. Terraform)
Proficiency with observability tools such as New Relic, Datadog, Prometheus, or similar
Strong understanding of software deployment strategies, CI/CD pipelines, and release engineering
Ability to code in at least one modern scripting or systems language (e.g., Python,PowerShell, Go, Bash)
Experience operating multi-tenant environments with an emphasis on security, performance, and cost optimization
Excellent communicator who thrives in cross-functional settings and can influence engineering culture around reliability

Responsibilities

Champion Reliability by Design : Collaborate with architects and engineers to build resilient, fault-tolerant systems across our evolving cloud-native stack
Observability Overhaul : Lead the charge on full-stack observability, leveraging modern APM tooling, meaningful SLOs/SLIs, and actionable alerts
Scaling Systems : Develop and implement auto-scaling strategies, load testing plans, and capacity forecasting for multi-tenant environments
Progressive Delivery : Help implement and automate deployment strategies such as canary releases, feature flags, and blue/green rollouts
Incident Response : Create and refine on-call processes, incident response playbooks, and blameless post-mortem routines
Monitoring & Tooling : Own and evolve our monitoring infrastructure, integrating metrics, logs, and traces into a cohesive ecosystem
Developer Empowerment : Build reusable templates, dashboards, and platform tooling to empower dev teams to “shift left” on reliability
Cross-functional Collaboration : Work hand-in-hand with Infrastructure, Architecture, Support, and Engineering teams to drive shared accountability for uptime and performance

Preferred Qualifications

Hands-on experience with Azure DevOps is strongly preferred, as our CI/CD and project workflows are fully built around it
Experience in regulated industries (e.g., financial services, healthcare)
Background with service mesh architectures, distributed tracing, and gRPC/GraphQL
Familiarity with incident management platforms (e.g., PagerDuty, OpsGenie)
Contributions to open-source SRE tooling or frameworks

Site Reliability Engineer

StarCompliance

Summary

Requirements

Responsibilities

Preferred Qualifications

Remote

DevOps

Mid-level

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

GoDaddy

Remote

DevOps

Mid-level

Remote

DevOps

Senior