Principal Site Reliability Engineer

SecurityScorecard Logo

SecurityScorecard

💵 $220k-$290k
📍Remote - Worldwide

Summary

Join SecurityScorecard as a Principal Site Reliability Engineer and take a strategic leadership role in enhancing the reliability, scalability, and speed of our engineering platform. You will spearhead advancements in our Kubernetes-based infrastructure and CI/CD systems to support large-scale, high-availability services. Collaborate with engineering leaders to define and implement platform-wide initiatives that enable rapid, secure, and repeatable deployments. Foster a culture of reliability and operational excellence. Lead the design and evolution of Kubernetes infrastructure, architect and optimize CI/CD pipelines, and establish best practices for GitOps and progressive delivery. Drive the adoption of automated testing and guide teams in designing for observability and alerting. Partner with security and development teams to ensure infrastructure meets security standards. Mentor and influence senior engineers to improve platform reliability.

Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles, with 2+ years in a technical leadership or principal capacity
  • Deep expertise with Kubernetes internals (controllers, networking, autoscaling, operators, etc.) and production-grade clusters on cloud providers (EKS, GKE, or AKS)
  • Proven experience designing and scaling CI/CD systems using tools such as GitHub Actions, Argo CD, Tekton, Spinnaker, or similar
  • Strong proficiency in Terraform and modern IaC practices
  • Advanced knowledge of automated testing strategies, including performance, load, and failure testing
  • Proficient in one or more programming/scripting languages (Python, Go, Bash, etc.)
  • Deep experience with monitoring and observability stacks such as Prometheus, Grafana, OpenTelemetry, and Datadog
  • Strong communicator with the ability to align technical initiatives to business objectives and influence across engineering teams

Responsibilities

  • Lead the design and evolution of Kubernetes-based infrastructure to support multi-tenant, high-scale applications with strong isolation, resilience, and security
  • Architect and optimize CI/CD pipelines to support fast and reliable build, test, and deploy cycles across a polyglot environment
  • Establish and evangelize best practices for GitOps, canary deployments, rollback strategies, and progressive delivery
  • Define and implement scalable Infrastructure as Code (IaC) patterns using tools such as Terraform, Helm, and Crossplane
  • Drive the adoption of automated testing throughout the delivery lifecycle—unit, integration, load, and chaos testing—to ensure high confidence in production changes
  • Guide teams in designing for observability, SLOs, and alerting, ensuring actionable signals and minimizing alert fatigue
  • Partner with security, compliance, and development teams to ensure infrastructure and delivery systems meet modern security and governance standards
  • Lead incident response retrospectives and foster a blameless culture of continuous improvement
  • Mentor and influence senior engineers across multiple teams, helping to up-level platform reliability capabilities organization-wide

Preferred Qualifications

  • Experience implementing multi-cluster or multi-region Kubernetes strategies
  • Exposure to chaos engineering and building resilient distributed systems
  • Familiarity with compliance frameworks (SOC 2, HIPAA, etc.) as they relate to infrastructure and deployment
  • Contributions to open-source Kubernetes tooling or SRE frameworks
  • Familiarity with JVM- or Node-based application stacks

Benefits

  • Competitive salary
  • Stock options
  • Health benefits
  • Unlimited PTO
  • Parental leave
  • Tuition reimbursements

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.