Site Reliability Engineer

SecurityScorecard
Summary
Join SecurityScorecard as a Staff Site Reliability Engineer and become a key technical leader in designing, implementing, and optimizing our Kubernetes-based infrastructure and CI/CD systems. You will work closely with engineering teams to enhance delivery speed, ensure production reliability, and integrate best practices for automation, observability, and resilience. This role demands strong technical expertise and collaborative skills to guide large-scale infrastructure and platform initiatives. You will design, build, and scale Kubernetes infrastructure, optimize CI/CD pipelines, collaborate on progressive delivery strategies, improve Infrastructure as Code practices, build automated testing strategies, enhance system observability, contribute to incident response, and mentor other engineers. The position requires significant experience in production Kubernetes environments and CI/CD pipelines, along with expertise in Infrastructure as Code and testing automation. SecurityScorecard offers a competitive salary, stock options, health benefits, unlimited PTO, parental leave, and tuition reimbursements.
Requirements
- 6+ years of experience in SRE, DevOps, or Infrastructure roles, including significant experience in production Kubernetes environments
- Proven success building and maintaining CI/CD pipelines using tools such as GitHub Actions, Jenkins, GitLab CI, or Spinnaker
- Strong hands-on experience with Kubernetes internals (networking, scaling, RBAC, etc.) and cloud-managed services like EKS, GKE, or AKS
- Expertise with Infrastructure as Code (Terraform, Helm, Pulumi) and GitOps workflows
- Solid experience with test automation tools and integrating testing into the CI/CD lifecycle
- Proficient in scripting or programming languages such as Python, Bash, or Go
- Knowledge of monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, OpenTelemetry)
- Strong communication and collaboration skills to work effectively with product engineering, security, and platform teams
Responsibilities
- Design, build, and scale Kubernetes infrastructure to support secure, multi-tenant, high-availability applications
- Lead efforts to optimize and maintain CI/CD pipelines , improving reliability, speed, and rollback safety for production deployments
- Collaborate with developers to implement progressive delivery strategies , including blue/green and canary deployments
- Improve Infrastructure as Code practices with tools like Terraform, Helm, and Argo CD , and help define reusable patterns for the broader org
- Build and enforce automated testing strategies (unit, integration, performance) within the CI/CD lifecycle
- Partner with development and platform teams to improve system observability , define SLOs, and establish meaningful alerts and dashboards
- Actively contribute to incident response efforts and postmortems, with a focus on root cause analysis and sustainable remediation
- Mentor engineers across teams, sharing deep knowledge of Kubernetes, CI/CD, and cloud infrastructure
Preferred Qualifications
- Experience operating multi-region or multi-cluster Kubernetes environments
- Exposure to chaos engineering , resilience testing, or traffic shaping strategies
- Familiarity with security scanning, compliance automation, or infrastructure policy-as-code
- Contributions to open-source Kubernetes tools or CI/CD platforms
- Familiarity with JVM and Node.js-based services
Benefits
- Health benefits
- Unlimited PTO
- Parental leave
- Tuition reimbursements