Senior Site Reliability Engineer

Censys
Summary
Join Censys as a Senior Site Reliability Engineer (SRE) and contribute to building and maintaining the tools that empower our development teams and production applications. You will focus on developer efficiency and experience, improving the SDLC and workflows by supporting application code, automation, and empowering developers. Responsibilities include building and maintaining tooling for Kubernetes and GCP, working with development teams to deploy services, ensuring smooth production operations, and creating a self-service platform. You will participate in on-call rotations and collaborate with the SRE and infrastructure team. This role requires 5+ years of SRE experience and expertise in Kubernetes, containerization, cloud environments, and infrastructure-as-code. The position offers a competitive salary, bonus eligibility, equity, and comprehensive benefits.
Requirements
- 5+ years of experience in an SRE role or similar
- Experience deploying, managing, and debugging applications in a Kubernetes environment. We leverage Helm and Crossplane heavily to deploy our applications
- Experience building, securing, and managing container images
- Experience working with Cloud-based environments, and interacting with Cloud services such as CloudSQL databases, Pub/Sub, Memorystore, and others
- Familiarity with Infrastructure-as-code Tools, such as Terraform, Crossplane, or similar
- Experience with tools and solutions used to monitor the 4 golden signals (latency, traffic, errors, and load), including Prometheus, Grafana, and OpenTelemetry
- Familiarity with a monorepo, trunk-based development model with monolithic build tooling and CI/CD, with a strong desire to achieve Continuous Deployment. Familiarity with CI/CD systems such as GitHub Actions, ArgoCD, or similar
- Ability to communicate and support developers with empathy to support their day-to-day roles, seeking ways to automate and promote self-service as necessary to continually enable developers to move with higher velocity and confidence through the entire SDLC
Responsibilities
- Build and maintain tooling to support our applications in Kubernetes and in the Google Cloud Platform
- Work with development teams to help them build, ship, and deploy services and applications with ease and confidence, and promote service resilience and reliability
- Help ensure smooth operations of our production environments, and work with developers to help debug complex issues as they arise. This includes creating and facilitating the capturing and monitoring of the 4 golden signals in our applications
- Help to create a self-service platform by working with the rest of the SRE and infrastructure team to accelerate and promote developer velocity, including service catalogs, repository tooling and documentation. We believe in the self-service model and treat the development team as our internal customers, including listening to feedback, seeking out improvements, and quickly iterating to continually provide value
- Participate in a shared on-call rotation schedule. We believe in service end-to-end ownership, and as such, both development teams and SRE participate in on-call. Our SRE team is responsible for maintaining and being on call for our infrastructure environments and ensuring primary site uptime
Preferred Qualifications
- Experience building and supporting a gRPC microservice architecture
- Familiarity with Kubernetes Service Mesh, such as Istio or similar, to support our microservice architecture observability, multi-cluster routing, and network efficiency is highly desirable
- Ability to interface with application code to help assist in introducing best practices, golden path standardization, shared libraries, etc. The majority of our applications are written in Go. Python and Scala are present to a lesser degree
- Familiarity with Application Security tooling, such as dependency scanning, static analysis, and other linting tooling to help shift security left in the SDLC and CI process, and bridge engineering practices with our Security Operations team
- Familiarity and comfort with Linux-based environments
Benefits
- 401k match
- Health
- Vision
- Dental
Share this job:
Similar Remote Jobs
