Software Engineer - Site Reliability

Pantheon Platform
Summary
Join Pantheon's SRE team as a Software Engineer and contribute to a globally scaled platform powering hundreds of thousands of websites. You will work on advanced implementations of WordPress and Drupal CMS systems using Google Cloud Platform offerings, manage a large-scale orchestration platform, and collaborate with a wider engineering team. Responsibilities include administering and maintaining configuration management with Kubernetes and other tools, owning production systems, and ensuring platform stability and reliability. The ideal candidate possesses experience with Python, GoLang, or other OOP languages, Kubernetes, Terraform, and Linux administration, along with experience in large-scale, high-traffic platforms. Pantheon offers competitive compensation, benefits including full medical coverage and paid time off, and a supportive work environment.
Requirements
- Understanding and work experience developing with either Python ,GoLang or any object oriented programming language
- Understanding and working knowledge of Kubernetes, Terraform, CI/CD pipelines , Release Engineering practices
- Understanding of Linux operating systems administration
- Work-related experience with large-scale, high-traffic platforms
- Work-related experience with designing scalable and robust services in the real world
- Clear communication skills and the ability to represent your contributions and ideas with clarity while remaining open and giving space to the contributions and ideas of others
- Participate in system design consulting, platform management, and capacity planning
- Develop and mature sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service-level objectives
- Experience supporting livesite and on call
- Experience building and operating complex observability tooling like Grafana Cloud , Prometheus etc
Responsibilities
- Work on advanced globally scaled implementations of WordPress and Drupal CMS systems using the latest in Google Cloud platform offerings
- Work on a large scale orchestration platform serving millions of containers, using lower level Linux systems like systemd/cgroups directly
- Administer, develop and maintain standardization and configuration state management with Kubernetes, Chef, Terraform, GCP Tooling , Vault etc
- Close collaboration with the wider engineering team to both deliver platform improvements and provide subject-matter-expertise for other technical initiatives
- Own your team’s production systems, measure and track their health with SLO’s, and assist our dedicated support team to resolve production issues
- Continuous improvements to our standard of engineering excellence by implementing best practices for coding, testing, deploying and communication
- Support Pantheon as a member of the on-call engineer rotation, contributing to the infrastructure’s stability, reliability, and performance that drives Pantheon's success
- Support and meet with Pantheon customers, as needed, to ensure their success as well as ours
Preferred Qualifications
- Working knowledge of Cassandra, MySQL, Redis
- Working knowledge of React, Node.js, Python, Go
- Working knowledge of Docker, Chef, CircleCI, Vault
- Working knowledge of Wordpress, Drupal
- Coding experience beyond simple scripts
- CKA , CKAD or CKS or CNCF Certifications
- Experience supporting and developing Open Source tooling on public clouds like GCP , AWS or Azure
Benefits
- Industry competitive compensation and equity plan
- Paid Time Off (PTO), Paid Sick Leave (PSL) and 11 Paid Company Holidays
- Full medical coverage (Extended health care, dental, vision)
- Top-of-line equipment
- Monthly allowance for wellness, reading and access to LinkedIn Learning for continued development
- Events and activities both team-based and company wide that inspire, educate and cultivate