Site Reliability Consultant

Pythian
Summary
Join Pythian as a Site Reliability Consultant and become a technology leader and trusted advisor to our customers. You will mentor teammates, focusing on infrastructure design, CI/CD pipeline automation, and building intelligent monitoring systems across various technologies. Your expertise in Git, artifact repositories, and Kubernetes will be crucial. You will operate and maintain platforms, implement CI/CD pipelines, design observability solutions, and participate in on-call rotations. You will also collaborate with clients, provide technical direction, create documentation, and mentor colleagues. This remote position offers a competitive salary and benefits package, including flexible work arrangements and professional development opportunities.
Requirements
- Must have strong experience with container orchestration (Kubernetes, Docker) in cloud (AWS EKS) or on-prem distributions
- Familiarity with related ecosystem tools (Helm, Operators, GitOps, etc.)
- Hands-on experience using AWS (VPC, EC2, EKS, IAM, S3, etc.), including provisioning with IaC tools like Terraform (or AWS CloudFormation)
- Experience setting up GitLab or similar platforms (GitHub, Bitbucket) for CI/CD pipelines, managing runners, and integrating code scanning
- Familiarity with artifact repository solutions (e.g., JFrog Artifactory), including repository creation, access controls, and automation of artifact flows
- Track record of infrastructure automation using Terraform, Ansible, Puppet, or Chef to reduce manual intervention and ensure repeatable deployments
- Strong scripting skills (Bash, Python, Go, etc.) to automate system tasks and streamline operational workflows
- Experience with modern monitoring stacks (Prometheus, Dynatrace, Grafana, ELK/EFK) for analyzing logs, metrics, and traces
- Proven ability to design alerts, dashboards, and runbooks that enable rapid first-contact resolution
- Solid understanding of Linux-based systems, performance tuning, and troubleshooting
- Network fundamentals (TCP/IP, load balancers, DNS, NTP, etc.) and ability to diagnose connectivity or performance issues in complex distributed environments
- Familiarity with container security best practices (RBAC, TLS, vulnerability scanning) and how to apply them at scale
- Adept at communicating technical concepts to both engineering and non-technical stakeholders
- Ability to mentor junior team members, champion DevOps culture, and contribute to an inclusive, knowledge-sharing environment
- Bachelor’s Degree in Computer Science, Information Systems, or equivalent experience
- Several years of progressive DevOps or SRE experience managing large-scale systems in a production environment
Responsibilities
- Administer and optimize platforms such as GitLab (CI/CD pipelines, runners) and artifact repository solutions (e.g., JFrog Artifactory)
- Maintain and troubleshoot Kubernetes clusters—either in the cloud (AWS EKS) or on-prem distributions—with a focus on availability, performance, and security
- Champion “infrastructure as code” using tools like Terraform (or CloudFormation), building repeatable processes for provisioning and updating clusters, repos, and associated services
- Implement or improve CI/CD pipelines to reduce manual toil and ensure quick, reliable deployments across multiple environments
- Design and configure observability solutions (e.g., Prometheus, Dynatrace, Grafana) to proactively detect and address issues in container orchestration environments, code repositories, and artifact repositories
- Participate in an on-call rotation, troubleshooting incidents at all tiers (from first-contact resolution to escalation) and driving continuous improvement based on Root Cause Analysis
- Collaborate with clients to shape infrastructure strategies around container orchestration, secure CI/CD, and DevSecOps best practices
- Provide leadership and technical direction on automating repetitive administrative tasks, enforcing security policies (RBAC, TLS, container scanning), and adopting GitOps workflows
- Create and maintain design documents, runbooks, and operational playbooks for container platforms, CI/CD pipelines, and code management services
- Mentor fellow consultants and client stakeholders on Kubernetes, infrastructure automation, and advanced CI/CD usage to enhance knowledge across the organization
- Plan and coordinate maintenance activities, ensuring minimal downtime and clear communication with stakeholders
- Provide ITIL-oriented support (Incident, Change, Problem Management), and champion continuous improvement of operational processes and service reliability
Preferred Qualifications
- AWS certifications (Solutions Architect, DevOps Engineer) are a plus
- Understanding of compliance frameworks (HIPAA, PCI, etc.) and data privacy constraints a plus
- Experience or strong interest in leveraging AI-based services or scripts for operational efficiency and faster issue resolution is highly desirable
Benefits
- Competitive total rewards and salary package
- Flexibly work remotely from your home, there’s no daily travel requirement to an office!
- Hone your skills or learn new ones with our substantial training allowance; participate in professional development days, attend training, become certified, whatever you like!
- We give you all the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalize your work environment!
- You will have an annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more)
- You will receive a generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity