Summary

Join Pythian as a Site Reliability Consultant and become a technology leader and trusted advisor to our customers. You will focus on infrastructure design and modernization, CI/CD pipeline automation, and building intelligent monitoring and observability systems. This remote position requires expertise in Kubernetes, AWS, CI/CD, and DevOps automation. You will mentor teammates, collaborate with clients, and participate in on-call rotations. Pythian offers a flexible work environment, opportunities for professional development, and various benefits to support your well-being.

Requirements

Must have strong experience with container orchestration (Kubernetes, Docker) in cloud (AWS EKS) or on-prem distributions
Familiarity with related ecosystem tools (Helm, Operators, GitOps, etc.)
Hands-on experience using AWS (VPC, EC2, EKS, IAM, S3, etc.), including provisioning with IaC tools like Terraform (or AWS CloudFormation)
Experience setting up GitLab or similar platforms (GitHub, Bitbucket) for CI/CD pipelines, managing runners, and integrating code scanning
Familiarity with artifact repository solutions (e.g., JFrog Artifactory), including repository creation, access controls, and automation of artifact flows
Track record of infrastructure automation using Terraform, Ansible, Puppet, or Chef to reduce manual intervention and ensure repeatable deployments
Strong scripting skills (Bash, Python, Go, etc.) to automate system tasks and streamline operational workflows
Experience with modern monitoring stacks (Prometheus, Dynatrace, Grafana, ELK/EFK) for analyzing logs, metrics, and traces
Proven ability to design alerts, dashboards, and runbooks that enable rapid first-contact resolution
Solid understanding of Linux-based systems, performance tuning, and troubleshooting
Network fundamentals (TCP/IP, load balancers, DNS, NTP, etc.) and ability to diagnose connectivity or performance issues in complex distributed environments
Familiarity with container security best practices (RBAC, TLS, vulnerability scanning) and how to apply them at scale
Adept at communicating technical concepts to both engineering and non-technical stakeholders
Ability to mentor junior team members, champion DevOps culture, and contribute to an inclusive, knowledge-sharing environment
Bachelor’s Degree in Computer Science, Information Systems, or equivalent experience
Several years of progressive DevOps or SRE experience managing large-scale systems in a production environment

Responsibilities

Administer and optimize platforms such as GitLab (CI/CD pipelines, runners) and artifact repository solutions (e.g., JFrog Artifactory)
Maintain and troubleshoot Kubernetes clusters—either in the cloud (AWS EKS) or on-prem distributions—with a focus on availability, performance, and security
Champion “infrastructure as code” using tools like Terraform (or CloudFormation), building repeatable processes for provisioning and updating clusters, repos, and associated services
Implement or improve CI/CD pipelines to reduce manual toil and ensure quick, reliable deployments across multiple environments
Design and configure observability solutions (e.g., Prometheus, Dynatrace, Grafana) to proactively detect and address issues in container orchestration environments, code repositories, and artifact repositories
Participate in an on-call rotation, troubleshooting incidents at all tiers (from first-contact resolution to escalation) and driving continuous improvement based on Root Cause Analysis
Collaborate with clients to shape infrastructure strategies around container orchestration, secure CI/CD, and DevSecOps best practices
Provide leadership and technical direction on automating repetitive administrative tasks, enforcing security policies (RBAC, TLS, container scanning), and adopting GitOps workflows
Create and maintain design documents, runbooks, and operational playbooks for container platforms, CI/CD pipelines, and code management services
Mentor fellow consultants and client stakeholders on Kubernetes, infrastructure automation, and advanced CI/CD usage to enhance knowledge across the organization
Plan and coordinate maintenance activities, ensuring minimal downtime and clear communication with stakeholders
Provide ITIL-oriented support (Incident, Change, Problem Management), and champion continuous improvement of operational processes and service reliability

Preferred Qualifications

AWS certifications (Solutions Architect, DevOps Engineer) are a plus
Understanding of compliance frameworks (HIPAA, PCI, etc.) and data privacy constraints a plus
Experience or strong interest in leveraging AI-based services or scripts for operational efficiency and faster issue resolution is highly desirable

Benefits

Flexibly work remotely from your home, there’s no daily travel requirement to an office! All you need is a stable internet connection
Pythian cares about continues learning and provides opportunities to earn certifications (AWS, Kubernetes, Terraform) and expand your skill set across multiple platforms, frameworks, and industries
We give you all the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalize your work environment!
You will have an annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more)
You will receive a generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity

Site Reliability Consultant

Pythian

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Senior

Fetch

Remote

DevOps

Mid-level