Summary

Join Pythian as a Site Reliability Consultant and become a technology leader and trusted advisor to our customers. You will mentor teammates, focusing on infrastructure design, CI/CD pipeline automation, and building intelligent monitoring systems across various technologies. Your expertise in Git, artifact repositories, and Kubernetes will be crucial. You will operate and maintain platforms, implement CI/CD pipelines, design observability solutions, and participate in on-call rotations. You will also collaborate with clients, provide technical direction, create documentation, and mentor colleagues. This remote position offers a competitive salary and benefits package, including flexible work arrangements and professional development opportunities.

Requirements

Must have strong experience with container orchestration (Kubernetes, Docker) in cloud (AWS EKS) or on-prem distributions
Familiarity with related ecosystem tools (Helm, Operators, GitOps, etc.)
Hands-on experience using AWS (VPC, EC2, EKS, IAM, S3, etc.), including provisioning with IaC tools like Terraform (or AWS CloudFormation)
Experience setting up GitLab or similar platforms (GitHub, Bitbucket) for CI/CD pipelines, managing runners, and integrating code scanning
Familiarity with artifact repository solutions (e.g., JFrog Artifactory), including repository creation, access controls, and automation of artifact flows
Track record of infrastructure automation using Terraform, Ansible, Puppet, or Chef to reduce manual intervention and ensure repeatable deployments
Strong scripting skills (Bash, Python, Go, etc.) to automate system tasks and streamline operational workflows
Experience with modern monitoring stacks (Prometheus, Dynatrace, Grafana, ELK/EFK) for analyzing logs, metrics, and traces
Proven ability to design alerts, dashboards, and runbooks that enable rapid first-contact resolution
Solid understanding of Linux-based systems, performance tuning, and troubleshooting
Network fundamentals (TCP/IP, load balancers, DNS, NTP, etc.) and ability to diagnose connectivity or performance issues in complex distributed environments
Familiarity with container security best practices (RBAC, TLS, vulnerability scanning) and how to apply them at scale
Adept at communicating technical concepts to both engineering and non-technical stakeholders
Ability to mentor junior team members, champion DevOps culture, and contribute to an inclusive, knowledge-sharing environment
Bachelor’s Degree in Computer Science, Information Systems, or equivalent experience
Several years of progressive DevOps or SRE experience managing large-scale systems in a production environment

Responsibilities

Administer and optimize platforms such as GitLab (CI/CD pipelines, runners) and artifact repository solutions (e.g., JFrog Artifactory)
Maintain and troubleshoot Kubernetes clusters—either in the cloud (AWS EKS) or on-prem distributions—with a focus on availability, performance, and security
Champion “infrastructure as code” using tools like Terraform (or CloudFormation), building repeatable processes for provisioning and updating clusters, repos, and associated services
Implement or improve CI/CD pipelines to reduce manual toil and ensure quick, reliable deployments across multiple environments
Design and configure observability solutions (e.g., Prometheus, Dynatrace, Grafana) to proactively detect and address issues in container orchestration environments, code repositories, and artifact repositories
Participate in an on-call rotation, troubleshooting incidents at all tiers (from first-contact resolution to escalation) and driving continuous improvement based on Root Cause Analysis
Collaborate with clients to shape infrastructure strategies around container orchestration, secure CI/CD, and DevSecOps best practices
Provide leadership and technical direction on automating repetitive administrative tasks, enforcing security policies (RBAC, TLS, container scanning), and adopting GitOps workflows
Create and maintain design documents, runbooks, and operational playbooks for container platforms, CI/CD pipelines, and code management services
Mentor fellow consultants and client stakeholders on Kubernetes, infrastructure automation, and advanced CI/CD usage to enhance knowledge across the organization
Plan and coordinate maintenance activities, ensuring minimal downtime and clear communication with stakeholders
Provide ITIL-oriented support (Incident, Change, Problem Management), and champion continuous improvement of operational processes and service reliability

Preferred Qualifications

AWS certifications (Solutions Architect, DevOps Engineer) are a plus
Understanding of compliance frameworks (HIPAA, PCI, etc.) and data privacy constraints a plus
Experience or strong interest in leveraging AI-based services or scripts for operational efficiency and faster issue resolution is highly desirable

Benefits

Competitive total rewards and salary package
Flexibly work remotely from your home, there’s no daily travel requirement to an office!
Hone your skills or learn new ones with our substantial training allowance; participate in professional development days, attend training, become certified, whatever you like!
We give you all the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalize your work environment!
You will have an annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more)
You will receive a generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity

Site Reliability Consultant

Pythian

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level