Site Reliability Consultant at Pythian

Summary

Join Pythian's next-generation Site Reliability Engineering team as a Site Reliability Consultant! Work remotely from Costa Rica in the PST time zone. You will operate, maintain, and administer solutions for customer infrastructure, focusing on efficiency, availability, and visibility. Responsibilities include planning maintenance, creating documentation, providing root cause analysis reports, and identifying opportunities for improvement. You will collaborate with teammates, act as a technology leader for clients, and participate in on-call rotation. This role requires experience with Google and AWS clouds, scripting and automation, microservices architecture, and DevOps tools. Pythian offers a competitive total rewards package, flexible remote work, opportunities for professional development, and a focus on employee well-being.

Requirements

Experience working with Google and AWS Clouds (including infrastructure as code deployment with Cloud Formation, Terraform, Opsworks, etc)
Scripting and automation of administrative tasks using Python and Scala is a must
Solid understanding of microservices architecture and container technologies (Kubernetes is a must, Docker, lxc, etc)
Clear understanding of software development lifecycles and best practices from an infrastructure point of view (PRs, merge, rebase, etc)
Understanding the end-to-end operations of a ‘Business System’ vs components
Comprehensive systems hardware and network troubleshooting experience
Common Linux distribution platform installation, configuration, performance tuning, and cloud migration
TCP/IP networking, NIC bonding, and network services configuration (DNS, NTP, DHCP, SMTP, etc)
Operation and administration of virtual infrastructure, including experience with at least one hypervisor (VMware, Hyper-V, KVM, etc.)
Ability to describe IaaS, PaaS, SaaS, pros and cons of each, use cases for virtualization and cloud
Administration of web servers and supporting technologies, including network load balancers
Experience with the design, development, and deployment of Puppet
System and application error investigation, troubleshooting of access/availability issues including deep multi-system root cause analysis
Experience managing networking devices, such as switches and firewalls from a variety of vendors
Solid understanding of DevOps tools, processes, and culture
Ability to pick up new technologies quickly
Ability to provide accurate work scheduling and task estimations for work delivery

Responsibilities

Operate, maintain, and administer solutions contributing to customer infrastructure's operational efficiency, availability, and visibility
Planning maintenance activity, design documentation, and standard procedures
Provide Root Cause Analysis reports for outages/incidents (ITIL - Problem Management)
Observe and provide feedback on the current state of the client’s infrastructure, and identify opportunities to improve resiliency, reduce incident occurrence, and automate repetitive administrative and operational tasks
Contribute to, improve, and maintain team documentation about client systems and infrastructure, procedures, policies, and schedules
Gather and document information about client environments through audit activities, and analyze the information to identify opportunities for improvement and application of best practices
Work collaboratively with teammates to contribute to the continuous improvement of our working culture
Act as a technology leader for clients, as well as drive client discussions on technology road maps
Participate in an on-call rotation in an escalation capacity

Benefits

Competitive total rewards package
Flexibly work remotely from your home, there’s no daily travel requirement to an office!
All you need is a stable internet connection
Collaborate with some of the best and brightest in the industry!
Hone your skills or learn new ones with our substantial training allowance; participate in professional development days, attend training, become certified, whatever you like!
We give you all the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalize your work environment!
You will have an annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more)
Additionally, you will receive a generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity

Site Reliability Consultant

Pythian

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Senior

Stack AV

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Senior