Pythian is hiring a
Linux Site Reliability Consultant

Logo of Pythian

Pythian

๐Ÿ’ต ~$82k-$120k
๐Ÿ“Remote - Mexico

Summary

The job is for a Site Reliability Consultant at Pythian. The role involves operating, maintaining, and administering customer infrastructure, providing root cause analysis reports, identifying opportunities to improve resiliency, collaborating with teammates, acting as a technology leader, participating in an on-call rotation, and more. Remote work is available.

Requirements

  • Experience working with Google and AWS Clouds (including infrastructure as code deployment with Cloud Formation, Terraform, Opsworks, etc)
  • Scripting and automation of administrative tasks using Python and Scala is a must
  • Solid understanding of microservices architecture and container technologies (Kubernetes is a must, Docker, lxc, etc)
  • Clear understanding of software development lifecycles and best practices from an infrastructure point of view (PRs, merge, rebase, etc)
  • Understanding the end-to-end operations of a โ€˜Business Systemโ€™ vs components
  • Comprehensive systems hardware and network troubleshooting experience
  • Common Linux distribution platform installation, configuration, performance tuning, and cloud migration
  • TCP/IP networking, NIC bonding, and network services configuration (DNS, NTP, DHCP, SMTP, etc)
  • Operation and administration of virtual infrastructure, including experience with at least one hypervisor (VMware, Hyper-V, KVM, etc.)
  • Ability to describe IaaS, PaaS, SaaS, pros and cons of each, use cases for virtualization and cloud
  • Administration of web servers and supporting technologies, including network load balancers
  • Experience with the design, development, and deployment of Puppet
  • System and application error investigation, troubleshooting of access/availability issues including deep multi-system root cause analysis
  • Experience managing networking devices, such as switches and firewalls from a variety of vendors
  • Solid understanding of DevOps tools, processes, and culture
  • Ability to pick up new technologies quickly
  • Ability to provide accurate work scheduling and task estimations for work delivery

Responsibilities

  • Operate, maintain, and administer solutions contributing to customer infrastructure's operational efficiency, availability, and visibility
  • Planning maintenance activity, design documentation, and standard procedures
  • Provide Root Cause Analysis reports for outages/incidents (ITIL - Problem Management)
  • Observe and provide feedback on the current state of the clientโ€™s infrastructure, and identify opportunities to improve resiliency, reduce incident occurrence, and automate repetitive administrative and operational tasks
  • Contribute to, improve, and maintain team documentation about client systems and infrastructure, procedures, policies, and schedules
  • Gather and document information about client environments through audit activities, and analyze the information to identify opportunities for improvement and application of best practices
  • Work collaboratively with teammates to contribute to the continuous improvement of our working culture
  • Act as a technology leader for clients, as well as drive client discussions on technology road maps
  • Participate in an on-call rotation in an escalation capacity

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Jobs

Please let Pythian know you found this job on JobsCollider. Thanks! ๐Ÿ™