Senior Linux Systems Administrator

ServiceNow Logo

ServiceNow

πŸ“Remote - India

Summary

Join ServiceNow as a Site Reliability Engineer and provide relief and sustainable resolution to infrastructure issues. Leverage your software development, systems engineering, and networking expertise to proactively prevent recurring problems. Collaborate with partner teams to enhance infrastructure reliability and performance through improved system design. Contribute to configuration management and infrastructure as code using Puppet. Develop tools using Ansible, Python, bash, and JavaScript to automate tasks and improve customer maintenance. Drive enhancements and bug fixes for large-scale automation projects. Design and implement procedures for maintenance tasks where automation is insufficient. Participate in escalations and root cause analysis of global infrastructure issues. Troubleshoot database backups, restores, and migrations. Support various infrastructure services, including machine learning, big data clusters, messaging systems, and more.

Requirements

  • A strong background in Linux Systems Administration (CentOS/Redhat) and engineering, understanding of the components of cloud infrastructure including hardware platforms, OS, applications, databases (MariaDB), networks, web, and application servers (Apache/Tomcat)
  • 5+ years of experience in Site Reliability Engineering/DevOps/System Administration and managing large-scale server infrastructure at a cloud computing or MSP setting is highly desirable
  • Solid experience with Linux (RedHat and/or CentOS)
  • Working-level knowledge of one: Python, bash, JavaScript
  • Strong experience with service troubleshooting in a production environment covering web front-end/application, Systems, Databases and Networks
  • Previous direct exposure to administrating fundamental internet services (DNS, Mail, Apache/Tomcat) with a good understanding of the LAMP stack

Responsibilities

  • Provide relief and sustainable resolution to issues within our infrastructure
  • Use your experience in software development, systems engineering and networking to proactively prevent repeatable issues
  • Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design
  • Contribute to Configuration Management and Infrastructure as Code for global private cloud (puppet)
  • Develop tools in Ansible, Python, bash, and JavaScript to replace manual work and improve customer maintenance experience
  • Drive enhancements and bugfixes for large scale automation projects such as patching and provisioning
  • Design and implement procedure to accomplish maintenances where automation and tooling cannot; drive resolution of root causes with internal team members
  • Prepare new ServiceNow products and services for production readiness with design review, feedback to engineering teams, training, and testing
  • Use broad knowledge and experience of systems administration and networking principles to proactively prevent and address incidents while constantly improving documentation
  • Participate in escalations and Root Cause Analysis of issues in the global ServiceNow infrastructure
  • Troubleshoot database backup and restore failures as well as perform database migrations
  • Support operation of a wide variety of infrastructure services including Machine Learning and Prediction, Cloudera Big Data clusters, Kafka and RabbitMQ messaging, database encryption, E-Mail infrastructure at scale, DNS, Puppet, Elasticsearch, F5 BigIP, and more

Preferred Qualifications

  • Familiarity with administrating MySQL, Oracle, MariaDB or similar technologies; proficiency preferred
  • Familiarity with Networking Technologies such as routing, switching and load balancing (VPN exposure is a huge plus)
  • Experience with systems and network performance and availability monitoring and analysis as well as configuration management platforms (Nagios/Icinga, Puppet, Ansible, Splunk) is desirable
  • Understanding of ITIL v3 framework and how it applies to incidents, problems an

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.