Staff Production Infrastructure Engineer

ServiceNow
Summary
Join ServiceNow as a DevOps Engineer and develop internal tools and utilities using Python or Golang. Manage and maintain mission-critical Tier-0 services, ensuring high availability and reliability. Oversee the operation of a large-scale hybrid cloud infrastructure. Collaborate with a team using an "everything as code" approach. Write test plans and automate tests for declarative infrastructure. Build orchestration workflows leveraging ServiceNow's capabilities. Research and implement new tools and technologies. Lead and support migration projects, including VM migrations from VMware to OpenStack. This role requires experience with AI integration, bare metal Linux servers, RedHat OpenStack, VM migrations, automation processes, and Linux server core systems services.
Requirements
- Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AIβs potential impact on the function or industry
- Extensive experience operating bare metal Linux servers at scale in a production environment, using configuration management tools such as Puppet or Ansible
- Hands-on experience with RedHat OpenStack, including deployment, management, and troubleshooting
- Strong expertise in migrating VMs from VMware to OpenStack or other cloud environments
- Proven ability to maintain and improve automation processes, with experience in scripting and developing solutions in Python or Golang
- A deep understanding of Linux servers and core systems services, including DNS, DHCP, LDAP, networking, storage, and software packaging
- Ability to document and communicate systems architecture effectively
- Comfort and proficiency in designing, writing, and debugging code in a collaborative team environment, with experience writing test specifications and understanding of test automation fundamentals
- Experience with a variety of open-source tools used in the operation of production computing environments
- Problem-solving skills and the ability to investigate deeply into the internals of DevOps tools and systems technologies
Responsibilities
- Develop internal tools and utilities using Python or Golang to support and enhance our systems, while utilizing Puppet and Ansible to build and maintain a configuration management pipeline that is safe, fast, and observable
- Manage and maintain mission-critical Tier-0 services, including DNS, DHCP, and LDAP, ensuring high availability and reliability
- Oversee the operation of a large-scale hybrid cloud infrastructure consisting of bare metal and VMs
- Collaborate with a team that adopts an "everything as code" approach, writing, reviewing, and deploying configurations
- Write test plans and automate tests for declarative infrastructure across multiple cloud environments
- Build orchestration workflows leveraging the capabilities of the ServiceNow platform
- Research and implement new open-source and commercial tools, technologies, and methodologies
- Lead and support migration projects, including VM migrations from VMware to OpenStack or other cloud platforms