Summary

Join OpenAI's Applied Engineering team as a Site Reliability Engineer and help bring our technology to public sector customers. You will design, build, and maintain scalable infrastructure, both on-premises and in the cloud. This role requires on-site work with customers, troubleshooting issues, and collaborating with internal teams. You will automate tasks, standardize infrastructure, and ensure the reliability of our systems. The position is based in Washington D.C. or San Francisco, CA, and requires travel to customer sites. This is a unique opportunity to see the direct impact of your work and contribute to the responsible deployment of AI.

Requirements

Hold an active US security clearance
5+ years experience operating infrastructure and systems at scale

Responsibilities

Design and build performant, reliable, and scalable infrastructure, both on-premises and in the cloud, for our public sector customers
Administer the systems from the hardware up to kubernetes, ensuring our teams have a standardized infrastructure to deploy OpenAI’s technology onto
Own the reliability of these systems by being on-site with the customer, utilizing observability tooling, and directly troubleshooting issues that arise as the first line of support
Partner with teams across engineering and security to ensure the product supports the unique needs of the infrastructure and use-cases
Automate routine tasks and standardize our infrastructure offerings to allow our team to scale as we continue to grow
Partner with teams across the business, including engineering, security, and compliance, to enable our products to work within the unique constraints of new environments

Preferred Qualifications

Worked out of secure environments, closely collaborating with both on-site clients and remote colleagues
Hands-on experience with containers (Docker) and orchestration platforms (kubernetes)
Scripting experience with Python or equivalents for automating routine tasks
Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done to ensure both your team and our customers succeed
Strong troubleshooting skills across the entire stack (infrastructure, systems, and applications)
Thrive in dynamic environments and can navigate ambiguity with ease

Site Reliability Engineer

OpenAI

Job highlights

Summary

Requirements

Responsibilities

Preferred Qualifications

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Senior Infrastructure Engineer, Site Reliability Engineer

Flex

Remote

DevOps

Senior

Software Engineer, Site Reliability Engineer

Tailor

Remote

Software Development

Mid-level

Senior Site Reliability Engineering Engineer

Binance

Remote

DevOps

Senior