Software Engineer, Analytics Platform
![OpenAI Logo](https://cdn.jobscollider.com/logo/openai-7d31.webp)
OpenAI
Summary
Join OpenAI's Research Platform Analytics team as a pragmatic and passionate engineer focused on enhancing the data experience for engineers and scientists. You will build and maintain large-scale data processing pipelines, develop a general-purpose data platform for petabyte-scale datasets, and ensure the scalability and reliability of our infrastructure. This role involves hands-on infrastructure work, including deploying and troubleshooting core services. The position is based in San Francisco or remote within the US, utilizing a hybrid work model. You'll collaborate with various teams to deliver impactful data tooling and systems, contributing to OpenAI's mission of accelerating research towards AGI.
Requirements
- Proficient in Python and backend development, with experience working in large codebases (monorepos)
- Experience building and operating large-scale stream and batch processing pipelines (Kafka, Spark, Flink, Presto/Trino)
- Hands-on experience with Kubernetes, Terraform, and deploying/troubleshooting production systems
- Worked on access control, provenance, auditing, and large-scale data movement
- Passion for building systems that provide key insights, especially in ML training workflows
- Comfortable in a fast-moving environment, making trade-offs to deliver impact quickly
Responsibilities
- Build and maintain large-scale stream and batch processing pipelines (Kafka, Spark, Flink, Trino/Presto)
- Develop a general-purpose data processing platform for handling massive datasets
- Scale applications for ML research, ensuring smooth operation as workloads grow
- Ensure the security, integrity, and compliance of data according to industry and company standards
- Ensure our analytics and data platforms can scale reliably to the next several orders of magnitude
- Accelerate company productivity by empowering your fellow engineers, researchers, and teammates with excellent data tooling and systems, providing a best in case experience
- Bring new features and capabilities to the world by partnering with product engineers, trust & safety and other teams to build the technical foundations
- Like all other teams, we are responsible for the reliability of the systems we build. This includes an on-call rotation to respond to critical incidents as needed
Preferred Qualifications
Understanding of data transformations in ML training and inference workflows is a plus
Benefits
- This role is based in San Francisco, CA or open to being remote within the US
- We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees