Senior AI Runtime Engineer

Modular
Summary
Join Modular, a company revolutionizing AI infrastructure, and become an AI Runtime Engineer. You will design and develop runtime optimizations for CPU and GPU efficiency, port the Modular runtime stack to new hardware platforms, and collaborate with various teams to achieve state-of-the-art performance. This role involves working with customers to understand their needs and collaborating on performance analysis and benchmarking systems. The ideal candidate has 5+ years of experience in high-performance computing, expertise in C++, and experience with CPU or GPU runtime optimizations. Modular offers competitive compensation, including stock options, and world-class benefits such as premier insurance plans, 401k matching, and flexible paid time off. The position can be based in Los Altos, CA, or remotely from the US or Canada.
Requirements
- 5+ years of experience working on high-performance computing systems
- Experience in C++ programming and complex software systems
- Experience with CPU or GPU runtime optimizations and performance analysis on CPUs, GPUs, or AI accelerators
- Proficiency with one or more profiling tools (CPU or GPU)
- Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
Responsibilities
- Design and develop runtime and cross-stack optimizations to improve CPU and GPU efficiency, addressing issues such as CPU overhead, caching, and data locality across multiple devices
- Port the Modular runtime stack to new hardware platforms and develop an API to streamline this process
- Collaborate with the compiler, kernels, serving, and models teams to design core technologies that achieve state-of-the-art end-to-end performance on various CPU and GPU hardware
- Collaborate with the customer success team and engage with customers to understand their performance requirements and use cases
- Collaborate with tooling and infrastructure teams to design systems for automated performance analysis and benchmarking
Preferred Qualifications
- Experience with ML graph optimizations, parallel / distributed programming, heterogeneous ML computation, and/or code generation
- Exposure to MLIR, LLVM, and/or the Mojo programming language
- Advanced degree in Computer Science or a related area is a plus
Benefits
- Premier insurance plans
- Up to 5% 401k matching
- Flexible paid time off