GenAI Optimization Engineer

Modular
Summary
Join Modular, a company revolutionizing AI infrastructure, and become part of a team building a cutting-edge, modular platform (MAX) to simplify AI development and deployment. The E2E Optimizations team focuses on implementing cutting-edge optimizations and research for auto-regressive text generation, image generation, and more. You will design, implement, and tune performance features, lead cross-functional projects, collaborate with subject matter experts, and contribute to the MAX tech stack. This role requires in-depth Python knowledge, 3+ years of experience in ML/DL/Generative AI, and experience with performance optimizations for Generative AI. The position offers competitive compensation, including stock options, and world-class benefits such as premier insurance plans, 401k matching, and flexible paid time off. Remote work options are available for US and Canada-based candidates.
Requirements
- In-depth knowledge of the Python programming language
- 3+ years of working experience in Machine Learning, Deep Learning, or Generative AI
- Experience implementing framework-level performance optimizations for Generative AI use cases
- Experience profiling and reducing latency in GenAI applications
- Deep interest in machine learning technologies and use cases
- Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
Responsibilities
- Design, scope, implement, and tune performance features for Generative AI use cases in the MAX framework
- Plan and lead cross-functional projects spanning multiple teams and domains
- Collaborate with subject matter experts within Modular to enable features across different parts of the stack
- Contribute to the MAX tech stack across multiple languages, including Mojo, Python, and C++
- Monitor latest research channels and identify potential opportunities for the MAX framework
Preferred Qualifications
- Experience using Machine Learning frameworks like PyTorch, Tensorflow, etc
- CUDA/ROCM/Accelerator Programming and Optimization experience
- Experience with LLVM/MLIR/Compilers
- Experience working with distributed/parallel programming models and an understanding of parallel hardware
Benefits
- Premier insurance plans
- Up to 5% 401k matching
- Flexible paid time off
Share this job:
Similar Remote Jobs

