GenAI Optimization Engineer

closed
Modular Logo

Modular

πŸ“Remote - United States, Canada

Summary

Join Modular, a company revolutionizing AI infrastructure, and become part of a team building a cutting-edge, modular platform (MAX) to simplify AI development and deployment. The E2E Optimizations team focuses on implementing cutting-edge optimizations and research for auto-regressive text generation, image generation, and more. You will design, implement, and tune performance features, lead cross-functional projects, collaborate with subject matter experts, and contribute to the MAX tech stack. This role requires in-depth Python knowledge, 3+ years of experience in ML/DL/Generative AI, and experience with performance optimizations for Generative AI. The position offers competitive compensation, including stock options, and world-class benefits such as premier insurance plans, 401k matching, and flexible paid time off. Remote work options are available for US and Canada-based candidates.

Requirements

  • In-depth knowledge of the Python programming language
  • 3+ years of working experience in Machine Learning, Deep Learning, or Generative AI
  • Experience implementing framework-level performance optimizations for Generative AI use cases
  • Experience profiling and reducing latency in GenAI applications
  • Deep interest in machine learning technologies and use cases
  • Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture

Responsibilities

  • Design, scope, implement, and tune performance features for Generative AI use cases in the MAX framework
  • Plan and lead cross-functional projects spanning multiple teams and domains
  • Collaborate with subject matter experts within Modular to enable features across different parts of the stack
  • Contribute to the MAX tech stack across multiple languages, including Mojo, Python, and C++
  • Monitor latest research channels and identify potential opportunities for the MAX framework

Preferred Qualifications

  • Experience using Machine Learning frameworks like PyTorch, Tensorflow, etc
  • CUDA/ROCM/Accelerator Programming and Optimization experience
  • Experience with LLVM/MLIR/Compilers
  • Experience working with distributed/parallel programming models and an understanding of parallel hardware

Benefits

  • Premier insurance plans
  • Up to 5% 401k matching
  • Flexible paid time off
This job is filled or no longer available