Kernel Engineer
Modular
Summary
Join Modular, a company revolutionizing AI infrastructure, and become an AI Kernel Engineer. You will design and optimize high-performance ML kernels, utilize low-level programming (C/C++/Assembly), collaborate with cross-functional teams, and contribute to the development of future accelerators. This role requires in-depth C++ and low-level architectural performance knowledge, 4+ years of experience with complex code and systems, and experience with performance modeling and analysis. The position offers competitive compensation, including stock options, and world-class benefits such as premier insurance plans, 401k matching, and flexible paid time off. Remote work options are available for US and Canada-based candidates. Modular fosters a collaborative and supportive work environment.
Requirements
- In-depth knowledge of C++ and low-level (micro)architectural performance is required
- 4+ years of experience working on complex code and systems
- Experience with performance modeling and performance data analysis
- Understanding of Parallelization techniques for ML / HPC Acceleration
- Deep interest in machine learning technologies and use cases
- Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture
Responsibilities
- Design and optimize high-performance ML numeric and data manipulation kernels/operators
- Utilize low-level C/C++/Assembly programming to achieve state of the art performance. Your work will also entail potentially introducing new novel compiler and tools support
- Work with compiler, framework, runtime and performance teams to deliver end-to-end performance that fully utilizes todayβs complex server and mobile systems
- Collaborate with architects and hardware engineers to co-design future accelerators, including ISA for new hardware features and evolving ISA
- Collaborate with machine learning researchers to guide system development for future ML trends
Preferred Qualifications
- Some knowledge of compiler fundamentals is valuable, as is familiarity with kernel authoring paradigms (i.e., OpenMP, CUDA, Halide, Rise/Lift, or others)
- Experience with performance profilers, performance data analysis tools, visualization tools, and debugging or experience working with embedded systems
- Experience working with distributed/parallel programming models and an understanding of parallel hardware
- Experience developing firmware for accelerators and embedded programming
- Experience with HPC programming and accelerator languages such as CUDA, OpenCL, SYCL, etc
Benefits
- Premier insurance plans
- Up to 5% 401k matching
- Flexible paid time off
- Competitive Compensation
- Stock options
- Team Building Events
- Remote work options are available for US and Canada-based candidates