Compiler Software Engineer, Staff

d-Matrix Logo

d-Matrix

πŸ“Remote - Canada

Summary

Join d-Matrix, a company focused on generative AI, and become a Staff Compiler Software Engineer. Work remotely or hybrid in Toronto. You will design, optimize, and lower high-level machine learning representations for our machine learning compiler. Contribute to high-level IR transformations, dialect and IR design, and lowering ML models from frameworks like PyTorch, TensorFlow, and ONNX. Optimize performance for compute graphs and develop model partitioning techniques. Collaborate with experienced compiler engineers, ML framework developers, hardware architects, and performance engineers.

Requirements

  • Bachelor's degree in computer science or a related field with 6+ years of relevant industry experience (or MS with 5+ years of experience or PhD with 3+ years of experience)
  • Strong proficiency in modern C++ (C++14/17/20) and compiler development
  • Experience with modern compiler infrastructures such as LLVM, MLIR, or equivalent frameworks
  • Experience with machine learning frameworks (e.g., PyTorch, TensorFlow, ONNX)
  • Solid understanding of graph-level optimizations and IR transformations in ML compilers
  • Experience with model partitioning strategies such as GSPMD, Sharding, and Distributed Execution

Responsibilities

  • Develop the front end of our machine learning compiler
  • Design, optimizing, and lowering high-level machine learning representations to intermediate representations suitable for further compilation
  • Contribute to High-level IR transformations (e.g., graph optimization, operator fusion, canonicalization)
  • Dialect and IR design for machine learning frameworks
  • Lowering and transformation of ML models from frameworks such as PyTorch, TensorFlow, and ONNX to compiler IRs such as MLIR and LLVM
  • Performance optimization for compute graphs, including operator specialization, fusion, and memory layout transformations
  • Model partitioning techniques, including Graph-based parallelism strategies (e.g., pipelined model parallelism, tensor parallelism, and data parallelism)
  • Automatic partitioning of large models across multiple devices using techniques like GSPMD (Generalized SPMD Partitioning)
  • Placement-aware optimizations to minimize communication overhead and improve execution efficiency on distributed hardware
  • Work closely with ML framework developers, hardware architects, and performance engineers to ensure efficient model execution

Preferred Qualifications

  • Algorithm design experience, from conceptualization to implementation
  • Experience with Open-Source ML compiler projects, such as Torch-MLIR, IREE, XLA, or TVM
  • Experience with automatic differentiation, shape inference, and type propagation in ML compilers
  • Experience optimizing distributed execution of large models on accelerators (e.g., GPUs, TPUs, custom AI hardware)
  • Passion for working in a fast-paced and dynamic startup environment

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs