Senior Machine Learning Engineer

IntelliPro Logo

IntelliPro

πŸ’΅ $220k-$300k
πŸ“Remote - United States

Summary

Join a lean, expert team building next-gen AI from the ground up. As a Senior Machine Learning Engineer, you will optimize the performance of state-of-the-art foundation models across various hardware environments. You will design and maintain abstractions for efficient scaling of model performance, profile and optimize memory usage and latency in PyTorch, and benchmark model and system performance. Collaboration with hardware and systems partners to identify bottlenecks and improve performance is crucial. Your work will directly influence the scalability, cost, performance, and accessibility of our models. This is a remote, full-time position offering a competitive salary and benefits package.

Requirements

  • Deep experience profiling and optimizing PyTorch code for performance (memory, latency, throughput)
  • Familiarity with tools like torch.compile , torch.XLA , PyTorch profiler, and memory or trace viewers
  • Experience building performance-portable abstractions and optimizing ML pipelines for a variety of hardware/software stacks
  • Strong understanding of transformer models and modern attention mechanisms
  • Hands-on work with parallel inference strategies (tensor parallelism, pipeline parallelism, etc.)

Responsibilities

  • Design and maintain abstractions that scale model performance efficiently across heterogeneous hardware platformsβ€”not just CUDA/NVIDIA
  • Profile and optimize memory usage, latency, and throughput in PyTorch; build or integrate low-level solutions (e.g., Triton kernels) as needed
  • Benchmark our model and system performance to guide product decisions around cost, throughput, and deployment tradeoffs
  • Collaborate with hardware and systems partners to uncover bottlenecks and push for performance improvements in future iterations
  • Work hand-in-hand with research and engineering teams to ensure systems are planned and built with efficiency in mind from the start

Preferred Qualifications

  • Proficiency with Triton or CUDA, especially writing custom kernels and fusions for hot code paths
  • Experience writing high-performance parallel C++, particularly in a machine learning context (e.g., data loading, inference)
  • Previous work building efficient ML demos or inference environments (Gradio, Docker, etc.)
  • Experience deploying models on non-NVIDIA hardware platforms

Benefits

  • Base Salary: $220,000 – $300,000 / year (based on experience & location)
  • Equity: Generous stock options
  • Full health coverage
  • Flexible PTO
  • Home office support

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs