Summary

Join a lean, expert team building next-gen AI from the ground up. As a Senior Machine Learning Engineer, you will optimize the performance of state-of-the-art foundation models across various hardware environments. You will design and maintain abstractions for efficient scaling of model performance, profile and optimize memory usage and latency in PyTorch, and benchmark model and system performance. Collaboration with hardware and systems partners to identify bottlenecks and improve performance is crucial. Your work will directly influence the scalability, cost, performance, and accessibility of our models. This is a remote, full-time position offering a competitive salary and benefits package.

Requirements

Deep experience profiling and optimizing PyTorch code for performance (memory, latency, throughput)
Familiarity with tools like torch.compile , torch.XLA , PyTorch profiler, and memory or trace viewers
Experience building performance-portable abstractions and optimizing ML pipelines for a variety of hardware/software stacks
Strong understanding of transformer models and modern attention mechanisms
Hands-on work with parallel inference strategies (tensor parallelism, pipeline parallelism, etc.)

Responsibilities

Design and maintain abstractions that scale model performance efficiently across heterogeneous hardware platforms—not just CUDA/NVIDIA
Profile and optimize memory usage, latency, and throughput in PyTorch; build or integrate low-level solutions (e.g., Triton kernels) as needed
Benchmark our model and system performance to guide product decisions around cost, throughput, and deployment tradeoffs
Collaborate with hardware and systems partners to uncover bottlenecks and push for performance improvements in future iterations
Work hand-in-hand with research and engineering teams to ensure systems are planned and built with efficiency in mind from the start

Preferred Qualifications

Proficiency with Triton or CUDA, especially writing custom kernels and fusions for hot code paths
Experience writing high-performance parallel C++, particularly in a machine learning context (e.g., data loading, inference)
Previous work building efficient ML demos or inference environments (Gradio, Docker, etc.)
Experience deploying models on non-NVIDIA hardware platforms

Benefits

Base Salary: $220,000 – $300,000 / year (based on experience & location)
Equity: Generous stock options
Full health coverage
Flexible PTO
Home office support

Senior Machine Learning Engineer

IntelliPro

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Software Development

Senior

Share this job:

Similar Remote Jobs

Remote

Software Development

Senior

Remote

Software Development

Senior

ServiceNow

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Red Cell Partners

Remote

Software Development

Senior

Marker Learning

Remote

Software Development

Senior

Penn Interactive

Remote

Software Development

Senior