Research Scientist, Large-Scale Learning

Together AI Logo

Together AI

📍Remote - United States, Netherlands

Summary

Join Together AI's Model Shaping team as a Research Scientist in Large-Scale Learning and contribute to increasing the efficiency of training foundation models. You will analyze state-of-the-art neural network training techniques, propose and implement new approaches, and present your findings at leading ML/ML Systems conferences. Collaborate with Machine Learning Engineers to integrate improvements into Together's platform. This role demands autonomous research design, implementation, and validation, along with strong communication skills. The ideal candidate will have a proven publication record and a passion for applying research to real-world impact.

Requirements

  • Can autonomously design, implement, and validate your research ideas
  • Skilled at writing high-quality and efficient code in Python and PyTorch
  • Have first-author publications at leading conferences on ML or ML Systems (ICLR, ICML, NeurIPS, MLSys)
  • Are a strong communicator, ready to both discuss your research plans with other scientists and explain them to broader audience
  • Follow the latest advances in relevant subfields of AI
  • Passionate about seeing your research create real-world impact through Together AI's services and willing to work hands-on with production systems to achieve it

Responsibilities

  • Define and drive the research agenda around efficiency and performance of foundation model training
  • Study recent results from the broader AI research community, analyzing their relevance to the team’s research directions and ongoing projects
  • Conduct experiments to empirically validate your hypotheses and compare the outcomes with relevant related work
  • Share your findings both internally and externally (e.g., at top-tier conferences on ML and ML Systems)
  • Partner with Machine Learning Engineers to integrate advanced methods into Together’s Model Shaping platform

Preferred Qualifications

  • Algorithmic modifications of large neural network training (e.g., novel optimization algorithms or model adaptation techniques)
  • Distributed optimization (including federated learning, communication-efficient optimization, and decentralized training)
  • ML systems optimizations for distributed training, memory efficiency, or compute efficiency
  • Writing optimized NVIDIA GPU kernels or communication collectives using NVIDIA’s networking stack (e.g., NCCL or NVSHMEM)
  • Running large-scale experiments on GPU clusters

Benefits

  • Health insurance
  • Startup equity
  • Flexibility in terms of remote work

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.