Machine Learning Engineer

SandboxAQ Logo

SandboxAQ

💵 $167k-$234k
📍Remote - United States, Canada

Summary

Join SandboxAQ's AI Simulation team as a Machine Learning Engineer to develop AI systems for drug and materials discovery. You will work with a team to architect and train AI systems, leveraging expertise in Large Language Models and multi-modal data processing. Responsibilities include developing scalable ML software, designing ML algorithms for NGS sequencing pipelines, and applying reasoning techniques for extracting insights from various data sources. You will also curate data, research novel bioinformatics approaches, and communicate findings effectively. The role requires a Ph.D. in a related field and 3-5 years of relevant experience. Preferred qualifications include familiarity with Generative AI and a strong publication record.

Requirements

  • Ph.D. in Computer Science, Computational Biology, High-Performance Computing, or a related field
  • 3–5 years of hands-on experience, preferably in the private sector, working on one or more of the following
  • Large Language Models and GenAI techniques
  • NGS sequencing pipelines
  • Graph neural networks
  • Experience in processing and curating multi-modal data—including large-scale omics, clinical datasets, and scientific literature
  • Proficiency in running analyses and training machine learning or deep learning models in high-performance computing (HPC) environments, particularly those using GPUs
  • Strong collaboration mindset, with the ability to identify problems and communicate technical concepts clearly to both technical and non-technical stakeholders
  • Demonstrated ability to dive deep into technically complex problems and a track record of driving initiatives through to completion

Responsibilities

  • Develop robust, scalable ML software for predictive and generative modeling tasks related to genomics data (eg. Interactome, Cell & Tissue modeling)
  • Design and implement ML algorithms to enhance NGS sequencing pipelines
  • Apply reasoning techniques—including LLMs, Graph Neural Networks, Gen AI models—for extracting insights to advance drug discovery from simulation, omics data, and literature
  • Identify, ingest, and curate relevant data sources. Own data quality control, validation, and integration workflows
  • Research and prototype novel bioinformatics and deep learning approaches to interpret human genetic variants, gene regulation mechanisms and disease pathways using diverse multimodal data (e.g. multi-omics, single-cell data, proteomics, genomics, biomedical imaging)
  • Communicate complex ideas effectively across audiences, including internal collaborators, external stakeholders, and clients—tailoring technical depth as needed
  • Contribute to the scientific community through patent filings, peer-reviewed publications, white papers, and conference presentations

Preferred Qualifications

  • Familiarity with advanced AI concepts, including
  • Generative AI (LLMs, Biological Foundation Models, Diffusion & Optimal Transport techniques)
  • ML-based advancements in NGS sequencing pipelines
  • Biomedical Imaging
  • Demonstrate good grasp of molecular biology concepts, particularly the central dogma (DNA, RNA, protein), and related high-throughput technologies such as RNA-seq, epigenomics, single-cell and spatial omics
  • Working knowledge of graph databases and graph data structures
  • Strong publication record in peer-reviewed venues (eg. NeurIPS, ICLR, ICML, CVPR, ECCV, ICCV)
  • Willingness to travel up to 25% for conferences, customer engagements, team offsites, or internal meetings

Benefits

  • Medical/dental/vision
  • Family planning/fertility
  • PTO (summer and winter breaks)
  • Financial wellness resources
  • 401(k) plans

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.