Staff Machine Learning Engineer

Tempus Labs, Inc. Logo

Tempus Labs, Inc.

πŸ’΅ $170k-$230k
πŸ“Remote - United States

Summary

Join Tempus as a Staff Machine Learning Engineer and play a pivotal role in designing, building, and optimizing the data infrastructure for our advanced generative AI models. You will manage the lifecycle of vast datasets, from ingestion and processing to integration and retrieval, enabling our AI to learn from real-world evidence. This critical role involves architecting sophisticated data processing workflows, developing efficient data ingestion strategies, and optimizing data storage for large-scale training. You will collaborate with various teams, establish robust monitoring systems, and actively manage data processing costs. This position requires a strong academic background and extensive experience in large-scale data pipelines and infrastructure. The role offers a competitive salary and a full range of benefits.

Requirements

  • Master's degree in Computer Science, Artificial Intelligence, Software Engineering, or a related field. A strong academic background with a focus on AI data engineering
  • Proven track record (8+ years of industry experience) in designing, building, and operating large-scale data pipelines and infrastructure in a production environment
  • Strong experience working with massive, heterogeneous datasets (TBs+) and modern distributed data processing tools and frameworks such as Apache Spark, Ray, or Dask
  • Strong, hands-on experience with tools and libraries specifically designed for large-scale ML data handling, such as Hugging Face Datasets, MosaicML Streaming, or similar frameworks (e.g., WebDataset, Petastorm). Experience with MLOps tools and platforms (e.g., MLflow, Kubeflow, SageMaker Pipelines)
  • Understanding of the data challenges specific to training large models (Foundation Models, LLMs, Multimodal Models)
  • Proficiency in programming languages like Python and experience with modern distributed data processing tools and frameworks
  • Proven ability to bring thought leadership to the product and engineering teams, influencing technical direction and data strategy
  • Experience mentoring junior engineers and collaborating effectively with cross-functional teams (Research Scientists, ML Engineers, Platform Engineers, Product Managers, Clinicians)
  • Excellent communication skills, capable of explaining complex technical concepts to diverse audiences
  • Strong bias-to-action and ability to thrive in a fast-paced, dynamic research and development environment
  • A pragmatic approach focused on delivering rapid, iterative, and measurable progress towards impactful goals

Responsibilities

  • Architect and build sophisticated data processing workflows responsible for ingesting, processing, and preparing multimodal training data that seamlessly integrate with large-scale distributed ML training frameworks and infrastructure (GPU clusters)
  • Develop strategies for efficient, compliant data ingestion from diverse sources, including internal databases, third-party APIs, public biomedical datasets, and Tempus's proprietary data ecosystem
  • Utilize, optimize, and contribute to frameworks specialized for large-scale ML data loading and streaming (e.g., MosaicML Streaming, Ray Data, HF Datasets)
  • Collaborate closely with infrastructure and platform teams to leverage and optimize cloud-native services (primarily GCP) for performance, cost-efficiency, and security
  • Engineer efficient connectors and data loaders for accessing and processing information from diverse knowledge sources, such as knowledge graphs, internal structured databases, biomedical literature repositories (e.g., PubMed), and curated ontologies
  • Optimize data storage for efficient large scale training training and knowledge access
  • Orchestrate, monitor, and troubleshoot complex data workflows using tools like Airflow, Kubeflow Pipelines
  • Establish robust monitoring, logging, and alerting systems for data pipeline health, data drift detection, and data quality assurance, providing feedback loops for continuous improvement
  • Analyze and optimize data I/O performance bottlenecks considering storage systems, network bandwidth and compute resources
  • Actively manage and seek optimizations for the costs associated with storing and processing massive datasets in the cloud

Preferred Qualifications

  • Advanced degree (PhD) in Computer Science, Engineering, Bioinformatics, or a related field
  • Contributions to relevant open-source projects
  • Direct experience working with clinical or biological data (EHR, genomics, medical imaging)

Benefits

  • Medical and other benefits
  • Incentive compensation
  • Restricted stock units

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.