Summary

Join Tempus as a Staff Machine Learning Engineer and play a pivotal role in designing, building, and optimizing the data infrastructure for our advanced generative AI models. You will manage the lifecycle of vast datasets, from ingestion and processing to integration and retrieval, enabling our AI to learn from real-world evidence. This critical role involves architecting sophisticated data processing workflows, developing efficient data ingestion strategies, and optimizing data storage for large-scale training. You will collaborate with various teams, establish robust monitoring systems, and actively manage data processing costs. This position requires a strong academic background and extensive experience in large-scale data pipelines and infrastructure. The role offers a competitive salary and a full range of benefits.

Requirements

Master's degree in Computer Science, Artificial Intelligence, Software Engineering, or a related field. A strong academic background with a focus on AI data engineering
Proven track record (8+ years of industry experience) in designing, building, and operating large-scale data pipelines and infrastructure in a production environment
Strong experience working with massive, heterogeneous datasets (TBs+) and modern distributed data processing tools and frameworks such as Apache Spark, Ray, or Dask
Strong, hands-on experience with tools and libraries specifically designed for large-scale ML data handling, such as Hugging Face Datasets, MosaicML Streaming, or similar frameworks (e.g., WebDataset, Petastorm). Experience with MLOps tools and platforms (e.g., MLflow, Kubeflow, SageMaker Pipelines)
Understanding of the data challenges specific to training large models (Foundation Models, LLMs, Multimodal Models)
Proficiency in programming languages like Python and experience with modern distributed data processing tools and frameworks
Proven ability to bring thought leadership to the product and engineering teams, influencing technical direction and data strategy
Experience mentoring junior engineers and collaborating effectively with cross-functional teams (Research Scientists, ML Engineers, Platform Engineers, Product Managers, Clinicians)
Excellent communication skills, capable of explaining complex technical concepts to diverse audiences
Strong bias-to-action and ability to thrive in a fast-paced, dynamic research and development environment
A pragmatic approach focused on delivering rapid, iterative, and measurable progress towards impactful goals

Responsibilities

Architect and build sophisticated data processing workflows responsible for ingesting, processing, and preparing multimodal training data that seamlessly integrate with large-scale distributed ML training frameworks and infrastructure (GPU clusters)
Develop strategies for efficient, compliant data ingestion from diverse sources, including internal databases, third-party APIs, public biomedical datasets, and Tempus's proprietary data ecosystem
Utilize, optimize, and contribute to frameworks specialized for large-scale ML data loading and streaming (e.g., MosaicML Streaming, Ray Data, HF Datasets)
Collaborate closely with infrastructure and platform teams to leverage and optimize cloud-native services (primarily GCP) for performance, cost-efficiency, and security
Engineer efficient connectors and data loaders for accessing and processing information from diverse knowledge sources, such as knowledge graphs, internal structured databases, biomedical literature repositories (e.g., PubMed), and curated ontologies
Optimize data storage for efficient large scale training training and knowledge access
Orchestrate, monitor, and troubleshoot complex data workflows using tools like Airflow, Kubeflow Pipelines
Establish robust monitoring, logging, and alerting systems for data pipeline health, data drift detection, and data quality assurance, providing feedback loops for continuous improvement
Analyze and optimize data I/O performance bottlenecks considering storage systems, network bandwidth and compute resources
Actively manage and seek optimizations for the costs associated with storing and processing massive datasets in the cloud

Preferred Qualifications

Advanced degree (PhD) in Computer Science, Engineering, Bioinformatics, or a related field
Contributions to relevant open-source projects
Direct experience working with clinical or biological data (EHR, genomics, medical imaging)

Benefits

Medical and other benefits
Incentive compensation
Restricted stock units

Staff Machine Learning Engineer

Tempus Labs, Inc.

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Software Development

Mid-level

Share this job:

Similar Remote Jobs

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Senior

Canva

Remote

Software Development

Mid-level

Canva

Remote

Software Development

Mid-level

Remote

Software Development

Senior

Remote

Software Development

Mid-level