Senior/Staff Scientist, Cheminformatics, Data Science

Deep Origin
Summary
Join Deep Origin as a Senior or Staff Scientist and contribute to a groundbreaking ARPA-H funded initiative focused on developing scalable computational models for predicting protein-ligand binding. You will collaborate with a multidisciplinary team to lead ligand- and structure-based modeling efforts, curate data, and develop high-performing predictive models accelerating molecular design. This role requires a Ph.D. in a relevant field and 2+ years of industry experience in drug discovery or pharmaceutical data science. Strong proficiency in Python and experience with machine learning model development are essential. The position offers a remote-friendly environment, competitive compensation, flexible schedules, and a collaborative culture. Deep Origin builds cloud-native tools for scientists working on the frontiers of biology, chemistry, and computation, offering a unique opportunity to impact therapeutic discovery.
Requirements
- Ph.D. in cheminformatics, computational chemistry, chemical biology, or a related field
- 2+ years of industry experience in drug discovery or pharmaceutical data science
- Deep familiarity with protein-ligand interactions, protein structure handling, and ligand representations
- Strong proficiency in Python, with hands-on experience using RDKit, scikit-learn, PyTorch, or similar frameworks
- A strong foundation in ML model development, validation, and interpretability
Responsibilities
- Design, curate, and process diverse chemical and biological datasets (ChEMBL, BindingDB, RCSB PDB, UniProt, etc.)
- Build and evaluate ligand-based ML models (e.g., QSAR) using methods like gradient-boosted trees, neural nets, and graph neural networks (GNNs)
- Incorporate structural features of proteins and ligands into affinity prediction workflows
- Collaborate closely with medicinal chemists, biologists, and data scientists to support data-driven discovery decisions
- Plan and organize work to meet key deadlines and milestones; coordinate with collaborators to ensure integration with broader project efforts
- Communicate effectively within Deep Origin and with external partners, regularly sharing updates on progress, blockers, and decisions
- Clearly communicate model performance, limitations, and insights to both technical and non-technical teams
Preferred Qualifications
- Experience working with DNA-encoded libraries datasets, including data quality control and preprocessing
- Exposure to protein structure analysis, scoring functions, or ligand docking workflows (e.g., AutoDock Vina, etc.)
- Familiarity with cloud platforms (AWS, GCP) and workflow orchestration tools
Benefits
- A remote-friendly environment (with hubs in the US and Armenia)
- Competitive compensation and equity packages
- Flexible schedules
- A culture that encourages curiosity, autonomy, and creative problem-solving