Clinical Data Engineer

Vibrent Health Logo

Vibrent Health

πŸ“Remote - United States

Summary

Join Vibrent Health as a Research Data Engineer and bridge the gap between research methodology and engineering implementation. You will transform data from various sources into research-ready datasets, empowering researchers to derive meaningful insights. This role involves working with large datasets, ensuring data quality and compliance, and collaborating with cross-functional teams. You will design, implement, and maintain data pipelines and infrastructure. Success requires a strong understanding of clinical data collection and extraction processes, along with project management and technical expertise. Vibrent is a remote-first organization, offering a competitive compensation package and opportunities for professional development. The ideal candidate will have extensive experience in data engineering within the healthcare or clinical research domain and direct collaboration with researchers.

Requirements

  • Bachelor's or Master's degree in Computer Science, Data Engineering, Biomedical Informatics, or a related field
  • 2+ years of recent experience directly collaborating with researchers, data scientists, clinical data managers, and research project managers to develop solutions to support research goals
  • 5+ years of experience in data engineering, preferably in healthcare, clinical research, or life sciences
  • Building data pipelines to manage heterogenous data ingestions or similar in data integration across multiple sources including collected data
  • Proven track record of handling health/clinical datasets and supporting research analysis
  • Experience creating ELT and ETL to ingest data into data warehouse and data lakes
  • Experience visualizing large datasets with BI tools and other data visualization methods
  • Experience working with genomic data, imaging data, and wearable device data
  • Experience of data modeling, database design, and data governance
  • Experience deploying data pipelines in the cloud
  • Experience with unstructured data processing and transformation
  • Experience developing and maintaining data pipelines for large amounts of data efficiently
  • Knowledgeable of research processes and language in biological or medical fields and be able to effectively communicate and support researchers in these domains with technological and methodological expertise
  • Strong understating of end-to-end processes for data collection, extraction and analysis needs by end users in research
  • Strong ability to develop technical specifications based on communication from stakeholders
  • Knowledge of statistical analysis techniques and tools used in medical research
  • Expert level proficiency with Python/R; experienced in creating custom functions
  • Strong SQL and database design skills (PostgreSQL, MySQL, SQL Server, NoSQL databases)
  • Utilizes GitLab, GitHub
  • Proficiency in data processing frameworks such as Dagster, DBT, or Meltano
  • Strong proficiency in utilizing cloud platforms (AWS, Azure, or GCP) with Snowflake for setting up and working with data warehouse, data lakes. Snowflake experience is required
  • Must understand database concepts. Knowledge of XML, JSON, APIs
  • Knowledge of healthcare data standards (FHIR, HL7, CDISC, OMOP) and clinical terminologies (LOINC, SNOMED, ICD)
  • Familiarity with compliance frameworks such as HIPAA, GDPR, or GxP

Responsibilities

  • Design, maintain, and optimize data models to create robust, research-ready datasets
  • Build required infrastructure for optimal data extraction, transformation and loading of data using cloud technologies like AWS, Azure etc
  • Design, develop, and optimize data pipelines to ingest, process, and store clinical and medical research data from various sources (e.g., EHRs, clinical trials, wearable devices, genomic data)
  • Ensure efficient ETL/ELT processes for transforming raw data into structured, analysis-ready datasets
  • Build and refine ETL processes using SQL and Python to transform raw health data into structured formats suitable for analysis
  • Develop and maintain data warehouses, databases, and data lakes tailored to research needs
  • Collaborate with engineering teams to ensure data pipelines are reliable, scalable, and performant
  • Provide technical leadership on various aspects of clinical data flow including assisting with the definition, build, and validation of application program interfaces (APIs), data streams, data staging to various systems for data extraction and integration
  • Collaborate with clinical researchers, data scientists, and IT teams to understand data requirements and develop solutions that support their research goals
  • Coordinate with downstream users to ensure that outputs meet requirements of end users
  • Translate research questions into optimized queries, aggregations, and summaries that facilitate quick, accurate analysis
  • Provide technical support to research teams by enabling efficient data access and analysis
  • Work with regulatory and compliance teams to ensure adherence to industry regulations (e.g., HIPAA, GDPR, FDA 21 CFR Part 11)
  • Work closely with engineers, product managers, and researchers to integrate research needs into the product roadmap
  • Participate in code reviews, agile sprints, and continuous improvement initiatives
  • Develop custom dashboards, reports, and interactive visualizations (e.g., Tableau, Looker, or Python-based libraries) to empower stakeholders with real-time access to quality metrics and research findings
  • Develop and maintain data visualization best practices and standards to ensure consistency and quality across all reporting
  • Conduct regular evaluations of existing visualizations to ensure continued accuracy and identify areas for improvement
  • Implement data validation, cleaning, and monitoring processes to ensure data integrity and accuracy
  • Manage and maintain pipelines and troubleshoot data in data lake or warehouse
  • Establish and enforce data governance policies, including metadata management and data lineage tracking
  • Ensure proper documentation of data workflows, schemas, and transformations
  • Create and maintain comprehensive data dictionaries, metadata standards, and codebooks to enhance data transparency and reproducibility
  • Conduct periodic data quality checks and audits to ensure compliance with research standards and regulatory requirements
  • Implement security best practices to safeguard sensitive patient data in compliance with industry regulations
  • Conduct periodic audits and risk assessments to identify potential vulnerabilities and data integrity issues
  • Utilize cloud-based platforms (AWS, Azure, GCP) to build scalable and reliable data infrastructure
  • Leverage programming languages and frameworks (Python, SQL, R, Spark) to develop efficient data solutions
  • Employ machine learning pipelines and AI-driven techniques to support advanced research initiatives

Preferred Qualifications

  • Ph.D. in a relevant field (e.g., Epidemiology, Public Health, Health Informatics, Biostatistics, Data Science)
  • Certification in cloud platforms or healthcare data management (e.g., AWS Certified Data Analytics, Certified Health Data Analyst - CHDA)
  • Familiarity with FAIR data principles (Findability, Accessibility, Interoperability, and Reusability)

Benefits

  • Competitive compensation package
  • Over-average 401k match
  • Benefits you need to prioritize self and family care
  • Support for your further education and career development
  • Remote work

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs