Clinical Data Engineer at Vibrent Health

Summary

Join Vibrent Health as a Research Data Engineer and bridge the gap between research methodology and engineering implementation. You will transform data from various sources into research-ready datasets, empowering researchers to derive meaningful insights. This role involves working with large datasets, ensuring data quality and compliance, and collaborating with cross-functional teams. You will design, implement, and maintain data pipelines and infrastructure. Success requires a strong understanding of clinical data collection and extraction processes, along with project management and technical expertise. Vibrent is a remote-first organization, offering a competitive compensation package and opportunities for professional development. The ideal candidate will have extensive experience in data engineering within the healthcare or clinical research domain and direct collaboration with researchers.

Requirements

Bachelor's or Master's degree in Computer Science, Data Engineering, Biomedical Informatics, or a related field
2+ years of recent experience directly collaborating with researchers, data scientists, clinical data managers, and research project managers to develop solutions to support research goals
5+ years of experience in data engineering, preferably in healthcare, clinical research, or life sciences
Building data pipelines to manage heterogenous data ingestions or similar in data integration across multiple sources including collected data
Proven track record of handling health/clinical datasets and supporting research analysis
Experience creating ELT and ETL to ingest data into data warehouse and data lakes
Experience visualizing large datasets with BI tools and other data visualization methods
Experience working with genomic data, imaging data, and wearable device data
Experience of data modeling, database design, and data governance
Experience deploying data pipelines in the cloud
Experience with unstructured data processing and transformation
Experience developing and maintaining data pipelines for large amounts of data efficiently
Knowledgeable of research processes and language in biological or medical fields and be able to effectively communicate and support researchers in these domains with technological and methodological expertise
Strong understating of end-to-end processes for data collection, extraction and analysis needs by end users in research
Strong ability to develop technical specifications based on communication from stakeholders
Knowledge of statistical analysis techniques and tools used in medical research
Expert level proficiency with Python/R; experienced in creating custom functions
Strong SQL and database design skills (PostgreSQL, MySQL, SQL Server, NoSQL databases)
Utilizes GitLab, GitHub
Proficiency in data processing frameworks such as Dagster, DBT, or Meltano
Strong proficiency in utilizing cloud platforms (AWS, Azure, or GCP) with Snowflake for setting up and working with data warehouse, data lakes. Snowflake experience is required
Must understand database concepts. Knowledge of XML, JSON, APIs
Knowledge of healthcare data standards (FHIR, HL7, CDISC, OMOP) and clinical terminologies (LOINC, SNOMED, ICD)
Familiarity with compliance frameworks such as HIPAA, GDPR, or GxP

Responsibilities

Design, maintain, and optimize data models to create robust, research-ready datasets
Build required infrastructure for optimal data extraction, transformation and loading of data using cloud technologies like AWS, Azure etc
Design, develop, and optimize data pipelines to ingest, process, and store clinical and medical research data from various sources (e.g., EHRs, clinical trials, wearable devices, genomic data)
Ensure efficient ETL/ELT processes for transforming raw data into structured, analysis-ready datasets
Build and refine ETL processes using SQL and Python to transform raw health data into structured formats suitable for analysis
Develop and maintain data warehouses, databases, and data lakes tailored to research needs
Collaborate with engineering teams to ensure data pipelines are reliable, scalable, and performant
Provide technical leadership on various aspects of clinical data flow including assisting with the definition, build, and validation of application program interfaces (APIs), data streams, data staging to various systems for data extraction and integration
Collaborate with clinical researchers, data scientists, and IT teams to understand data requirements and develop solutions that support their research goals
Coordinate with downstream users to ensure that outputs meet requirements of end users
Translate research questions into optimized queries, aggregations, and summaries that facilitate quick, accurate analysis
Provide technical support to research teams by enabling efficient data access and analysis
Work with regulatory and compliance teams to ensure adherence to industry regulations (e.g., HIPAA, GDPR, FDA 21 CFR Part 11)
Work closely with engineers, product managers, and researchers to integrate research needs into the product roadmap
Participate in code reviews, agile sprints, and continuous improvement initiatives
Develop custom dashboards, reports, and interactive visualizations (e.g., Tableau, Looker, or Python-based libraries) to empower stakeholders with real-time access to quality metrics and research findings
Develop and maintain data visualization best practices and standards to ensure consistency and quality across all reporting
Conduct regular evaluations of existing visualizations to ensure continued accuracy and identify areas for improvement
Implement data validation, cleaning, and monitoring processes to ensure data integrity and accuracy
Manage and maintain pipelines and troubleshoot data in data lake or warehouse
Establish and enforce data governance policies, including metadata management and data lineage tracking
Ensure proper documentation of data workflows, schemas, and transformations
Create and maintain comprehensive data dictionaries, metadata standards, and codebooks to enhance data transparency and reproducibility
Conduct periodic data quality checks and audits to ensure compliance with research standards and regulatory requirements
Implement security best practices to safeguard sensitive patient data in compliance with industry regulations
Conduct periodic audits and risk assessments to identify potential vulnerabilities and data integrity issues
Utilize cloud-based platforms (AWS, Azure, GCP) to build scalable and reliable data infrastructure
Leverage programming languages and frameworks (Python, SQL, R, Spark) to develop efficient data solutions
Employ machine learning pipelines and AI-driven techniques to support advanced research initiatives

Preferred Qualifications

Ph.D. in a relevant field (e.g., Epidemiology, Public Health, Health Informatics, Biostatistics, Data Science)
Certification in cloud platforms or healthcare data management (e.g., AWS Certified Data Analytics, Certified Health Data Analyst - CHDA)
Familiarity with FAIR data principles (Findability, Accessibility, Interoperability, and Reusability)

Benefits

Competitive compensation package
Over-average 401k match
Benefits you need to prioritize self and family care
Support for your further education and career development
Remote work

Clinical Data Engineer

Vibrent Health

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Mid-level

Similar Remote Jobs

Remote

Data

Mid-level

Remote

Data

Director

Remote

Data

Mid-level

Fella Health

Remote

Data

Senior

Remote

Data

Senior

Remote

Data

Mid-level

Remote

Data

Senior

Remote

Data

Senior

RefinedScience

Remote

Data

Mid-level

Remote

Data

Senior