Summary
Join the CDC Foundation as a Data Engineer to design, build, and maintain data infrastructure for a public health organization. This role will enable data integration and preparation pipelines for downstream analytics on behalf of the Office of Epidemiology and Population Health.
Requirements
- Bachelor's degree in computer science or information systems, or equivalent experience
- Demonstrated ability in complex data management and data preparation, including but not limited to data storage, data standardization, and data operations, for data warehousing efforts
- Experience working with data integration frameworks
- Experience working with cloud services & infrastructure (Microsoft Azure Databricks preferred)
- Experience in designing, writing, and delivering code in a team environment, using source code control, unit testing, and other software engineering principles (e.g., Java, Python, R)
- Ability to thrive in a project-based, team environment
Responsibilities
- Utilize software engineering methods and tools on a common data analytic platform to integrate, process and prepare multiple sources of data for downstream public health surveillance analyses
- Collaborate with the Data Analyst and Epidemiologists to understand data requirements, develop and maintain data pipelines automating data transformation tasks
- Perform data linkages between public health surveillance data and geospatial data assets
- Document data transformation processes and maintain comprehensive records for reproducibility
- Test data and/or applications to validate data accuracy/quality
- Track projects from conceptualization to completion, including helping to create project roadmaps, project plans and requirements documentation
- Create and manage the systems and pipelines that enable efficient and reliable flow of data, including ingestion, processing, and storage
- Collect data from various sources, transforming and cleaning it to ensure accuracy and consistency. Load data into storage systems or data warehouses
- Optimize data pipelines, infrastructure, and workflows for performance and scalability
- Monitor data pipelines and systems for performance issues, errors, and anomalies, and implement solutions to address them
- Implement security measures to protect sensitive information
- Collaborate with data scientists, analysts, and other partners to understand their data needs and requirements, and to ensure that the data infrastructure supports the organization's goals and objectives
- Collaborate with cross-functional teams to understand data requirements and design scalable solutions that meet business needs
- Implement and maintain ETL processes to ensure the accuracy, completeness, and consistency of data
- Design and manage data storage systems, including relational databases, NoSQL databases, and data warehouses
- Knowledgeable about industry trends, best practices, and emerging technologies in data engineering, and incorporating the trends into the organization's data infrastructure
- Provide technical guidance to other staff
- Communicate effectively with partners at all levels of the organization to gather requirements, provide updates, and present findings
Preferred Qualifications
Spatial data experience, e.g. geopandas or ArcGIS