Data Engineer

CDC Foundation
Summary
Join the CDC Foundation as a Data Engineer and play a crucial role in advancing its mission by designing, building, and maintaining data infrastructure for a public health organization. Working remotely within the Boston Public Health Commission, you will deliver the architecture for data generation, storage, processing, and analysis. Collaborate with various teams to implement solutions meeting public health agency needs. This grant-funded, limited-term position (ending June 30, 2026) offers a fully remote work arrangement for U.S.-based candidates, with a salary range of $103,500-$143,500 plus benefits. The position involves creating and managing data systems and pipelines, optimizing data workflows, implementing security measures, and collaborating with stakeholders. You will be hired by the CDC Foundation and assigned to the Boston Public Health Commission.
Requirements
- Bachelor's degree in Computer Science, Information Technology, Data Science, or a related field
- Minimum of five (5) years of experience in building Data Warehouse and/or Data Lake implementations in a product-centric environment
- Proficiency in programming languages commonly used in data engineering, such as Python, Java, Scala, and SQL Candidate should be able to implement data automations within existing frameworks as opposed to writing one off scripts
- Experience with big data technologies and frameworks like Hadoop, Spark, Kafka, and Flink
- Strong understanding of database systems, including relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra)
- Experience regarding engineering best practices such as source control, automated testing, continuous integration and deployment, and peer review
- Knowledge of data warehousing concepts and tools
- Experience with cloud computing platforms. Microsoft Azure is a plus
- Expertise in data modeling, both in ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes, and data integration techniques
- Experience in building Data Warehouse, Data Lake or other Data Platforms in Microsoft Azure platform using Azure Data Factory, SQL Server, Azure Blob Storage (Data Lake Gen2), and Powerbi Visualization tool
- Familiarity with agile development methodologies, software design patterns, and best practices
- Strong analytical thinking and problem-solving abilities
- Excellent verbal and written communication skills, including the ability to convey technical concepts to non-technical partners effectively
- Flexibility to adapt to evolving project requirements and priorities
- Outstanding interpersonal and teamwork skills; and the ability to develop productive working relationships with colleagues and partners
- Experience working in a virtual environment with remote partners and teams
- Experience working in Agile/Scrum environments
- Proficiency in Microsoft Office
Responsibilities
- Create and manage the systems and pipelines that enable efficient and reliable flow of data, including ingestion, processing, and storage
- Collect data from various sources, transforming and cleaning it to ensure accuracy and consistency. Load data into storage systems, data lakes or data warehouses
- Optimize data pipelines, infrastructure, and workflows for performance and scalability
- Design, create, test, deploy and maintain data pipelines that deliver curated, value-added data assets such as data lakes and other purpose-built data stores. Ensure data pipelines are optimized, highly reliable, and contain low technical debt
- Monitor data pipelines and systems for performance issues, errors, and anomalies, and implement solutions to address them
- Implement security measures to protect sensitive information
- Collaborate with data scientists, analysts, and other partners to understand their data needs and requirements, and to ensure that the data infrastructure supports the organization's goals and objectives
- Collaborate with cross-functional teams to understand data requirements and design scalable solutions that meet business needs
- Implement and maintain ETL and ELT processes to ensure the accuracy, completeness, and consistency of data
- Design and manage data storage systems, including relational databases, NoSQL databases, and data warehouses
- Knowledgeable about industry trends, best practices, and emerging technologies in data engineering, and incorporating the trends into the organization's data infrastructure
- Provide technical guidance to other staff
- Communicate effectively with partners at all levels of the organization to gather requirements, provide updates, and present findings
Preferred Qualifications
- Knowledge of SAS and R is desirable
- Hands-on experience with Power BI, Tableau, or other BI tools for data visualization
- Familiarity with Delta Lake, Medallion Architecture, or Lakehouse architecture
- Experience with Terraform, Bicep, or ARM templates for Azure infrastructure as code
- Microsoft certifications such as:Azure Data Engineer Associate (DP-203)Azure Fundamentals (AZ-900)
Benefits
- Salary Range: 103,500-143,500, plus benefits
- Location: Remote, must be based in the United States
- Work Schedule: 8am β 5pm Eastern Standard Time