Remote Data Engineer

Logo of CDC Foundation

CDC Foundation

πŸ’΅ $103k-$143k
πŸ“Remote - United States

Job highlights

Summary

Join the CDC Foundation in advancing public health by designing and implementing modern data infrastructure for the Northwest Portland Area Indian Health Board (NPAIHB) Data Hub project. As a Data Engineer, you will play a crucial role in creating scalable solutions that align with epidemiological needs, ensuring accurate and reliable data release, and collaborating closely with cross-functional teams to understand current and future data requirements.

Requirements

  • Bachelor's degree in Computer Science, Information Technology, Data Science, or a related field
  • Minimum of five (5) years of related informatics experience, preferably with three (3) years of experience in a lead data engineer position
  • Demonstrated expertise in building SQL relational databases and transitioning non-relational data into a structured relational format, ensuring seamless integration and optimized performance
  • Proficiency in SQL programming and other languages commonly used in data engineering, such as Python, Java, Scala. Candidate should be able to implement data automations within existing frameworks as opposed to writing one off scripts
  • Experience transforming and preparing data into formats suitable for data visualization software, ensuring it is structured for optimal use in dashboards and other visual outputs
  • Strong understanding of database systems, including relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra), with PostgreSQL preferred
  • Experience regarding engineering best practices such as source control, automated testing, continuous integration and deployment, and peer review, and serving as a subject matter expert on these topics
  • Knowledge of data warehousing concepts and tools
  • Experience with cloud computing platforms, with preference for experience in AWS environment
  • Expertise in data modeling, ETL (Extract, Transform, Load) processes, and data integration techniques
  • Familiarity with agile development methodologies, software design patterns, and best practices
  • Strong analytical thinking and problem-solving abilities
  • Excellent verbal and written communication skills, including the ability to convey technical concepts to non-technical partners effectively
  • Ability to travel occasionally for in-person meetings (travel costs will be covered by NPAIHB)

Responsibilities

  • Design a data hub roadmap to streamline secure and reliable data management, including ingestion, processing, and storage through enhancements or implementation of new systems and pipelines
  • Load data into storage systems or data warehouses, transforming, cleaning, and organizing with dimensional modeling techniques to ensure accuracy, consistency, and efficient querying
  • Transform and structure data to ensure it is optimized for use in data visualization software, enabling accurate and effective visual representations of epidemiological data
  • Collaborate closely with the project epidemiologist to ensure they gain a comprehensive understanding of the data pipeline architecture and data engineering methods to support long-term maintenance and sustainability of the system
  • Collaborate closely with project epidemiologist to understand data requirements and ensure that data infrastructure and workflows align with epidemiological needs
  • Ensure thorough and clear documentation of database architecture and workflows to promote sustainability, consistency, and ease of maintenance
  • Define business rules around data governance for the Data Hub. Apply rigorous data quality checks and validation processes to guarantee the accuracy and reliability of the data released, emphasizing the importance of delivering correct and trustworthy data to support public health initiatives
  • Optimize data pipelines, infrastructure, and workflows for performance and scalability
  • Monitor data pipelines and systems for performance issues, errors, and anomalies, and implement solutions to address them
  • Analyze and interpret datasets to identify data management needs and advise on data management strategy
  • Implement security measures to protect sensitive information
  • Collaborate with epidemiologists, analysts, and other partners to understand current and future data needs and requirements, and to ensure that the data infrastructure supports the organization's goals and objectives
  • Implement and maintain ETL processes to ensure the accuracy, completeness, and consistency of data
  • Design and manage data storage systems, including migration of SAS datasets to PostgreSQL relational database

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let CDC Foundation know you found this job on JobsCollider. Thanks! πŸ™