Remote Data Ingestion Engineer

Logo of Encora

Encora

📍Remote - Mexico

Job highlights

Summary

Join our team as a Data Ingestion Engineer and be responsible for managing data ingestion processes, platform management, and optimizing ETL/ELT pipelines on the Databricks platform. The role involves ensuring compliance with data quality, security, and PII management standards while utilizing system integration tools and CI/CD methodologies.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, Statistics, or a related field
  • 3+ years of experience in data engineering, ingestion pipelines, and ETL/ELT processes
  • Strong knowledge of data lake ingestion processes and best practices
  • Expertise in CI/CD pipelines and version control systems like Git
  • Proficient in agile methodologies

Responsibilities

  • Ingest data from various source systems, adapting ingestion strategies as needed
  • Manage and oversee ETL/ELT pipelines on the Databricks platform
  • Optimize data pipelines for scalability, speed, and performance
  • Document ingestion, integration flows, and pipelines for clarity and future reference
  • Schedule and automate ingestion jobs using Apache Airflow
  • Manage metadata and master data within the technical data catalog
  • Ensure adherence to security, compliance guidelines, and PII management during data ingestion
  • Maintain pipeline infrastructure and implement automated monitoring strategies
  • Follow SDLC best practices to ensure quality and consistency

Preferred Qualifications

  • Hands-on experience with Spark/Scala, SQL, and Python/PySpark
  • Proficiency in working with Databricks and Unity Catalog
  • Experience with ETL/ELT development, monitoring, and pipelining tools such as Apache Airflow
  • Knowledge of ingestion tools such as Dell Boomi
  • Strong understanding of data quality guidelines and best practices for managing large data sets

Job description

Important Information:

  • Years of Experience: 3+ years of experience in data engineering, ingestion pipelining, and ETL/ELT.
  • Job Mode: Full-time.
  • Work Mode: Remote.

Job Summary:

The Data Ingestion Engineer will be responsible for managing data ingestion processes, platform management, and optimizing ETL/ELT pipelines on the Databricks platform. The role involves ensuring compliance with data quality, security, and PII management standards while utilizing system integration tools and CI/CD methodologies. This position also focuses on documenting workflows, managing metadata, and supporting the overall data pipeline infrastructure.

Responsibilities and Duties:

  • Ingest data from various source systems, adapting ingestion strategies as needed.
  • Manage and oversee ETL/ELT pipelines on the Databricks platform.
  • Optimize data pipelines for scalability, speed, and performance.
  • Document ingestion, integration flows, and pipelines for clarity and future reference.
  • Schedule and automate ingestion jobs using Apache Airflow.
  • Manage metadata and master data within the technical data catalog.
  • Ensure adherence to security, compliance guidelines, and PII management during data ingestion.
  • Maintain pipeline infrastructure and implement automated monitoring strategies.
  • Follow SDLC best practices to ensure quality and consistency.

Qualifications and Skills:

  • Bachelor’s degree in Computer Science, Engineering, Statistics, or a related field.
  • 3+ years of experience in data engineering, ingestion pipelines, and ETL/ELT processes.
  • Strong knowledge of data lake ingestion processes and best practices.
  • Expertise in CI/CD pipelines and version control systems like Git.
  • Proficient in agile methodologies.

Role-specific Requirements:

  • Hands-on experience with Spark/Scala, SQL, and Python/PySpark.
  • Proficiency in working with Databricks and Unity Catalog.
  • Experience with ETL/ELT development, monitoring, and pipelining tools such as Apache Airflow.
  • Knowledge of ingestion tools such as Dell Boomi.
  • Strong understanding of data quality guidelines and best practices for managing large data sets.

Technologies:

  • Databricks, Spark/Scala, SQL, Python/PySpark.
  • Apache Airflow, Unity Catalog, Dell Boomi.
  • CI/CD tools, Git, and version control.

Skillset Competencies:

  • Data pipeline management and optimization.
  • ETL/ELT pipeline development and monitoring.
  • System integration and metadata management.
  • Compliance with security and PII guidelines.
  • Automated monitoring and CI/CD implementation.

About Encora:

Encora is the preferred digital engineering and modernization partner of some of the world’s leading enterprises and digital-native companies. With over 9,000 experts in 47+ offices and innovation labs worldwide, Encora’s technology practices include Product Engineering & Development, Cloud Services, Quality Engineering, DevSecOps, Data & Analytics, Digital Experience, Cybersecurity, and AI & LLM Engineering.

At Encora, we hire professionals based solely on their skills and qualifications, and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality.

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Encora know you found this job on JobsCollider. Thanks! 🙏