Summary

Join Encora as a ML Data Engineer and be responsible for designing, developing, and maintaining high-quality software solutions. You will collaborate with cross-functional teams, lead technical projects, mentor junior engineers, and improve software development practices. This remote position, based in Peru, Colombia, Costa Rica, or Bolivia, requires extensive experience in software development and a focus on building scalable applications. You will work with Databricks, Apache Spark, and MLflow, among other technologies. The role involves feature engineering, data integration, pipeline development, and ensuring data governance and compliance.

Requirements

Hold a Bachelor’s degree in computer science, software engineering, or a related field
Possess extensive experience in software development with a focus on designing and building scalable applications
Have professional/advanced English skills
Have 7+ years in data engineering and at least 4+ years focusing on ML feature engineering, ETL pipeline development, and data preparation for ML
Have proven experience managing pipelines on Databricks using Apache Spark, with a strong understanding of the medallion architecture
Be familiar with ML lifecycle management, with MLflow experience
Have advanced skills in Apache Spark (PySpark) for big data processing and analytics
Be proficient in Python for data manipulation and SQL for query optimization
Have experience building pipelines for real-time and batch model serving in production environments, and knowledge of CI/CD practices for ETL/ELT pipeline development
Possess expertise in metadata and master data management within technical data catalogues
Understand data security and compliance, especially with sensitive data like PII

Responsibilities

Develop and maintain feature engineering pipelines using Databricks to support ML models effectively
Integrate diverse data sources (e.g., clickstreams, user behaviour, demographic data) to create user behaviour features/profiles for complex ML tasks
Design and implement ETL/ELT pipelines aligned with the bronze, silver, and gold layers of the medallion architecture
Build data pipelines to support ML model training, calibration, and deployment, leveraging MLflow for experiment tracking and performance monitoring
Design low-latency, production-ready data pipelines to support real-time and batch model inference
Apply CI/CD principles for seamless pipeline deployment
Ensure pipelines comply with security and regulatory standards, particularly for handling PII, and maintain metadata and master data across the data catalogue
Work closely with ml scientists, ml engineers, and other stakeholders to align data transformation with business objectives

Benefits

Remote work

Ml Data Engineer

Encora

Summary

Requirements

Responsibilities

Benefits

Remote

Data

Mid-level

Share this job:

Similar Remote Jobs

Stryber

Remote

Data

Mid-level

Remote

Data

Mid-level

Remote

Data

Senior

Remote

Data

Mid-level

Remote

Software Development

Senior

Remote

Data

Mid-level

Verasity

Remote

Data

Manager

Scalable

Remote

Software Development

Senior

Scalable

Remote

Software Development

Senior