πKingdom of Saudi Arabia
Ml Data Engineer
Encora
πRemote - Bolivia, Colombia
Please let Encora know you found this job on JobsCollider. Thanks! π
Summary
Join Encora as a ML Data Engineer and be responsible for designing, developing, and maintaining high-quality software solutions. You will collaborate with cross-functional teams, lead technical projects, mentor junior engineers, and improve software development practices. This remote position, based in Peru, Colombia, Costa Rica, or Bolivia, requires extensive experience in software development and a focus on building scalable applications. You will work with Databricks, Apache Spark, and MLflow, among other technologies. The role involves feature engineering, data integration, pipeline development, and ensuring data governance and compliance.
Requirements
- Hold a Bachelorβs degree in computer science, software engineering, or a related field
- Possess extensive experience in software development with a focus on designing and building scalable applications
- Have professional/advanced English skills
- Have 7+ years in data engineering and at least 4+ years focusing on ML feature engineering, ETL pipeline development, and data preparation for ML
- Have proven experience managing pipelines on Databricks using Apache Spark, with a strong understanding of the medallion architecture
- Be familiar with ML lifecycle management, with MLflow experience
- Have advanced skills in Apache Spark (PySpark) for big data processing and analytics
- Be proficient in Python for data manipulation and SQL for query optimization
- Have experience building pipelines for real-time and batch model serving in production environments, and knowledge of CI/CD practices for ETL/ELT pipeline development
- Possess expertise in metadata and master data management within technical data catalogues
- Understand data security and compliance, especially with sensitive data like PII
Responsibilities
- Develop and maintain feature engineering pipelines using Databricks to support ML models effectively
- Integrate diverse data sources (e.g., clickstreams, user behaviour, demographic data) to create user behaviour features/profiles for complex ML tasks
- Design and implement ETL/ELT pipelines aligned with the bronze, silver, and gold layers of the medallion architecture
- Build data pipelines to support ML model training, calibration, and deployment, leveraging MLflow for experiment tracking and performance monitoring
- Design low-latency, production-ready data pipelines to support real-time and batch model inference
- Apply CI/CD principles for seamless pipeline deployment
- Ensure pipelines comply with security and regulatory standards, particularly for handling PII, and maintain metadata and master data across the data catalogue
- Work closely with ml scientists, ml engineers, and other stakeholders to align data transformation with business objectives
Benefits
Remote work
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Remote Jobs
π°$93k-$196k
πUnited States, Canada
π°$182k-$249k
πUnited States
πWorldwide
πMexico
π°$204k-$259k
πUnited States
πUnited States
πGermany
πGermany