Principal Data Engineer

Movable Ink Logo

Movable Ink

πŸ“Remote - Canada

Summary

Join Movable Ink as a Principal Data Engineer and help shape our Data Warehouse and Hybrid Data Lake Infrastructure. You will collaborate with various teams, enabling data-driven decisions. This pivotal role involves owning the infrastructure and code for data pipelines, handling data at scale. Responsibilities include designing, implementing, and optimizing ingestion pipelines, ensuring data accuracy and integrity. You will also mentor junior team members and ensure compliance with regulatory requirements. The ideal candidate possesses extensive experience in data engineering, cloud-based data warehouses, and various data pipeline technologies.

Requirements

  • 12+ years of professional experience in data engineering, software engineering, database administration, business intelligence, or related fields, with 8+ years as a Data Engineer focused on cloud-based Data Warehouses(Redshift, Snowflake, Firebolt, BigQuery)
  • Deep experience working with multi-petabyte, mission-critical databases, optimizing for high availability, performance, and reliability, informed by a strong understanding of database internals
  • Expert proficiency with Python and SQL, and significant experience building robust data pipelines with these languages
  • Expert proficiency in deploying and managing data pipeline orchestration frameworks such as Apache Airflow or Prefect
  • Significant experience with Infrastructure-as-Code (Terraform) and automating cloud infrastructure management
  • Significant experience with stream processing technologies such as Apache Flink, Apache Kafka, or Apache Pulsar
  • Significant experience in building telemetry, monitoring, and alerting solutions for large-scale data pipelines
  • Significant experience in implementing Hybrid Data Lake / Data Warehouse architectures, with a focus on Apache Iceberg or similar technologies
  • Significant experience in designing and implementing solutions that comply with regulatory requirements such as GDPR and CCPA
  • Experience in Agile/Scrum environments, working with technical managers and product owners to break down high-level requirements into actionable work
  • Excellent communication skills, with the ability to effectively collaborate across technical and business teams

Responsibilities

  • Partner with internal operations teams to identify, collect, and integrate data from various business systems, ensuring comprehensive and accurate data capture
  • Design, implement, and maintain robust batch and real-time data pipelines, leveraging tools like Apache Airflow, Apache Flink, and Terraform for IaC
  • Build and optimize Hybrid Data Lake / Data Warehouse infrastructure with solutions like Apache Iceberg for scalable and cost-effective storage
  • Ensure data pipelines adhere to best practices and are optimized for performance, scalability, and reliability
  • Conduct thorough testing of data pipelines to validate data accuracy and integrity
  • Monitor data pipelines, implement telemetry and alerting, troubleshoot any issues that arise, and proactively improve system reliability
  • Establish and track SLAs for data processing and delivery, ensuring timely and reliable access to data for all users
  • Become a mentor for less experienced team members and establish patterns and practices that increase the quality, accuracy, and efficiency of solutions produced by the team
  • Design and implement Change Data Capture (CDC) solutions to support real-time data replication and point-in-time data queries
  • Work with other teams to ensure secure data access and compliance with regulatory requirements (e.g., GDPR, CCPA, etc.)

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs