Senior Data Engineer

Wave HQ
Summary
Join Wave as a Senior Data Engineer and build tools and infrastructure to support Data Products, Insights & Innovation teams, and the business. Collaborate with various teams to develop data solutions, scale data infrastructure, and advance Wave's data-centric transformation. This role requires proven experience in complex product environments and strong communication skills. You will design, build, and deploy components of a modern data stack, including data ingestion, a data lake, and various pipelines. You will also maintain legacy systems, optimize data workflows, and resolve incidents to ensure high availability and reliability. This position offers the chance to grow and thrive by contributing to high-impact projects that empower insights and innovation.
Requirements
- Bring 6+ years of experience in building data pipelines and managing a secure, modern data stack
- Experience includes CDC streaming ingestion using tools like Meltano or similar for data ingestion workflows that support AI/ML workloads, and a curated data warehouse in Redshift, Snowflake, or DataBricks
- At least 3 years of experience working with AWS cloud infrastructure, including Kafka (MSK), Spark / AWS Glue, and infrastructure as code (IaC) using Terraform
- Write and review high-quality, maintainable code that enhances the reliability and scalability of our data platform
- Use Python, SQL, and dbt extensively, and be comfortable leveraging third-party frameworks to accelerate development
- Prior experience building data lakes on S3 using Apache Iceberg with Parquet, Avro, JSON, and CSV file formats
- Experience with Airflow or similar orchestration systems to build and manage multi-stage workflows that automate and orchestrate data processing pipelines
- Familiarity with data governance practices, including data quality, lineage, and privacy, as well as experience using cataloging tools to enhance discoverability and compliance
- Experience developing and deploying data pipeline solutions using CI/CD best practices to ensure reliability and scalability
- Working knowledge of tools such as Stitch and Segment CDP for integrating diverse data sources into a cohesive ecosystem
Responsibilities
- Design, build, and deploy components of a modern data stack, including CDC ingestion (using Meltano or similar tools), a centralized Iceberg data lake, and a variety of batch, incremental, and stream-based pipelines
- Help build and manage a fault-tolerant data platform that scales economically while balancing innovation with operational stability by maintaining legacy Python ELT scripts and accelerating the transition to dbt models in Redshift, Snowflake, or DataBricks
- Collaborate within a cross-functional team in planning and rolling out data infrastructure and processing pipelines that serve workloads across analytics, machine learning, and GenAI services
- Work with different teams across Wave and help them succeed by ensuring that their data, analytics, and AI insights are reliably delivered
- Thrive in ambiguous conditions by independently identifying opportunities to optimize pipelines and improve data workflows under tight deadlines
- Respond to alerts and proactively implement monitoring solutions to minimize future incidents, ensuring high availability and reliability of data systems
- Provide technical assistance, listen to and communicate with people to answer their concerns
- Assess existing systems, optimize data accessibility, and provide innovative solutions to help internal teams surface actionable insights that enhance external customer satisfaction
Preferred Qualifications
Knowledge and practical experience with Athena, Redshift, or Sagemaker Feature Store to support analytical and machine learning workflows
Share this job:
Similar Remote Jobs
