πFrance, Spain
Senior Data Engineer
closed
People Data Labs
π΅ $190k-$220k
πRemote - United States
Summary
Join People Data Labs (PDL), a leading provider of people and company data, as a Data Engineer. PDL is seeking a highly skilled and experienced individual to build and maintain infrastructure for data ingestion, transformation, and loading, develop entity resolution frameworks, and create CI/CD pipelines. The ideal candidate will have a strong background in software development, Python, Apache Spark, SQL, and data pipeline orchestration. This role offers a high level of autonomy, opportunity for direct contributions, and the chance to be part of a team discovering the next frontier of data-as-a-service (DaaS).
Requirements
- 5-7+ years of industry experience with clear examples of strategic technical problem-solving and implementation
- Strong software development fundamentals
- Experience with Python
- Expertise with Apache Spark (Java, Scala, and/or Python-based)
- Experience with SQL
- Experience building scalable data processing systems (e.g., cleaning, transformation)Β from the ground up
- Experience using developer-oriented data pipeline and workflow orchestration (e.g., Airflow (preferred), dbt, dagster or similar)
- Knowledge of modern data design and storage patterns (e.g., incremental updating, partitioning and segmentation, rebuilds and backfills)
- Experience working in Databricks (including delta live tables, data lakehouse patterns, etc.)
- Experience with cloud computing services (AWS (preferred), GCP, Azure or similar)
- Experience with data warehousing (e.g., Databricks, Snowflake, Redshift, BigQuery, or similar)
- Understanding of modern data storage formats and tools (e.g., parquet, ORC, Avro, Delta Lake)
- Balance high ownership and autonomy with a strong ability to collaborate
- Work effectively remotely (able to be proactive about managing blockers, proactive on reaching out and asking questions, and participating in team activities)
- Demonstrate strong written communication skills on Slack/Chat and in documents
- Exhibt experience in writing data design docs (pipeline design, dataflow, schema design)
- Scope and breakdown projects, communicate and collaborate progress and blockers effectively with your manager, team, and stakeholders
Responsibilities
- Build infrastructure for ingestion, transformation, and loading an exponentially increasing volume of data from a variety of sources using Spark, SQL, AWS, and Databricks
- Building an organic entity resolution framework capable of correctly merging hundreds of billions of individual entities into a number of clean, consumable datasets
- Developing CI/CD pipelines and anomaly detection systems capable of continuously improving the quality of data we're pushing into production
- Dreaming up solutions to largely undefined data engineering and data science problems
Preferred Qualifications
- Degree in a quantitative discipline such as computer science, mathematics, statistics, or engineering
- Experience working with entity data (entity resolution / record linkage)
- Experience working with data acquisition / data integration
- Expertise with Python and the Python data stack (e.g., numpy, pandas)
- Experience with streaming platforms (e.g., Kafka)
- Experience evaluating data quality and maintaining consistently high data standards across new feature releases (e.g., consistency, accuracy, validity, completeness)
Benefits
- Stock
- Competitive Salaries
- Unlimited paid time off
- Medical, dental, & vision insurance
- Health, fitness, and office stipends
- The permanent ability to work wherever and however you want
This job is filled or no longer available
Similar Remote Jobs
πBrazil
πPoland
πBrazil
π°$191k-$223k
πUnited States
π°$138k-$254k
πWorldwide

π°$120k-$175k
πUnited States
π°$170k-$180k
πUnited States
πUkraine
πUnited States







