Senior Data Engineer

closed
People Data Labs Logo

People Data Labs

πŸ’΅ $190k-$220k
πŸ“Remote - United States

Summary

Join People Data Labs (PDL), a leading provider of people and company data, as a Data Engineer. PDL is seeking a highly skilled and experienced individual to build and maintain infrastructure for data ingestion, transformation, and loading, develop entity resolution frameworks, and create CI/CD pipelines. The ideal candidate will have a strong background in software development, Python, Apache Spark, SQL, and data pipeline orchestration. This role offers a high level of autonomy, opportunity for direct contributions, and the chance to be part of a team discovering the next frontier of data-as-a-service (DaaS).

Requirements

  • 5-7+ years of industry experience with clear examples of strategic technical problem-solving and implementation
  • Strong software development fundamentals
  • Experience with Python
  • Expertise with Apache Spark (Java, Scala, and/or Python-based)
  • Experience with SQL
  • Experience building scalable data processing systems (e.g., cleaning, transformation)Β  from the ground up
  • Experience using developer-oriented data pipeline and workflow orchestration (e.g., Airflow (preferred), dbt, dagster or similar)
  • Knowledge of modern data design and storage patterns (e.g., incremental updating, partitioning and segmentation, rebuilds and backfills)
  • Experience working in Databricks (including delta live tables, data lakehouse patterns, etc.)
  • Experience with cloud computing services (AWS (preferred), GCP, Azure or similar)
  • Experience with data warehousing (e.g., Databricks, Snowflake, Redshift, BigQuery, or similar)
  • Understanding of modern data storage formats and tools (e.g., parquet, ORC, Avro, Delta Lake)
  • Balance high ownership and autonomy with a strong ability to collaborate
  • Work effectively remotely (able to be proactive about managing blockers, proactive on reaching out and asking questions, and participating in team activities)
  • Demonstrate strong written communication skills on Slack/Chat and in documents
  • Exhibt experience in writing data design docs (pipeline design, dataflow, schema design)
  • Scope and breakdown projects, communicate and collaborate progress and blockers effectively with your manager, team, and stakeholders

Responsibilities

  • Build infrastructure for ingestion, transformation, and loading an exponentially increasing volume of data from a variety of sources using Spark, SQL, AWS, and Databricks
  • Building an organic entity resolution framework capable of correctly merging hundreds of billions of individual entities into a number of clean, consumable datasets
  • Developing CI/CD pipelines and anomaly detection systems capable of continuously improving the quality of data we're pushing into production
  • Dreaming up solutions to largely undefined data engineering and data science problems

Preferred Qualifications

  • Degree in a quantitative discipline such as computer science, mathematics, statistics, or engineering
  • Experience working with entity data (entity resolution / record linkage)
  • Experience working with data acquisition / data integration
  • Expertise with Python and the Python data stack (e.g., numpy, pandas)
  • Experience with streaming platforms (e.g., Kafka)
  • Experience evaluating data quality and maintaining consistently high data standards across new feature releases (e.g., consistency, accuracy, validity, completeness)

Benefits

  • Stock
  • Competitive Salaries
  • Unlimited paid time off
  • Medical, dental, & vision insurance
  • Health, fitness, and office stipends
  • The permanent ability to work wherever and however you want
This job is filled or no longer available

Similar Remote Jobs