Senior Data Engineer

closed
Wikimedia Foundation Logo

Wikimedia Foundation

πŸ’΅ $113k-$175k
πŸ“Remote - Worldwide

Summary

Join the Wikimedia Foundation's Data Platform team as a Senior Data Engineer and shape the future of how Wikimedia's vast data ecosystem serves internal teams and the global community. You will play a key role in unifying data systems, delivering scalable solutions, and supporting the open knowledge movement. This position requires 5+ years of data engineering experience, expertise in tools like Airflow, Kafka, Spark, and Hive, and advanced proficiency in Python and Java/Scala. You will design and build data pipelines, monitor data quality, support data governance, and contribute to the development of the shared data platform. The Wikimedia Foundation offers a competitive salary and benefits package, and is a remote-first organization.

Requirements

  • 5+ years of data engineering experience, with a significant portion focused on on-premise systems (e.g., Hadoop, HDFS)
  • Practical knowledge of engineering best practices with a strong emphasis on system robustness and maintainability
  • Hands-on experience in troubleshooting systems and pipelines for performance and scaling
  • Demonstrated consistency with tenure at companies (e.g., average of 2+ years, ideally including longer engagements)
  • Expertise in tools like Airflow, Kafka, Spark, and Hive
  • Advanced proficiency in Python and Java/Scala, with deep knowledge of one language and its ecosystem
  • Advanced working knowledge of SQL and experience with various database/query dialects (e.g., MariaDB, HiveQL, CassandraQL, Spark SQL, Presto)
  • Strong communication and collaboration skills to interact effectively within and across teams
  • Ability to produce clear, well-documented technical designs and articulate ideas to both technical and non-technical stakeholders

Responsibilities

  • Designing and Building Data Pipelines: Develop scalable, robust infrastructure and processes using tools such as Airflow, Spark, and Kafka
  • Monitoring and Alerting for Data Quality: Implement systems to detect and address potential data issues promptly
  • Supporting Data Governance and Lineage: Assist in designing and implementing solutions to track and manage data across pipelines
  • Data Platform Development: Contribute to the design and improvement of the shared data platform, enabling critical use cases such as product analytics, bot detection, and image classification
  • Enhancing Operational Excellence: Identify and implement improvements in system reliability, maintainability, and performance

Preferred Qualifications

  • Exposure to architectural/system design or technical leadership tasks
  • Experience in data governance, data lineage, and data quality initiatives
  • Familiarity with additional technologies such as Flink, Iceberg, Druid, Presto, Cassandra, Kubernetes, and Docker
  • Expertise in AI development tooling and AI applications in data engineering and analytics
  • Familiarity with stream processing frameworks like Spark Streaming or Flink

Benefits

  • The anticipated annual pay range of this position for applicants based within the United States is US$ 113,082 to US$ 175,725 with multiple individualized factors, including cost of living in the location, being the determinants of the offered pay
  • For applicants located outside of the US, the pay range will be adjusted to the country of hire
This job is filled or no longer available

Similar Remote Jobs