Senior Data Engineer

Wikimedia Foundation Logo

Wikimedia Foundation

πŸ’΅ $113k-$175k
πŸ“Remote - Worldwide

Summary

Join the Wikimedia Foundation's Data Platform team as a Senior Data Engineer and shape the future of how Wikimedia's vast data ecosystem serves internal teams and the global community. You will play a key role in unifying data systems, delivering scalable solutions, and supporting the open knowledge movement. This position requires 5+ years of data engineering experience, expertise in tools like Airflow, Kafka, Spark, and Hive, and advanced proficiency in Python and Java/Scala. You will design and build data pipelines, monitor data quality, support data governance, and contribute to the development of the shared data platform. The Wikimedia Foundation offers a competitive salary and benefits package, and is a remote-first organization.

Requirements

  • 5+ years of data engineering experience, with a significant portion focused on on-premise systems (e.g., Hadoop, HDFS)
  • Practical knowledge of engineering best practices with a strong emphasis on system robustness and maintainability
  • Hands-on experience in troubleshooting systems and pipelines for performance and scaling
  • Demonstrated consistency with tenure at companies (e.g., average of 2+ years, ideally including longer engagements)
  • Expertise in tools like Airflow, Kafka, Spark, and Hive
  • Advanced proficiency in Python and Java/Scala, with deep knowledge of one language and its ecosystem
  • Advanced working knowledge of SQL and experience with various database/query dialects (e.g., MariaDB, HiveQL, CassandraQL, Spark SQL, Presto)
  • Strong communication and collaboration skills to interact effectively within and across teams
  • Ability to produce clear, well-documented technical designs and articulate ideas to both technical and non-technical stakeholders

Responsibilities

  • Designing and Building Data Pipelines: Develop scalable, robust infrastructure and processes using tools such as Airflow, Spark, and Kafka
  • Monitoring and Alerting for Data Quality: Implement systems to detect and address potential data issues promptly
  • Supporting Data Governance and Lineage: Assist in designing and implementing solutions to track and manage data across pipelines
  • Data Platform Development: Contribute to the design and improvement of the shared data platform, enabling critical use cases such as product analytics, bot detection, and image classification
  • Enhancing Operational Excellence: Identify and implement improvements in system reliability, maintainability, and performance

Preferred Qualifications

  • Exposure to architectural/system design or technical leadership tasks
  • Experience in data governance, data lineage, and data quality initiatives
  • Familiarity with additional technologies such as Flink, Iceberg, Druid, Presto, Cassandra, Kubernetes, and Docker
  • Expertise in AI development tooling and AI applications in data engineering and analytics
  • Familiarity with stream processing frameworks like Spark Streaming or Flink

Benefits

  • The anticipated annual pay range of this position for applicants based within the United States is US$ 113,082 to US$ 175,725 with multiple individualized factors, including cost of living in the location, being the determinants of the offered pay
  • For applicants located outside of the US, the pay range will be adjusted to the country of hire

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.