Senior Data Engineer at Wikimedia Foundation

Summary

Join the Wikimedia Foundation's Data Platform team as a Senior Data Engineer and shape the future of how Wikimedia's vast data ecosystem serves internal teams and the global community. You will play a key role in unifying data systems, delivering scalable solutions, and supporting the open knowledge movement. This position requires 5+ years of data engineering experience, expertise in tools like Airflow, Kafka, Spark, and Hive, and advanced proficiency in Python and Java/Scala. You will design and build data pipelines, monitor data quality, support data governance, and contribute to the development of the shared data platform. The Wikimedia Foundation offers a competitive salary and benefits package, and is a remote-first organization.

Requirements

5+ years of data engineering experience, with a significant portion focused on on-premise systems (e.g., Hadoop, HDFS)
Practical knowledge of engineering best practices with a strong emphasis on system robustness and maintainability
Hands-on experience in troubleshooting systems and pipelines for performance and scaling
Demonstrated consistency with tenure at companies (e.g., average of 2+ years, ideally including longer engagements)
Expertise in tools like Airflow, Kafka, Spark, and Hive
Advanced proficiency in Python and Java/Scala, with deep knowledge of one language and its ecosystem
Advanced working knowledge of SQL and experience with various database/query dialects (e.g., MariaDB, HiveQL, CassandraQL, Spark SQL, Presto)
Strong communication and collaboration skills to interact effectively within and across teams
Ability to produce clear, well-documented technical designs and articulate ideas to both technical and non-technical stakeholders

Responsibilities

Designing and Building Data Pipelines: Develop scalable, robust infrastructure and processes using tools such as Airflow, Spark, and Kafka
Monitoring and Alerting for Data Quality: Implement systems to detect and address potential data issues promptly
Supporting Data Governance and Lineage: Assist in designing and implementing solutions to track and manage data across pipelines
Data Platform Development: Contribute to the design and improvement of the shared data platform, enabling critical use cases such as product analytics, bot detection, and image classification
Enhancing Operational Excellence: Identify and implement improvements in system reliability, maintainability, and performance

Preferred Qualifications

Exposure to architectural/system design or technical leadership tasks
Experience in data governance, data lineage, and data quality initiatives
Familiarity with additional technologies such as Flink, Iceberg, Druid, Presto, Cassandra, Kubernetes, and Docker
Expertise in AI development tooling and AI applications in data engineering and analytics
Familiarity with stream processing frameworks like Spark Streaming or Flink

Benefits

The anticipated annual pay range of this position for applicants based within the United States is US$ 113,082 to US$ 175,725 with multiple individualized factors, including cost of living in the location, being the determinants of the offered pay
For applicants located outside of the US, the pay range will be adjusted to the country of hire

Senior Data Engineer

Wikimedia Foundation

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Senior

Similar Remote Jobs

Remote

Data

Senior

Remote

Data

Senior

Netskope

Remote

Data

Senior

TeleSoftas

Remote

Data

Senior

Remote

Data

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Wealth

Remote

Data

Senior

Remote

Data

Mid-level

method products pbc

Remote

Data

Senior