Senior Data Engineer at Wikimedia Foundation

Summary

Join the Wikimedia Foundation's Data Platform team as a Senior Data Engineer and shape the future of how Wikimedia's vast data ecosystem serves internal teams and the global community. You will help unify data systems, deliver scalable solutions, and support the open knowledge movement. This role requires 5+ years of data engineering experience, expertise in tools like Airflow, Kafka, Spark, and Hive, and advanced proficiency in Python and Java/Scala. You will design and build data pipelines, monitor data quality, support data governance, and contribute to the shared data platform. The Wikimedia Foundation is a remote-first organization offering competitive salaries and benefits.

Requirements

5+ years of data engineering experience, with a significant portion focused on on-premise systems (e.g., Hadoop, HDFS)
Practical knowledge of engineering best practices with a strong emphasis on system robustness and maintainability
Hands-on experience in troubleshooting systems and pipelines for performance and scaling
Demonstrated consistency with tenure at companies (e.g., average of 2+ years, ideally including longer engagements)
Expertise in tools like Airflow, Kafka, Spark, and Hive
Advanced proficiency in Python and Java/Scala, with deep knowledge of one language and its ecosystem
Advanced working knowledge of SQL and experience with various database/query dialects (e.g., MariaDB, HiveQL, CassandraQL, Spark SQL, Presto)
Strong communication and collaboration skills to interact effectively within and across teams
Ability to produce clear, well-documented technical designs and articulate ideas to both technical and non-technical stakeholders

Responsibilities

Designing and Building Data Pipelines: Develop scalable, robust infrastructure and processes using tools such as Airflow, Spark, and Kafka
Monitoring and Alerting for Data Quality: Implement systems to detect and address potential data issues promptly
Supporting Data Governance and Lineage: Assist in designing and implementing solutions to track and manage data across pipelines
Data Platform Development: Contribute to the design and improvement of the shared data platform, enabling critical use cases such as product analytics, bot detection, and image classification
Enhancing Operational Excellence: Identify and implement improvements in system reliability, maintainability, and performance

Preferred Qualifications

Exposure to architectural/system design or technical leadership tasks
Experience in data governance, data lineage, and data quality initiatives
Familiarity with additional technologies such as Flink, Iceberg, Druid, Presto, Cassandra, Kubernetes, and Docker
Expertise in AI development tooling and AI applications in data engineering and analytics
Familiarity with stream processing frameworks like Spark Streaming or Flink

Benefits

Salaries at the Wikimedia Foundation are set in a way that is competitive, equitable, and consistent with our values and culture
The anticipated annual pay range of this position for applicants based within the United States is US$ 113,082 to US$ 175,725 with multiple individualized factors, including cost of living in the location, being the determinants of the offered pay
For applicants located outside of the US, the pay range will be adjusted to the country of hire
We neither ask for nor take into consideration the salary history of applicants

Senior Data Engineer

Wikimedia Foundation

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Senior

Share this job:

Similar Remote Jobs

Remote

Data

Senior

Remote

Data

Senior

Netskope

Remote

Data

Senior

Netskope

Remote

Data

Senior

Remote

Data

Senior

Included Health

Remote

Software Development

Senior

United States Department of Defense

Remote

Data

Senior

Wealth

Remote

Data

Senior

LoopMe

Remote

Data

Senior

CoEnterprise

Remote

Sales

Mid-level