πFrance, Spain
Senior Data Engineer
closed
Wikimedia Foundation
π΅ $113k-$175k
πRemote - Worldwide
Summary
Join the Wikimedia Foundation's Data Platform team as a Senior Data Engineer and shape the future of how Wikimedia's vast data ecosystem serves internal teams and the global community. You will play a key role in unifying data systems, delivering scalable solutions, and supporting the open knowledge movement. This position requires 5+ years of data engineering experience, expertise in tools like Airflow, Kafka, Spark, and Hive, and advanced proficiency in Python and Java/Scala. You will design and build data pipelines, monitor data quality, support data governance, and contribute to the development of the shared data platform. The Wikimedia Foundation offers a competitive salary and benefits package, and is a remote-first organization.
Requirements
- 5+ years of data engineering experience, with a significant portion focused on on-premise systems (e.g., Hadoop, HDFS)
- Practical knowledge of engineering best practices with a strong emphasis on system robustness and maintainability
- Hands-on experience in troubleshooting systems and pipelines for performance and scaling
- Demonstrated consistency with tenure at companies (e.g., average of 2+ years, ideally including longer engagements)
- Expertise in tools like Airflow, Kafka, Spark, and Hive
- Advanced proficiency in Python and Java/Scala, with deep knowledge of one language and its ecosystem
- Advanced working knowledge of SQL and experience with various database/query dialects (e.g., MariaDB, HiveQL, CassandraQL, Spark SQL, Presto)
- Strong communication and collaboration skills to interact effectively within and across teams
- Ability to produce clear, well-documented technical designs and articulate ideas to both technical and non-technical stakeholders
Responsibilities
- Designing and Building Data Pipelines: Develop scalable, robust infrastructure and processes using tools such as Airflow, Spark, and Kafka
- Monitoring and Alerting for Data Quality: Implement systems to detect and address potential data issues promptly
- Supporting Data Governance and Lineage: Assist in designing and implementing solutions to track and manage data across pipelines
- Data Platform Development: Contribute to the design and improvement of the shared data platform, enabling critical use cases such as product analytics, bot detection, and image classification
- Enhancing Operational Excellence: Identify and implement improvements in system reliability, maintainability, and performance
Preferred Qualifications
- Exposure to architectural/system design or technical leadership tasks
- Experience in data governance, data lineage, and data quality initiatives
- Familiarity with additional technologies such as Flink, Iceberg, Druid, Presto, Cassandra, Kubernetes, and Docker
- Expertise in AI development tooling and AI applications in data engineering and analytics
- Familiarity with stream processing frameworks like Spark Streaming or Flink
Benefits
- The anticipated annual pay range of this position for applicants based within the United States is US$ 113,082 to US$ 175,725 with multiple individualized factors, including cost of living in the location, being the determinants of the offered pay
- For applicants located outside of the US, the pay range will be adjusted to the country of hire
This job is filled or no longer available
Similar Remote Jobs
π°$120k-$180k
πWorldwide
πBrazil
πLithuania
πIndia
π°$175k-$210k
πUnited States
π°$225k-$255k
πUnited States
π°$170k-$180k
πUnited States
πArgentina

πPoland