Data Engineering Intern

Sayari Logo

Sayari

πŸ’΅ $41k-$52k
πŸ“Remote - United States

Summary

Join Sayari's Data Engineering team as an intern and contribute to the development and maintenance of data pipelines for Sayari Graph, our flagship product. You will work with product and software engineering teams to collect global data, maintain ETL pipelines, and develop new ones. The internship involves working with technologies like TypeScript, Kubernetes, Postgres, Cassandra, Elasticsearch, Memgraph, and Spark. This is a remote, paid internship requiring 20-30 hours per week. You will have the opportunity to contribute to our open-source projects and collaborate with a supportive team. The internship offers hands-on experience in data engineering and exposure to a large-scale data platform.

Requirements

  • Experience with Python and/or a JVM language (e.g., Scala)
  • Experience working collaboratively with git

Responsibilities

  • Write and deploy crawling scripts to collect source data from the web
  • Write and run data transformers in Scala Spark to standardize bulk data sets
  • Write and run modules in Python to parse entity references and relationships from source data
  • Diagnose and fix bugs reported by internal and external users
  • Analyze and report on internal datasets to answer questions and inform feature work
  • Work collaboratively on and across a team of engineers using basic agile principles
  • Give and receive feedback through code reviews

Preferred Qualifications

  • Experience with Apache Spark and Apache Airflow
  • Experience working on a cloud platform like GCP, AWS, or Azure
  • Understanding of or interest in knowledge graphs

Benefits

  • $20 - $25 an hour
  • Remote

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.