Big Data Engineer

MGID Logo

MGID

๐Ÿ“Remote - Worldwide

Summary

Join MGID, a leading native advertising company, as a Big Data Engineer. You will collaborate with data scientists and other stakeholders to develop and optimize PySpark applications for processing large datasets. Your responsibilities include designing data pipelines, building dashboards, and ensuring data accuracy. This role requires proven experience with PySpark, distributed computing, and large datasets. MGID offers a results-driven culture, support, and flexibility to help you thrive.

Requirements

  • Proven experience in developing and optimizing PySpark applications
  • Strong knowledge of distributed computing principles and concepts
  • Practical experience working with large datasets using technologies such as Hadoop, Spark, ClickHouse
  • Proficiency in programming languages such as Python, SQL
  • Experience with Linux/Unix command-line interface
  • Familiarity with data visualization and dashboarding tools
  • Strong communication skills and ability to work effectively in a remote team environment
  • Excellent problem-solving skills and attention to detail

Responsibilities

  • Collaborate with Data Scientists, Data Analysts, and other stakeholders to understand data needs and develop solutions
  • Design, develop, and optimize PySpark applications for processing and analyzing large sets of structured and unstructured data
  • Monitor and evaluate data to ensure accuracy and integrity, troubleshoot and debug PySpark code
  • Build and maintain data pipelines for ingesting, processing, and storing data, optimizing for performance and scalability
  • Develop and maintain data visualization dashboards and reports to enable insights and decision-making
  • Create and maintain tools and libraries for efficient data processing
  • Stay up-to-date with industry trends and new technologies to continuously improve data processing capabilities

Preferred Qualifications

  • Bachelorโ€™s or Masterโ€™s degree in Computer Science or a related field
  • Practical experience with ClickHouse
  • Practical experience with stream processing and messaging systems such as Kafka
  • Practical experience with NoSQL databases (for example MongoDB), especially Aerospike
  • Knowledge of AdTech domain โ€” understanding of online advertising, RTB
  • Familiarity with containerization technologies such as Docker and Kubernetes, cloud computing platforms
  • Familiarity with data governance and security best practices
  • Knowledge of machine learning concepts and frameworks

Benefits

  • Supported
  • Connected
  • Flexibility

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.