Senior Data Engineer

Mercedes-Benz.io
Summary
Join Mercedes-Benz.io's OneDNA team as a Data Engineer and contribute to building a cutting-edge data platform for KPI reporting, data analysis, and data science. You will design, build, and optimize data pipelines, develop data models, and leverage Azure cloud services. Collaboration with data analysts, data scientists, and stakeholders is crucial. The role requires strong software engineering skills, proficiency in PySpark and SQL, and expertise in Azure cloud services. Mercedes-Benz.io offers a flexible work environment, with options for office-based work in Lisbon or Braga, or remote work from anywhere in Portugal. The company fosters a collaborative and open-minded culture with various benefits.
Requirements
- 4+ years of hands-on software engineering experience, developing and maintaining data pipelines
- Strong experience with software engineering best practices (OOP, TDD, CI/CD, version control)
- Proficiency with PySpark and SQL, including the ability to optimize queries and understand Spark internals
- Proficient in Python for data processing and automation tasks
- Strong expertise with Azure cloud services, including Databricks, Data Factory, DevOps, and Datalake Storage
- Solid experience in dimensional data modelling and data warehousing principles
- Knowledge of Kubernetes and Docker for containerized applications
- Strong problem-solving and analytical thinking
- Excellent communication skills to interact with stakeholders and team members effectively
- Ability to work collaboratively in a fast-paced, team-oriented environment
Responsibilities
- Develop & Maintain Data Pipelines: Design, build, and optimize scalable and reliable data pipelines for cleaning, integrating, and transforming large datasets
- Infrastructure Development: Create and maintain data processing infrastructure to ensure high-quality data supply for digital analysts and data scientists
- Cloud Integration: Leverage Azure cloud services, including Data Factory, Databricks, and Datalake Storage, to deploy robust data solutions
- Data Modelling: Develop dimensional data models and maintain data warehousing solutions to support analytics and reporting needs
- Performance Optimization: Write efficient, maintainable, and well-tested Python and SQL code, ensuring optimal performance for large-scale datasets
- Spark Expertise: Work with Apache Spark to write and optimize PySpark code and SQL queries, understanding Spark internals for maximum efficiency
- Collaboration: Partner with data analysts, data scientists, and stakeholders to understand data requirements, ensure quality, and support data-driven decision-making
- Governance & Quality: Ensure data governance, monitor data quality, and implement best practices for data management
- Deployment & Monitoring: Support deployment and monitoring of machine learning pipelines and data solutions in production environments
Preferred Qualifications
Experience with Google Analytics data is a plus
Benefits
- Health insurance for you and your family
- Life insurance
- Proactive self-development in international Trainings and Conferences
- Language Training courses
- Wellbeing actions (massages, nutrition sessions, happy hour and more)
- Brand Connection Perks
- IPhone, MacBook Pro or Dell (your choice) and noise-cancelling headphones