Data Scientist

Hakkoda
Summary
Join Hakkoda, an IBM Company, as a Data Scientist specializing in Machine Learning, PySpark, and Databricks. You will build, optimize, and deploy large-scale machine learning models for long-range demand and sales forecasting within an automotive OEM's next-generation Intelligent Forecast Application. This crucial role involves leveraging distributed computing frameworks, particularly PySpark on the Databricks platform, to support strategic decision-making across the automotive value chain. The ideal candidate possesses hands-on experience with ML model development and deployment for forecasting, especially long-range predictions, in a production environment using PySpark and Databricks. Strong technical skills in machine learning, big data processing, and time series forecasting are essential, along with effective teamwork. This position offers the opportunity to contribute directly to a rapidly growing company and make a significant impact.
Requirements
- Bachelor's or Master's degree in Data Science, Computer Science, Statistics, Applied Mathematics, or a closely related quantitative field
- 2 to 5 years of hands-on experience in a Data Scientist or Machine Learning Engineer role
- Proven experience developing and deploying machine learning models in a production environment
- Demonstrated experience in long-range demand and sales forecasting
- Significant hands-on experience with PySpark for large-scale data processing and machine learning
- Extensive practical experience working with the Databricks platform, including notebooks, jobs, and ML capabilities
- Expert proficiency in PySpark
- Expert proficiency in the Databricks platform
- Strong proficiency in Python and SQL
- Experience with machine learning libraries compatible with PySpark (e.g., MLlib, or integrating other libraries)
- Experience with advanced time series forecasting techniques and their implementation
- Experience with distributed computing concepts and optimization techniques relevant to PySpark
- Hands-on experience with a major cloud provider (Azure, AWS, or GCP) in the context of using Databricks
- Familiarity with MLOps concepts and tools used in a Databricks environment
- Experience with data visualization tools
- Analytical skills with a deep understanding of machine learning algorithms and their application to forecasting
- Ability to troubleshoot and solve complex technical problems related to big data and machine learning workflows
Responsibilities
- Design, develop, and implement scalable and accurate machine learning models specifically for long-range demand and sales forecasting challenges
- Apply advanced time series analysis techniques and integrate them with machine learning models leveraging PySpark for data processing and model training on large datasets within the Databricks environment
- Implement probabilistic forecasting methods using PySpark to capture uncertainty in long-range predictions
- Develop robust solutions for hierarchical and grouped long-range forecasting on distributed data
- Build and optimize large-scale data pipelines for ingesting, cleaning, transforming, and engineering features relevant to long-range forecasting from diverse, complex automotive datasets using PySpark on Databricks
- Develop and implement robust code for model training, inference, and deployment of long-range forecasting models directly within the Databricks platform
- Apply MLOps principles compatible with Databricks workflows for model versioning, monitoring, retraining, and managing the lifecycle of long-range ML forecasting models in production
- Collaborate with Data Engineering and IT Operations to ensure seamless deployment and operational efficiency of the forecasting application on Databricks
- Evaluate long-range forecasting model performance using relevant metrics (e.g., MAE, RMSE, MAPE, considering metrics suitable for longer horizons) and optimize models and data processing pipelines for improved accuracy and efficiency within the PySpark/Databricks ecosystem
- Work effectively as part of a technical team, collaborating with other data scientists, data engineers, and software developers to integrate ML long-range forecasting solutions into the broader forecasting application built on Databricks
- Communicate technical details and forecasting results effectively within the technical team
Preferred Qualifications
- Experience with specific long-range forecasting methodologies and libraries used in a distributed environment
- Experience with real-time or streaming data processing using PySpark for near-term forecasting components that might complement long-range models
- Familiarity with automotive data types relevant to long-range forecasting (e.g., economic indicators affecting car sales, long-term market trends)
- Experience with distributed version control systems (e.g., Git)
- Knowledge of agile development methodologies
- Ability to work effectively as part of a technical team
- Clear and concise communication of technical details and forecasting results
- Ability to tackle complex technical challenges and find efficient solutions
- Eagerness to learn and adapt to new technologies and methodologies within the PySpark/Databricks ecosystem and advancements in long-range forecasting
- Ability to understand business needs related to long-term planning
Benefits
- Medical, Dental, Vision
- Life Insurance
- Paid parental leave
- Flexible PTO Options
- Company Bonus Program
- Work from home benefits
- Technical training and certifications
- Robust learning and development opportunities
- Trip to Costa Rica
Share this job:
Similar Remote Jobs

