Data Engineering Intern

CALSTART Logo

CALSTART

πŸ’΅ $33k-$43k
πŸ“Remote - United States

Summary

Join CALSTART's mission-driven team as a part-time intern and contribute to building a data lake and advanced data pipelines for clean transportation insights. This 6-month internship (25 hours/week) focuses on designing a scalable data lake architecture using AWS, developing ETL pipelines, implementing data quality assurance processes, and creating interactive dashboards using tools like Power BI or Tableau. You will collaborate with the data engineering team, ensuring proper documentation of your work. The internship offers hands-on experience in cloud-based data engineering and data science, supporting CALSTART's efforts to accelerate clean transportation solutions. Compensation is $16-$21/hour. CALSTART values transparency and strives to provide as much information regarding compensation as possible.

Requirements

  • Proficiency in Python for data processing and analysis
  • Experience with data science workflows and tools
  • Knowledge of ETL processes and pipeline development
  • Familiarity with AWS services for cloud-based data infrastructure
  • Strong communication skills, both written and verbal
  • Collaborative team player with a proactive mindset

Responsibilities

  • Collaborate with the data engineering team to design and build a scalable data lake architecture using cloud platforms (AWS) and technologies like Amazon S3, RDS or EC2
  • Assist in building end-to-end ETL (Extract, Transform, Load) pipelines that pull data from various sources, process it, and store it in the data lake in an organized and efficient manner
  • Implement data cleansing, transformation, and validation processes to ensure data accuracy, completeness, and consistency before storing it in the data lake
  • Develop interactive dashboards and visualizations using tools like Power BI, Tableau, or open-source alternatives to present insights related to clean transportation, such as vehicle performance, infrastructure coverage, and funding distribution
  • Ensure proper documentation of the data lake architecture, pipeline processes, and visualization tools for knowledge transfer and future improvements

Preferred Qualifications

  • Bachelor/Master degrees in Math, Data Science, Statistics, Enginnering, Computer Science or related fields
  • Experience with Data science/ analytics
  • Proficiency in SQL and experience with relational databases such as MySQL, PostgreSQL, or Microsoft SQL Server
  • Some experience with the ETL pipeline will be an add
  • AWS experience

Benefits

  • 100% company paid comprehensive health benefits for Medical, Dental, Vision, Short Term Disability, Long Term Disability and Life Insurance
  • Retirement plan with generous company contributions
  • FSA for Health and Dependent Care
  • 3 weeks of vacation time in the first year of employment
  • 11 paid company holidays
  • Paid sick time
  • Paid family leave

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs