Data Scientist

Everstream Analytics Logo

Everstream Analytics

πŸ“Remote

Summary

Join our Natural Language Processing (NLP) and Generative AI Data Science team as a Data Science Intern. You will gain hands-on experience collecting and working with real-world, publicly available data from online sources, including news outlets and company websites. This role involves writing Python scripts, diving into web structures, and building clean, usable datasets for high-impact AI models. The internship offers a chance to work on real data projects supporting NLP and generative AI initiatives, collaborate with a talented team, and build a portfolio. It provides meaningful exposure and flexibility in a fully remote work environment. This is an opportunity to work with modern tools and techniques used in the industry and level up your Python and web scraping skills.

Requirements

  • Pursue a degree in Computer Science, Data Science, Information Technology, or a related field
  • Demonstrate familiarity with Python and libraries such as BeautifulSoup, Scrapy, or Selenium for data collection tasks
  • Show understanding of HTML, CSS, and JavaScript to navigate and parse web content effectively
  • Possess basic knowledge of data storage formats and databases (e.g., CSV, JSON, SQL)
  • Possess strong problem-solving skills and attention to detail
  • Demonstrate excellent communication skills, both written and verbal

Responsibilities

  • Develop and maintain scripts to automate the collection of publicly available data from online sources, ensuring compliance with each website's terms of service and robots.txt directives
  • Clean, validate, and organize collected data to ensure accuracy and usability for downstream tasks
  • Store extracted data in structured formats such as CSV, JSON, or databases, ensuring efficient retrieval and analysis
  • Work closely with data scientists and analysts to understand data requirements and ensure legal compliance
  • Document data collection processes, data dictionaries, and any challenges encountered to facilitate knowledge sharing and future maintenance

Preferred Qualifications

  • Demonstrate familiarity with AI-powered data collection tools (e.g., Firecrawl)
  • Demonstrate familiarity with web concepts such as sitemaps, robots.txt, and RSS feeds
  • Demonstrate experience with data visualization tools or libraries (e.g., Matplotlib, Seaborn)
  • Demonstrate familiarity with version control systems like Git
  • Show understanding of ethical considerations and legal guidelines related to data collection
  • Demonstrate ability to work independently and manage time effectively in a remote or hybrid work environment

Benefits

Fully remote work environment

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs