Data Scientist

Everstream Analytics Logo

Everstream Analytics

📍Remote - United States

Summary

Join Everstream Analytics' Natural Language Processing (NLP) and Generative AI Data Science team as a Data Science Intern. This internship offers hands-on experience collecting and working with real-world data from online sources. You will develop and maintain scripts for data collection, clean and organize data, store it in structured formats, collaborate with data scientists, and document processes. The ideal candidate is a driven student pursuing a related degree with Python, web scraping, and data handling skills. This fully remote internship provides valuable experience with modern tools and techniques, collaboration opportunities, and portfolio-building projects.

Requirements

  • Pursuing a degree in Computer Science, Data Science, Information Technology, or a related field
  • Familiarity with Python and libraries such as BeautifulSoup, Scrapy, or Selenium for data collection tasks
  • Understanding of HTML, CSS, and JavaScript to navigate and parse web content effectively
  • Basic knowledge of data storage formats and databases (e.g., CSV, JSON, SQL)
  • Strong problem-solving skills and attention to detail
  • Excellent communication skills, both written and verbal

Responsibilities

  • Develop and maintain scripts to automate the collection of publicly available data from online sources, ensuring compliance with each website's terms of service and robots.txt directives
  • Clean, validate, and organize collected data to ensure accuracy and usability for downstream tasks
  • Store extracted data in structured formats such as CSV, JSON, or databases, ensuring efficient retrieval and analysis
  • Work closely with data scientists and analysts to understand data requirements and ensure legal compliance
  • Document data collection processes, data dictionaries, and any challenges encountered to facilitate knowledge sharing and future maintenance

Preferred Qualifications

  • Familiarity with AI-powered data collection tools (e.g. Firecrawl)
  • Familiarity with web concepts such as sitemaps, robots.txt, and RSS feeds
  • Experience with data visualization tools or libraries (e.g., Matplotlib, Seaborn)
  • Familiarity with version control systems like Git
  • Understanding of ethical considerations and legal guidelines related to data collection
  • Ability to work independently and manage time effectively in a remote or hybrid work environment

Benefits

  • This isn’t just another internship — it’s a chance to work on real data projects that directly support our NLP and generative AI initiatives
  • You’ll gain hands-on experience with modern tools and techniques used in industry, collaborate with a talented and supportive team of data professionals, and build a portfolio that goes well beyond classroom assignments
  • Whether you're passionate about ethical data practices, fascinated by AI, or eager to level up your Python and web scraping skills, this role offers meaningful exposure and flexibility — all within a fully remote work environment designed with students in mind

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.