Data Scientist - Gen AI / QA at Zyte

Summary

Join Zyte's Data QA team as a Data Scientist and leverage your AI expertise to ensure the quality and usability of web-scraped data. You will work with large datasets, employing Python and the PyData stack to analyze, manipulate, and visualize data, uncovering quality issues. Collaborate with developers and stakeholders, presenting findings and conclusions effectively. This role requires proficiency in Python, experience with AI-based data quality verification, and strong communication skills. You'll contribute to enhancing data quality for Zyte's enterprise clients, using cutting-edge technologies and working remotely with a global team.

Requirements

Highly proficient in Python and the PyData stack. Minimum of 3 years (please provide code samples in your application - ideally pertaining to data analysis or Generative AI - via a link to GitHub or other publicly-accessible service)
BS degree in Computer Science, Engineering, Mathematics, Statistics or equivalent
Up to speed on the latest advances in Generative AI particularly as they pertain to process automation, web scraping/parsing, and data quality verification
Comfortable with Prompt Engineering and token/cost optimization
Familiar with abstraction layers (MCP, Marvin, Langchain etc)
Experience coding against the APIs of at least one of the Google, OpenAI, or Anthropic models
Experience in data quality visualization and the visualisation of data quality issues
Ability to work with very large datasets (into the millions of records)
Strong knowledge of software QA methodologies, tools, and processes
Excellent level of written and spoken English; confident communicator; able to communicate on both technical and non-technical levels with various stakeholders on all matters of QA
Outstanding attention to detail

Responsibilities

Understand customer web scraping and data requirements; map these requirements to custom AI-based data quality validation techniques, with a focus on achieving pre-established degrees of data quality and uncovering data quality issues
Draw conclusions about data quality by producing descriptive and evidence-based statistics, summaries, and visualisations
Supplement existing manual QA and schema validation techniques with AI-based data quality verification
Collaborate with developers to further troubleshoot and pinpoint solutions
Present findings and conclusions to stakeholders at various levels (other members of the QA department, developers, project managers, account managers, customers)
Write high-quality, well-structured code that is maintainable and extensible
Manage code using GitHub, BitBucket and other version control approaches as applicable

Preferred Qualifications

Prior experience in a Data QA role (where the focus was on verifying data quality, rather than testing application functionality)
Familiarity with Jupyter and JupyterLab
Experience building your own dashboards
Experience with Spark, BigQuery, and other big data technologies
Previous remote working experience

Benefits

Become part of a self-motivated, progressive, multi-cultural team
Have the freedom and flexibility to work from where you do your best work
Attend conferences and meet with team members from across the globe
Work with cutting-edge open source technologies and tools

Data Scientist - Gen AI / QA

Zyte

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Mid-level

Share this job:

Similar Remote Jobs

Remote

QA

Mid-level

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

All Others

Director

Natera

Remote

Product

Senior

Shift Technology

Remote

Software Development

Senior

Shift Technology

Remote

Software Development

Senior

Natera

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Mid-level