Research Engineer - Data

Leonardo.Ai Logo

Leonardo.Ai

πŸ“Remote - United Kingdom

Summary

Join Leonardo.Ai, a Canva company, and become a Research Engineer – Data, architecting and managing petascale data pipelines for world-class AI models. You will collaborate with researchers to create and curate large, multi-modal datasets, including synthetic data. Your expertise in distributed systems and data processing will be crucial. Responsibilities include data acquisition and curation, developing high-performance data pipelines, generating synthetic data, conducting experiments, ensuring data security and compliance, and contributing to open-source projects. The role offers a flexible work environment and opportunities for professional growth within a diverse and inclusive culture.

Requirements

  • Have hands-on experience with images, videos, 3D geometry (mesh/solid modeling), and/or text data
  • Have well-rounded expertise in Python and PyTorch
  • Demonstrate proficiency in setting up large-scale, robust data pipelines, using frameworks like Spark, Ray, or Metaflow
  • Be comfortable with model versioning, and experiment tracking
  • Have a good understanding of parallel and distributed computing
  • Be experienced with setting up evaluation methods
  • Have experience with AWS, Azure, or other cloud platforms
  • Be proficient in both relational (MySQL, PostgreSQL) and NoSQL (MongoDB, Cassandra) databases, plus vector data stores

Responsibilities

  • Lead the ingestion, unification, and organization of large, unstructured data sources (e.g., text, images, 3D geometry, code snippets) into scalable, high-quality datasets suitable for machine learning research and production
  • Develop and optimize distributed systems for data processing, including filtering, indexing, and retrieval, leveraging frameworks like Ray, Metaflow, Spark, or Hadoop
  • Build and orchestrate pipelines to generate synthetic data at scale, advancing research on cost-efficient inference and training strategies
  • Design and conduct experiments on dataset quality, scalability, and performance
  • Collaborate with legal and safety teams to ensure all data usage respects privacy, security, and ethical standards
  • Contribute to internal and external libraries or frameworks, sharing insights and breakthroughs with the wider AI community through publications or technical blogs

Preferred Qualifications

Have a passion for synthetic data generation making use of inference of pretrained models, 3D rendering engines, and/or other softwares

Benefits

  • Flexible Work Environment: We understand the importance of work-life balance. Thrive personally and professionally with the option to work remotely or in our vibrant offices
  • Empowering Growth: We invest in your development with continuous learning opportunities and clear pathways for career advancement tailored to your goals

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs