Research Engineer - Data at Leonardo.Ai

Summary

Join Leonardo.Ai, a Canva company, and become a Research Engineer – Data, architecting and managing petascale data pipelines for world-class AI models. You will collaborate with researchers to create and curate large, multi-modal datasets, including synthetic data. Your expertise in distributed systems and data processing will be crucial. Responsibilities include data acquisition and curation, developing high-performance data pipelines, generating synthetic data, conducting experiments, ensuring data security and compliance, and contributing to open-source projects. The role offers a flexible work environment and opportunities for professional growth within a diverse and inclusive culture.

Requirements

Have hands-on experience with images, videos, 3D geometry (mesh/solid modeling), and/or text data
Have well-rounded expertise in Python and PyTorch
Demonstrate proficiency in setting up large-scale, robust data pipelines, using frameworks like Spark, Ray, or Metaflow
Be comfortable with model versioning, and experiment tracking
Have a good understanding of parallel and distributed computing
Be experienced with setting up evaluation methods
Have experience with AWS, Azure, or other cloud platforms
Be proficient in both relational (MySQL, PostgreSQL) and NoSQL (MongoDB, Cassandra) databases, plus vector data stores

Responsibilities

Lead the ingestion, unification, and organization of large, unstructured data sources (e.g., text, images, 3D geometry, code snippets) into scalable, high-quality datasets suitable for machine learning research and production
Develop and optimize distributed systems for data processing, including filtering, indexing, and retrieval, leveraging frameworks like Ray, Metaflow, Spark, or Hadoop
Build and orchestrate pipelines to generate synthetic data at scale, advancing research on cost-efficient inference and training strategies
Design and conduct experiments on dataset quality, scalability, and performance
Collaborate with legal and safety teams to ensure all data usage respects privacy, security, and ethical standards
Contribute to internal and external libraries or frameworks, sharing insights and breakthroughs with the wider AI community through publications or technical blogs

Preferred Qualifications

Have a passion for synthetic data generation making use of inference of pretrained models, 3D rendering engines, and/or other softwares

Benefits

Flexible Work Environment: We understand the importance of work-life balance. Thrive personally and professionally with the option to work remotely or in our vibrant offices
Empowering Growth: We invest in your development with continuous learning opportunities and clear pathways for career advancement tailored to your goals

Research Engineer - Data

Leonardo.Ai

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Mid-level

Share this job:

Similar Remote Jobs

Wallarm. API & App Security Integrated

Remote

Cybersecurity

Mid-level

Waabi

Remote

Software Development

Mid-level

Centric Software

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Remote

All Others

Mid-level

Remote

All Others

Mid-level

Remote

Cybersecurity

Mid-level

Remote

Cybersecurity

Mid-level

United States Department of Defense

Remote

QA

Senior

Remote

Cybersecurity

Mid-level