Senior Machine Learning Infra Engineer
Waabi
π΅ $158k-$269k
πRemote - Canada, United States
Please let Waabi know you found this job on JobsCollider. Thanks! π
Job highlights
Summary
Join Waabi, a leading AI company revolutionizing self-driving technology, and contribute to building the next generation of safe and scalable autonomous vehicles. Work alongside a team of expert engineers and researchers using an AI-first approach. Collaborate with cross-functional teams to design and implement robust cloud infrastructure for training and simulation workloads. Develop and implement cloud strategies, ensuring optimal performance and reliability. Waabi offers a competitive salary, comprehensive benefits, and a dynamic work environment. We are seeking highly skilled and passionate individuals to join our growing team and make a significant impact on the future of transportation.
Requirements
- BS, MS/PhD in Computer Science or similar technical field of study or equivalent practical experience
- 5+ years of relevant industry experience
- Experience in reading and developing production quality software
- Deep understanding of Cloud compute and data storage for distributed training and inference workloads
- Familiarity with Python, GO, Rust or C++ ecosystems
- Experience working with public cloud platforms (AWS preferred)
- Experience with infrastructure as code systems (Terraform preferred)
- Experience in job scheduling and resource allocation
- Experience with containers and container orchestration (i.e., Docker, ECS, Kubernetes)
- Experience and high level of comfort working with Linux systems
- Experience with building platform services that enable other teams to do their best work
- Open-minded and collaborative team player with the willingness to help others
- Passionate about self-driving technologies, solving hard problems, and creating innovative solutions
- Experience working in an Agile/Scrum environment
Responsibilities
- Work alongside a team of multidisciplinary Engineers and Research Scientists using an AI-first approach to enable safe self-driving at scale
- Collaborate with cross-functional teams in the company to understand the growing need and pain points in cloud usage
- Propose cloud strategies around compute and data usages for training and simulation workloads
- Design and implement scalable and resilient cloud infrastructure optimized for long term reliability and adaptability
- Devise and promote best practices for cloud usages in training and simulation environments, oversee cloud strategies and usages across the whole company
Preferred Qualifications
- Experience with on-premise servers, network equipment and scale-out storage systems
- Experience with CI/CD pipelines and release management
- Experience in common ML tools, workflows and frameworks (i.e. systems like Kubeflow or MLFlow)
- Understand system performance tuning at software, hardware, and network levels
- Have good understanding of GPUs and accelerators in ML training and inference use cases
Benefits
- Competitive compensation and equity awards
- Health and Wellness benefits encompassing Medical, Dental and Vision coverage (for full-time employees only)
- Unlimited Vacation
- Flexible hours and Work from Home support
- Daily drinks, snacks and catered meals (when in office)
- Regularly scheduled team building activities and social events both on-site, off-site & virtually
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Remote Jobs
- πUnited States
- πPortugal
- π°$225k-$255kπUnited States
- π°$180k-$270kπWorldwide
- πUnited States
- πEurope
- πSlovak Republic
- π°$154k-$275kπWorldwide
- πCzech Republic
Please let Waabi know you found this job on JobsCollider. Thanks! π