Principal ML Infrastructure Engineer

Upwork
Summary
Join Upwork's Machine Learning Infrastructure & Data team as a Principal ML Infrastructure Engineer and play a pivotal role in designing, developing, and maintaining robust and scalable ML infrastructure. You will collaborate with cross-functional teams, design and implement distributed systems, develop and maintain ML frameworks, architect highly available systems, collaborate with researchers on novel research, stay current with advancements in ML infrastructure, and mentor teammates. The ideal candidate possesses senior-level experience in ML infrastructure engineering, a proven track record of delivering impactful solutions, strong communication and teamwork skills, and a commitment to continuous learning. Upwork offers a remote-first work environment and a comprehensive benefits package.
Requirements
- Senior/Leadership level experience in ML infrastructure engineering, ideally at an innovative technology company
- Proven Impact: Show us your track record of delivering impactful solutions
- Innovative Thinker: Bring creativity and fresh ideas to the table
- Technical Proficiency: Solid foundation in software engineering and ML concepts
- Collaborative Mindset: Strong communication and teamwork skills are a must
- Continuous Learner: Stay updated with the latest advancements in the field of AI
Responsibilities
- Own technical workstreams from start to finish, contribute to the teamβs product roadmap, and be responsible for major technical decisions and tradeoffs
- Effectively participate in teamβs planning, code reviews, and design discussions
- Consider the effects of projects across multiple teams and proactively manage conflicts
- Work together with partner teams to achieve cross-departmental goals and satisfy broad requirements
- Design, implement, and optimize distributed systems and infrastructure components to support large-scale machine learning workflows, including data ingestion, feature engineering, model training, and serving
- Develop and maintain frameworks, libraries, and tools to streamline the end-to-end machine learning lifecycle, from data preparation, model training, evaluation, deployment, and monitoring
- Architect and implement highly available, fault-tolerant, and secure systems that meet the performance and scalability requirements of production machine learning workloads
- Collaborate and publish with machine learning researchers and data scientists on novel research and translate research into scalable and efficient software solutions
- Stay current with the latest advancements in machine learning infrastructure, distributed computing, and cloud technologies, and integrate them into our platform to drive innovation
- Mentor teammates, conduct code reviews, and uphold engineering best practices to ensure the delivery of high-quality software solutions
Benefits
- Comprehensive medical insurance coverage for both you and your family
- Unlimited paid time off
- A 401(k) plan with matching contributions
- 12 weeks of paid parental leave
- An Employee Stock Purchase Plan
Share this job:
Similar Remote Jobs
