Lambda is hiring a
HPC Operations Engineer in United States
![Logo of Lambda](https://cdn.jobscollider.com/logo/lambda-554e.webp)
HPC Operations Engineer closed
🏢 Lambda
💵 $120k-$160k
📍United States
📅 Posted on Jun 10, 2024
Summary
The job is a remote position for an HPC/AI cluster deployment and configuration specialist at Lambda. The role involves deploying and configuring large-scale HPC clusters, troubleshooting issues, providing clear updates to project leads, staying updated on the latest HPC/AI technologies, and contributing to Standard Operating Procedures.
Requirements
- Have a good understanding of HPC/AI architecture, operating systems, firmware, software, and networking
- Have 3+ years of experience in deploying and configuring HPC clusters for AI workloads
- Have an innate attention to detail
- Be familiar with Bright Cluster Manager or similar cluster management tools
Responsibilities
- Remotely deploy and configure large-scale HPC clusters for AI workloads (up to many thousands of nodes)
- Remotely install and configure operating systems, firmware, software, and networking on HPC clusters both manually and using automation tools
- Troubleshoot and resolve HPC cluster issues working closely with physical deployment teams on-site
- Provide context and details to an automation team to further automate the deployment process
- Provide clear and detailed requirements back to HPC design team on gaps and improvement areas, specifically in the areas of simplification, stability, and operational efficiency
- Contribute to the creation and maintenance of Standard Operating Procedures
- Provide regular and well-communicated updates to project leads throughout each deployment
Preferred Qualifications
- Experience with machine learning and deep learning frameworks (PyTorch, TensorFlow) and benchmarking tools (DeepSpeed, MLPerf)
- Experience with containerization technologies (Docker, Kubernetes)
- Experience working with the technologies that underpin Lambda's cloud business (GPU acceleration, virtualization, and cloud computing)
Benefits
- Generous cash & equity compensation
- Investors include Gradient Ventures, Google’s AI-focused venture fund
- Experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
- Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
- A wildly talented team of 200, and growing fast
- Health, dental, and vision coverage for you and your dependents
- Commuter/Work from home stipends
- 401k Plan with 2% company match
- Flexible Paid Time Off Plan that we all actually use
This job is filled or no longer available
Similar Jobs
- 1 months ago💰$170k-$230k📍United States
- 1 months ago💰$86k📍Iceland
- 2 weeks ago💰$169k-$243k📍Worldwide
- 2 weeks ago💰$174k-$259k📍United Kingdom
- 2 weeks ago💰$160k-$210k📍United States
- 2 weeks ago💰~$103k-$207k📍United States
- 1 months ago💰$169k-$243k📍United States, Canada
- 2 weeks ago💰$60k-$120k📍Taiwan, China
- 1 months ago💰$180k-$250k📍United States, Canada
- 1 months ago💰~$141k-$210k📍Worldwide