πWorldwide
Linux System Engineer
closed
Phantom AI
πRemote - United States
Summary
The job is for an AI/ML cluster infrastructure support role at Phantom AI, a company specializing in cost-effective L2/L3 solutions for the automotive industry. The position involves systems automation, configuration management, health monitoring, debugging application performance issues, and more. The role is remote or in-office, and the company provides equal employment opportunities.
Requirements
- Bachelorβs degree in computer science, electrical engineering or related field
- Strong understanding of Linux fundamentals and performance optimizations (Ubuntu)
- Advanced experience with SLURM configuration management systems, starting from scratch
- Demonstrable knowledge of TCP/IP, Linux operating system internals, filesystems, disk/storage technologies and storage protocols
- Experience in collaborating with network and data center teams for large scale cluster builds
- Experience with configuration management software systems monitoring and alerting (Prometheus, Grafana, Telegraf, Splunk, etc.) and/or administering HPC workload managers (SLURM)
- Experience with high-throughput low-latency networks, GPU-based computing systems, and/or high performance storage systems
- Experience with Slurm and storage management of distributed parallel file systems a plus
- 3+ years of additional equivalent experience or evidence of exceptional ability related to the position
Responsibilities
- Support the AI/ML cluster infrastructure on GPU focusing on systems automation, configuration management and deployment at scale
- Improve our cluster health monitoring and auto-recovery pipeline
- Work with users on debugging application performance issues
- Work with hardware and storage vendors to tune and optimize our servers, TrueNas storage and network
- Automate and Deploy GPU cluster with Ansible
- Performance tuning and OS provisioning on Linux systems
- Manage HPC clusters, workloads and applications
Benefits
- This is a contract position
- Office snacks & reimbursable meals* when in-office
This job is filled or no longer available
Similar Remote Jobs
πEgypt
πWorldwide
πUnited States

πUnited States

πBelgium
πFrance
π°$48k-$107k
πFrance
πUnited States, Europe, Middle East, and Africa
πUnited States