Phantom AI is hiring a
Linux System Engineer in United States

Logo of Phantom AI
Linux System Engineer
🏢 Phantom AI
💵 ~$120k-$160k
📍United States
📅 Posted on Jun 11, 2024

Summary

The job is for an AI/ML cluster infrastructure support role at Phantom AI, a company specializing in cost-effective L2/L3 solutions for the automotive industry. The position involves systems automation, configuration management, health monitoring, debugging application performance issues, and more. The role is remote or in-office, and the company provides equal employment opportunities.

Requirements

  • Bachelor’s degree in computer science, electrical engineering or related field
  • Strong understanding of Linux fundamentals and performance optimizations (Ubuntu)
  • Advanced experience with SLURM configuration management systems, starting from scratch
  • Demonstrable knowledge of TCP/IP, Linux operating system internals, filesystems, disk/storage technologies and storage protocols
  • Experience in collaborating with network and data center teams for large scale cluster builds
  • Experience with configuration management software systems monitoring and alerting (Prometheus, Grafana, Telegraf, Splunk, etc.) and/or administering HPC workload managers (SLURM)
  • Experience with high-throughput low-latency networks, GPU-based computing systems, and/or high performance storage systems
  • Experience with Slurm and storage management of distributed parallel file systems a plus
  • 3+ years of additional equivalent experience or evidence of exceptional ability related to the position

Responsibilities

  • Support the AI/ML cluster infrastructure on GPU focusing on systems automation, configuration management and deployment at scale
  • Improve our cluster health monitoring and auto-recovery pipeline
  • Work with users on debugging application performance issues
  • Work with hardware and storage vendors to tune and optimize our servers, TrueNas storage and network
  • Automate and Deploy GPU cluster with Ansible
  • Performance tuning and OS provisioning on Linux systems
  • Manage HPC clusters, workloads and applications

Benefits

  • This is a contract position
  • Office snacks & reimbursable meals* when in-office
Help us out by mentioning to Phantom AI that you discovered this job opportunity on JobsCollider. Your support is greatly appreciated. Thank you 🙏
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Jobs