Phantom AI is hiring a
Linux System Engineer in United States

Logo of Phantom AI
Linux System Engineer closed
🏢 Phantom AI
💵 ~$120k-$160k
📍United States
📅 Posted on Jun 11, 2024

Summary

The job is for an AI/ML cluster infrastructure support role at Phantom AI, a company specializing in cost-effective L2/L3 solutions for the automotive industry. The position involves systems automation, configuration management, health monitoring, debugging application performance issues, and more. The role is remote or in-office, and the company provides equal employment opportunities.

Requirements

  • Bachelor’s degree in computer science, electrical engineering or related field
  • Strong understanding of Linux fundamentals and performance optimizations (Ubuntu)
  • Advanced experience with SLURM configuration management systems, starting from scratch
  • Demonstrable knowledge of TCP/IP, Linux operating system internals, filesystems, disk/storage technologies and storage protocols
  • Experience in collaborating with network and data center teams for large scale cluster builds
  • Experience with configuration management software systems monitoring and alerting (Prometheus, Grafana, Telegraf, Splunk, etc.) and/or administering HPC workload managers (SLURM)
  • Experience with high-throughput low-latency networks, GPU-based computing systems, and/or high performance storage systems
  • Experience with Slurm and storage management of distributed parallel file systems a plus
  • 3+ years of additional equivalent experience or evidence of exceptional ability related to the position

Responsibilities

  • Support the AI/ML cluster infrastructure on GPU focusing on systems automation, configuration management and deployment at scale
  • Improve our cluster health monitoring and auto-recovery pipeline
  • Work with users on debugging application performance issues
  • Work with hardware and storage vendors to tune and optimize our servers, TrueNas storage and network
  • Automate and Deploy GPU cluster with Ansible
  • Performance tuning and OS provisioning on Linux systems
  • Manage HPC clusters, workloads and applications

Benefits

  • This is a contract position
  • Office snacks & reimbursable meals* when in-office
This job is filled or no longer available

Similar Jobs