Infrastructure Engineer

Deepgram Logo

Deepgram

πŸ“Remote - Worldwide

Summary

Join Deepgram, a leading voice AI platform company, as an experienced Infrastructure Engineer. You will design, implement, and maintain our large-scale distributed systems infrastructure, focusing on network architecture, storage solutions, and compute platforms for AI/ML workloads. This role requires expertise in network engineering, storage systems, and container orchestration. You will build and optimize cost-effective data center infrastructure and manage large-scale deployments using Kubernetes and Slurm. The ideal candidate has 5+ years of experience in infrastructure engineering and a strong background in network engineering and large-scale storage systems. Deepgram offers a collaborative and customer-focused work environment.

Requirements

  • 5+ years of experience in infrastructure engineering or similar roles
  • Strong background in network engineering and design for reliability
  • Experience with large-scale storage systems (distributed file systems, caching solutions)
  • Proven track record of managing data center infrastructure
  • Expertise in container orchestration platforms (Kubernetes, Slurm)
  • Experience with GPU infrastructure management and optimization
  • Strong automation and scripting skills

Responsibilities

  • Design and implement reliable, high-performance network architectures for distributed systems
  • Architect and maintain large-scale storage solutions, including backup systems, distributed caching, and object storage
  • Build and optimize cost-effective data center infrastructure
  • Develop and maintain GPU compute clusters for AI inference workloads
  • Manage large-scale deployments using modern orchestration platforms like Kubernetes and Slurm
  • Implement monitoring, alerting, and automation solutions for infrastructure management

Preferred Qualifications

  • Experience with software-defined networking
  • Knowledge of cost optimization for cloud and on-premise infrastructure
  • Familiarity with AI/ML workloads and their infrastructure requirements
  • Experience with multi-region infrastructure deployment
  • Background in performance optimization for distributed systems
  • Certification in relevant cloud platforms (AWS, GCP, Azure)

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs