Staff Network Engineer

Voltage Park
Summary
Join Voltage Park as a Staff Network Engineer and contribute to building and operating the backbone of a high-performance AI infrastructure. You will design, deploy, and support large-scale network systems connecting GPU clusters, storage, and compute environments across data centers. Collaborate with Principal Engineers and cross-functional teams to deliver automation-driven, low-latency networking for AI and HPC workloads. Implement and maintain high-throughput, low-latency networks, deploy and troubleshoot network systems, and operate and optimize layer 2/3 network services. Develop and maintain network automation, monitor network health, participate in incident response, and maintain configuration standards. Collaborate on architectural decisions and vendor evaluations.
Requirements
- 5β8+ years of hands-on experience in large-scale network engineering, data center networks, or service provider infrastructure
- Strong knowledge of IP networking, BGP, OSPF, EVPN/VXLAN, and L2/L3 design principles
- Experience configuring and operating Arista, Juniper, or Cisco platforms in production environments
- Proficiency in scripting or automation (e.g., Python, Bash, Ansible)
- Solid troubleshooting skills and experience with real-time diagnostics and packet analysis
- Familiarity with monitoring and telemetry tools (e.g., Prometheus, Grafana, sFlow, InfluxDB)
Responsibilities
- Implement and maintain high-throughput, low-latency networks supporting AI Factory workloads and distributed training infrastructure
- Work hands-on to deploy, configure, and troubleshoot routing, switching, optics, and interconnect systems across data centers
- Operate and optimize layer 2/3 network services: BGP, EVPN/VXLAN, OSPF, MPLS, QoS, and ACLs
- Work with Infiniband Networking Systems and Nvidia Fabric Manager (UFM)
- Develop and maintain network automation (e.g., Ansible, Python, Terraform) for provisioning, compliance, and operational workflows
- Monitor network health and performance using telemetry tools and help scale observability platforms
- Participate in the incident response rotation and perform root cause analysis on service-impacting events
- Maintain configuration standards, documentation, and change management in line with infrastructure governance processes
- Collaborate with the Principal Network Engineer on architectural decisions and vendor evaluations
Preferred Qualifications
- Experience in AI, HPC, or GPU-based infrastructure
- Exposure to carrier-grade architectures, DCI, and optical transport systems
- Exposure to Nvidia Infiniband Networking systems and components
- Understanding of network segmentation, security policies, and zero-trust principles
- Comfortable working in 24/7 operational environments and on-call rotations