Lead Infrastructure Engineer

Pallon Logo

Pallon

📍Remote - Germany

Summary

Join Pallon, a spin-off from ETH Zurich, and become a seasoned infrastructure engineer, taking full ownership of our infrastructure—from our high-performance GPU cluster to our cloud systems. You will lead critical decisions around architecture, performance, and scale, while also solving real-world issues. Collaborate closely with our platform and computer vision teams to ensure tools run fast, reliably, and securely. This hands-on role offers autonomy to shape how infrastructure comes together. You will design and build a custom GPU cluster, manage and scale infrastructure, keep systems running smoothly and securely, and make strategic decisions on automation. The ideal candidate has 5+ years of experience owning infrastructure end-to-end, ideally in startups, and is comfortable with all layers from bare-metal servers to cloud-native tools.

Requirements

  • You’ve spent 5+ years owning infrastructure end-to-end, ideally in startup environments
  • You’re comfortable at every layer — from bare-metal servers and NVMe drives to container orchestration and cloud-native tools
  • You have strong Linux fundamentals, and you know your way around networking, storage, and distributed systems
  • You can code well enough to automate, debug, and build tooling across a variety of languages
  • You communicate clearly and collaborate well — especially with engineers who aren’t infra specialists
  • You thrive with autonomy and can manage your own priorities effectively
  • You’re curious and fast-learning, especially when tackling new tools or challenges
  • You have a university degree in Computer Science or a related field

Responsibilities

  • Design and build a custom GPU cluster for deep learning workloads
  • Decide how we manage and scale our infrastructure — both on-prem and in the cloud
  • Keep systems running smoothly and securely — from data pipelines to distributed training jobs
  • Troubleshooting weird kernel errors, configuring systemd units, or debugging Kubernetes evictions
  • Making calls on when to script, when to automate, and when to just fix the thing

Preferred Qualifications

  • Experience with machine learning infrastructure or HPC clusters
  • Familiarity with data engineering workflows and ETL pipelines

Benefits

  • Contribute to a positive impact on society and the environment
  • Develop a novel product that changes a whole industry
  • Be part of a motivated, smart, fun, and supportive team of software engineers and AI researchers
  • Own a part of Pallon and have a part in our success with our Employee Stock Option Plan (ESOP)
  • Work from home or enjoy access to our beautiful office space located in Zürich

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.