Platform Engineer / SRE

Portainer.io
Summary
Join Portainer's global Platform Engineering team and play a critical role in ensuring the reliability, scalability, and efficiency of large-scale, self-managed Kubernetes environments across customer data centers. You will collaborate with customer platform teams to operate and improve their Kubernetes estate, enhance observability and automation, and extend platform capabilities. This high-impact role demands deep infrastructure knowledge, cloud native expertise, and a DevOps/SRE mindset to support mission-critical systems globally. We seek an experienced individual with a proven track record in operating and engineering Kubernetes at scale, solving complex infrastructure problems, and owning production systems end-to-end. The ideal candidate will have practical expertise in building, breaking, and hardening Kubernetes platforms in demanding environments. Portainer offers a competitive salary and the ability to work anywhere globally.
Requirements
- 6+ years of hands-on experience in Platform Engineering, DevOps, or SRE roles
- 3+ years operating large-scale on-prem or self-managed Kubernetes clusters in production
- Deep understanding of Kubernetes control-plane components (API server, etcd, controller-manager, scheduler)
- Experience with Portainer or other Kubernetes platform management tools (e.g., Rancher, Lens, OpenShift)
- Proficiency in service mesh technologies such as Istio and Envoy
- Advanced skills in Infrastructure as Code (Terraform, Helm, Kustomize) and GitOps workflows
- Solid knowledge of CNI plugins (e.g., Cilium, Calico), ingress controllers, and CSI drivers
- Scripting and automation using Python, Ansible, Terraform, or Bash
- Familiarity with observability tooling (Prometheus, Grafana, Loki, VictoriaMetrics, Mimir, etc.)
- Strong grasp of reliability engineering principles: SLOs, SLIs, chaos testing, and scaling patterns
Responsibilities
- Operate and manage self-hosted Kubernetes clusters at scale (5,000+ nodes per region) across multiple sites
- Serve as a subject-matter expert on Kubernetes internals, delivering proactive support, performance tuning, and architectural recommendations
- Enable and extend platform tooling using Portainer, integrating it with identity, observability, and lifecycle management systems
- Design and automate Day-2 operational workflows including node lifecycle, network overlays, and storage provisioning
- Lead technical engagements such as architecture reviews, operational readiness assessments, and incident postmortems
- Build and maintain IaC pipelines and GitOps patterns using tools like Terraform, ArgoCD, and Flux
- Troubleshoot and resolve advanced infrastructure issues related to scheduling, networking, DNS, ingress, and runtime isolation
- Contribute to internal reusable tooling, engineering standards, and automation frameworks
- Collaborate with customer stakeholders and internal technical teams across time zones as part of a 24/7 high-availability model
Preferred Qualifications
Demonstrable experience in Go is a strong advantage; particularly in building custom Kubernetes operators or contributing upstream (e.g., submitting PRs to Kubernetes core or CNCF projects)
Benefits
- A highly competitive salary
- The ability to work anywhere in the world
Share this job:
Similar Remote Jobs
