Core & ML Ops Team Lead

Zyte Logo

Zyte

📍Remote - Poland

Summary

Join Zyte, a globally distributed team building powerful, easy-to-use tools for web data extraction, as an experienced Team Lead to manage the Core & MLOps Squad. This hands-on technical leadership role demands expertise in MLOps, systems programming, and orchestration. You will lead a cross-functional team in designing and maintaining the scalable infrastructure powering Zyte. Responsibilities include designing the core platform, owning the model platform, building a 'Golden Path' for streamlined development, and ensuring MLOps excellence. Team management involves roadmap planning, delivery, mentoring, and fostering high engineering standards. The role requires collaboration with other teams and a commitment to platform thinking.

Requirements

  • 5+ years experience building distributed systems; 3+ years in MLOps/ML platform engineering (or equivalent impact)
  • Knowledge of Linux/OS internals (process model, cgroups/namespaces), networking (TCP/IP, HTTP/2), concurrency, and performance profiling
  • Deep understanding of Kubernetes (bonus: Mesos)
  • Proficiency developing high-performance services in Java, Rust, Go or C++ (bonus: familiarity with vert.x and Netty frameworks); strong Python skills
  • Experience with GPU infrastructure (scheduling, containerization, optimization)
  • Track record of designing and operating model platforms (registry, training, serving, monitoring) in production
  • Demonstrated success leading technical teams and implementing organization-wide platform solutions

Responsibilities

  • Design and evolve the core platform (Kubernetes, Mesos, GPU scheduling/autoscaling, distributed compute)
  • Own the model platform : registry, experiment tracking, training orchestration, evaluation, serving, and monitoring
  • Build the Golden Path : reference repos, a scaffold CLI, opinionated CI/CD pipelines, runtime contracts (health/metrics/tracing/SLOs), high-performance clients, circuit breakers and other production‑ready defaults
  • Operate a secure, multi‑tenant model registry and training platform with standardized experiment/evaluation harnesses
  • Provide turnkey serving patterns (online + batch), drift/quality monitoring, and rollback playbooks
  • Integrate public/open‑source AI capabilities as managed platform services with cost and data‑governance guardrails
  • Run the squad: roadmap/prioritization, delivery, mentoring, and high engineering standards
  • Partner with product engineering (Zyte API, Scrapy Cloud), Prod Ops, and Security on adoption and rollout plans
  • Mentor the team and foster a platform-thinking mindset

Preferred Qualifications

  • Streaming & workflows: Kafka plus Argo/Temporal/Airflow or equivalents
  • EBPF‑based observability, perf tooling, or io_uring experience
  • Cost optimization for ML/AI; multi‑tenant quotas and fairness
  • Hands‑on experience authoring Golden Paths (service chassis/templates, CI/CD blueprints, CLI scaffolds)
  • SRE practices (SLIs/SLOs, incident management)

Benefits

Have the freedom and flexibility to work from where you do your best work, as we are a completely remote company

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.