Senior Platform Engineer

MUBI Logo

MUBI

πŸ“Remote - United Kingdom

Summary

Join MUBI's remote-first Infrastructure team as a Senior Platform Engineer and contribute to building and maintaining a highly scalable, distributed platform. You will work with a global team, leveraging Kubernetes (EKS), AWS, and a custom-built CDN. Responsibilities include designing, implementing, and maintaining EKS clusters, managing AWS services, automating provisioning with Terraform, improving CI/CD pipelines, ensuring high system availability through enhanced monitoring, and improving security and reliability. The role requires significant experience with AWS, Kubernetes, Infrastructure-as-Code, CI/CD pipelines, monitoring tools, and networking fundamentals. MUBI offers a fully remote setup and a commitment to a diverse and inclusive workplace.

Requirements

  • 3+ years in platform, infrastructure, or SRE roles
  • Deep experience with AWS and Kubernetes (EKS)
  • Infrastructure-as-Code (Terraform, AWS CDK, Pulumi)
  • CI/CD pipelines (Jenkins, ArgoCD) & GitOps practices
  • Monitoring & observability tools (Prometheus, Grafana, ELK, Datadog)
  • Networking fundamentals (TCP/IP, DNS, Load Balancing, VPCs, security policies)
  • Strong Linux administration skills
  • Good scripting & automation skills (Ruby, Bash)

Responsibilities

  • Design, implement, and maintain EKS clusters, handling upgrades, security, and monitoring
  • Manage AWS services (EC2, S3, RDS, VPC, Route53, CloudFront, CloudWatch, SNS, SQS, DynamoDB) with a focus on cost optimization and scalability
  • Automate provisioning with Terraform
  • Improve and maintain our Jenkins & ArgoCD pipelines for Kubernetes-based deployments
  • Work with Helm, Kustomize, and GitOps practices to standardize deployments
  • Ensure high system availability by enhancing monitoring with Prometheus, Grafana, Datadog, and ELK (Elasticsearch, Logstash, Kibana)
  • Implement monitoring-as-code, log collection, and alerting for infrastructure and applications
  • Improve cluster networking, ingress traffic management, and security policies (RBAC, network policies, PodSecurityPolicies, vulnerability scanning)
  • Enhance disaster recovery strategies, backups, and high-availability configurations
  • Work closely with developers to troubleshoot performance issues, optimize cloud costs, and automate processes
  • Contribute to a culture of reliability, making complex infrastructure easy to use for engineers

Preferred Qualifications

  • Experience with autoscaling tools (Karpenter, HPA)
  • Multi-region AWS experience
  • Database operations knowledge (MariaDB, PostgreSQL, query performance, backups)
  • Experience with distributed systems, event-driven architectures (RabbitMQ, Kafka, EventBridge, WebSockets, gRPC)
  • Knowledge of CDN operations, caching, and performance optimization

Benefits

Fully remote setup

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs