Senior Platform Engineer

MUBI
Summary
Join MUBI's remote-first Infrastructure team as a Senior Platform Engineer and contribute to building and maintaining a highly scalable, distributed platform. You will work with a global team, leveraging Kubernetes (EKS), AWS, and a custom-built CDN. Responsibilities include designing, implementing, and maintaining EKS clusters, managing AWS services, automating provisioning with Terraform, improving CI/CD pipelines, ensuring high system availability through enhanced monitoring, and improving security and reliability. The role requires significant experience with AWS, Kubernetes, Infrastructure-as-Code, CI/CD pipelines, monitoring tools, and networking fundamentals. MUBI offers a fully remote setup and a commitment to a diverse and inclusive workplace.
Requirements
- 3+ years in platform, infrastructure, or SRE roles
- Deep experience with AWS and Kubernetes (EKS)
- Infrastructure-as-Code (Terraform, AWS CDK, Pulumi)
- CI/CD pipelines (Jenkins, ArgoCD) & GitOps practices
- Monitoring & observability tools (Prometheus, Grafana, ELK, Datadog)
- Networking fundamentals (TCP/IP, DNS, Load Balancing, VPCs, security policies)
- Strong Linux administration skills
- Good scripting & automation skills (Ruby, Bash)
Responsibilities
- Design, implement, and maintain EKS clusters, handling upgrades, security, and monitoring
- Manage AWS services (EC2, S3, RDS, VPC, Route53, CloudFront, CloudWatch, SNS, SQS, DynamoDB) with a focus on cost optimization and scalability
- Automate provisioning with Terraform
- Improve and maintain our Jenkins & ArgoCD pipelines for Kubernetes-based deployments
- Work with Helm, Kustomize, and GitOps practices to standardize deployments
- Ensure high system availability by enhancing monitoring with Prometheus, Grafana, Datadog, and ELK (Elasticsearch, Logstash, Kibana)
- Implement monitoring-as-code, log collection, and alerting for infrastructure and applications
- Improve cluster networking, ingress traffic management, and security policies (RBAC, network policies, PodSecurityPolicies, vulnerability scanning)
- Enhance disaster recovery strategies, backups, and high-availability configurations
- Work closely with developers to troubleshoot performance issues, optimize cloud costs, and automate processes
- Contribute to a culture of reliability, making complex infrastructure easy to use for engineers
Preferred Qualifications
- Experience with autoscaling tools (Karpenter, HPA)
- Multi-region AWS experience
- Database operations knowledge (MariaDB, PostgreSQL, query performance, backups)
- Experience with distributed systems, event-driven architectures (RabbitMQ, Kafka, EventBridge, WebSockets, gRPC)
- Knowledge of CDN operations, caching, and performance optimization
Benefits
Fully remote setup
Share this job:
Similar Remote Jobs
