Summary

Join our team as a Kubernetes On-Premise Operations Engineer and manage our on-premise Kubernetes infrastructure, focusing on day-to-day operations, proactive monitoring, and troubleshooting to ensure high availability and system stability. Collaborate with Level 3 Engineers to maintain seamless production operations. This role supports applications serving multiple countries, including Mi Tigo, Tigo Sports, Apigee, and KannelGateway. You will be responsible for Kubernetes cluster management, incident management, networking and ingress management, storage and database support, observability and monitoring, automation and configuration management, production deployments, and OS and security management. The position requires strong troubleshooting skills and experience with various tools and technologies. This is a remote position open only to candidates in Bolivia.

Requirements

5+ years in Operations, SRE, or DevOps roles
3+ years managing on-premise Kubernetes clusters
Strong troubleshooting skills in: Kubernetes
Strong troubleshooting skills in: Networking
Strong troubleshooting skills in: Databases (MongoDB, MySQL, PostgreSQL)
Proficient in monitoring tools: Prometheus, Grafana, Loki
Familiar with operational processes, incident management, and runbooks
Experience with Helm, Ansible , and optionally Terraform
Prior experience with production on-call support and incident resolution
Competent in performing production deployments under change management practices
Experience managing Ubuntu systems

Responsibilities

Manage and maintain our on-premise Kubernetes infrastructure
Perform day-to-day operations, proactive monitoring, and troubleshooting
Ensure high availability and system stability
Collaborate with Level 3 Engineers to maintain seamless production operations
Kubernetes Cluster Management
Apply patches and updates
Monitor and troubleshoot performance issues
Incident Management & On-Call Support
Participate in on-call rotation
Respond to incidents, perform root cause analysis (RCA), and document resolutions
Networking & Ingress Management
Operate and troubleshoot Cilium, Nginx Ingress Controller, and Traefik
Storage & Databases
Support and maintain NFS, MongoDB, MySQL, PostgreSQL ensuring performance and data integrity
Observability & Monitoring
Manage Prometheus, Grafana, and Loki for proactive alerting and system logging
Automation & Configuration Management
Use Helm, Ansible, and CI/CD pipelines to apply and manage infrastructure configurations
Production Deployments
Execute, monitor, and manage production deployments with proper rollback strategies
OS & Security Management
Maintain Ubuntu-based systems, ensuring they are patched, secure, and performant

Operations Engineer

Believe Solutions

Summary

Requirements

Responsibilities

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Affirm

Remote

Cybersecurity

Mid-level

Remote

Project Management

Manager

LogicGate

Remote

DevOps

Senior

Centric Software

Remote

DevOps

Senior

Plus Power

Remote

DevOps

Senior

Jamf

Remote

DevOps

Entry Level

ClickHouse

Remote

DevOps

Mid-level

ClickHouse

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Senior