Remote Senior Site Reliability Engineer at HiveMQ

Summary

HiveMQ is seeking an experienced Site Reliability Engineer for their Cloud Operations team. The role involves ensuring the HiveMQ Cloud platform's availability, reliability, and scalability, managing cloud infrastructure with various tools, and contributing to the overall platform vision of HiveMQ.

Requirements

Experience operating at scale Cloud (SaaS, IaaS or PaaS) products and services in a Cloud environment with high degrees of automation
Proven experience in building and operating applications at production-quality in the cloud with Cloud native technologies like Kubernetes, Docker, Terraform, Helm, CI/CD and other IaC tools
The ability to methodically diagnose systems, networking and application issues in on-call operation
Experience operating with at least one of the major 3 Cloud providers (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
Strong Experience with metrics and monitoring solutions such as Grafana, Prometheus, Loki, Mimir or similar
High standards on building platform and infrastructure setups with automation, modular reusable infrastructure as code, GitOps, Test- and CI/CD-driven
The ability to solve problems independently and are driven towards execution
A systematic but pragmatic approach paired with a high sense of ownership and take pride in the work you accomplish as a team
A good understanding of how agile platform engineering using Kanban in a self-organized team works
Excellent English communication skills and able to work in a collaborative team environment

Responsibilities

Ensure the HiveMQ Cloud platform is always highly available, reliable, and scalable
Run AWS, GCP, and Azure global infrastructure with Helm, Terraform, Kubernetes, and other industry-standard tools
Employ modernized software delivery methods such as infrastructure as code, distributed containerized service deployments, and self-healing fully managed SaaS services to automate the deployment and maintenance of customer-facing products and internal systems
Plan, implement, and maintain infrastructure to meet current or estimated demand while ensuring efficient use of cloud resources and related costs
Work on application monitoring, infrastructure change management, platform incident management, response, and post-incident reviews
Help debug production issues across services and levels of the stack and improve our products and services
Operate tools that power our observability, monitoring, and on-call systems
Help define Service Level Objectives and means to measure, automate remediations, and alert on them

Benefits

Be on call

HiveMQ is hiring a Senior Site Reliability Engineer

HiveMQ

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Senior

Similar Jobs

Senior Site Reliability Engineering Engineer

Binance

Remote

DevOps

Senior

Senior Site Reliability Engineering Engineer

Binance

Remote

DevOps

Senior

Site Reliability Engineer Senior

Granicus

Remote

DevOps

Senior

Senior Site Reliability Engineer

Gemini

Remote

DevOps

Mid-level

Senior Site Reliability Engineer

Dayshape

Remote

DevOps

Senior

Senior Site Reliability Engineer

MasteryPrep

Remote

DevOps

Senior

Senior Site Reliability Engineer

Tyk

Remote

DevOps

Senior

Senior Site Reliability Engineer

MongoDB

Remote

DevOps

Senior

Senior Site Reliability Engineer

MongoDB

Remote

DevOps

Senior

Senior Site Reliability Engineer

MongoDB

Remote

DevOps

Senior

HiveMQ is hiring a
Senior Site Reliability Engineer