HiveMQ is hiring a
Senior Site Reliability Engineer

closed
Logo of HiveMQ

HiveMQ

πŸ’΅ ~$150k-$222k
πŸ“Remote - Germany

Summary

HiveMQ is seeking an experienced Site Reliability Engineer for their Cloud Operations team. The role involves ensuring the HiveMQ Cloud platform's availability, reliability, and scalability, managing cloud infrastructure with various tools, and contributing to the overall platform vision of HiveMQ.

Requirements

  • Experience operating at scale Cloud (SaaS, IaaS or PaaS) products and services in a Cloud environment with high degrees of automation
  • Proven experience in building and operating applications at production-quality in the cloud with Cloud native technologies like Kubernetes, Docker, Terraform, Helm, CI/CD and other IaC tools
  • The ability to methodically diagnose systems, networking and application issues in on-call operation
  • Experience operating with at least one of the major 3 Cloud providers (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • Strong Experience with metrics and monitoring solutions such as Grafana, Prometheus, Loki, Mimir or similar
  • High standards on building platform and infrastructure setups with automation, modular reusable infrastructure as code, GitOps, Test- and CI/CD-driven
  • The ability to solve problems independently and are driven towards execution
  • A systematic but pragmatic approach paired with a high sense of ownership and take pride in the work you accomplish as a team
  • A good understanding of how agile platform engineering using Kanban in a self-organized team works
  • Excellent English communication skills and able to work in a collaborative team environment

Responsibilities

  • Ensure the HiveMQ Cloud platform is always highly available, reliable, and scalable
  • Run AWS, GCP, and Azure global infrastructure with Helm, Terraform, Kubernetes, and other industry-standard tools
  • Employ modernized software delivery methods such as infrastructure as code, distributed containerized service deployments, and self-healing fully managed SaaS services to automate the deployment and maintenance of customer-facing products and internal systems
  • Plan, implement, and maintain infrastructure to meet current or estimated demand while ensuring efficient use of cloud resources and related costs
  • Work on application monitoring, infrastructure change management, platform incident management, response, and post-incident reviews
  • Help debug production issues across services and levels of the stack and improve our products and services
  • Operate tools that power our observability, monitoring, and on-call systems
  • Help define Service Level Objectives and means to measure, automate remediations, and alert on them

Benefits

Be on call

This job is filled or no longer available

Similar Jobs