Senior Manager, Site Reliability Engineering

SEON Logo

SEON

๐Ÿ“Remote - Hungary

Summary

Join SEON's Site Reliability Engineering (SRE) team as a highly experienced and motivated SRE Manager to lead a team of Site Reliability Engineers. You will play a crucial role in maintaining the reliability and efficiency of our services, ensuring that our products and services are reliable while coordinating with cross-functional teams across various geographical regions. This role offers flexibility, based in Budapest with a hybrid schedule or remotely in the European Union with occasional travel. You will lead and grow a high-performing SRE team, own incident management, drive implementation of SLAs and SLOs, champion automation, collaborate with engineering teams, and oversee system monitoring. You will also manage on-call rotations, drive continuous improvement, ensure compliance, provide mentorship, and communicate effectively with stakeholders.

Requirements

  • Bachelorโ€™s degree in Computer Science, Engineering, or a related field (or equivalent practical experience)
  • Proven success in leading high-performing SRE or DevOps teams in a large-scale, fast-paced environment
  • Extensive experience running high-availability web services at a large scale, with comprehensive knowledge of cloud-native architectures and advanced networking concepts
  • Strategic vision to balance immediate operational needs with long-term reliability and scalability objectives
  • Outstanding communication and interpersonal skills, with the ability to build strong relationships with team members and stakeholders
  • Strong technical background with hands-on experience in cloud computing, system architecture, automation, and monitoring
  • Excellent problem-solving skills with a focus on root cause analysis and proactive improvements
  • Exceptional organizational skills, with the ability to manage multiple priorities and projects simultaneously
  • Experience with tools and technologies such as AWS, Kubernetes, Terraform, Prometheus, Grafana, Jenkins, and similar

Responsibilities

  • Lead and grow a high-performing SRE team responsible for the reliability, performance, and scalability of production systems
  • Own the incident management process, postmortems, and root cause analysis to improve system resilience
  • Drive implementation of SLAs, SLOs, and error budgets across services to align operational goals with business objectives
  • Champion the use of automation to reduce manual work and improve deployment and recovery times
  • Collaborate with software engineering and Platform engineering teams to ensure systems are designed for reliability and operational efficiency
  • Oversee system monitoring, alerting, and observability efforts using tools like Prometheus, Grafana, Datadog, or similar
  • Manage on-call rotations, and ensure proper documentation, runbooks, and playbooks are maintained
  • Identify and drive continuous improvement in system architecture, capacity planning, and deployment strategies
  • Ensure compliance with security, privacy, and regulatory requirements within the infrastructure
  • Provide mentorship, performance reviews, and career development opportunities for SRE team members
  • You will communicate effectively with stakeholders at all levels, providing updates on team performance, project status, and incident resolutions
  • You will advocate for the SRE team within the broader organization, representing their needs and concerns

Preferred Qualifications

  • Cloud Architect Certification in one of the public clouds (AWS, GCP, Azure)
  • Good Knowledge of security controls for SOC2 and ISO certifications

Benefits

This role offers flexibility. It can be based in Budapest with a hybrid schedule or anywhere in the European Union with a remote setup, including occasional travel to our other offices

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.