πAustralia
Head Of Site Reliability Engineering
closed
Swirlds Inc
πRemote - Worldwide
Summary
Join Hashgraph as we seek a Head of SRE to lead the design, implementation, and management of a greenfield distributed infrastructure project. This is an exciting opportunity to build a system from the ground up, driving innovation and shaping the future of our platform.
Requirements
- 10+ years of experience in Site Reliability Engineering (SRE) or infrastructure engineering, with at least 5 years in leadership roles
- Proven experience in designing, deploying, and managing large-scale distributed systems, preferably in a cloud environment (AWS, GCP, Azure)
- Strong expertise in automation tools (Terraform, Ansible, etc.) and scripting languages (Python, Bash, etc.)
- Strong experience with containerization and orchestration technologies such as Docker and Kubernetes
- Deep understanding of network infrastructure, load balancing, firewalls, VPNs, and security best practices
- Proven track record of meeting or exceeding SLAs for system uptime and performance
- Experience building and leading teams across multiple regions and time zones
- Familiarity with managing infrastructure in a highly regulated or security-sensitive environment
- Strong understanding of CI/CD pipelines and incident management platforms (PagerDuty, Opsgenie)
- Strong understanding of LGTM stack
- Excellent leadership, communication, and project management skills
Responsibilities
- Leading the design, deployment, and management of infrastructure, ensuring high availability, reliability, and scalability
- Building, mentoring, and leading a globally distributed SRE team across multiple time zones (APAC, LATAM, etc.) with a follow-the-sun on-call support model
- Developing and managing SLAs for availability, performance, and uptime while driving operational excellence and automation
- Creating and implementing strategies for continuous delivery, monitoring, and incident response to ensure minimal downtime and rapid recovery
- Partnering with engineering teams to design scalable and fault-tolerant architecture and processes
- Overseeing security best practices, including vulnerability management, monitoring, and compliance with industry standards
- Developing tools and processes for automation of infrastructure, monitoring, alerting, and incident management
- Managing budgets, vendors, and third-party tools related to infrastructure, ensuring cost-effectiveness and efficiency
- Ensuring comprehensive documentation and training for all infrastructure, deployment, and operational processes
Preferred Qualifications
- Experience in managing infrastructure for decentralized or highly distributed systems
- Familiarity with observability and tracing tools (e.g., OpenTelemetry, Jaeger)
- Knowledge of multi-cloud architectures and hybrid cloud setups
- Certifications in AWS, GCP, or Azure (Solutions Architect, SRE Engineer, etc.)
- Experience with security frameworks like SOC 2, ISO 27001, and best practices for compliance in regulated environments
- Experience working in agile or SRE-focused organizations with a focus on continuous improvement and operational excellence
This job is filled or no longer available
Similar Remote Jobs

πFrance
πIndia
πEurope
π°$42k-$57k
πSpain
πUnited States
πUnited Kingdom