Summary

Join IOG's Midnight Tribe as the Head of Site Reliability Engineering (SRE) and lead the infrastructure and reliability strategy for the Midnight Network, a blockchain platform focused on data protection. In this senior leadership role, you will own the reliability, scalability, and performance of the platform, building and leading a high-performing SRE team. You will be instrumental in setting the foundations of our infrastructure, designing globally scalable systems, and ensuring high availability within a blockchain architecture. This hands-on role demands technical depth, architectural vision, operational rigor, and strong people leadership skills. You will lead the SRE team, drive initiatives to enhance service reliability, and oversee the entire service lifecycle. Collaborate with engineering and testing teams to build robust production systems and ensure sustainable incident response.

Requirements

Bachelor's degree in Computer Science, Information Technology, or a related field
At least 8 years in a Reliability Engineering, DevOps or infrastructure focused role
Proven track record of leading and managing a high-performing SRE team
Experience writing code in Python, Rust/C++ or JavaScript
Proven years of experience in Build and Release engineering, Linux operational excellence and automation
Systematic problem-solving approach, coupled with effective communication skills and a sense of drive
You will be someone who works well on your own and with a team
You are kind and respectful of others’ opinions and you are open and act with integrity when engaging in academic or technical discussions
Proven experience in capacity planning, performance monitoring, and optimization to ensure systems can handle current and future loads efficiently
System engineering experience working with application servers, containers, and web servers
Demonstrated ability to analyze incidents, identify root causes, and implement preventive measures to reduce the likelihood of recurring issues
Strong understanding of cloud architecture including the major cloud providers (AWS, GCP, etc)
Experience working with Docker containers and related orchestration technologies (such as Kubernetes or ECS)
Knowledge of SRE principles (observability, SLOs, SLIs, logging, etc)
Understand underlying networking and security considerations when developing the architecture of our deployment environments
Fluency in git based workflows, commit discipline
Experience in providing mentorship and coaching to team members

Responsibilities

Lead the SRE team, sharing expertise and best practices. Coach, mentor and develop SRE team
Demonstrate leadership in driving initiatives that enhance service reliability, scalability, and overall performance
Lead the entire lifecycle of services, including inception, design, deployment, operation, and refinement
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews
Oversee the maintenance of live services by continuously measuring and monitoring factors like availability, latency, and overall system health
Assist our teams in creating software that is both simple and flexible to configure and deploy
Lead sustainable incident response practices, ensuring timely resolution with a focus on minimizing impact
Collaborate with software engineering and testing teams to establish and maintain automated regression suite infrastructure and performance testing
Sustainably scale systems through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity
Conduct blameless postmortems to analyze incidents, identify root causes, and implement preventive measures

Benefits

Remote work
Laptop reimbursement
New starter package to buy hardware essentials (headphones, monitor, etc)
Learning & Development opportunities
Competitive PTO

Site Reliability Engineering

Input Output

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Director

Share this job:

Similar Remote Jobs

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Mid-level

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Strike

Remote

DevOps

Mid-level

NBCUniversal

Remote

DevOps

Manager