Site Reliability Engineer Lead

Input Output
Summary
Join IOG, a blockchain technology company, as a Site Reliability Engineer Lead. You will lead a team, ensuring high-quality, stable environments for customers. Responsibilities include managing build and deployment cycles, supporting multi-tier applications, building automation tools, and improving monitoring systems. You will collaborate with agile teams and foster a DevOps culture. The role requires a Bachelor's degree or equivalent experience, 5+ years in SRE/DevOps, and 2+ years in a leadership role. Strong Linux, networking, and programming skills are essential. IOG offers remote work, laptop reimbursement, learning opportunities, competitive PTO, medical/dental/vision benefits, 401k, and a health savings account.
Requirements
- Bachelor’s Degree or higher in Computer Science, Software Engineering, or related technical field, or equivalent practical experience
- 5+ years of professional experience in SRE, DevOps, Platform Engineering, or Infrastructure roles
- 2+ years in a technical leadership or senior engineering capacity
- Proven track record of building and operating highly available, distributed, fault-tolerant systems
- Strong foundation in Linux system internals, networking (TCP/IP, DNS, HTTP), and systems programming
- Experience leading incident responses, writing post-mortems, and driving reliability improvements
- Experience working with Agile, Kanban, or similar development methodologies
- You will be someone who works well on your own and with a team
- You value cooperation and collaboration above all, and are not afraid to ask for clarification or help when needed
- You are kind and respectful of others’ opinions, and you are open and act with integrity when engaging in academic or technical discussions
- Strong scripting and programming skills: Bash, Python, Go, or Rust preferred
- Extensive experience with Git: branching strategies, GitOps workflows, code review best practices
- Experience with CI/CD systems, such as GitHub Actions, GitLab CI, Jenkins, Buildkite, or equivalent
- Cloud platform proficiency: AWS, GCP, Azure — including compute, storage, networking, and IAM
- Containerization and orchestration: deep experience with Docker and Kubernetes (k8s), Helm
- Infrastructure as Code (IaC): using Terraform, Pulumi, or similar tools
- Configuration management: Ansible, Chef, or SaltStack (with preference for declarative approaches)
- Monitoring, logging, and observability: Prometheus, Grafana, Loki, OpenTelemetry, Datadog, or similar
- Security best practices: secrets management (Vault, SOPS), least privilege, security incident handling
- Incident Management and Root Cause Analysis (RCA): strong ownership in production reliability
- Automated testing and validation: unit testing, integration testing, chaos engineering exposure
- Experience managing large-scale Linux-based systems: operational excellence in Ubuntu, Debian, or NixOS environments
- Advocate of DevOps/SRE culture: focus on reducing toil, Service Level Objectives (SLOs), error budgets
- Strong communication skills: written and verbal, capable of collaborating across distributed teams
Responsibilities
- Working on ‘build and deployment cycles’ across all development environments
- Supporting the build, deployment, and configuration management for multi-tier applications
- Participating in the building of tools and processes to support the infrastructure
- Improving and maintaining tooling and scripts for automation purposes
- Develop tooling for internal and external users to monitor and maintain production systems
- Supporting our teams to write software that is simple and flexible to configure and deploy
- Collaborating with agile teams to establish and maintain automated regression suite infrastructure and performance testing infrastructure
- Building capabilities to allow development teams to be self-sufficient
- As Leaders it is our responsibility to motivate, develop and progress our fellow team members
- As a Leader there is a need to communicate openly with all members of your team, address any issues head on and not shy away from difficult conversations
- Empowering your team to provide the best results by organizing clear processes and coordinating team efforts should be your top priority
Preferred Qualifications
Demonstrated experience in open-source contribution is highly desirable
Benefits
- Remote work
- Laptop reimbursement
- New starter package to buy hardware essentials (headphones, monitor, etc)
- Learning & Development opportunities
- Competitive PTO and Sick Leave plan
- Medical, Dental, and Vision benefits coverage for the employee and dependents
- 401k
- Health Savings Account
- Life Insurance