Remote Site Reliability Engineer Technical Lead at Nethermind

Summary

Join a team of builders and researchers on a mission to empower enterprises and developers worldwide to access and build on decentralized systems. We're seeking an experienced Site Reliability Engineer to lead and mentor our SRE team.

Requirements

5+ years of experience in Site Reliability Engineering or DevOps
Expert knowledge of cloud platforms (AWS, GCP)
Expert knowledge of Kubernetes
Proven experience in designing and implementing scalable, efficient, resilient systems
Deep understanding of Linux/Unix systems and networking protocols
Strong programming skills in Python or Go
Strong background in monitoring, observability, and logging systems (e.g., Grafana, Prometheus, Loki)
Expertise in CI/CD tools (e.g. GitHub Actions, ArgoCD)
Excellent communication skills, both written and verbal, with the ability to explain complex technical concepts to various audiences
Experience in producing technical documentation, runbooks, presentations, and post-mortem reports
Experience and passion for mentoring and upskilling team members

Responsibilities

Lead the implementation and refinement of SRE practices across the organization, including SLOs, error budgets, and blameless postmortems
Design and implement automation to eliminate toil and improve system reliability and efficiency
Lead initiatives and architect scalable hybrid cloud solutions for Web3 infrastructure
Manage error budgets and make data-driven decisions about when to prioritize reliability vs. new features
Drive SRE practices to ensure high availability, performance, and reliability under varying load conditions
Collaborate closely with Platform engineering team to build reliability into services from the ground up
Collaborate closely with Nethermind’s Infrastructure Leadership department to align SRE strategies with overall technical vision
Drive the adoption of observability best practices and implement comprehensive monitoring systems
Develop and maintain service level indicators (SLIs) and objectives (SLOs), working with product owners to define appropriate reliability targets
Mentor team members in SRE practices and foster a culture of continuous learning
Lead capacity planning efforts, using quantitative analysis to predict and address future scaling challenges
Contribute to long-term technical roadmaps, balancing reliability concerns with product innovation

Preferred Qualifications

Experience leading technical teams
Contributions to open-source projects or thought leadership in SRE
Familiarity with MLOps and big data technologies
Knowledge of blockchain technology and infrastructure
Experience with chaos engineering principles and tools
Familiarity with traffic management and CDN technologies
Systems or backend engineering background

Remote Site Reliability Engineer Technical Lead

Nethermind

Job highlights

Summary

Requirements

Responsibilities

Preferred Qualifications

Remote

DevOps

Senior

Similar Remote Jobs

Lead Site Reliability Engineer, Infrastructure Security

MongoDB

Remote

DevOps

Senior

Lead Site Reliability Engineer, Infrastructure Security

MongoDB

Remote

DevOps

Senior

Staff Technical Program Manager, Site Reliability Engineering

MongoDB

Remote

DevOps

Senior

Senior Infrastructure Engineer, Site Reliability Engineer

Flex

Remote

DevOps

Senior

Software Engineer, Site Reliability Engineer

Tailor

Remote

Software Development

Mid-level

Site Reliability Engineer, DevOps Engineer

Wizeline

Remote

DevOps

Mid-level

Site Reliability Engineer

Graylog

Remote

DevOps

Mid-level

Senior Site Reliability Engineer

Input Output

Remote

DevOps

Senior

Site Reliability Engineer III

Fearless

Remote

DevOps

Senior

Site Reliability Engineer II

Bloomreach

Remote

DevOps

Mid-level