Site Reliability Engineer at Input Output

Summary

Join IOHK, a blockchain technology company, as a Site Reliability Engineer (SRE) and play a crucial role in ensuring the reliability and performance of our open-source production systems. You will design, develop, and maintain tools and software using Python, Bash, Terraform, or Nix to improve service availability and scalability. This role involves collaborating with development teams, analyzing system performance, and participating in on-call rotations. You will need proficiency in Python, Bash, Terraform, and Nix, along with extensive AWS experience and knowledge of Kubernetes and PostgreSQL. Excellent communication and troubleshooting skills are essential. IOHK offers remote work, laptop reimbursement, a new starter package, learning and development opportunities, and competitive PTO.

Requirements

Proficiency in Python, Bash, Terraform, Nix for DevOps services
Extensive experience with AWS, specifically with services like EKS and RDS
Familiarity with Container orchestration (e.g. Kubernetes) is essential
Hands-on experience with PostgreSQL and its deployment on RDS
Knowledge of monitoring tools (e.g., Prometheus, Grafana, Loki)
Solid troubleshooting and performance tuning capabilities
Exceptional communication skills and team collaboration ethic
Experience with CI/CD (e.g. Github Actions, Hydra, Earthly)
Strong analytical and troubleshooting skills
Excellent communication skills to collaborate with development teams, operations, and other stakeholders
Ability to quickly learn new technologies and adapt to changing environments
High attention to detail to ensure system reliability and performance

Responsibilities

Design, write, and deliver tools and software primarily using Python, Bash, Terraform or Nix to improve the availability, scalability, and efficiency of our services
Engage in and refine the whole lifecycle of services, from inception and design, through deployment, operation, and continuous improvement
Practice sustainable incident response and promote blameless postmortems
Collaborate with the development teams to ensure that solutions are designed with customer experience, scalability, and performance in mind
Analyze system performance and reliability, offering recommendations for enhancement
Develop and uphold service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for our services
Participate in on-call rotations, responding to and mitigating service interruptions and technical challenges

Benefits

Remote work
Laptop reimbursement
New starter package to buy hardware essentials (headphones, monitor, etc)
Learning & Development opportunities
Competitive PTO

Site Reliability Engineer

Input Output

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

GoDaddy

Remote

DevOps

Mid-level

Remote

DevOps

Senior