Site Reliability Engineer at RT2

Summary

Join RT² as a Site Reliability Engineer and play a key role in maintaining and improving the reliability of our systems. You will enhance system stability, optimize performance, and automate deployment processes. This position requires experience with infrastructure tools like Terraform, Bicep, and Ansible, across on-premise and cloud (Azure) environments. You will work with various teams to identify and mitigate potential system failures, participate in capacity planning, and create automation to improve deployment speed and incident response. The ideal candidate is a proactive problem-solver passionate about infrastructure and continuous improvement. This is an exciting opportunity to make a meaningful impact at a rapidly growing company.

Requirements

Experience working with server operating systems like Windows, Unix, Linux
Experience working with monitoring via tools such as ELK stack, Grafana, Azure Application Insights
Experience with Git or other distributed source control systems
Bachelor’s degree (or equivalent) in computer science or related discipline
Experience with tools TerraForm, Bicep, Ansible
Experience with both On-Premise and Cloud Providers preferably Azure
Experience with Hyper-V and VMWare
Experience with CI/CD Pipelines like Azure Pipelines, GitHub Actions, and OctoDeploy
Experience with scripting languages like PowerShell, Python and Bash
Proactive approach to identifying problems, performance bottlenecks, and areas for improvement
Experience with observability tools like Grafana, UptimeRobot, ELK, PagerDuty
Experience working with Agile methodologies

Responsibilities

Help maintain and enhance production monitoring and notifications
Improve reliability and quality of production systems
Measure and help optimize system performance
Work with delivery and other teams to identify points of potential failure and then work to help enhance and improve systems to mitigate
Participate in capacity planning
Create automation to improve deployment speed, testing, and responding to operational issues
Work to meet service level objectives
Help build runbooks, tools, and other supporting tools to improve incident response
Monitor production systems and help manage incident response
Participate in post mortems, document outages, steps to recovery, future mitigation strategies
Work on both on-premises (data center) and cloud-based infrastructure (Azure)

Benefits

Remote, flexible working options
Competitive compensation
Generous STI and LTI provisions
Health, Dental and Vision Insurance
Paid Annual Leave
Paid Sick Leave
401K, and more

Site Reliability Engineer

RT2

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Mid-level

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

GoDaddy

Remote

DevOps

Mid-level

Remote

DevOps

Senior