Summary

Join RT², a company offering flexible, cutting-edge retail management solutions, as a Site Reliability Engineer. This key role focuses on maintaining and improving the reliability of our systems. You will enhance system stability, optimize performance, and automate deployment processes. Experience with infrastructure tools like Terraform, Bicep, and Ansible in both on-premise and cloud environments (Azure) is essential. This position offers a chance to make a significant impact in a dynamic, team-oriented environment. The role involves working with various technologies and methodologies to ensure system reliability and efficiency. Competitive compensation and benefits are offered.

Requirements

Experience working with server operating systems like Windows, Unix, Linux
Experience working with monitoring via tools such as ELK stack, Grafana, Azure Application Insights
Experience with Git or other distributed source control systems
Bachelor’s degree (or equivalent) in computer science or related discipline
Experience with tools TerraForm, Bicep, Ansible
Experience with both On-Premise and Cloud Providers preferably Azure
Experience with Hyper-V and VMWare
Experience with CI/CD Pipelines like Azure Pipelines, GitHub Actions, and OctoDeploy
Experience with scripting languages like PowerShell, Python and Bash
Proactive approach to identifying problems, performance bottlenecks, and areas for improvement
Experience with observability tools like Grafana, UptimeRobot, ELK, PagerDuty
Experience working with Agile methodologies

Responsibilities

Help maintain and enhance production monitoring and notifications
Improve reliability and quality of production systems
Measure and help optimize system performance
Work with delivery and other teams to identify points of potential failure and then work to help enhance and improve systems to mitigate
Participate in capacity planning
Create automation to improve deployment speed, testing, and responding to operational issues
Work to meet service level objectives
Help build runbooks, tools, and other supporting tools to improve incident response
Monitor production systems and help manage incident response
Participate in post mortems, document outages, steps to recovery, future mitigation strategies
Work on both on-premises (data center) and cloud-based infrastructure (Azure)

Benefits

Remote, flexible working options
Competitive compensation
Generous STI and LTI provisions
Health, Dental and Vision Insurance
Paid Annual Leave
Paid Sick Leave
401K, and more

Site Reliability Engineer

RT2

Job highlights

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Senior Infrastructure Engineer, Site Reliability Engineer

Flex

Remote

DevOps

Senior

Software Engineer, Site Reliability Engineer

Tailor

Remote

Software Development

Mid-level

Senior Site Reliability Engineering Engineer

Binance

Remote

DevOps

Senior