Remote Site Reliability Engineer at StarCompliance

Summary

The Site Reliability Engineer will maintain and improve the platform's reliability, availability, and performance using Azure as the core cloud platform. Key responsibilities include analyzing reliability challenges, working with cross-functional teams, identifying and addressing Toil, conducting Post-Mortems, and driving reliability and supportability aspects of Cloud services.

Requirements

4+ years of experience in Reliability engineering background
2+ recent years of experience with Azure systems
Advanced knowledge of New Relic ecosystem
Working Knowledge of Monitoring and APM tools such as Azure App Insights, Grafana, and Selenium
Knowledge of networking and troubleshooting latency, connectivity, and performance
Experience working with IaC with Terraform and CaC with Ansible
Familiar with one or more Databases - SQL server, Mongo DB, and PostgreSQL
Hands-on experience with SRE practices and writing, running Chaos engineering experiments
Proficient in Linux and Windows administration, troubleshooting, and support
Experience with Azure DevOps
Excellent Debugging skills across a variety of integrated platforms

Responsibilities

Analyze reliability challenges and develop automated solutions for incident resolution
Work with development teams to improve applications operational features for faster MTTD, MTTR, and auto-recovery
Lead the establishment of SLIs, SLOs, Error budgets, policies, and work with respective engineers to instrument, visualize, and offer a means for peer engineers and developers to gain greater insight into operational performance (Observability)
Identify, track, and address Toil
Conduct Post-Mortems
Identify and implement continuous improvement in various facets of production operations
Offer advanced technical support for cross-product issues and incidents
Leverage SRE tooling to develop, implement, and deliver on the SRE mission
Conduct Chaos Testing
Identify, define, and implement new tools and technologies to improve the quality and efficiency of distributed platforms
Drive reliability and supportability aspects of Cloud service, including change management, triage of customer escalations, remediation plans, playbooks, and automation
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health
Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity
Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement

Preferred Qualifications

Preferred experience with C#, .Net, and PowerShell or Python or Golang
Experience with containerization
Experience in High Availability and distributed systems

Benefits

StarCompliance Background Checks

StarCompliance is hiring a Site Reliability Engineer, Remote - United States

Site Reliability Engineer closed

🏢 StarCompliance

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Similar Jobs

Senior Site Reliability Engineering Engineer

Binance

Remote

DevOps

Senior Site Reliability Engineering Engineer

Binance

Remote

DevOps

Web3

Engineering Team Lead Site Reliability Engineer

Givebutter

Remote

DevOps

Senior Site Reliability Engineer

Sezzle

Remote

DevOps

Site Reliability Engineer

Sezzle

Remote

DevOps

Lead Site Reliability Engineer

Remotivate

Remote

DevOps

Lead Site Reliability Engineer

Curology

Remote

DevOps

Site Reliability Engineer

Cerbo EHR

Remote

DevOps

Senior Site Reliability Engineer

Sezzle

Remote

DevOps

Site Reliability Engineer

Sezzle

Remote

DevOps

StarCompliance is hiring a
Site Reliability Engineer, Remote - United States