Senior Cloud Site Reliability Engineer at NICE

Summary

Join NiCE's expanding Site Reliability team and contribute to the automation, support, and maintenance of our applications. As a key member, you will lead investigations into outages, performance, and cost issues, spearhead automation initiatives, and provide technical leadership. Collaborate with DevOps and engineering teams to establish and enforce SLOs, SLAs, and error budgets. Develop and configure monitoring dashboards and alerts, optimize system performance, and ensure security. This role requires significant experience in Site Reliability Engineering, programming, and cloud technologies.

Requirements

Must have 6+ years of experience in Site Reliability Engineering
Excellent technical, analytical and troubleshooting skills
Experience and in-depth knowledge of databases and data handling (MS-SQL, Elasticsearch, YML, JSON, XML)
Significant experience in programming or advanced scripting (Python, PowerShell, C# etc.)
Experience with infrastructure/configuration as code and version control (ARM, BICEP, Git)
Strong Experience managing monitoring, alerting and dashboarding platforms (Azure Monitor, Prometheus, Grafana, Elasticsearch)
Demonstrable experience of supporting live cloud services and platforms
Expert in developing queries for dashboards and alerting for microservices
Expertise in developing custom metrics for microservices
Collaborate with DevOps and engineering teams to establish and enforce SLOs, SLAs, and error budgets
Production experience with Kubernetes and containerization
Exposure to commercial cloud providers (Ideally Azure, others considered)
Efficient, effective, and respectful communication skills both with customers and within internal departments
Good listener, able to identify and validate assumptions
Able to use effective questioning to confirm understanding of a customer problem and then provide help to solve it
Methodical troubleshooting, technical skill and attention to detail used in diagnosing problems and reproducing issues in a local environment
Multi-tasking and time-management to prioritise and switch between varied tasks

Responsibilities

Act as part of a team of SRE’s that act as the ‘gatekeepers’ of production, and actively manage the work backlog and develop reliability improvements
Lead investigations into root cause outages, performance, and cost issues
Lead initiatives to develop the automation of low-value tasks balanced against project delivery demands
You will provide technical leadership and to wider Cloud Operations and Support teams along with providing oversight to the products and services they support
Collaborate with DevOps and engineering teams to establish and enforce SLOs, SLAs, and error budgets
Develop and configure monitoring dashboards and alerts in tools like Grafana and Azure Monitor
Installation and configuration of Observability Platform including tools like Grafana, Prometheus, Azure Monitor, Open telemetry etc
Developing bicep modules for monitoring infrastructure and deploy it
Optimize system performance, cost, and security through regular reviews and tuning

Preferred Qualifications

Be flexible with working hours when needed to address critical or urgent matters
Be able to provide on-call services from time to time as needed
Exposure to Azure DevOps pipelines is desirable (CI/CD)
Exposure to test frameworks is desirable (NUnit, Jasmine, Selenium)
Strong experience in infrastructure as a code, design and implementation strategies

Benefits

Enjoy NiCE-FLEX!
At NiCE, we work according to the NiCE-FLEX hybrid model, which enables maximum flexibility: 2 days working from the office and 3 days of remote work, each week

Senior Cloud Site Reliability Engineer

NICE

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Senior

Trase

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior