Remote Senior Site Reliability Engineer at Planet DDS

Summary

Join Planet DDS as a Site Reliability Engineer (SRE) to develop and implement strategies for observability and reliability in Microsoft Azure. The ideal candidate will have 6+ years of experience with cloud services, strong collaboration and communication skills, and a passion for solving problems with technology.

Requirements

6+ years of experience operating and troubleshooting Azure App Services, Azure Functions, Azure Logic Apps, Azure SQL, Azure Storage, Application Insights, Azure Redis, VNets and Azure App Gateway
6+ years of experience with Reliability concepts to ensure high performance and high service availability, able to define implement and improve business performance SLO’s
6+ years of experience with Observability across multiple domains (APM, Infrastructure, Synthetics, Logs, etc...) within cloud and on-premise environments using Datadog, Azure Monitor and Application Insights. NewRelic and Grafana are nice to have
6+ years of experience with Production operations including 24x7 on-call support, escalation/paging with OpsGenie, incident management, RCA (Root Cause Analysis) and retrospective analysis
2+ years of experience leading an SRE team
Experience with infrastructure management across multiple cloud and on-premise environments using tools such as Terraform, Bicep, PowerShell, Ansible
Security is part of everything we do and will require your knowledge of fundamental cloud security (e.g., identity and access management, firewalls, etc.)
Strong collaboration and communication skills in a hybrid environment using Microsoft Teams, email and calendar
Bachelor’s Degree in a relevant major or equivalent years of experience

Responsibilities

Develop architecture, strategy, and implementations to enable or enhance the Observability and Reliability of applications and services running on IaaS and PaaS in Microsoft Azure
Service Level Objectives and indicators focused on improving business workflow performance and availability
Technical and business dashboards, metrics, and actionable alerting
Processes and automation for increasing uptime and availability, reducing toil and improving all phases of incident and problem management
24x7 Support: Perform deep dives into systemic and latent reliability issues, incident management, problem management
Participate in all aspects of incident management including awareness, communication, remediation, retrospective / root cause analysis
Identify and implement process improvements of MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve)
Support operations & engineering teams on Azure. AWS and GCP are nice to have
Supports applications written in .NET, .NET core, MVC and JavaScript
Training & mentoring for peers and less experienced engineers
Production environments with on-call rotations
Advocacy: Train and mentor engineering teams on modern observability practices and techniques
Define and socialize SRE culture, best practices, architectural and security standards
Assess and raise risks across the organization
Partnership with: Internal engineering, architecture and operations teams to ensure alignment. External teams to support their work and ensure compliance with our standards
Optimize & manage: Multi product observability platforms supporting cloud / on prem infrastructure, services and applications. Observability cost optimization
Measuring and monitoring availability, latency, and overall system health across multiple product lines

Remote Senior Site Reliability Engineer

Planet DDS

Job highlights

Summary

Requirements

Responsibilities

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Senior Site Reliability Engineering Engineer

Binance

Remote

DevOps

Senior

Senior Infrastructure Engineer, Site Reliability Engineer

Flex

Remote

DevOps

Senior

Senior Site Reliability Engineer

Input Output

Remote

DevOps

Senior

Senior Site Reliability Engineer

Weedmaps

Remote

DevOps

Senior

Senior Site Reliability Engineer

Supermetrics

Remote

DevOps

Senior

Senior Site Reliability Engineer

OLX

Remote

DevOps

Senior

Senior Site Reliability Engineer

Fastly

Remote

DevOps

Senior

Senior Site Reliability Engineer

Nordsec Security AB

Remote

DevOps

Senior

Senior Site Reliability Engineer

Nylas

Remote

DevOps

Senior

Senior Site Reliability Engineer

Rubrik

Remote

DevOps

Senior