Summary

Join Gorilla Logic as a Senior Site Reliability Engineer (SRE) and lead the design, implementation, and governance of monitoring frameworks using Dynatrace. You will enable product and operations teams through scalable observability tooling, enforce best practices, and drive the migration of applications' monitoring components from on-premises environments to SaaS Dynatrace. This role requires collaboration with internal teams, acting as a subject matter expert in monitoring strategy and execution. Responsibilities include developing dashboards and alerting frameworks, ensuring accurate SLOs and SLIs, and acting as a monitoring liaison for assigned teams. You will lead the migration of over 200 applications' observability stacks and utilize IaC tools like Terraform. The ideal candidate will advocate for observability best practices and ensure alignment with organizational goals.

Requirements

Bachelor's degree in Computer Science, Engineering, or equivalent experience
5+ years of experience in site reliability engineering or observability-focused roles
Proven hands-on experience with Dynatrace administration, including user/role management, data sources, configuration, alerting, and dashboarding
Demonstrated expertise in application observability migration, especially from on-premise environments to SaaS platforms (preferably Dynatrace)
Expertise in observability concepts, including SLIs, SLOs, error budgets, and alert tuning
Proficiency with Terraform and other IaC tools for managing observability infrastructure
Experience managing large-scale observability environments, including configuration governance and multi-team enablement
Familiarity with Agile development methodologies and cross-functional collaboration
Strong communication and mentoring skills; ability to train engineers in observability principles
Detail-oriented with a proactive mindset for process improvement and monitoring automation

Responsibilities

Serve as the primary technical lead for the design and delivery of monitoring stacks tailored to team-specific requirements
Meet with Product Owners and Operations/Security teams to understand observability needs and translate them into reusable monitoring patterns
Develop initial dashboards and alerting frameworks, enabling teams to customize and maintain them going forward
Provide governance by ensuring engineering teams define and maintain accurate SLOs and SLIs
Act as the monitoring liaison for assigned teams, promoting a self-service observability culture
Lead the migration of over 200 applications' observability stacks from on-premises to SaaS Dynatrace, with a goal of completion within six months
Use Infrastructure as Code (IaC) tools like Terraform to automate deployment of dashboards, alerts, and metrics configurations
Import and manage Terraform state from existing Dynatrace on-prem setups and re-implement them in the SaaS environment
Advocate for observability best practices across teams and enforce consistency in implementation
Ensure alignment with organizational SLAs, incident response practices, and performance optimization goals

Preferred Qualifications

Experience with Kubernetes, Docker, or other container orchestration platforms
Familiarity with additional monitoring or logging tools like Datadog, Sumologic, Prometheus, or Grafana
Experience with CI/CD platforms and deployment automation tools
Networking and security knowledge related to observability and telemetry
Experience in scripting languages such as Python, Bash, or Go

Senior Site Reliability Engineer (SRE)

Gorilla Logic

Summary

Requirements

Responsibilities

Preferred Qualifications

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

SMG Swiss Marketplace Group

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Instacart

Remote

DevOps

Senior

Abnormal Security

Remote

DevOps

Senior

Remote

DevOps

Senior