Senior Site Reliability Engineer (SRE)

Gorilla Logic Logo

Gorilla Logic

πŸ“Remote - Colombia, Costa Rica

Summary

Join Gorilla Logic as a Senior Site Reliability Engineer (SRE) and lead the design, implementation, and governance of monitoring frameworks using Dynatrace. You will enable product and operations teams through scalable observability tooling, enforce best practices, and drive the migration of applications' monitoring components from on-premises environments to SaaS Dynatrace. This role requires collaboration with internal teams, acting as a subject matter expert in monitoring strategy and execution. Responsibilities include developing dashboards and alerting frameworks, ensuring accurate SLOs and SLIs, and acting as a monitoring liaison for assigned teams. You will lead the migration of over 200 applications' observability stacks and utilize IaC tools like Terraform. The ideal candidate will advocate for observability best practices and ensure alignment with organizational goals.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or equivalent experience
  • 5+ years of experience in site reliability engineering or observability-focused roles
  • Proven hands-on experience with Dynatrace administration, including user/role management, data sources, configuration, alerting, and dashboarding
  • Demonstrated expertise in application observability migration, especially from on-premise environments to SaaS platforms (preferably Dynatrace)
  • Expertise in observability concepts, including SLIs, SLOs, error budgets, and alert tuning
  • Proficiency with Terraform and other IaC tools for managing observability infrastructure
  • Experience managing large-scale observability environments, including configuration governance and multi-team enablement
  • Familiarity with Agile development methodologies and cross-functional collaboration
  • Strong communication and mentoring skills; ability to train engineers in observability principles
  • Detail-oriented with a proactive mindset for process improvement and monitoring automation

Responsibilities

  • Serve as the primary technical lead for the design and delivery of monitoring stacks tailored to team-specific requirements
  • Meet with Product Owners and Operations/Security teams to understand observability needs and translate them into reusable monitoring patterns
  • Develop initial dashboards and alerting frameworks, enabling teams to customize and maintain them going forward
  • Provide governance by ensuring engineering teams define and maintain accurate SLOs and SLIs
  • Act as the monitoring liaison for assigned teams, promoting a self-service observability culture
  • Lead the migration of over 200 applications' observability stacks from on-premises to SaaS Dynatrace, with a goal of completion within six months
  • Use Infrastructure as Code (IaC) tools like Terraform to automate deployment of dashboards, alerts, and metrics configurations
  • Import and manage Terraform state from existing Dynatrace on-prem setups and re-implement them in the SaaS environment
  • Advocate for observability best practices across teams and enforce consistency in implementation
  • Ensure alignment with organizational SLAs, incident response practices, and performance optimization goals

Preferred Qualifications

  • Experience with Kubernetes, Docker, or other container orchestration platforms
  • Familiarity with additional monitoring or logging tools like Datadog, Sumologic, Prometheus, or Grafana
  • Experience with CI/CD platforms and deployment automation tools
  • Networking and security knowledge related to observability and telemetry
  • Experience in scripting languages such as Python, Bash, or Go

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.