Summary
Join Grafana Labs' Platform Monitoring Squad as a remote Software Engineer in Canada (excluding Quebec residents). This role focuses on managing cloud resources and observability, including cost management, dashboard creation, and improving system reliability. You will work with technologies like Prometheus, Alertmanager, and Crossplane, contributing to critical dashboards and tooling for Grafana, Mimir, Loki, and Tempo. Responsibilities involve designing and implementing solutions for cloud resource observability, cost alerts, and autoscaling. The ideal candidate possesses experience in platform engineering, observability systems, and working with internal and external users.
Requirements
- Experience working in a Platform group delivering services to internal users and customers
- Experience/interest in implementing, integrating, and maintaining observability systems and processes
Responsibilities
- Design cloud service provider resource o11y
- Create cost alerts
- Help with improving cloud-cost margins
- Improve the reliability of autoscaling tools
- Investigate CSP unallocated resources
- Business-critical dashboard management
Preferred Qualifications
- Experience with Terraform
- CLI tooling experience
- Familiarity with Kubernetes administration
- Worked on the correlation of application performance to an application technology stack
- Microservices
- Telemetry querying
- Visualization and reporting
- OpenTelemetry
- Go would be ideal
Benefits
- Equity
- Bonus (if applicable)
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.