Observability Engineer

Brillio Logo

Brillio

πŸ’΅ $135k-$145k
πŸ“Remote - United States

Summary

Join Brillio, a rapidly growing digital technology services provider, as an Observability Engineer. This remote position requires 7+ years of experience. You will design and develop robust observability solutions, implement monitoring and alerting systems, collaborate with software engineers, optimize tooling, create insightful reports, and resolve incidents. The ideal candidate possesses a deep understanding of observability principles and tools, proficiency in programming languages, strong cloud computing knowledge, and excellent problem-solving and communication skills. Brillio offers a competitive hourly rate of $65-$70.

Requirements

  • Bachelor's degree in computer science, Engineering, or a related field (or equivalent experience)
  • 2-3 years' experience as an Observability Engineer or a similar role in a production environment
  • Deep understanding of observability principles, methodologies, and tools such as Prometheus, Grafana, Jaeger, ELK stack, etc
  • Proficiency in programming/scripting languages like Java, Python, Go, or similar for automation and tooling development
  • Strong knowledge of cloud computing platforms (AWS preferred) and container orchestration systems (e.g., Kubernetes)
  • Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems
  • Strong communication skills and the ability to collaborate effectively with cross-functional teams

Responsibilities

  • Design and develop robust observability solutions to monitor, analyze, and troubleshoot distributed systems
  • Familiar with OTEL standards and tools
  • Previous experience working with application teams to implement β€œself-healing” i.e. alerting that triggers automated remediation
  • Implement and configure monitoring, logging, tracing, and alerting systems to ensure comprehensive coverage of our infrastructure and applications
  • Collaborate with software engineers to instrument code for telemetry data collection and analysis
  • Optimize observability tooling and processes to improve system reliability, performance, and scalability
  • Create dashboards, reports, and visualizations to provide actionable insights into system health and performance
  • Investigate and resolve incidents by analyzing telemetry data and identifying root causes
  • Stay current with industry trends and best practices in observability and recommend improvements to our observability strategy and infrastructure

Benefits

$65 - $70 an hour

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.