Remote L3 Cloud DevOps Engineer/Site Reliability Engineer (SRE)

Logo of NTD Software

NTD Software

πŸ’΅ $80k-$120k
πŸ“Remote - Worldwide

Job highlights

Summary

Join our team as an experienced L3 Cloud DevOps Engineer with a strong focus on Site Reliability Engineering (SRE) to create and enhance monitoring and alerting tools, utilizing Grafana, Prometheus, and Datadog.

Requirements

  • Extensive hands-on experience with Python scripting
  • Strong expertise in Site Reliability Engineering (SRE) practices
  • Proficiency in Grafana, including dashboard creation and modification
  • In-depth knowledge of Prometheus and Datadog tools for monitoring and alerting
  • Experience with user and system monitoring, along with the ability to create and enhance dashboards and runbooks
  • DevOps experience is a secondary but desirable skill set
  • Relevant certifications or courses in Python, SRE, Grafana, and Prometheus are a plus

Responsibilities

  • Proactively build and enhance Grafana dashboards to improve monitoring capabilities
  • Collaborate with cross-functional teams to ensure effective monitoring and alerting
  • Manage and respond to alerts, focusing on timely remediation and implementation of solutions for service issues
  • Conduct user and system monitoring to identify and address potential problems
  • Develop and maintain runbooks to support operational efficiency and incident response
  • Utilize Python scripting to automate and improve processes within the DevOps and SRE framework

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let NTD Software know you found this job on JobsCollider. Thanks! πŸ™