Remote Senior Cloud SRE

Logo of NICE

NICE

๐Ÿ“Remote - India

Job highlights

Summary

Join an ever-growing, market disrupting, global company where the teams โ€“ comprised of the best of the best โ€“ work in a fast-paced, collaborative, and creative environment! As the market leader, every day at NICE is a chance to learn and grow, and there are endless internal career opportunities across multiple roles, disciplines, domains, and locations. If you are passionate, innovative, and excited to constantly raise the bar, you may just be our next NICEr!

Requirements

  • Experience with using monitoring tools in a production environment
  • 5+ years of production cloud operations experience
  • 5+ years expertise in Linux command line
  • 5+ years of using Terraform in AWS for automation. Hands on with automation and seeking out opportunities to automate manual processes
  • 5+ years of strong, hands-on experience building production services in AWS
  • 4+ years of experience with scripting using Python and Bash
  • Ability to participate in on-call rotation
  • Considerable knowledge of IT equipment and diagnostic tools
  • Considerable knowledge of principles and techniques of systems analysis, design, development and programming
  • Considerable knowledge of principles of information systems
  • Cnsiderable knowledge of capabilities of computer technology
  • Knowledge of methods and procedures used to conduct detailed analysis and design of computer systems
  • Knowledge of practices and issues of systemsโ€™ security and disaster recovery
  • Knowledge of computer operating systems

Responsibilities

  • Ability to design, implement and improve Grafana, Prometheus, Loki, Promtail, node exporter
  • Log parsing and management
  • Configuration of alerting, push notifications to VictorOps (now Splunk) and Email notifications
  • Architect, design and Implement Icinga 2 monitoring and alerting
  • Ability to monitor system metrics and log parsing
  • Ability to automate tasks using bash and / or Python scripting
  • Predictive monitoring of systems and applications
  • Familiarity with JVM internals and using of JMX and REST for monitoring
  • Familiarity with AWS infrastructure
  • Deep understanding of Java applications, TLS, Apache
  • Automated checks of performance of system metrics in Grafana
  • Automated checks of performance of Web Applications
  • Problem-solving and troubleshooting, including performing root cause analysis to design preventative activities
  • Crafting and maintaining dashboards and reports, pulling together monitoring data across multiple platforms within the same tool as well as across multiple tools
  • Assisting with writing scripts and queries that can provide environment self-healing capabilities

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let NICE know you found this job on JobsCollider. Thanks! ๐Ÿ™