Senior Site Reliability Engineer at Nextiva

Summary

Join Nextiva as a Senior Site Reliability Engineer (SRE) in Bangalore and redefine customer experiences. You will support and scale Kafka and Elasticsearch infrastructure, core systems powering our SaaS platform. This role demands automation expertise, AI-driven observability, and quick adoption of new technologies. You will proactively build resilient systems, own systems end-to-end, and write clean automation within a fast-paced, innovative team. You will also mentor junior engineers and lead large-scale reliability projects. The position requires a Bachelor's degree, 6+ years of relevant experience, and strong Linux and cloud platform skills.

Requirements

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
Fluent English communication skills (spoken and written)
6+ years of experience in software development, automation, or infrastructure engineering
Deep experience with Kafka and/or Elasticsearch in production environments
Strong Linux systems expertise and 6+ years managing Linux-based environments
Hands-on experience with cloud platforms - GCP and/or AWS required
Proficient in scripting languages like Python, Bash, etc
Automation-first mindset - deep experience with Ansible, Terraform, Jenkins
Expert-level understanding of Git and GitHub workflows for CI/CD and infrastructure-as-code
Proficient with container tools (Docker) and orchestrators (Kubernetes)
Strong understanding of SRE principles - SLAs/SLOs, alerting, observability, and incident management
Experience with SQL, caching systems (e.g., Redis), and troubleshooting distributed systems
Quick learner with a strong curiosity for new tools, frameworks, and AI/ML use cases in operations

Responsibilities

Triage, troubleshoot, and resolve complex production issues involving Kafka and Elasticsearch
Design and build automated monitoring, alerting, and logging systems - leveraging AI/ML techniques where possible
Write tools and infrastructure software to support self-healing, auto-scaling, and incident prevention
Automate system administration tasks - from patching and upgrades to config and deployment workflows
Use and manage GitHub extensively for infrastructure-as-code, release management, and collaboration
Partner with development, QA, and performance teams to ensure middleware systems are production-ready
Participate in the on-call rotation and continuously improve incident response and resolution playbooks
Mentor junior engineers and contribute to a culture of automation, learning, and accountability
Lead large-scale reliability and observability projects in collaboration with global teams

Preferred Qualifications

Observability Tools: Datadog, Splunk, Kibana, Opsgenie
Programming: Java/Spring, JavaScript/React
Middleware: RabbitMQ, Tomcat
Experience with AI/ML-based anomaly detection, AIOps platforms, and LLM integrations for infrastructure
Azure cloud experience (nice to have)

Benefits

Health 🍏 - Comprehensive medical coverage, including dental care
Insurance 💼 - Life insurance, covering life and disability
Work-Life Balance ⚖️ - PTO and Paid Sick time as per CBA, paid parental leave
Financial Security 💰 - Private pension plan available
Wellness 🤸‍ - Employee Assistance Program and comprehensive wellness initiatives
Growth 🌱 - Access to ongoing learning and development opportunities and career advancement

Senior Site Reliability Engineer

Nextiva

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Senior

Trase

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior