Lead Site Reliability Engineer

Sprinto Logo

Sprinto

πŸ“Remote - India

Summary

Join Sprinto as a Lead Site Reliability Engineer and take ownership of the observability pipeline, CI/CD pipeline development, and full infrastructure management. Ensure high availability, scalability, and reliable product delivery by collaborating with application engineers to develop necessary tooling for efficient operations. Establish and maintain on-call protocols and incident response processes. This role requires expertise in IaC tools, APM tools, application capacity planning, and incident response. Strong problem-solving and communication skills are essential. Familiarity with Sprinto's tech stack (Node.js, React, Apollo GraphQL, PostgreSQL, and AWS) is a plus. Sprinto offers a remote-first policy, flexible hours, group medical insurance, accident cover, a company-sponsored device, and education reimbursement.

Requirements

  • Proficiency with tools such as Terraform and Ansible
  • Skilled in using Application Performance Monitoring tools, setting up on-call practices, identifying bottlenecks across the stack, and collaborating with teams to address these issues effectively
  • Proven experience in application capacity planning, owning incident response workflows, and running processes such as Root Cause Analyses (RCAs) and maintaining runbooks
  • Strong problem-solving abilities and excellent communication skills, both spoken and written

Responsibilities

  • Take ownership of the observability pipeline to ensure high availability and optimal performance of applications
  • Design, build, and maintain the Continuous Integration/Continuous Deployment (CI/CD) pipelines to facilitate smooth and reliable product deliveries
  • Own the complete infrastructure stack of the product, contributing to scalability and enhancements of the overall offering
  • Work closely with application engineers to develop and refine tooling necessary for efficient operations management
  • Establish and maintain on-call protocols and incident response processes to ensure timely resolution of issues and maintain service reliability

Preferred Qualifications

Familiarity with our current tech stack is a plus as it will enable you to start contributing sooner. Our tech stack includes Node.js , React, Apollo GraphQL, PostgreSQL, and AWS

Benefits

  • Remote First Policy
  • 5 Days Working With FLEXI Hours
  • Group Medical Insurance (Parents, Spouse, Children)
  • Group Accident Cover
  • Company Sponsored Device
  • Education Reimbursement Policy

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.