Senior Site Reliability Engineer

AlphaSense Logo

AlphaSense

πŸ“Remote - India

Summary

Join AlphaSense as a Senior SRE to tackle complex reliability challenges at scale. You will support developers and the AlphaSense platform, working on platforming and tooling, proposing and experimenting with solutions. The role involves adapting open-source solutions to meet our needs and shaping the SRE culture. You will elevate product reliability, collaborate with engineering teams, participate in on-call rotations, and diagnose and resolve production issues. This remote position based in India requires covering 8 hours of US time zone with flexibility. The team is composed of talented individuals across Product, User Experience & Engineering.

Requirements

  • Strong experience in Linux, Kubernetes, Helm
  • Cloud-native architectures and modern web application behavior knowledge
  • Python (or similar) and AWS or GCP experience
  • Be adept at diagnosing complex issues and bottlenecks in operating systems, networks, and distributed systems
  • Understand complex, layered tech stacks from infrastructure to frontend. Identify failure points and implement preventative measures to protect software and systems
  • Thrive under pressure, demonstrating strong ownership and accountability
  • Quickly synthesize complex information for effective decision-making
  • Maintain composure and clear judgment under stressβ€”ready to act decisively when needed, while avoiding overreaction
  • Exhibit excellent communication skills, fostering effective collaboration and problem-solving during incident response and day-to-day operations
  • Have experience in system monitoring; understanding of SLO, MTTI, MTTR, and MTTF
  • Actively monitor projects driven by other teams for uncovering dependencies and understand their impact
  • Have the ability to decompose reliability problems or business scenarios into multi-component solutions

Responsibilities

  • Support developers and the AlphaSense platform, working on platforming and tooling, proposing and experimenting with solutions
  • Adapt open-source solutions to meet our needs and shape the SRE culture
  • Elevate product reliability to the level of precision associated with Swiss watch brands, targeting 99.99% uptime
  • Engage with our engineering teams and contribute to the improvement of their software application through first-class observability
  • Participate in an on-call rotation, promptly addressing AlphaSense availability incidents, and offering support for application engineers during incidents
  • Diagnose and resolve production issues spanning multiple services and technology stacks
  • Analyze complex systems to pinpoint and address root causes of problems efficiently
  • Assist with daily production system operations, including incident response, troubleshooting, and maintenance
  • Help ensure smooth system operation, contribute to automation, and address operational challenges

Preferred Qualifications

  • Experience working with on-call Incident Response solutions (PagerDuty, FireHydrant)
  • Experience in modern monitoring systems (like Grafana LGTM stack)
  • Experience with running things at scale

Benefits

  • Remote work from India
  • The role requires covering full 8 hours of United States time zone (7 PM - 3 AM IST), with flexibility starting later by 1-2 hours

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.