StarTree is hiring a
Senior Staff - Site Reliability Engineer

closed
Logo of StarTree

StarTree

πŸ’΅ ~$150k-$222k
πŸ“Remote - India

Summary

StarTree is seeking a seasoned Site Reliability Engineer (SRE) to manage, tune, and debug large-scale distributed systems, focusing on Apache Pinot and SQL DBs. The role involves collaborating with customers, executing disaster recovery strategies, and influencing the roadmap of other teams.

Requirements

  • 12+ years of experience as an engineer (SRE, SDET, or development)
  • Experience managing highly available production facing distributed systems and in-depth knowledge of Java are a plus
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Experience with Kubernetes and container orchestration
  • Familiarity with streaming systems, such as Kafka, Pulsar, Flume, Flink, Spark, or similar
  • Knowledge of standard methodologies related to security, performance, and disaster recovery
  • Strong troubleshooting and critical thinking skills

Responsibilities

  • Leverage various monitoring and alerting services to solve intricate programming problems at scale
  • Manage and tune multiple critical customer-facing Apache Pinot clusters
  • Monitor availability, read/write latencies, and other key telemetry to proactively identify SLO misses and help mitigate issues
  • Build a rapport with and work closely with customers to mitigate and resolve incidents
  • Execute disaster recovery strategies with minimal downtime
  • Collaborate with other engineers to understand and troubleshoot systems and use the experience gained to influence the roadmap of other teams
This job is filled or no longer available

Similar Jobs