Senior Site Reliability Engineer

Owner.com Logo

Owner.com

πŸ’΅ $170k-$210k
πŸ“Remote - Worldwide

Summary

Join Owner.com, a rapidly growing restaurant-commerce platform, as a Senior SRE/DevOps Engineer. You will play a crucial role in ensuring the reliability and scalability of our systems, working on site-reliability engineering and DevOps enablement. This position involves designing for uptime, performance, and resiliency, as well as building tools, CI/CD pipelines, and automation. You will collaborate with various engineering teams and contribute to incident response and post-mortems. Your work will directly impact thousands of restaurants and millions of diners daily. The role offers a competitive salary, comprehensive benefits, and a remote-first work environment.

Requirements

  • 5+ years running production workloads on AWS (or GCP/Azure) with infrastructure-as-code (Terraform/CDK/CloudFormation)
  • Hands-on experience operating container orchestration (ECS, EKS, Kubernetes, Nomad, etc.) and designing blue/green or canary rollouts
  • Depth in at least two of our core datastores (Postgres, MongoDB, Kafka) including backup/restore, upgrades, and performance tuning
  • Fluency with CI/CD pipelines (we use Buildkite + GitHub Actions) and a knack for automating everything with shell, Python, or TypeScript
  • Proven track record setting up monitoring/alerting in Datadog, Prometheus, or similar, with clear SLO/SLA ownership
  • Strong grasp of linux networking, load balancing (Cloudflare/ELB), and CDN/edge-security concepts
  • Excellent incident-management and root-cause analysis skills; able to write crisp RCAs and follow through on action items
  • Passion for customer-centric thinking, rapid iteration, and continuous learning

Responsibilities

  • Design for reliability: Set SLOs/SLIs, build self-healing architectures, and drive incident-prevention projects that keep our APIs and real-time ordering flows <100 ms p95
  • Own observability: Level-up dashboards, alerts, and distributed tracing so teams can detect issues before customers do
  • Automate deployments: Evolve our Buildkite pipelines and Terraform modules to give engineers <10-minute, one-click rollouts (and clean rollbacks)
  • Champion security & compliance: Harden infra with least-privilege IAM, threat-model topology changes, and guide SOC 2 / PCI efforts
  • Partition & scale data-stores: Tune Postgres for multi-TB workloads, maintain Mongo sharding, and shepherd Kafka topic management as event volume climbs
  • Lead incident response: Rotate with the on-call SREs, run blameless post-mortems, and convert findings into durable fixes
  • Mentor & collaborate: Pair with product engineers on capacity reviews, guide junior devs on Docker best-practices, and evangelize β€œyou build it, you run it.”

Preferred Qualifications

  • Experience with NestJS or other Node.js backends at scale
  • Prior work in PCI-DSS or SOC 2 environments
  • Familiarity with GitOps workflows (Argo CD, Flux)
  • Exposure to mobile CI (React-Native pipelines), LaunchDarkly/feature-flags, or chaos-engineering

Benefits

  • The estimated base salary range for this role is $170K - $210K, plus a generous pre-IPO equity package
  • 100% remote across the U.S. or Canada (option to drop into our SF office)
  • Comprehensive health, dental, and vision coverage
  • Home-office stipend, top-tier laptop, and any tools you need to excel
  • Twice-annual team off-sites

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.