Senior Site Reliability Engineer

IntelliPro Logo

IntelliPro

πŸ’΅ $107k-$180k
πŸ“Remote - United States

Summary

Join a high-impact infrastructure team at a fast-growing global technology leader as a Senior Site Reliability Engineer. This role focuses on scaling reliable, high-performance systems in a cloud-native environment, working on large-scale, mission-critical applications used by millions. You will ensure 24/7 uptime, operate and maintain core systems, architect monitoring solutions, collaborate with engineering teams, develop automation tools, and troubleshoot infrastructure bottlenecks. The ideal candidate will have a Bachelor's degree in a related field, 5+ years of relevant experience, and deep expertise in Linux, distributed systems, and cloud architecture. This position offers a hybrid or remote work setup (in select states) and a competitive compensation and benefits package.

Requirements

  • Bachelor’s degree in Computer Science, Information Systems, or a related technical field
  • 5+ years of experience supporting mission-critical, real-time, high-traffic systems in a cloud-based or hybrid production environment
  • Deep expertise in Linux , distributed systems, cloud architecture, and containerized workloads ( Docker, Kubernetes , etc.)
  • Skilled in system-level debugging and end-to-end performance optimization
  • Strong programming/scripting ability in Python, Go , or similar
  • Experience managing OSS components such as Kafka, Elasticsearch, Redis , and more
  • Proven ability to reduce incident rates and drive down MTTR through process improvements and tooling
  • Excellent communication skills and experience working across distributed teams

Responsibilities

  • Ensure 24/7 uptime by participating in a rotating on-call schedule and managing production incidents across distributed environments
  • Operate and maintain core systems like Elasticsearch, Kafka, RabbitMQ, Redis , with a focus on reliability and performance
  • Architect monitoring solutions, define SLOs/SLIs, and implement scalable observability tools (e.g., Grafana, Prometheus, Zabbix )
  • Collaborate with engineering teams to optimize capacity, auto-scaling, and system utilization
  • Develop and maintain automation tools and workflows to support a culture of minimal manual intervention
  • Troubleshoot infrastructure bottlenecks and improve full-stack performance across services
  • Own the design and execution of new infrastructure patterns to support continued scale and speed
  • Maintain clear technical documentation including runbooks, incident response procedures, and architectural diagrams

Preferred Qualifications

  • Experience with big data infrastructure (e.g., Hadoop, Spark, Hive, HBase )
  • Background in data infrastructure, DBRE, or DBA responsibilities at scale
  • Familiarity with service mesh technologies and zero-trust architectures

Benefits

  • Full medical, dental, and vision insurance
  • HSA with company contributions + FSA options
  • 401(k) plan with discretionary company match and financial advising
  • Company-paid life, AD&D, short-term & long-term disability insurance
  • Paid holidays, generous PTO, and floating days
  • Employee discounts and perks
  • Weekly catered lunches, stocked snacks, and beverages
  • Gym access & dog-friendly office (select locations)
  • Swag, holiday parties, and internal community events
  • Base Salary: $107,600 – $180,200/year
  • Compensation: Includes annual bonus + equity (RSU)

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.