Summary

Join a high-impact infrastructure team at a fast-growing global technology leader as a Senior Site Reliability Engineer. This role focuses on scaling reliable, high-performance systems in a cloud-native environment, working on large-scale, mission-critical applications used by millions. You will ensure 24/7 uptime, operate and maintain core systems, architect monitoring solutions, collaborate with engineering teams, develop automation tools, and troubleshoot infrastructure bottlenecks. The ideal candidate will have a Bachelor's degree in a related field, 5+ years of relevant experience, and deep expertise in Linux, distributed systems, and cloud architecture. This position offers a hybrid or remote work setup (in select states) and a competitive compensation and benefits package.

Requirements

Bachelor’s degree in Computer Science, Information Systems, or a related technical field
5+ years of experience supporting mission-critical, real-time, high-traffic systems in a cloud-based or hybrid production environment
Deep expertise in Linux , distributed systems, cloud architecture, and containerized workloads ( Docker, Kubernetes , etc.)
Skilled in system-level debugging and end-to-end performance optimization
Strong programming/scripting ability in Python, Go , or similar
Experience managing OSS components such as Kafka, Elasticsearch, Redis , and more
Proven ability to reduce incident rates and drive down MTTR through process improvements and tooling
Excellent communication skills and experience working across distributed teams

Responsibilities

Ensure 24/7 uptime by participating in a rotating on-call schedule and managing production incidents across distributed environments
Operate and maintain core systems like Elasticsearch, Kafka, RabbitMQ, Redis , with a focus on reliability and performance
Architect monitoring solutions, define SLOs/SLIs, and implement scalable observability tools (e.g., Grafana, Prometheus, Zabbix )
Collaborate with engineering teams to optimize capacity, auto-scaling, and system utilization
Develop and maintain automation tools and workflows to support a culture of minimal manual intervention
Troubleshoot infrastructure bottlenecks and improve full-stack performance across services
Own the design and execution of new infrastructure patterns to support continued scale and speed
Maintain clear technical documentation including runbooks, incident response procedures, and architectural diagrams

Preferred Qualifications

Experience with big data infrastructure (e.g., Hadoop, Spark, Hive, HBase )
Background in data infrastructure, DBRE, or DBA responsibilities at scale
Familiarity with service mesh technologies and zero-trust architectures

Benefits

Full medical, dental, and vision insurance
HSA with company contributions + FSA options
401(k) plan with discretionary company match and financial advising
Company-paid life, AD&D, short-term & long-term disability insurance
Paid holidays, generous PTO, and floating days
Employee discounts and perks
Weekly catered lunches, stocked snacks, and beverages
Gym access & dog-friendly office (select locations)
Swag, holiday parties, and internal community events
Base Salary: $107,600 – $180,200/year
Compensation: Includes annual bonus + equity (RSU)

Senior Site Reliability Engineer

IntelliPro

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior