Senior Site Reliability Engineer at Spreedly

Summary

Join Spreedly as a Senior Site Reliability Engineer and ensure the reliability, observability, and scalability of our globally distributed payments platform. You will lead efforts to stabilize and optimize our infrastructure, build platform services, and champion best practices. Leverage your expertise in software development, infrastructure, and operations to ensure our applications and systems are reliable, scalable, and efficient. Work across the entire application stack, using a diverse range of tools and technologies to support our mission-critical system. This role requires strong experience in designing and operating highly available, scalable cloud architectures. You will also mentor team members and foster a culture of learning and collaboration.

Requirements

Hands-on experience with Datadog, OpenTelemetry, Sentry, and Sumo Logic or similar monitoring and observability platforms, with a focus on actionable metrics and alerts
Strong proficiency in a modern programming language, with a proven ability to write clean, maintainable, and efficient code
Extensive experience with AWS services, including EC2 (Ubuntu Linux), S3, and RDS
In-depth knowledge of relational databases (e.g., CockroachDB, PostgreSQL, Riak) with experience in performance optimization and query tuning
Excellent problem-solving skills with experience diagnosing complex system issues in production environments
Proven ability to work cross-functionally with product and application, infrastructure, and security engineering teams
Strong understanding of DevOps practices, including CI/CD pipelines, configuration management, and infrastructure-as-code
Strong written and verbal communication skills, with the ability to explain complex technical concepts to non-technical stakeholders

Responsibilities

Ensure the reliability, availability, and performance of Spreedly’s globally distributed payments platform, processing $4B monthly production systems through monitoring, automation, and continuous improvement
Collaborate with development teams to improve the reliability and performance of Ruby on Rails and Elixir applications
Implement and maintain robust observability solutions using Datadog and OpenTelemetry, enabling proactive identification alerting, and resolution of issues
Lead incident response efforts by participating in a shared on-call rotation to maintain 24/7 system reliability, including root cause analysis, resolution, and implementing measures to prevent recurrence
Develop and maintain automation tools to reduce manual intervention, streamline operations, and enhance developer productivity
Monitor, analyze, and optimize the performance of relational databases, identifying and resolving bottlenecks to maintain data integrity and efficiency
Lead by example, infusing modern SRE best practices and fostering a culture of reliability and performance within the engineering organization
Provide technical guidance and mentorship to team members, fostering a culture of learning and collaboration

Preferred Qualifications

Ruby, Rails, and Elixir experience are preferred
Experience with Kafka is a plus
Advanced knowledge of Docker and container orchestration best practices is a plus

Benefits

Competitive salary + Equity
Outstanding Medical and Dental benefits, including 100% employer-paid options
Company-paid Life and Disability insurance
Optional vision and supplemental insurance options, and various Flexible Spending Accounts (FSA)
Open Paid Time Off policy + 12 weeks of paid leave for new parents
Matching 401(k) plan (5% up to $5,000 yearly)
Monthly home working/digital lifestyle stipend, new MacBook, and one-time accessory reimbursement
LinkedIn Learning subscription
Access to company-paid professional coaching service
Visits to HQ in Durham, North Carolina for remote employees

Senior Site Reliability Engineer

Spreedly

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Senior

Vantage

Remote

DevOps

Senior

Natera

Remote

DevOps

Senior

Remote

DevOps

Senior