Summary
Join Stash's Infrastructure team as a Staff Site Reliability Engineer to contribute to the stability and operational excellence of our platforms. You will lead initiatives to enhance system reliability, automate processes, and implement scalable infrastructure solutions. This role demands expertise in cloud infrastructure (AWS), Kubernetes, and CI/CD processes. You will collaborate with engineering teams and make high-impact architectural decisions. Stash offers a comprehensive compensation package, remote-first work policy, flexible PTO, and various other benefits.
Requirements
- 8+ years of experience in site reliability engineering or a similar role
- Strong expertise in Kubernetes (K8s) and Amazon EKS
- Advanced skills in AWS, including setup, management, and optimization
- Proficiency in infrastructure as code, particularly Terraform and Terraform Cloud
- Solid programming skills in Python and/or Go
- Experience with system monitoring tools like Datadog and familiarity with logging and archiving practices
- Extensive experience with GitHub Actions for CI/CD pipelines
- Proven track record in designing and managing microservice architectures using Docker and containers
- Practical experience with Kafka
- Deep understanding of SLOs, SLIs, and SLAs, and their application in maintaining system reliability
- Experience working in PCI and other regulated environments
Responsibilities
- Design, develop, and maintain scalable and resilient cloud infrastructure using AWS
- Implement and oversee monitoring systems to ensure optimal performance and rapid response to issues
- Automate deployment pipelines and manage CI/CD processes using tools like GitHub Actions
- Make high-impact architectural decisions to improve system efficiency and reduce downtime
- Collaborate with engineering teams to innovate and enhance deployment and operational capabilities
- Develop and manage microservices architectures using Docker and containerization technologies
Preferred Qualifications
- Experience with FireHydrant
- Previous role(s) in a fast-paced startup environment
- Knowledge in managing and maintaining SLOs, SLIs, and SLAs beyond foundational levels
Benefits
- Comprehensive total rewards package, comprising compensation (salary and equity) and health care benefits
- Complimentary subscription to Stash+ account
- Remote-first work policy β Live and work where you feel the most productive, whether that is in your home, in an office
- Flexible PTO
- Annual learning and development reimbursement benefit
- Work-from-home equipment stipends; home internet subsidy
- Paid Parental Leave (offerings for birth giving and non-birth giving parents) Primary & Secondary
- Enhanced health and wellness benefits through One Medical, Gympass, and Maven Health
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.