Remote Site Reliability Engineer
closedBraze
๐Remote - Canada
Job highlights
Summary
Join our team of Site Reliability Engineers at Braze, where you'll collaborate with engineering teams to improve infrastructure, automation, and tooling. As a SRE, you'll ensure site uptime, architect products for scalability, and develop internal platform infrastructure.
Requirements
- 3+ years of experience as a Software, DevOps, or Site Reliability Engineer
- You think about systems - interfaces, boundaries, edge cases, failure modes, behaviors, specific implementations
- Have an urge to collaborate, document, and deliver quickly
- Collaborating across the global remote teams, often working asynchronously
- Document everything so you don't need to learn the same thing (or plan the same work) twice
- Delivering fast to delight our customersโ even internal ones
- Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it
- Have a desire to solve everyday challenges facing software engineers and automate their toil away
- Have an excellent ability to manage multiple tasks and expectations at once
- Know your way around Linux and Unix Shell
- Have strong programming skills - Ruby and/or Go preferred
- Have experience with Docker, Kubernetes, Terraform, or similar IaC technologies
- Have experience with MongoDB, Redis, Kafka, Postgres, or similar data technologies
Responsibilities
- Partner with Brazeโs engineering teams on: Architecting products to effectively utilize infrastructure platforms in a scalable, reliable manner
- Debugging reliability and scalability issues across all stack layers, including the products built using our infrastructure platforms
- Make monitoring and alerting alerts on symptoms and not on outages
- Ensure that Braze meets our strict enterprise-grade SLAs with customers
- Develop Brazeโs internal platform infrastructure: Create Infrastructure as code using Chef, Terraform, and Kubernetes
- Develop deployment pipelines for applications in multiple languages using Docker, Kubernetes, etc
- Provide centralized/common tooling, services, and automation frameworks that are critical for scaling operations, capacity management, reducing operational pain, and improving the day-to-day workflow of Brazeโs engineering teams
- Manage incidents: Be on a PagerDuty rotation to respond to availability incidents and provide support for other engineers
- Use your on-call shift to prevent incidents from ever happening
- Retrospect everything that happens to turn lessons into system improvements/changes, automation, etc
Benefits
- Competitive compensation that may include equity
- Retirement and Employee Stock Purchase Plans
- Flexible paid time off
- Comprehensive benefit plans covering medical, dental, vision, life, and disability
- Family services that include fertility benefits and equal paid parental leave
- Professional development supported by formal career pathing, learning platforms, and tuition reimbursement
- Community engagement opportunities throughout the year, including an annual company wide Volunteer Week
- Employee Resource Groups that provide supportive communities within Braze
- Collaborative, transparent, and fun culture recognized as a Great Place to Workยฎ
This job is filled or no longer available
Similar Remote Jobs
- ๐ฐ$177k-$213k๐United States
- ๐Japan
- ๐ฐ$60k-$120k๐Asia
- ๐Mexico
- ๐ฐ$151k-$297k๐United States
- ๐Spain
- ๐India
- ๐France
- ๐India
- ๐United Kingdom