Site Reliability Engineer at Braze

Summary

Join our team of Site Reliability Engineers at Braze, where you'll collaborate with engineering teams to improve infrastructure, automation, and tooling. As a SRE, you'll ensure site uptime, architect products for scalability, and develop internal platform infrastructure.

Requirements

3+ years of experience as a Software, DevOps, or Site Reliability Engineer
You think about systems - interfaces, boundaries, edge cases, failure modes, behaviors, specific implementations
Have an urge to collaborate, document, and deliver quickly
Collaborating across the global remote teams, often working asynchronously
Document everything so you don't need to learn the same thing (or plan the same work) twice
Delivering fast to delight our customers– even internal ones
Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it
Have a desire to solve everyday challenges facing software engineers and automate their toil away
Have an excellent ability to manage multiple tasks and expectations at once
Know your way around Linux and Unix Shell
Have strong programming skills - Ruby and/or Go preferred
Have experience with Docker, Kubernetes, Terraform, or similar IaC technologies
Have experience with MongoDB, Redis, Kafka, Postgres, or similar data technologies

Responsibilities

Partner with Braze’s engineering teams on: Architecting products to effectively utilize infrastructure platforms in a scalable, reliable manner
Debugging reliability and scalability issues across all stack layers, including the products built using our infrastructure platforms
Make monitoring and alerting alerts on symptoms and not on outages
Ensure that Braze meets our strict enterprise-grade SLAs with customers
Develop Braze’s internal platform infrastructure: Create Infrastructure as code using Chef, Terraform, and Kubernetes
Develop deployment pipelines for applications in multiple languages using Docker, Kubernetes, etc
Provide centralized/common tooling, services, and automation frameworks that are critical for scaling operations, capacity management, reducing operational pain, and improving the day-to-day workflow of Braze’s engineering teams
Manage incidents: Be on a PagerDuty rotation to respond to availability incidents and provide support for other engineers
Use your on-call shift to prevent incidents from ever happening
Retrospect everything that happens to turn lessons into system improvements/changes, automation, etc

Benefits

Competitive compensation that may include equity
Retirement and Employee Stock Purchase Plans
Flexible paid time off
Comprehensive benefit plans covering medical, dental, vision, life, and disability
Family services that include fertility benefits and equal paid parental leave
Professional development supported by formal career pathing, learning platforms, and tuition reimbursement
Community engagement opportunities throughout the year, including an annual company wide Volunteer Week
Employee Resource Groups that provide supportive communities within Braze
Collaborative, transparent, and fun culture recognized as a Great Place to Work®

Site Reliability Engineer

Braze

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Mid-level

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

GoDaddy

Remote

DevOps

Mid-level

Remote

DevOps

Senior