Braze is hiring a
Site Reliability Engineer

Logo of Braze

Braze

💵 ~$180k-$252k
📍Remote - Canada

Summary

Join our team of Site Reliability Engineers at Braze, where you'll collaborate with engineering teams to improve infrastructure, automation, and tooling. As a SRE, you'll ensure site uptime, architect products for scalability, and develop internal platform infrastructure.

Requirements

  • 3+ years of experience as a Software, DevOps, or Site Reliability Engineer
  • You think about systems - interfaces, boundaries, edge cases, failure modes, behaviors, specific implementations
  • Have an urge to collaborate, document, and deliver quickly
  • Collaborating across the global remote teams, often working asynchronously
  • Document everything so you don't need to learn the same thing (or plan the same work) twice
  • Delivering fast to delight our customers– even internal ones
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it
  • Have a desire to solve everyday challenges facing software engineers and automate their toil away
  • Have an excellent ability to manage multiple tasks and expectations at once
  • Know your way around Linux and Unix Shell
  • Have strong programming skills - Ruby and/or Go preferred
  • Have experience with Docker, Kubernetes, Terraform, or similar IaC technologies
  • Have experience with MongoDB, Redis, Kafka, Postgres, or similar data technologies

Responsibilities

  • Partner with Braze’s engineering teams on: Architecting products to effectively utilize infrastructure platforms in a scalable, reliable manner
  • Debugging reliability and scalability issues across all stack layers, including the products built using our infrastructure platforms
  • Make monitoring and alerting alerts on symptoms and not on outages
  • Ensure that Braze meets our strict enterprise-grade SLAs with customers
  • Develop Braze’s internal platform infrastructure: Create Infrastructure as code using Chef, Terraform, and Kubernetes
  • Develop deployment pipelines for applications in multiple languages using Docker, Kubernetes, etc
  • Provide centralized/common tooling, services, and automation frameworks that are critical for scaling operations, capacity management, reducing operational pain, and improving the day-to-day workflow of Braze’s engineering teams
  • Manage incidents: Be on a PagerDuty rotation to respond to availability incidents and provide support for other engineers
  • Use your on-call shift to prevent incidents from ever happening
  • Retrospect everything that happens to turn lessons into system improvements/changes, automation, etc

Benefits

  • Competitive compensation that may include equity
  • Retirement and Employee Stock Purchase Plans
  • Flexible paid time off
  • Comprehensive benefit plans covering medical, dental, vision, life, and disability
  • Family services that include fertility benefits and equal paid parental leave
  • Professional development supported by formal career pathing, learning platforms, and tuition reimbursement
  • Community engagement opportunities throughout the year, including an annual company wide Volunteer Week
  • Employee Resource Groups that provide supportive communities within Braze
  • Collaborative, transparent, and fun culture recognized as a Great Place to Work®

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let Braze know you found this job on JobsCollider. Thanks! 🙏