Senior Platform Infrastructure Engineer
Braze
๐ต $154k-$275k
๐Remote - United States
Please let Braze know you found this job on JobsCollider. Thanks! ๐
Job highlights
Summary
Join Braze as a Platform Infrastructure Engineer specializing in MongoDB and be responsible for managing, maintaining, and evolving the infrastructure required to run MongoDB at scale on Kubernetes. You will ensure MongoDB operates as a reliable, performant, and developer-friendly service. This role requires designing and managing MongoDB infrastructure, ensuring reliability and performance, responding to incidents, collaborating with other teams, and innovating through automation. You will work with a passionate team to support Braze's massive data operations and contribute to a culture of operational excellence. Braze offers competitive compensation, comprehensive benefits, and opportunities for professional development.
Requirements
- 5+ years managing database platforms in production environments, with 2+ years specifically focused on MongoDB
- Hands-on expertise with Kubernetes, including deploying and managing stateful workloads
- Proven experience automating database operations using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or Pulumi
- Deep understanding of MongoDB internals, including sharding, replica sets, and query optimization
- Proficiency in Kubernetes concepts like StatefulSets, Operators, and Persistent Volume Claims
- Strong knowledge of cloud services (AWS/GCP/Azure) and their storage integrations for MongoDB workloads
- Familiarity with monitoring and logging tools tailored to databases (e.g., Prometheus, MongoDB Atlas monitoring)
- Dedicated to building robust, scalable, and self-serviceable MongoDB systems that empower developers and reduce operational complexity
- Committed to collaboration, documentation, and knowledge-sharing across remote, global teams
- Proactive in seeking out ways to improve MongoDB performance, reliability, and automation
- Focused on delivering value quickly to internal stakeholders while maintaining operational excellence
Responsibilities
- Design and Manage MongoDB Infrastructure
- Build, optimize, and manage MongoDB clusters on Kubernetes, ensuring they meet scalability, availability, and performance requirements
- Develop automation frameworks for provisioning, upgrading, scaling, and maintaining MongoDB clusters
- Design architectures that support seamless MongoDB operations, including multi-region deployments, sharded clusters, and replica sets
- Create and optimize storage configurations, resource allocation, and networking to ensure the highest MongoDB performance
- Ensure MongoDB Reliability & Performance
- Implement high-availability strategies for MongoDB, including automated failovers, backups, and disaster recovery
- Collaborate with database engineers, Platform Software Engineers, and product teams to define and achieve Service Level Objectives (SLOs) for MongoDB performance and reliability
- Continuously monitor MongoDB systems using tools like Prometheus, Grafana, and database-specific observability solutions to identify and address performance bottlenecks proactively
- Incident Response & Resilience
- Be part of a PagerDuty rotation to respond to MongoDB-related incidents, minimizing downtime and impact on the business
- Conduct root cause analyses for MongoDB failures and implement preventive measures to improve resilience
- Develop and maintain playbooks for incident response and recovery, ensuring the team is equipped to handle any MongoDB-related challenges
- Collaboration & Knowledge Sharing
- Partner with other teams to integrate MongoDB as a self-service platform, reducing the need for manual intervention
- Share expertise through documentation, training, and mentoring to empower the broader engineering organization
- Contribute to a culture of operational excellence by creating robust standards and best practices for running MongoDB on Kubernetes
- Innovate & Automate
- Continuously evaluate emerging tools and technologies for managing MongoDB and Kubernetes, integrating them where appropriate
- Develop and implement self-healing mechanisms to minimize operational overhead and improve MongoDB uptime
- Automate manual processes, including scaling, maintenance, upgrades, and cluster health checks, to improve efficiency and reduce human error
Benefits
- Competitive compensation that may include equity
- Retirement and Employee Stock Purchase Plans
- Flexible paid time off
- Comprehensive benefit plans covering medical, dental, vision, life, and disability
- Family services that include fertility benefits and equal paid parental leave
- Professional development supported by formal career pathing, learning platforms, and tuition reimbursement
- Community engagement opportunities throughout the year, including an annual company wide Volunteer Week
- Employee Resource Groups that provide supportive communities within Braze
- Collaborative, transparent, and fun culture recognized as a Great Place to Workยฎ
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Remote Jobs
- ๐United States
- ๐India
- ๐Slovenia
- ๐United States
- ๐ฐ$177k-$213k๐United States
- ๐Canada
- ๐ฐ$115k-$130k๐United States
- ๐ฐ$131k-$211k๐United States
- ๐ฐ$200k-$220k๐United States
Please let Braze know you found this job on JobsCollider. Thanks! ๐