Summary
Join MongoDB's SRE Observability team as a remote engineer in North America and contribute to building and maintaining the observability stack for all engineering teams. You will define standards and vision for the platform, design and build core observability services, design and implement monitoring across global cloud providers, and ensure service reliability and fault tolerance. The role involves identifying key metrics, participating in on-call rotations, and improving observability capabilities. You will collaborate with other teams to implement best practices in instrumentation and monitoring. This is a highly collaborative role offering significant ownership of mission-critical infrastructure.
Requirements
- Experience running mission critical services at scale
- Experience with observability of large scale distributed systems
- An understanding of information security issues
- Firm grasp of at least one modern programming language, beyond basic scripting
- Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc)
- Bachelorβs degree in Computer Science or equivalent experience
Responsibilities
- Define standards and vision for the mission-critical observability platform leveraged by all parts of the engineering organization
- Design, architect, build and deliver core pieces of our observability services in collaboration with other vested parties
- Design, implement, and troubleshoot the monitoring of services that seamlessly spans the globe - including several cloud providers
- Build for reliability, making services and infrastructure available, resilient, fault tolerant and self-healing
- Identify and configure key metrics to detect incidents and quantify service health, availability and performance
- Participate in a week-long on-call rotation and blameless post-mortem process
- Improve our observability capabilities, optimizing for cost, ease of use, and maintainability
Preferred Qualifications
- Experience with at least one of the major cloud providers (Amazon Web Services, Google Compute, Microsoft Azure)
- Experience working in a kubernetes-based environment kubernetes clusters
Benefits
- Generous compensation package
- Opportunities to learn on the job (time to up skill in new technologies)
- High level of independence in your day to day work
- Flexible paid time off
- 20 weeks fully-paid gender-neutral parental leave
- Fertility and adoption assistance
- 401(k) plan
- Mental health counseling
- Access to transgender-inclusive health insurance coverage
- Health benefits offerings
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.