Summary

Join Careem's infra monitoring team as a passionate automation, tooling, and frameworks expert. You will contribute to building and improving Careem's infra/app monitoring system, enabling projects to enhance system visibility and alert capabilities. Key responsibilities involve developing a distributed monitoring system, designing scalable solutions, mentoring colleagues, collaborating with engineers and product owners, building and shipping new features, and maintaining various systems. The role requires extensive experience with monitoring systems, OOP languages, Kubernetes, cloud infrastructure, and infrastructure automation. Careem offers a unique work environment with flexible work arrangements, healthcare benefits, and fitness reimbursements.

Requirements

5+ years of experience with monitoring systems like Prometheus , NewRelic, AppDynamic etc
Experience in developing and debugging in one of these OOP languages, Java, Python, Bash, Go
Expert knowledge on Kubernetes
Experience with Cloud Infrastructure (AWS preferred)
Experience with infrastructure automation (Infrastructure as Code)
Experience in architecture/design, developing, operating and troubleshooting highly available systems at scale
Experience in building and owning tools for medium to large engineering teams
Experience of building systems, dashboards and metrics to facilitate a data-driven approach to problem resolution
Strong Unix or Linux background, including topics around network stack and scripting
Obsession about keeping costs low while building solutions

Responsibilities

Develop our distributed monitoring system to meet the challenging functional, scalability and reliability requirements for our fast-growing business
Design/Architect solutions with a focus on scalability, testability, and maintainability
Coach, and mentor colleagues on an energetic, growing team
Facilitate collaboration with other engineers, product owners, and designers to solve interesting and challenging problems across our platform
Build and ship new features and systems, with an emphasis on code quality, maintainability, readability, and testing
Develop, maintain, and extend a variety of systems, including open-source, ready-made, and in-house applications
Focus on quality and know what it means to ship high quality code

Preferred Qualifications

Experience in multi-tiered distributed systems
Proficient in configuring, managing, and optimizing Prometheus and Thanos stack for effective monitoring
CICD is a plus
Experience on EKL stack and/or Log management
Experience with cloud-centric application development and deployment (AWS preferred)

Benefits

Work and learn from great minds by joining a community of inspiring colleagues
Put your passion to work in a purposeful organisation dedicated to creating impact in a region with a lot of untapped potential
Explore new opportunities to learn and grow every day
Work 4 days a week in office & 1 day from home, and remotely from any country in the world for 30 days a year with unlimited vacation days per year. (If you are in an individual contributor role in tech, you will have 2 office days a week and 3 to work from home.)
Access to healthcare benefits and fitness reimbursements for health activities including gym, health club, and training classes

Senior Site Reliability Engineer I

Careem

Job highlights

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs