Senior Site Reliability Engineer I

Logo of Careem

Careem

πŸ“Remote - Pakistan

Job highlights

Summary

Join Careem's infra monitoring team as a passionate automation, tooling, and frameworks expert. You will contribute to building and improving Careem's infra/app monitoring system, enabling projects to enhance system visibility and alert capabilities. Key responsibilities involve developing a distributed monitoring system, designing scalable solutions, mentoring colleagues, collaborating with engineers and product owners, building and shipping new features, and maintaining various systems. The role requires extensive experience with monitoring systems, OOP languages, Kubernetes, cloud infrastructure, and infrastructure automation. Careem offers a unique work environment with flexible work arrangements, healthcare benefits, and fitness reimbursements.

Requirements

  • 5+ years of experience with monitoring systems like Prometheus , NewRelic, AppDynamic etc
  • Experience in developing and debugging in one of these OOP languages, Java, Python, Bash, Go
  • Expert knowledge on Kubernetes
  • Experience with Cloud Infrastructure (AWS preferred)
  • Experience with infrastructure automation (Infrastructure as Code)
  • Experience in architecture/design, developing, operating and troubleshooting highly available systems at scale
  • Experience in building and owning tools for medium to large engineering teams
  • Experience of building systems, dashboards and metrics to facilitate a data-driven approach to problem resolution
  • Strong Unix or Linux background, including topics around network stack and scripting
  • Obsession about keeping costs low while building solutions

Responsibilities

  • Develop our distributed monitoring system to meet the challenging functional, scalability and reliability requirements for our fast-growing business
  • Design/Architect solutions with a focus on scalability, testability, and maintainability
  • Coach, and mentor colleagues on an energetic, growing team
  • Facilitate collaboration with other engineers, product owners, and designers to solve interesting and challenging problems across our platform
  • Build and ship new features and systems, with an emphasis on code quality, maintainability, readability, and testing
  • Develop, maintain, and extend a variety of systems, including open-source, ready-made, and in-house applications
  • Focus on quality and know what it means to ship high quality code

Preferred Qualifications

  • Experience in multi-tiered distributed systems
  • Proficient in configuring, managing, and optimizing Prometheus and Thanos stack for effective monitoring
  • CICD is a plus
  • Experience on EKL stack and/or Log management
  • Experience with cloud-centric application development and deployment (AWS preferred)

Benefits

  • Work and learn from great minds by joining a community of inspiring colleagues
  • Put your passion to work in a purposeful organisation dedicated to creating impact in a region with a lot of untapped potential
  • Explore new opportunities to learn and grow every day
  • Work 4 days a week in office & 1 day from home, and remotely from any country in the world for 30 days a year with unlimited vacation days per year. (If you are in an individual contributor role in tech, you will have 2 office days a week and 3 to work from home.)
  • Access to healthcare benefits and fitness reimbursements for health activities including gym, health club, and training classes

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let Careem know you found this job on JobsCollider. Thanks! πŸ™