Summary

Join Moniepoint, a rapidly growing financial services platform in Africa, as a Site Reliability Engineer (SRE). You will ensure the smooth and efficient operation of our systems, engineering solutions to enhance visibility, automate tasks, and boost system resilience. This role involves on-call responsibilities for detecting and resolving issues, acting as Incident Commander during major incidents, and conducting root cause analyses. You will also develop automation, maintain monitoring dashboards, participate in feature development, define SLIs/SLOs, and resolve escalated customer complaints. The ideal candidate balances real-time responsibilities with strategic engineering work for sustainable service reliability. Moniepoint offers a supportive culture, learning opportunities, and competitive compensation.

Requirements

Minimum of 3 years of experience supporting enterprise applications in an SRE or similar role
Knowledge of distributed systems, microservices architecture and software design patterns
Experience with cloud platforms such as AWS, GCP, or Azure
Strong knowledge of Kubernetes and container orchestration tools
Experience using application performance monitoring tools, OpenTelemetry, and observability platforms such as New Relic, Datadog, ELK, or SigNoz
Excellent problem-solving and troubleshooting skills as an on-call engineer, with the ability to resolve complex infrastructure and application issues
Proficient in setting up and maintaining monitoring dashboards and alerts using Grafana and Prometheus
Working knowledge of a scripting/programming language (e.g., Python, Bash)
Proficiency in SQL databases (e.g., MySQL), writing complex sql queries against large datasets, and hands-on experience in database administration

Responsibilities

Participate in on-call rotations as the primary technical lead for detecting, triaging, and resolving service degradation, outages, or reliability issues across all environments
Act as the Incident Commander during major incidents: initiating war room or bridge calls, coordinating cross-functional teams, providing timely and clear status updates to all stakeholders and leading/documenting blameless Root Cause Analyses (RCAs) to identify the root causes of issues and drive long-term fixes
Develop automation to eliminate manual and repetitive operational tasks (toil) related to reliability and operations across both applications and infrastructure to improve efficiency and system resilience
Create and maintain monitoring dashboards and alerts to monitor application and infrastructure health
Participate in feature development discussions to ensure services are built with observability from the ground up
Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) in collaboration with Product and Engineering teams
Investigate and resolve customer complaints escalated beyond L1 and L2 support, especially those involving performance, reliability, or complex system behavior

Benefits

Attractive salary
Pension
Health insurance
Annual bonus

Senior Site Reliability Engineer

Moniepoint

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior