Summary
Join Zepz's Site Reliability Engineering team and help ensure the stability, resilience, and scale of our services. You will use code to solve problems related to configuration, infrastructure, tooling, and automation. Collaborate with feature teams to ensure services are correctly monitored and changes are implemented safely and securely. Lead troubleshooting of complex incidents and problems, and maintain end-to-end service visibility for our customers. Help the team meet its strategic goals of maximizing developer velocity while maintaining product reliability and high-quality customer experience. Grow professionally by reviewing others' work and seeking feedback on your own.
Requirements
- A skilled Engineer. At least 5 years in SRE, DevOps or Engineer role with a keen interest in solving problems using automation
- Understand SRE and DevOps methodologies. You understand the build and deployment cycle of an application, and how to operate a resilient system
- A focus on observability. Observability is key to operating a truly reliable and scalable system. We are looking for engineers who can "Monitor Everything & Measure Everything", driving a culture of observability. Experience with Grafana, Loki and Prometheus
- Holistic view on application delivery. You understand the use of many systems; monitoring, logging, alerting, and scaling. To build a robust platform which can respond to varying demands from both external sources (traffic) and internal sources (feature team delivery) in a safe and controlled manner. You have experience supporting or developing applications written in Java, Python or node.js
- Systematic problem-solving approach. You should have an understanding of how to analyze, and troubleshoot large-scale distributed systems
- Happy in the Clouds. Our Cloud Native platform is hosted on AWS. Youβll be comfortable working with a system that supports users from around the world, at scale
- Bias for action. You see a problem, you fix a problem. You get buy-in for your solutions and keep tickets moving. Weβre always looking for ways to ship at pace
- Growth mindset. A willingness to use your skills and experience to mentor less-experienced engineers. A desire to learn from others and make yourself better every day
- Agile outlook. You need to be excited about working in a fast-changing environment. Products, tools, frameworks and processes change, we evolve and take the best bits with us. The teams drive the evolution
- Disciplined and self managed. You need to own your role and be disciplined about adhering to protocols and processes. As a senior you will always ensure you are bringing value to the team and driving tasks to completion without being actively managed
Responsibilities
- Use code to solve problems. configuration, infrastructure, tooling, and automation, everything must be solved by writing high quality code that performs and scales
- Using best practices and standards in regards to Observability, Monitoring, Alerting, Capacity Planning, availability, performance/latency, change, troubleshooting for all our Tech services
- Work closely with feature teams to ensure that services are correctly monitored, change is delivered in a safe and secure way, resilience is built into our product and our standards and best practices adopted
- Lead or be involved in the troubleshooting of complex incidents and problems
- Have visibility on end to end service to our customers and ensure their journey is stable and consistent across all the microservices and 3rd party dependencies with the observability tool you will have implemented with the Engineering teams
- Helping the team meet its strategic goals; to maintain the highest level of observability, maximize developer velocity while keeping our product reliable, and ensure that we can deliver the highest quality experience to our customers
- Growing together. Youβll review others' work and happily seek feedback on yours to ensure we build a better codebase and sharpen each other's skills
Preferred Qualifications
- Have experience working in a FinTech space
- Have experience working in a distributed team across different geographies and timezones
Benefits
- Unlimited Annual Leave: Feel free to make the most of your time off and maintain a healthy work-life balance!
- Private Medical Cover: You can opt-in to a Private Medical Insurance scheme. This provides you with access to thorough medical coverage, so you can feel confident in your health and well-being
- Retirement: We offer pension schemes to help you plan for and secure your future
- Life Assurance: Life assurance is available to give you peace of mind and protect your loved ones in case of the unexpected
- Parental Leave: We offer competitive parental leave schemes to ensure you are spending as much quality time with your new bundle of joy as possible
- We are also remote-first as an organisation, offering flexibility for you to work where you need to be most productive
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.