Summary

Join Feedzai's Platform Engineering Performance & Reliability team and contribute to the optimization of existing systems, infrastructure building, and automation. You will manage the complex challenges of scale in Feedzai's fraud detection mission, collaborating with talented platform engineers on complexity analysis and large-scale system design. The role involves developing automation, tooling, and platforms supporting Feedzai's cloud service. You will provide recommendations on capacity allocation, work with product teams to improve system performance and reliability, and participate in incident response and root cause investigation. This position requires experience in distributed systems, cloud services, and programming languages like Go or Python. Feedzai offers a fast-paced, collaborative environment with opportunities for continuous learning.

Requirements

A bachelor's degree in Computer Science, Information Systems, or the equivalent combination of education, experience, and training
Programming skills (Go, Python or similar languages)
2+ years of experience in data structures, algorithms, programming, asynchronous & multithreaded designs
2+ years of experience with building scalable and distributed cloud services
2+ years operating production environments
1+ years of experience in cross team collaboration within a supportive role
Self-driven & motivated, with a strong work ethic and a passion for problem solving
Systematic problem-solving approach, coupled with effective verbal and written communication skills
Experience being oncall

Responsibilities

Provide recommendations about capacity allocation considering cost, resilience and performance
Work together with product teams to support best practices and drive improvements on systems performance and reliability before and after they go live
Development with Go, Python or similar languages
Automate all aspects of cloud infrastructure and incident response
Develop playbooks related to actionable alerts
Participate in incident response, root cause investigation and resolution
Maintain and develop our infrastructure as code (IaC) to manage and operate end-to-end lifecycle operations (monitoring, alerting, security, cost optimization, configuration, backup, etc.) in production environments
Utilize your experience and problem solving skills to help prevent and investigate production issues

Preferred Qualifications

Experience with monitoring & Observability stacks such as Grafana and Prometheus
Kubernetes, Cloud and Hashicorp experience is valued
Knowledge or experience with AWS or GCP

Site Reliability Engineer

Feedzai

Summary

Requirements

Responsibilities

Preferred Qualifications

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

GoDaddy

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Senior