Summary

Join dLocal, a global payments company, as a Site Reliability Engineering (SRE) Engineer. You will design and implement highly resilient, scalable, and reliable systems for mission-critical applications used by major clients. This role involves developing quality gates, automating processes, influencing architectural decisions, and collaborating with various teams. You will work with monitoring tools, CI/CD pipelines, and security best practices. dLocal offers a flexible, remote-first culture with travel, health, and learning benefits.

Requirements

Over 3 years’ of experience as SRE Engineer or in a very similar role
Experience with monitoring tools such as New Relic, DataDog, Nagios
Experience working with tools such as Jira, PagerDuty and Confluence and integrating these tools with automated processing techniques (API integrations)
Experience with CI/CD tools such as Github Actions, Jenkins, Spinnaker, ArgoCD or similar
Knowledge of security best practices and infosec tooling. (You will be writing systems to monitor for breaches and insecurities.)
Strong communication skills
Problem-solving skills
Detail-oriented person
Highly analytical person
Ability to collaborate across multi-functional teams

Responsibilities

Develop quality gates based on production-level service level objectives (SLOs) to detect issues earlier in the development cycle
Automate build testing and validation using service-level indicators (SLIs) and SLOs
Influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development
Design processes, playbooks and checklists for other engineers to follow during and after incidents
Write post mortems and perform technical after-action reviews to understand root cause and propose system improvements to reduce overall fault rates
Interact with members from almost all teams across the business to understand their monitoring, alerting and SLO / SLA requirements and design systems and processes that ensure we meet or exceed these requirements
Automate the provisioning of monitoring tools and rules with tools like Terraform and Ansible / Chef
Design base level requirements for new and existing services to ensure that all dLocal infrastructure and code are monitored consistently and accurately at a basic level
Monitor both the technical health as well as the security health of dLocal infrastructure and systems
Optimize signal-to-noise ratio for alerting to ensure we receive only the alerts that are actionable and make sense

Preferred Qualifications

Cloud experience (AWS) is highly advantageous (as most systems will integrate with AWS at some level)
IaC experience with a tool like Terraform is highly advantageous
CaC experience with a tool like Ansible, Chef or Salt is highly advantageous
Database knowledge is highly advantageous (both in terms of how they perform and SQL syntax)

Benefits

Flexible, remote-first dynamic culture with travel, health, and learning benefits

Site Reliability Engineering Leader

dLocal

Job highlights

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Renaissance Learning

Remote

DevOps

Entry Level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Director

OLX

Remote

DevOps

Senior