Staff Site Reliability Engineer at Nearsure

Summary

Join Nearsure's close-knit LATAM remote team and enjoy a supportive work environment with competitive USD salaries, 100% remote work flexibility, paid time off, national holidays, sick leave, a refundable annual credit, team-building activities, and a birthday day off. As a Staff Site Reliability Engineer, you will own and optimize OpenTelemetry pipelines, build self-service automation tools, design incident response processes, and collaborate with various teams to ensure reliable infrastructure and actionable alerting. You will leverage IaC, design base-level requirements for services, and take ownership of client infrastructure reliability. Nearsure values autonomy, open communication, and diversity, offering a supportive People Care team for employee well-being.

Requirements

Bachelor's Degree in Computer Science, Engineering, or a related field
8 + Years of experience working as an SRE Engineer or in a very similar role, more focused on observability
5 + Years of experience working with cloud (AWS)
5 + Years of experience working with IaC tools (Terraform) and GitOps CI/CD solutions (ArgoCD, GitHub Actions, or similar)
4 + Years of experience working with monitoring and logging tools such as Grafana, Prometheus, Loki, New Relic, or Datadog (experience managing observability pipelines at scale in high-throughput environments)
4 + Years of experience working in Kubernetes, including its core components, deployment methodologies, and monitoring best practices
Strong communication skills with team members and stakeholders (technical and nontechnical communication)
Strong scripting abilities (Python, Go, or similar) for automating observability tasks
Experience integrating incident management platforms (PagerDuty, Jira) with automated alerting workflows
Advanced English Level is required for this role as you will work with US clients. Effective communication in English is essential to deliver the best solutions to our clients and expand your horizons

Responsibilities

Design, implement, and maintain observability pipelines across the three main signals—logs, metrics, and traces—ensuring standardized, scalable, and efficient data ingestion. Optimize ingestion strategies to balance cost, performance, and usability
Build self-service automation and tooling that enables development teams to instrument and leverage observability without requiring manual intervention from the SRE team. Drive adoption of best practices while ensuring teams own their telemetry
Design the processes, playbooks, checklists, and automations for them and other engineers to follow during an incident
Interact with members from almost all teams across the business to understand their monitoring, alerting, and SLO / SLA requirements and design systems and processes that ensure we meet or exceed these requirements. Influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development
Design the processes, playbooks, checklists, and automations for them and other engineers to follow during an incident
Leverage Infrastructure-as-Code (IaC) to provision and manage monitoring tools, alerting rules, and our observability configurations across OTEL Pipelines
Design base-level requirements for new and existing services to ensure that all client infrastructure and code are monitored consistently and accurately at a basic level
Take full ownership of client infrastructure reliability, ensuring adherence to key availability and security KPIs

Benefits

Competitive USD salary
100% remote work
Paid time off
National Holidays celebrated
Sick leave
Refundable Annual Credit
Team-building activities
Birthday day off

Staff Site Reliability Engineer

Nearsure

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Addepar

Remote

DevOps

Mid-level