Staff Engineer - DevOps Site Reliability

Nagarro Logo

Nagarro

πŸ“Remote - Colombia

Summary

Join our Digital Product Engineering company as an experienced L3 SRE engineer, focusing on business-critical SaaS applications. You will provide full-stack L3 support, automating SRE tools and working under pressure. Strong communication skills are essential for collaborating with various teams and end-users. Experience with incident and problem management, multitenant applications, and various technologies is required. The role demands expertise in Kubernetes, AWS, and specific programming languages. We offer a dynamic and non-hierarchical work culture.

Requirements

  • Possess experience with EKS, Github Actions, Python (Strong), Kubernetes (Expert), and Prometheus
  • Demonstrate a solid understanding of networking concepts (TCP/IP, DNS, Routing, VPCs, subnets, firewalls, load balancing, TLS, and SSL)
  • Have experience with CI/CD pipelines (e.g., Jenkins, Github Actions) & version control
  • Have experience with AWS, particularly EKS, serverless, queue & various databases
  • Have experience with Python and React/Next

Responsibilities

  • Provide full-stack L3 support across infra, backend, and front-end before escalation to the engineering business unit
  • Automate SRE tools to provide proactive L3 support, aligned with our tech monitoring strategy
  • Work under business pressure for business-critical applications
  • Communicate effectively with L1, L2, Engineering, Product managers, leadership, and end-users during troubleshooting
  • Manage incidents and problems
  • Work with multitenant applications
  • Utilize monitoring and logging tools (Grafana, Prometheus, Loki, or ELK) to analyze and track resource utilization, application performance, and identify potential issues

Preferred Qualifications

  • Have previous experience building a user-facing GenAI/LLM software application
  • Understand security best practices in cloud environments, including AWS Managed Services (RDS, Batch, Lambda, Fargate, Step Functions, SQS/SNS, etc.)
  • Have experience with FastAPI and NextJS
  • Have experience with Websockets, Server-Side Events, and Pub/Sub (RabbitMQ, Kafka, etc.)
  • Understand cloud security concepts (IAM, access control)
  • Have experience with Terraform

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs