Senior Site Reliability Engineer 3

PagerDuty Logo

PagerDuty

πŸ“Remote - Portugal

Summary

Join PagerDuty's Release Engineering team in Portugal as a Senior Site Reliability Engineer 3. You will lead platform engineering initiatives, make key architectural decisions for CI/CD and Kubernetes, mentor junior team members, and build scalable infrastructure solutions. Responsibilities include designing and implementing complex platform solutions, improving developer experience and platform reliability, and leading post-incident reviews. You will collaborate with global engineering teams and champion observability best practices. This role requires a strong background in Site Reliability Engineering, Kubernetes, CI/CD, and cloud-native infrastructure. The position offers a flexible hybrid work model with one day per month in the Lisbon office.

Requirements

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
  • Deep expertise in Kubernetes administration and architecture
  • Strong track record of leading CI/CD and platform engineering initiatives
  • Demonstrated experience leading technical projects and mentoring engineers
  • Advanced experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
  • Experience with monitoring, observability and logging platforms (e.g. DataDog, New Relic, SumoLogic, Splunk)
  • Advanced experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)
  • Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)

Responsibilities

  • Lead the design and implementation of complex platform engineering solutions
  • Drive architectural decisions for our CI/CD infrastructure and Kubernetes platform
  • Mentor junior team members and provide technical leadership in platform engineering practices
  • Develop and implement strategic initiatives to improve developer experience and platform reliability
  • Design and implement scalable solutions for infrastructure automation using Terraform and other IaC tools
  • Lead post incident reviews and drive systematic improvements to prevent recurring issues
  • Collaborate with other engineering teams globally to define and implement platform standards
  • Champion observability and monitoring best practices across the organization
  • Participate in a 24/7 on-call rotation

Preferred Qualifications

  • Experience with GitOps practices and tools like ArgoCD
  • Experience building and maintaining platform engineering solutions at scale
  • Experience implementing and managing observability solutions
  • Experience with cost optimization and capacity planning
  • Knowledge of emerging trends in platform engineering and DevOps practices
  • Strong technical writing skills for documentation and knowledge sharing
  • Experience with developer portals and internal platform products

Benefits

  • Competitive salary
  • Comprehensive benefits package from day one
  • Flexible work arrangements
  • Company equity
  • ESPP (Employee Stock Purchase Program)
  • Retirement or pension plan
  • Generous paid vacation time
  • Paid holidays and sick leave
  • Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
  • Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent (some countries have longer leave standards and we comply with local laws)
  • Paid volunteer time off: 20 hours per year
  • Company-wide hack weeks
  • Mental wellness programs

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.