Senior Site Reliability Engineer 3

PagerDuty
Summary
Join PagerDuty's Release Engineering team in Portugal as a Senior Site Reliability Engineer 3. You will lead platform engineering initiatives, make key architectural decisions for CI/CD and Kubernetes, mentor junior team members, and build scalable infrastructure solutions. Responsibilities include designing and implementing complex platform solutions, improving developer experience and platform reliability, and leading post-incident reviews. You will collaborate with global engineering teams and champion observability best practices. This role requires a strong background in Site Reliability Engineering, Kubernetes, CI/CD, and cloud-native infrastructure. The position offers a flexible hybrid work model with one day per month in the Lisbon office.
Requirements
- 5+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles
- Deep expertise in Kubernetes administration and architecture
- Strong track record of leading CI/CD and platform engineering initiatives
- Demonstrated experience leading technical projects and mentoring engineers
- Advanced experience working on cloud-native infrastructure (e.g. AWS, GCP, Azure)
- Experience with monitoring, observability and logging platforms (e.g. DataDog, New Relic, SumoLogic, Splunk)
- Advanced experience with Infrastructure as Code, (e.g. Terraform, Cloudformation)
- Proficiency in at least one programming language (e.g. Python, Ruby, Go, etc.)
Responsibilities
- Lead the design and implementation of complex platform engineering solutions
- Drive architectural decisions for our CI/CD infrastructure and Kubernetes platform
- Mentor junior team members and provide technical leadership in platform engineering practices
- Develop and implement strategic initiatives to improve developer experience and platform reliability
- Design and implement scalable solutions for infrastructure automation using Terraform and other IaC tools
- Lead post incident reviews and drive systematic improvements to prevent recurring issues
- Collaborate with other engineering teams globally to define and implement platform standards
- Champion observability and monitoring best practices across the organization
- Participate in a 24/7 on-call rotation
Preferred Qualifications
- Experience with GitOps practices and tools like ArgoCD
- Experience building and maintaining platform engineering solutions at scale
- Experience implementing and managing observability solutions
- Experience with cost optimization and capacity planning
- Knowledge of emerging trends in platform engineering and DevOps practices
- Strong technical writing skills for documentation and knowledge sharing
- Experience with developer portals and internal platform products
Benefits
- Competitive salary
- Comprehensive benefits package from day one
- Flexible work arrangements
- Company equity
- ESPP (Employee Stock Purchase Program)
- Retirement or pension plan
- Generous paid vacation time
- Paid holidays and sick leave
- Dutonian Wellness Days & HibernationDuty - companywide paid days off in addition to PTO
- Paid parental leave: 22 weeks for pregnant parent, 12 weeks for non-pregnant parent (some countries have longer leave standards and we comply with local laws)
- Paid volunteer time off: 20 hours per year
- Company-wide hack weeks
- Mental wellness programs