Senior Site Reliability Engineer

GetYourGuide
Summary
Join GetYourGuide's Site Reliability Engineering team as a fully remote engineer, contributing to the development, automation, and enhancement of cloud and container-based infrastructure. You will work with Kubernetes, AWS, and Istio, ensuring high availability and scalability of production systems. Responsibilities include building and scaling cloud infrastructure, developing custom controllers, leveraging Istio and Envoy, driving initiatives for better system design, participating in on-call rotations, and championing operational culture. The ideal candidate possesses strong coding skills (Go preferred), Kubernetes experience, Linux system knowledge, and excellent communication skills. GetYourGuide offers a comprehensive benefits package including a personal growth budget, remote work options, flexible arrangements, team events, transportation and fitness budgets, GetYourGuide activity discounts, and health and wellness benefits.
Requirements
- Availability from 13:00 to 17:00 Central European Standard Time zone (Berlin/Zurich) every day for collaboration with the team
- Experience with Kubernetes and running containers at scale
- A good, low-level understanding of the Linux operating system
- Strong coding skills in at least one programming language. Our most used language is Go
- Good understanding of production systems, networking and container technology
- Sufficient grasp of public cloud environments like AWS
- Positive, proactive team player who is passionate about their craft and cares about helping the team deliver
- Written and verbal communication skills with the ability to clearly explain technical concepts to others in English
- You care about monitoring and understanding the state of systems
- Problem solver with operations skills that can quickly diagnose and pinpoint issues in a production environment
Responsibilities
- Build and scale our cloud-based infrastructure including managing our Kubernetes clusters and AWS environment
- Ensure the high availability, autoscaling and failure recovery capabilities of production and pre-production systems
- Develop custom controllers to automate the management of clusters
- Leverage Istio and Envoy to manage service communication and provide network observability
- Actively drive initiatives towards better system design and implementation of new technologies
- Participate in infrastructure on-call rotations
- Champion our operations culture and help the engineering organization deliver highly available services for our customers
Benefits
- Annual personal growth budget and mentorship programs for continuous learning and development
- Work from anywhere in the world for 40 days per year
- Flexible working arrangements to support work-life balance
- Opportunities to collaborate and socialize with team members through quarterly team events and yearly company-wide events
- Monthly transportation and fitness budget
- Discounts for you, your friends, and family on GetYourGuide activities
- Language reimbursement program
- Health and wellness benefits