Senior Site Reliability Engineer
closedApaleo
πRemote - Worldwide
Job highlights
Summary
Join Apaleo, a promising international hospitality start-up in Munich, as a Senior Site Reliability Engineer! You'll be responsible for running the production environment, optimizing system performance, improving monitoring, and building software to manage platform infrastructure. This role requires over 5 years of experience as an SRE with expertise in distributed systems and specific technologies like Kafka, PostgreSQL, and AWS services. Apaleo offers a key role in a diverse, international team, competitive compensation, flexible work arrangements, and various team events and benefits.
Requirements
- Over 5 years of experience as a Site Reliability Engineer, with a proven track record of tackling complex challenges and working with robust SaaS/PaaS products
- You regularly use at least one programming language and are ideally familiar with C# and .Net
- A proactive approach to identifying problems, performance bottlenecks, and areas for improvement
- Experience in running distributed systems built on top of Kafka, PostgreSQL, AWS S3, AWS SQS, AWS SNS
- Experience operating highly available distributed systems at scale, as well as building and deploying software in a SaaS environment
- Experience with technologies we use: Terraform, Docker (virtualization), ECS, DataDog, AWS
- Ability to analyze and troubleshoot complex issues related to cloud infrastructure
- Excellent communication skills with the ability to work independently and in a team
Responsibilities
- Run the production environment by monitoring availability and taking a holistic view of system health
- Participate in the on-call rotations
- Measure and optimise system performance, as well as improving the monitoring system with an eye toward pushing our capabilities forward in order to run Apaleo in a cloud environment at scale
- Improve reliability, quality, and time-to-market of our suite of software solutions by driving best practices for monitoring, alerting, and incident management company-wide
- Build software and systems to manage platform infrastructure and provide primary operational support to the infrastructure that underpins Apaleo's SaaS product
- Identify, respond to and collaborate with the team to resolve production and customer issues and incidents
- Balance feature development speed and reliability with well-defined service-level objectives
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding
Benefits
- A key role in one of the most promising international start-ups in the hospitality industry, located in the heart of Munich
- A diverse team of motivated and international experts from various disciplines and backgrounds
- Fair compensation with a transparent peer review, career progression plan and personal development program
- Flexible and free choice of work location + we support remote work
- Team events: team-events, dinners, meet-ups, Oktoberfest
- 30 vacation days per year
- Free public transportation inside the city of Munich
This job is filled or no longer available
Similar Remote Jobs
- π°$60k-$120kπAsia
- π°$177k-$213kπUnited States
- πGermany
- π°$127k-$249kπUnited States
- π°$109k-$169kπWorldwide
- π°$60k-$129kπColombia
- πUnited States
- π°$64k-$74kπUnited Kingdom
- πUnited Kingdom