Senior Site Reliability Engineer

closed
Logo of Apaleo

Apaleo

πŸ“Remote - Worldwide

Job highlights

Summary

Join Apaleo, a promising international hospitality start-up in Munich, as a Senior Site Reliability Engineer! You'll be responsible for running the production environment, optimizing system performance, improving monitoring, and building software to manage platform infrastructure. This role requires over 5 years of experience as an SRE with expertise in distributed systems and specific technologies like Kafka, PostgreSQL, and AWS services. Apaleo offers a key role in a diverse, international team, competitive compensation, flexible work arrangements, and various team events and benefits.

Requirements

  • Over 5 years of experience as a Site Reliability Engineer, with a proven track record of tackling complex challenges and working with robust SaaS/PaaS products
  • You regularly use at least one programming language and are ideally familiar with C# and .Net
  • A proactive approach to identifying problems, performance bottlenecks, and areas for improvement
  • Experience in running distributed systems built on top of Kafka, PostgreSQL, AWS S3, AWS SQS, AWS SNS
  • Experience operating highly available distributed systems at scale, as well as building and deploying software in a SaaS environment
  • Experience with technologies we use: Terraform, Docker (virtualization), ECS, DataDog, AWS
  • Ability to analyze and troubleshoot complex issues related to cloud infrastructure
  • Excellent communication skills with the ability to work independently and in a team

Responsibilities

  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Participate in the on-call rotations
  • Measure and optimise system performance, as well as improving the monitoring system with an eye toward pushing our capabilities forward in order to run Apaleo in a cloud environment at scale
  • Improve reliability, quality, and time-to-market of our suite of software solutions by driving best practices for monitoring, alerting, and incident management company-wide
  • Build software and systems to manage platform infrastructure and provide primary operational support to the infrastructure that underpins Apaleo's SaaS product
  • Identify, respond to and collaborate with the team to resolve production and customer issues and incidents
  • Balance feature development speed and reliability with well-defined service-level objectives
  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding

Benefits

  • A key role in one of the most promising international start-ups in the hospitality industry, located in the heart of Munich
  • A diverse team of motivated and international experts from various disciplines and backgrounds
  • Fair compensation with a transparent peer review, career progression plan and personal development program
  • Flexible and free choice of work location + we support remote work
  • Team events: team-events, dinners, meet-ups, Oktoberfest
  • 30 vacation days per year
  • Free public transportation inside the city of Munich
This job is filled or no longer available

Similar Remote Jobs