Site Reliability Engineer

Logo of Apaleo

Apaleo

πŸ“Remote - Worldwide

Job highlights

Summary

Join Apaleo, a promising international hospitality start-up, as a Site Reliability Engineer! You'll run our production environment, monitor system health, and participate in on-call rotations. Responsibilities include optimizing system performance, improving monitoring, and building software to manage platform infrastructure. You'll collaborate to resolve production issues and balance feature development with reliability. Apaleo offers a key role in a diverse, international team, competitive compensation, flexible work location, and various team events and benefits.

Requirements

  • Over 5 years of experience as a Site Reliability Engineer, with a proven track record of tackling complex challenges and working with robust SaaS/PaaS products
  • You regularly use at least one programming language and are ideally familiar with C# and .Net
  • A proactive approach to identifying problems, performance bottlenecks, and areas for improvement
  • Experience in running distributed systems built on top of Kafka, PostgreSQL, AWS S3, AWS SQS, AWS SNS
  • Experience operating highly available distributed systems at scale, as well as building and deploying software in a SaaS environment
  • Experience with technologies we use: Terraform, Docker (virtualization), ECS, DataDog, AWS
  • Ability to analyze and troubleshoot complex issues related to cloud infrastructure
  • Excellent communication skills with the ability to work independently and in a team

Responsibilities

  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Participating in the on-call rotations
  • Measure and optimising system performance, as well as improving the monitoring system with an eye toward pushing our capabilities forward in order to run Apaleo in a cloud environment at scale
  • Improve reliability, quality, and time-to-market of our suite of software solutions by driving best practices for monitoring, alerting, and incident management company-wide
  • Build software and systems to manage platform infrastructure and provide primary operational support to the infrastructure that underpins Apaleo's SaaS product
  • Identify, respond to and collaborate with the team to resolve production and customer issues and incidents
  • Balance feature development speed and reliability with well-defined service-level objectives
  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding

Benefits

  • A key role in one of the most promising international start-ups in the hospitality industry, located in the heart of Munich
  • A diverse team of motivated and international experts from various disciplines and backgrounds
  • Fair compensation with a transparent peer review, career progression plan and personal development program
  • Flexible and free choice of work location + we support remote work
  • Team events: team-events, dinners, meet-ups, Oktoberfest
  • 30 vacation days per year
  • Free public transportation inside the city of Munich

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Apaleo know you found this job on JobsCollider. Thanks! πŸ™