Senior Site Reliability Engineer

Kontakt.io Logo

Kontakt.io

πŸ“Remote - Poland

Summary

Join Kontakt.io as a Site Reliability Engineer (SRE) and ensure the scalability, availability, and security of our cloud-based AI-driven healthcare platform. You will collaborate with various teams to build highly resilient and automated systems, impacting how healthcare organizations leverage real-time data. Your expertise in cloud infrastructure, automation, monitoring, and performance optimization will be crucial. Passion for highly available systems and automation is essential. Help us build the future of smart care operations! We offer competitive salary, stock options, flexible work options, and benefits.

Requirements

  • 3+ years of experience as an SRE
  • Strong expertise in Kubernetes, Docker, and container orchestration
  • Experience managing cloud-native environments (AWS)
  • Experience with event-driven architectures, Kafka, or real-time data streaming
  • Knowledge of machine learning infrastructure
  • Previous experience in healthcare, compliance (HIPAA), and highly regulated environments
  • Proficiency in Infrastructure as Code (IaC) using Terraform
  • Deep knowledge of networking, DNS, load balancing, and security best practices
  • Experience with CI/CD pipelines (Jenkins, CI, or ArgoCD)
  • Hands-on experience with monitoring and logging tools (Prometheus, Grafana, ELK, OpenTelemetry)
  • Strong programming skills in Python, Golang, or Bash for automation

Responsibilities

  • Design and maintain highly available, fault-tolerant, and scalable cloud infrastructure
  • Implement SLOs, SLIs, and SLAs to track system reliability and optimize uptime
  • Participate in 24/7 on-call rotation
  • Oversee production platform deployments
  • Monitor latency, traffic, errors, and system health using modern observability tools
  • Conduct root cause analysis (RCA) and post-mortems to continuously improve system resilience
  • Automate infrastructure provisioning using Terraform, Ansible, or Pulumi
  • Implement CI/CD pipelines to ensure seamless and safe deployments
  • Enable self-healing mechanisms using Kubernetes operators, auto-scaling, and fault detection
  • Ensure compliance with HIPAA, GDPR, and other healthcare data regulations
  • Define and execute disaster recovery (DR) and business continuity plans
  • Manage and optimize AWS environments for cost-efficiency and performance
  • Deploy and manage observability tools and build real-time alerting and response frameworks
  • Establish best practices for logging, debugging, and performance monitoring
  • Improve incident response automation through runbooks, AI-based anomaly detection, and predictive analytics

Benefits

  • Work on a mission-driven platform that improves healthcare operations and patient outcomes
  • B2B contract or an employment agreement
  • Competitive salary and stock option plan
  • Collaborate with top engineers, data scientists, and AI experts
  • Flexible remote or hybrid work options (office in Krakow)
  • Collaborative and self-organized environment
  • Private medical care, cafeteria system

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.