Staff Site Reliability Engineer

Logo of neptune.ai

neptune.ai

📍Remote - Europe

Job highlights

Summary

Join our fully remote team as an experienced Staff Site Reliability Engineer. You will play a key role in designing, optimizing, and maintaining our infrastructure, ensuring scalability, resilience, and performance of Neptune solutions. This position requires deep understanding of distributed systems and performance optimization. You will collaborate with various teams and contribute to automation, security, and incident management. We offer a flexible remote work environment, employee stock options, paid time off, and opportunities for ownership and impact.

Requirements

  • 6+ years in SRE, DevOps, or related roles
  • Strong experience managing and optimizing Kubernetes clusters for robust, scalable, and efficient infrastructure
  • Proven expertise in designing and implementing automation solutions for infrastructure and application deployment, with experience in Terraform, Helm, and GitLab CI/CD
  • Strong programming skills in Shell and Python
  • Extensive experience with Linux system administration and network management
  • Expertise in managing distributed computing systems and near real-time data streaming platforms
  • Fluency in English, with solid communication skills for interacting with global customers

Responsibilities

  • Own the site reliability process and systems through all stages, from design and implementation to deployment and continuous maintenance
  • Ensure the scalability, resilience, and performance of Neptune solutions across global SaaS and client-hosted environments, including platforms such as GCP, Azure, AWS, and on-premise systems
  • Design and implement automation workflows to streamline deployments, upgrades, and incident response, reducing manual tasks and enhancing operational efficiency and consistency
  • Ensure infrastructure and processes meet security and industry standards, protecting sensitive data
  • Partner with development, product, customer success, and client teams to align on requirements and deliver robust, scalable, and reliable solutions
  • Document architecture, operational procedures, and troubleshooting guides to enable knowledge sharing, repeatability, and continuous improvement
  • Participate in on-call rotations, effectively addressing and resolving production incidents to maintain system uptime and performance

Preferred Qualifications

  • Experience in security best practices, compliance standards (e.g., SOC 2), and infrastructure hardening
  • Experience with multi-cloud architecture and cloud-native technologies
  • Experience in high-traffic, petabyte-scale data environments
  • Experience with ClickHouse and Kafka deployments

Benefits

  • 100% remote work with offices (co-works) in Warsaw/Wrocław/Poznań/Kraków available and flexible working hours
  • Participate in the Employee Stock Option Plan and be part of our growth journey
  • 20 paid service-free days per year
  • Space to take action, bring your ideas to life, and make a real impact

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let neptune.ai know you found this job on JobsCollider. Thanks! 🙏