Staff Site Reliability Engineer at neptune.ai

Summary

Join our fully remote team as an experienced Staff Site Reliability Engineer. You will play a key role in designing, optimizing, and maintaining our infrastructure, ensuring scalability, resilience, and performance of Neptune solutions. This position requires deep understanding of distributed systems and performance optimization. You will collaborate with various teams and contribute to automation, security, and incident management. We offer a flexible remote work environment, employee stock options, paid time off, and opportunities for ownership and impact.

Requirements

6+ years in SRE, DevOps, or related roles
Strong experience managing and optimizing Kubernetes clusters for robust, scalable, and efficient infrastructure
Proven expertise in designing and implementing automation solutions for infrastructure and application deployment, with experience in Terraform, Helm, and GitLab CI/CD
Strong programming skills in Shell and Python
Extensive experience with Linux system administration and network management
Expertise in managing distributed computing systems and near real-time data streaming platforms
Fluency in English, with solid communication skills for interacting with global customers

Responsibilities

Own the site reliability process and systems through all stages, from design and implementation to deployment and continuous maintenance
Ensure the scalability, resilience, and performance of Neptune solutions across global SaaS and client-hosted environments, including platforms such as GCP, Azure, AWS, and on-premise systems
Design and implement automation workflows to streamline deployments, upgrades, and incident response, reducing manual tasks and enhancing operational efficiency and consistency
Ensure infrastructure and processes meet security and industry standards, protecting sensitive data
Partner with development, product, customer success, and client teams to align on requirements and deliver robust, scalable, and reliable solutions
Document architecture, operational procedures, and troubleshooting guides to enable knowledge sharing, repeatability, and continuous improvement
Participate in on-call rotations, effectively addressing and resolving production incidents to maintain system uptime and performance

Preferred Qualifications

Experience in security best practices, compliance standards (e.g., SOC 2), and infrastructure hardening
Experience with multi-cloud architecture and cloud-native technologies
Experience in high-traffic, petabyte-scale data environments
Experience with ClickHouse and Kafka deployments

Benefits

100% remote work with offices (co-works) in Warsaw/Wrocław/Poznań/Kraków available and flexible working hours
Participate in the Employee Stock Option Plan and be part of our growth journey
20 paid service-free days per year
Space to take action, bring your ideas to life, and make a real impact

Staff Site Reliability Engineer

neptune.ai

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Earnest

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Stash

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level