Summary

Join Xebia, a global leader in digital solutions, and become a key member of our Site Reliability Engineering (SRE) team. You will recruit, develop, and mentor the SRE team, setting goals and tracking achievements. Responsibilities include defining and implementing SRE best practices, delivering Terraform-based automation in Google Cloud, designing secure IAM roles, and collaborating with development and security teams. This role requires extensive experience in software development, distributed systems, and cloud computing, along with strong problem-solving and leadership skills. The ideal candidate will possess deep technical expertise in GCP and experience with IaC tooling and monitoring tools. Xebia offers a dynamic work environment focused on innovation and employee development.

Requirements

8 years of experience with data structures or algorithms
5 years of experience with software development in one or more programming languages
3 years of experience managing people or teams, leading projects, and designing, analyzing, and troubleshooting distributed systems
Excellent problem-solving and analytical skills
Strong understanding of software development lifecycle (SDLC) and DevOps principles
Deep technical expertise in cloud computing platforms (GCP preferred)
Proficiency with Infrastructure-as-a-code (IaC) tooling, such as Terraform
Proven experience with monitoring tools (Prometheus, Datadog, New Relic)
Experience with automation frameworks (Ansible, Puppet, Chef)
Fluent in English (B2-C2)
Bachelor’s degree in Computer Science, a related field, or equivalent practical experience
Work from the European Union region and a work permit are required

Responsibilities

Recruiting, developing, and mentoring the SRE team, including setting goals and tracking their achievement
Supporting engineers' skill development through coaching and clear expectation setting
Defining and implementing SRE best practices, standards and processes, including Service Level Objectives (SLOs), to ensure service reliability and performance
Delivering Terraform-based automation in Google Cloud, including project creation, user management, and service enablement, while optimizing cloud costs
Designing secure IAM roles, permissions, and monitoring systems to enhance security, user experience, and proactive issue detection
Collaborating with development and security teams to ensure reliability, system security, and compliance, while proactively addressing potential issues
Prioritizing a customer-focused approach, delivering exceptional user experiences for infrastructure services with clear and effective communication
Analyzing system metrics to identify performance bottlenecks and opportunities for improvement and implement capacity planning strategies for resilience under high load
Continuously monitoring and optimizing system performance

Preferred Qualifications

Google Cloud, Azure or Kubernetes certifications

Site Reliability Engineering Manager

Xebia Poland

Job highlights

Summary

Requirements

Responsibilities

Preferred Qualifications

Remote

DevOps

Manager

Share this job:

Similar Remote Jobs

Technical Manager, Site Reliability Engineering

Coalfire

Remote

DevOps

Manager

Staff Technical Program Manager, Site Reliability Engineering

MongoDB

Remote

DevOps

Senior

Manager, Site Reliability Engineering

GoDaddy

Remote

DevOps

Manager