Summary

Join Pango Group as a Site Reliability Engineer (SRE) and play a crucial role in maintaining the reliability, availability, and performance of our systems and applications. You will collaborate with development and operations teams, implement best practices, automate processes, and ensure scalable infrastructure. Key responsibilities include system monitoring and incident response, automation and infrastructure as code implementation, performance optimization, collaboration with development teams, documentation and reporting, disaster recovery and backup planning, and adherence to security best practices. This role offers the opportunity to solve real customer problems, see your impact daily, and accelerate your career in a fast-paced, growth-oriented environment. Pango Group is committed to inclusivity and providing a welcoming work environment.

Requirements

Bachelor’s degree in Computer Science, Engineering, related field, or equivalent experience
Proven experience in a Site Reliability Engineering, DevOps, or related role
Strong knowledge of cloud services (AWS, Azure, Google Cloud) and container orchestration (Kubernetes, Docker)
Proficiency in scripting languages (Python, Bash, ansible, etc.) and experience with CI/CD tools (Jenkins, GitLab CI/CD, etc.) and infrastructure as code tools (Terraform, Ansible)
3+ years of proven track record with production monitoring using Prometheus, ELK, Grafana and OpsGenie/PagerDuty
3+ years of experience in Linux system administration (preferably Ubuntu)
Solid understanding of networking, security, system architecture, and data center operations in a fast-paced, 24x7, production environment
Strong understanding of networking concepts, protocols (TCP/IP, BGP, OSPF), and technologies (LAN, WAN, VPN) with proficiency in network monitoring tools and software

Responsibilities

Develop and implement monitoring tools to ensure system health
Respond to incidents, troubleshoot issues, and provide timely resolutions
Design and implement automation solutions to manage infrastructure and application deployment using tools like Terraform, Ansible, or similar technologies
Analyze system performance and capacity; implement improvements to enhance system reliability and efficiency
Work closely with development teams to improve system design and deployment practices
Advocate for reliability improvements in the software development lifecycle
Maintain thorough documentation of system architecture, processes, and incident response procedures
Provide regular reports on system performance and reliability metrics
Design and implement disaster recovery plans and ensure effective data backup solutions are in place
Collaborate with security teams to ensure best practices are followed to protect systems and data

Benefits

Solve real customer problems
See your impact
Accelerate your career
Work with other talented people at a company where people matter

SRE Engineer

Pango Group

Job highlights

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Cloud Operations And Support Engineer - SRE

Docebo

Remote

DevOps

Mid-level

Platform Engineer, SRE

Codurance

Remote

DevOps

Mid-level

Senior Infrastructure Engineer, SRE

Flex

Remote

DevOps

Senior

Senior SRE Engineer

Pango Group

Remote

DevOps

Senior

Site Reliability Engineer (SRE)

techruiter.

Remote

DevOps

Mid-level

L3 Cloud DevOps Engineer/Site Reliability Engineer (SRE)

NTD Software

Remote

DevOps

Mid-level

Senior SRE & DevOps Engineer

Rula

Remote

DevOps

Mid-level

SRE Senior/Expert Site Reliability Engineer

Lucca

Remote

DevOps

Senior

Platform Engineer

Constructor

Remote

DevOps

Mid-level