Senior SRE Engineer
Pango Group
Summary
Join Pango Group as a Senior Site Reliability Engineer and play a crucial role in maintaining the reliability, availability, and performance of our systems and applications. You will collaborate with development and operations teams, implement best practices, automate processes, and ensure scalable infrastructure. Day-to-day responsibilities include system monitoring and incident response, automation and infrastructure as code implementation, performance optimization, collaboration with development teams, documentation and reporting, disaster recovery planning, and ensuring security best practices. This role requires proven experience in SRE, DevOps, or a related field, strong cloud and container orchestration knowledge, proficiency in scripting and CI/CD tools, and a solid understanding of networking and security. Pango Group offers opportunities to solve real customer problems, see your impact, and accelerate your career in a fast-paced, growth-oriented environment.
Requirements
- Proven experience in a Site Reliability Engineering, DevOps, or related role
- Strong knowledge of cloud services (AWS, Azure, Google Cloud) and container orchestration (Kubernetes, Docker)
- Proficiency in scripting languages (Python, Bash, ansible, etc.) and experience with CI/CD tools (Jenkins, GitLab CI/CD, etc.) and infrastructure as code tools (Terraform, Ansible)
- 5+ years of proven track record with production monitoring using Prometheus, ELK, Grafana and OpsGenie/PagerDuty
- 5+ years of experience in Linux system administration (preferably Ubuntu)
- Solid understanding of networking, security, system architecture, and data center operations in a fast-paced, 24x7, production environment
- Strong understanding of networking concepts, protocols (TCP/IP, BGP, OSPF), and technologies (LAN, WAN, VPN) with proficiency in network monitoring tools and software
Responsibilities
- Develop and implement monitoring tools to ensure system health
- Respond to incidents, troubleshoot issues, and provide timely resolutions
- Design and implement automation solutions to manage infrastructure and application deployment using tools like Terraform, Ansible, or similar technologies
- Analyze system performance and capacity; implement improvements to enhance system reliability and efficiency
- Work closely with development teams to improve system design and deployment practices
- Advocate for reliability improvements in the software development lifecycle
- Maintain thorough documentation of system architecture, processes, and incident response procedures
- Provide regular reports on system performance and reliability metrics
- Design and implement disaster recovery plans and ensure effective data backup solutions are in place
- Collaborate with security teams to ensure best practices are followed to protect systems and data
Benefits
- Solve real customer problems . Pango Group’s point solutions allow consumers to address their immediate cyber protection needs. Our mandate is to continuously anticipate our customers’ evolving digital security needs to create best-in-class solutions aimed at keeping them safe
- See your impact. We are a scrappy, nimble organization where individual contributions are needed and valued. You will see your impact every day
- Accelerate your career. As we expand, you will have the opportunity to learn new technologies, products, and markets in a fast-paced, growth-oriented environment
- Most importantly, you’ll get to work with other talented people at a company where people matter. If you want to put your fingerprint on an organization and leapfrog your growth, this is the place for you