Senior SRE Engineer

Logo of Pango Group

Pango Group

πŸ’΅ $133k-$180k
πŸ“Remote - United States

Job highlights

Summary

Join Pango Group as a Senior Site Reliability Engineer and play a crucial role in ensuring the reliability, availability, and performance of our services and infrastructure. Collaborate with development and operations teams to implement best practices, drive infrastructure strategy, and enhance service delivery. You will design, implement, and manage scalable infrastructure, develop monitoring tools, lead incident management, create automation scripts, and optimize system performance. This role requires strong proficiency in cloud platforms, Linux/Unix systems, networking concepts, and scripting languages. Pango Group offers a generous compensation package including competitive pay, health and wellness benefits, retirement savings plans, and parental leave.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
  • 5+ years of experience in site reliability engineering, systems engineering, or DevOps
  • Strong proficiency in cloud platforms (AWS, Azure, GCP)
  • Strong knowledge of Linux/Unix systems
  • Strong understanding of networking concepts, protocols (TCP/IP, BGP, OSPF), and technologies (LAN, WAN, VPN)
  • Familiarity with network security practices and firewalls (e.g., Palo Alto, Fortinet)
  • Strong proficiency in network monitoring tools and software
  • Experience with containerization (Docker, Kubernetes) and orchestration tools
  • Proficiency in scripting languages (Python, Bash, Go, etc.) and infrastructure as code tools (Terraform, Ansible)
  • Proficiency in navigating data analytic tools such as Big Query and Databricks
  • Proficiency in scripting languages (Python, Bash, ansible, etc.) and experience with CI/CD tools (Jenkins, GitLab CI/CD, etc.)
  • Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK stack)
  • Strong analytical and troubleshooting skills with a proactive approach to identifying and resolving issues
  • Solid understanding of networking, security, database management, and data center operations in a fast-paced, 24x7, production environment
  • Excellent communication skills, both written and verbal, with the ability to collaborate effectively across teams

Responsibilities

  • Design, implement, and manage scalable and reliable infrastructure using cloud services and on-premises solutions
  • Develop and maintain monitoring tools to proactively identify issues
  • Lead incident management efforts, conducting root cause analysis and implementing preventative measures
  • Create automation scripts and tools to streamline operations, improve deployment processes, and reduce manual intervention
  • Work closely with software engineering teams to ensure that new features are reliable, scalable, and maintainable
  • Analyze system performance and capacity and recommend improvements to enhance efficiency
  • Analyze system performance and recommend capacity enhancements and optimizations
  • Maintain comprehensive documentation for systems, processes, and incident management procedures
  • Provide guidance and mentorship to junior team members, fostering a culture of continuous learning and improvement
  • Work with vendors to troubleshoot and resolve issues, negotiate contracts, pricing, and terms with vendors to secure advantageous agreements, and monitor vendor performance to ensure compliance with contracts and service level agreements

Benefits

  • Competitive pay
  • Generous health and wellness benefits
  • Retirement savings plans
  • Parental leave

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let Pango Group know you found this job on JobsCollider. Thanks! πŸ™