Hypori is hiring a
Senior Site Reliability Engineer

closed
Logo of Hypori

Hypori

πŸ’΅ ~$150k-$222k
πŸ“Remote - United States

Summary

Hypori Inc is seeking a Senior Site Reliability Engineer to build a secure multi-cloud platform for their Hypori Halo product. The engineer will own the design and construction of process automation, collaborate with various teams, and improve reliability through incident investigations and postmortems.

Requirements

  • BS degree in Computer Science or related fields, with at least 10 years of related work experience
  • 6+ years of operating within cloud infrastructure environment in AWS and Azure utilizing Infrastructure as Code principles
  • 6+ years of Engineering, SRE, and DevOps experience in an agile environment
  • 6+ years supporting a 24x7 mission-critical SaaS environment
  • Experience in Python, Java, Go, or other language
  • Experience in using observability tools such as Datadog, Grafana, or New Relic
  • Experience in integrating monitoring and alerting systems using Webhooks and APIs
  • Expert in Terraform, Puppet, and Git
  • Expert in Linux system operations, debugging, networking, software development, and cloud concepts
  • Experience with Release automation, system administration, and configuration management
  • Strong experience with containerization technology and Kubernetes
  • Expertise in security, monitoring, and performance aspects of cloud-native applications
  • Experience in SRE principles such as SLIs, SLAOs, resilience, scaling, and performance
  • Professional experience with GitOps, Jenkins, or other workflow tools
  • Ability to debug, optimize code, and automate routine tasks
  • Experience in designing and building failover and recovery automation
  • Excellent verbal, written, and interpersonal communication skills
  • Outstanding problem-solving and decision-making skills
  • Must be a self-starter with drive, a high level of initiative, and self-direction. A problem solver and able to develop solutions to complex issues
  • Must be adept at working in a matrix position where results must be achieved across various departments without line authority. Comfortable working with all levels of the organization

Responsibilities

  • Propose, design, develop, and ship platform software to increase product reliability and efficiency
  • Guide reliability practice through the SSDLC through activities including architecture reviews, code reviews, capacity/scaling planning, and test automation
  • Maintain service and platform health through monitoring and follow-the-sun incident response
  • Run infrastructure with Terraform, CI/CD, K8s, and other appropriate cloud tools
  • Improve reliability by leading incident investigations and postmortems, documenting the findings, and using code and automation to create repeatable actions to prevent problem recurrence
  • Improve operational processes continuously (release, deployment, patches, etc.) to make them as reliable as possible
  • Support engineering efforts to implement new cloud-based projects
  • Automate deployment and maintenance tasks using infrastructure as code and DevOps principles
  • Communicate technical designs and issues along with proposed solutions
  • Mentor junior SRE engineers

Benefits

  • Medical, dental, and vision insurance
  • Parental leave
  • Life and disability packages
  • 401(k) plan with employer-matching contributions that vest starting from your first day of employment
This job is filled or no longer available

Similar Jobs