Remote Senior Site Reliability Engineer

Logo of Hypori

Hypori

๐Ÿ“Remote - United States

Job highlights

Summary

Join Hypori Inc, a leading provider of SaaS cybersecurity solutions, in building a secure multi-cloud platform for Hypori. As a Senior Site Reliability Engineer, you will own the design and construction of process automation and tooling to support a world-class cloud platform.

Requirements

  • BS degree in Computer Science or related fields, with at least 10 years of related work experience
  • 6+ years of operating within cloud infrastructure environmentโ€ฏin AWS and Azure utilizing Infrastructure as Code principles
  • 6+ years of Engineering, SRE, and DevOps experience in an agile environment
  • 6+ years supporting a 24x7 mission-critical SaaS environment
  • Experience in Python, Java, Go, or other language
  • Experience in using observability tools such as Datadog, Grafana, or New Relic
  • Experience in integrating monitoring and alerting systems using Webhooks and APIs
  • Expert in Terraform, Puppet, and Git
  • Expert in Linux system operations, debugging, networking, software development, and cloud concepts
  • Experience with Release automation, system administration, and configuration management
  • Strong experience with containerization technology and Kubernetes
  • Expertise in security, monitoring, and performance aspects of cloud-native applications
  • Experience in SRE principles such as SLIs, SLOs, resilience, scaling, and performance
  • Professional experience with GitOps, Jenkins, or other workflow tools
  • Ability to debug, optimize code, and automate routine tasks
  • Experience in designing and building failover and recovery automation
  • Excellent verbal, written, and interpersonal communication skills
  • Outstanding problem-solving and decision-making skills
  • Must be a self-starter with drive, a high level of initiative, and self-direction. A problem solver and able to develop solutions to complex issues
  • Must be adept at working in a matrix position where results must be achieved across various departments without line authority. Comfortable working with all levels of the organization

Responsibilities

  • Propose, design, develop, and ship platform software to increase product reliability and efficiency
  • Guide reliability practice through the SSDLC through activities including architecture reviews, code reviews, capacity/scaling planning, and test automation
  • Maintain service and platform health through monitoring and follow-the-sun incident response
  • Run infrastructure with Terraform, CI/CD, K8s, and other appropriate cloud tools
  • Improve reliability by leading incident investigations and postmortems, documenting the findings, and using code and automation to create repeatable actions to prevent problem recurrence
  • Improve operational processes continuously (release, deployment, patches, etc.) to make them as reliable as possible
  • Support engineering efforts to implement new cloud-based projects
  • Automate deployment and maintenance tasks using infrastructure as code and DevOps principles
  • Communicate technical designs and issues along with proposed solutions
  • Mentor junior SRE engineers

Benefits

  • Medical, dental, and vision insurance
  • Parental leave
  • Life and disability packages
  • 401(k) plan with employer-matching contributions that vest starting from your first day of employment

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Hypori know you found this job on JobsCollider. Thanks! ๐Ÿ™