Site Reliability Engineer

Logo of Keeper Security, Inc.

Keeper Security, Inc.

๐Ÿ“Remote - United States

Job highlights

Summary

Join Keeper Security as a Site Reliability Engineer and contribute to the reliability and performance of our globally trusted cybersecurity software. This 100% remote position (with hybrid options in select locations) offers the chance to work with a modern tech stack and gain valuable skills in a fast-growing company. You will design, implement, and manage infrastructure for continuous integration and delivery, ensuring high availability and performance of production systems. Collaboration with development and security teams is key, along with troubleshooting and promoting a culture of reliability. Keeper offers a competitive benefits package.

Requirements

  • 5+ years of experience as a Site Reliability Engineer, DevOps Engineer, or a similar role with a focus on infrastructure and automation
  • Proficiency with cloud platforms (e.g., AWS, Azure, Google Cloud) and infrastructure management tools like Terraform, Kubernetes, or Docker
  • Experience with CI/CD tools such as Jenkins, GitHub Actions, or GitLab CI for automating build, test, and deployment pipelines
  • Strong understanding of monitoring and observability tools (e.g., Prometheus, Grafana, New Relic) to ensure system reliability
  • Solid experience with Linux, Mac OS X, and Windows systems, with knowledge of scripting languages like Python, Bash, or Go
  • In-depth knowledge of networking concepts, security best practices, and incident management
  • Ability to communicate complex technical issues and solutions clearly and effectively within a cross-functional team
  • A proactive problem-solver with a collaborative mindset and the ability to work independently in a fast-paced environment

Responsibilities

  • Design, implement, and manage the infrastructure and tools required for continuous integration, continuous delivery (CI/CD), and reliable software deployment
  • Ensure high availability and performance of production systems, monitoring critical services to prevent and resolve outages
  • Manage infrastructure automation, leveraging tools like Terraform, Ansible, or Kubernetes to scale systems efficiently and securely
  • Support security audits and compliance efforts (e.g., SOC2, ISO 27001, FedRAMP) to ensure our infrastructure meets necessary regulatory and security standards
  • Collaborate with development teams to optimize build and release pipelines, incorporating automated testing, code coverage, and performance monitoring
  • Troubleshoot issues related to system performance, software reliability, and capacity management to ensure the stability of production systems
  • Stay up to date with industry trends in cloud infrastructure, automation, and DevOps practices to continuously improve system design, monitoring, and security
  • Contribute to creating and maintaining monitoring and alerting systems to proactively identify and address reliability issues
  • Promote a culture of reliability by working with teams to define and achieve reliability goals (e.g., uptime, latency, incident response times)

Preferred Qualifications

Bachelorโ€™s degree in Computer Science, Engineering, or a related field

Benefits

  • Medical, Dental & Vision (Inclusive of domestic partnerships)
  • Employer Paid Life Insurance & Employee/Spouse/Child Supplemental life
  • Voluntary Short/Long Term Disability Insurance
  • 401k (Roth/Traditional)
  • A generous PTO plan that celebrates your commitment and seniority (including paid Bereavement/Jury Duty, etc)
  • Above market annual bonuses

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.