Summary
Join KnowBe4's team as a Site Reliability Engineer (SRE) and play a pivotal role in deploying and maintaining the infrastructure for our globally used security awareness training platform. You will be responsible for designing and implementing CI/CD pipelines, managing AWS/Azure services, and ensuring high availability and security of our infrastructure. Collaborate with development teams, troubleshoot system performance issues, and participate in incident response. This role requires expertise in infrastructure-as-code, cloud services, and automation. KnowBe4 offers a fantastic benefits package including company-wide bonuses, referral bonuses, tuition reimbursement, and more.
Requirements
- Bachelorβs degree in Computer Science, Information Technology, or a related field
- 5+ years equivalent work experience in SRE, DevOps, or infrastructure management may substitute for formal education
- CI/CD Workflows: Expertise in designing and maintaining automated pipelines for continuous delivery
- AWS or Azure Cloud Expertise: Strong knowledge of AWS/Azure services
- Infrastructure-as-Code: Proficiency in Terraform, Ansible, or similar tools
- Monitoring and Observability: Experience with Prometheus, Grafana, Datadog, or other observability platforms
- Automation and Scripting: Proficiency in Python, Bash, or other scripting languages to automate tasks
- Incident Management: Ability to lead incident response efforts and conduct root cause analysis
- Collaboration and Communication: Strong interpersonal skills to work effectively across teams and with stakeholders
Responsibilities
- Manage and maintain environments to ensure high availability and security
- Design and implement CI/CD pipelines to automate software delivery
- Monitor and troubleshoot system performance issues, using observability tools like Prometheus, Grafana, or Datadog
- Collaborate with development teams to align infrastructure efforts with project needs and timelines
- Build and maintain infrastructure as code (IaC) solutions using tools like Terraform
- Manage AWS/Azure services, including ECS/Container Apps, S3/blob storage etc
- Participate in incident response, conducting root cause analysis and post-incident reviews
- Automate manual tasks to improve operational efficiency and reduce technical debt
Benefits
- Company-wide bonuses based on monthly sales targets
- Employee referral bonuses
- Adoption assistance
- Tuition reimbursement
- Certification reimbursement
- Certification completion bonuses
- A relaxed dress code