Site Reliability Engineer

Everbridge
Summary
Join the Everbridge Federal Platform team and play a critical role in ensuring the service quality and availability of Everbridge's solutions. You will design, deploy, and manage services at scale, champion SRE best practices, and leverage cutting-edge technology. The platforms you support are vital for delivering timely information to protect people and businesses. This position involves working in a DoD IL4 cloud environment, maintaining AWS infrastructure, and collaborating with Agile teams. You will build upon operational availability, security, scalability, and reliability, while also participating in on-call rotations to resolve production issues. This role requires a strong background in AWS, Kubernetes, and DevOps/SRE principles.
Requirements
- 2+ years of technical AWS experience, managing and owning systems in a production environment
- 1+ years of Kubernetes experience (EKS, AKS, GKE, Self-managed)
- 2+ years of Terraform or similar IaC experience
- 2+ years of experience with MongoDB or ElasticSearch/ELK administration
- 2+ years of experience with application development or writing automation in Java
- Experience with the following tooling: GitLab CICD, Packer, Docker, EKS, Kubernetes, Spinnaker, Helm, Argo, Jenkins
- Experience with Telemetry tools such as Datadog, SumoLogic, Grafana, Prometheus
- Experience with configuration management tools such as Salt, Ansible, AWS user_data
- Experience with a DevOps/SRE production environment
- Experience with Agile practices
- UNIX/Linux experience
- Experience working on DoD programs
- Currently hold a Secret Clearance or a be a US citizen with the ability to obtain a Secret Clearance
- Must have or be able to obtain and maintain DoD 8140 โIntermediateโ level or higher certification (formally DoD 8170 IAM Level II)
Responsibilities
- Keep people safe and businesses running
- Be an integral member of the team implementing our platform in a DoD IL4 cloud environment
- Maintain infrastructure from conception to completion within AWS. Including services such as VPCs, EC2, Transit Gateways, IAM roles and policies, Route53, S3, SGs, NACLs
- Build upon the operational availability, security, scalability, efficiency, monitoring, instrumentation, and overall service reliability of Everbridge's solutions
- Collaborate across Agile teams with Architects, Developers, Quality, Data, Security, and other engineers on designing and implementing highly reliable solutions
- Research and implement SRE and best practices and by creating automation, cross-functional collaboration, and data-driven decisions to reinforce the integrity and reliability of our systems
- Participate in a rotating on-call rotation to resolve production escalations
Benefits
- Healthcare
- Dental
- Parental planning
- Mental health benefits
- Disability income benefits
- Life and AD&D insurance
- A 401(k) plan and match
- Paid time off
- Fitness reimbursements
Share this job:
Similar Remote Jobs







