Infrastructure/DevOps Engineer

Abnormal Security
Summary
Join Abnormal Security's IT team as an Infrastructure/DevOps Engineer to build and maintain reliable, scalable, and secure infrastructure for AI software engineers. You will collaborate with IT, security, and AI/ML engineering teams to support experimentation, deployment, and monitoring of advanced AI tools and solutions. This fully remote role (US and Canada) is perfect for someone passionate about systems engineering, AI enablement, and solving complex operational challenges. You will architect and manage infrastructure, implement containerization and orchestration, develop CI/CD systems, and collaborate on security and compliance. The ideal candidate thrives in collaborative environments and values automation, self-service tools, and building reliable systems. This role requires strong communication and a customer-first mindset.
Requirements
- 4+ years of experience in DevOps, SRE, or Infrastructure Engineering roles
- Proficiency with cloud providers (AWS preferred), Kubernetes, and Docker
- Experience with infrastructure as code tools (Terraform, Ansible, or Pulumi)
- Strong scripting skills in Python, Bash, or similar
- Familiarity with CI/CD systems such as GitHub Actions, Jenkins, or CircleCI
- Understanding of networking, security, and identity management in cloud environments
- Experience supporting ML workloads and GPU-based infrastructure
- Ability to troubleshoot complex system issues in a distributed environment
- Comfortable working across functional teams and communicating with technical and non-technical stakeholders
Responsibilities
- Architect and manage infrastructure that supports AI/ML pipelines, tools, and data platforms
- Implement and maintain containerization (e.g., Docker) and orchestration (e.g., Kubernetes) environments
- Develop CI/CD systems that integrate with ML workflows and ensure reproducible AI experiments
- Collaborate with security and compliance teams to ensure infrastructure meets data protection standards
- Automate provisioning and deployment using IaC tools like Terraform or Pulumi
- Monitor and troubleshoot infrastructure issues with tools like Prometheus, Grafana, and ELK stack
- Partner with AI and software engineers to optimize platform performance and resource utilization
- Maintain clear, accessible documentation to scale platform knowledge across the org
Preferred Qualifications
- Familiarity with MLOps tools like MLflow, Kubeflow, or SageMaker
- Experience with AI platform infrastructure (e.g., model serving, feature stores)
- Knowledge of logging and monitoring frameworks (e.g., Fluentd, Loki)
- Background in supporting data platforms like Snowflake, Databricks, or Hadoop
- AWS Certified
- Experience working in high-growth startups or tech companies
Benefits
- Bonus
- Restricted stock units (RSUs)