Infrastructure/DevOps Engineer

Abnormal Security
Summary
Join Abnormal Security's IT team as an Infrastructure/DevOps Engineer supporting AI platforms. Build and maintain reliable, scalable, and secure infrastructure to empower AI software engineers. Collaborate with IT, security, and AI/ML engineering teams to ensure foundational systems support AI tools and solutions. This fully remote role (US/Canada) is perfect for someone passionate about systems engineering, AI enablement, and solving complex operational challenges. You will architect and manage infrastructure, implement containerization and orchestration, develop CI/CD systems, and collaborate on security and compliance. The ideal candidate thrives in collaborative environments and prioritizes automation, self-service tools, and reliable systems.
Requirements
- 4+ years of experience in DevOps, SRE, or Infrastructure Engineering roles
- Proficiency with cloud providers (AWS preferred), Kubernetes, and Docker
- Experience with infrastructure as code tools (Terraform, Ansible, or Pulumi)
- Strong scripting skills in Python, Bash, or similar
- Familiarity with CI/CD systems such as GitHub Actions, Jenkins, or CircleCI
- Understanding of networking, security, and identity management in cloud environments
- Experience supporting ML workloads and GPU-based infrastructure
- Ability to troubleshoot complex system issues in a distributed environment
- Comfortable working across functional teams and communicating with technical and non-technical stakeholders
Responsibilities
- Architect and manage infrastructure that supports AI/ML pipelines, tools, and data platforms
- Implement and maintain containerization (e.g., Docker) and orchestration (e.g., Kubernetes) environments
- Develop CI/CD systems that integrate with ML workflows and ensure reproducible AI experiments
- Collaborate with security and compliance teams to ensure infrastructure meets data protection standards
- Automate provisioning and deployment using IaC tools like Terraform or Pulumi
- Monitor and troubleshoot infrastructure issues with tools like Prometheus, Grafana, and ELK stack
- Partner with AI and software engineers to optimize platform performance and resource utilization
- Maintain clear, accessible documentation to scale platform knowledge across the org
Preferred Qualifications
- Familiarity with MLOps tools like MLflow, Kubeflow, or SageMaker
- Experience with AI platform infrastructure (e.g., model serving, feature stores)
- Knowledge of logging and monitoring frameworks (e.g., Fluentd, Loki)
- Background in supporting data platforms like Snowflake, Databricks, or Hadoop
- AWS Certified
- Experience working in high-growth startups or tech companies
Benefits
- Bonus
- Restricted stock units (RSUs)
- Benefits