Infrastructure/DevOps Engineer

Abnormal Security Logo

Abnormal Security

๐Ÿ’ต $114k-$135k
๐Ÿ“Remote - United States

Summary

Join Abnormal Security's IT team as an Infrastructure/DevOps Engineer to build and maintain reliable, scalable, and secure infrastructure for AI software engineers. You will collaborate with IT, security, and AI/ML engineering teams to support experimentation, deployment, and monitoring of advanced AI tools and solutions. This fully remote role (US and Canada) is perfect for someone passionate about systems engineering, AI enablement, and solving complex operational challenges. You will architect and manage infrastructure, implement containerization and orchestration, develop CI/CD systems, and collaborate on security and compliance. The ideal candidate thrives in collaborative environments and values automation, self-service tools, and building reliable systems. This role requires strong communication and a customer-first mindset.

Requirements

  • 4+ years of experience in DevOps, SRE, or Infrastructure Engineering roles
  • Proficiency with cloud providers (AWS preferred), Kubernetes, and Docker
  • Experience with infrastructure as code tools (Terraform, Ansible, or Pulumi)
  • Strong scripting skills in Python, Bash, or similar
  • Familiarity with CI/CD systems such as GitHub Actions, Jenkins, or CircleCI
  • Understanding of networking, security, and identity management in cloud environments
  • Experience supporting ML workloads and GPU-based infrastructure
  • Ability to troubleshoot complex system issues in a distributed environment
  • Comfortable working across functional teams and communicating with technical and non-technical stakeholders

Responsibilities

  • Architect and manage infrastructure that supports AI/ML pipelines, tools, and data platforms
  • Implement and maintain containerization (e.g., Docker) and orchestration (e.g., Kubernetes) environments
  • Develop CI/CD systems that integrate with ML workflows and ensure reproducible AI experiments
  • Collaborate with security and compliance teams to ensure infrastructure meets data protection standards
  • Automate provisioning and deployment using IaC tools like Terraform or Pulumi
  • Monitor and troubleshoot infrastructure issues with tools like Prometheus, Grafana, and ELK stack
  • Partner with AI and software engineers to optimize platform performance and resource utilization
  • Maintain clear, accessible documentation to scale platform knowledge across the org

Preferred Qualifications

  • Familiarity with MLOps tools like MLflow, Kubeflow, or SageMaker
  • Experience with AI platform infrastructure (e.g., model serving, feature stores)
  • Knowledge of logging and monitoring frameworks (e.g., Fluentd, Loki)
  • Background in supporting data platforms like Snowflake, Databricks, or Hadoop
  • AWS Certified
  • Experience working in high-growth startups or tech companies

Benefits

  • Bonus
  • Restricted stock units (RSUs)

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.