OctoML is hiring a
Staff Cloud Infrastructure Engineer

Logo of OctoML

OctoML

πŸ’΅ $185k-$220k
πŸ“Remote - Worldwide

Summary

OctoAI is seeking a Staff Infrastructure Engineer to lead engineering efforts behind their Inference Serving platform. The role involves architecting cloud infrastructure solutions, developing CI/CD pipelines, managing incidents, implementing monitoring tools, working with software engineers, automating routine tasks, ensuring security and compliance, creating documentation, and more.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience
  • 7+ years of experience in cloud infrastructure engineering
  • Extensive experience with cloud platforms such as AWS and Google Cloud
  • Deep understanding of Kubernetes cluster management, debugging and troubleshooting
  • Excellent communication and collaboration skills
  • Background in software engineering with a solid understanding of SDLC
  • Strong Linux systems understanding including Bash / shell scripting and debugging
  • Proficiency in Go, Python, and/or JavaScript/TypeScript for building tools and automation
  • Expertise with CI/CD systems such as GitHub Actions, GitLab CI, or similar
  • Experience with infrastructure as code tools like Terraform, CloudFormation, or Ansible
  • Familiarity with observability tools such as Prometheus, Grafana, Datadog, or equivalent
  • Proficient in Git and version control practices
  • Strong problem-solving and debugging skills
  • Proven effectiveness in leading cross-functional projects
  • Track record of building patterns and solutions leveraged across an organization

Responsibilities

  • Architect and implement cloud infrastructure solutions
  • Develop and maintain automated CI/CD pipelines
  • Lead and manage incident response efforts
  • Implement and manage monitoring tools and performance tuning practices
  • Work closely with software engineers, product, and other stakeholders
  • Develop and maintain tools to automate routine tasks
  • Ensure all infrastructure meets security and compliance requirements
  • Create and maintain comprehensive documentation for infrastructure designs, processes, and incident response actions

Benefits

  • Comprehensive Healthcare: Fully covered premiums for employees and their dependents, including Medical, Dental, Vision, Life Insurance, and Disability Insurance
  • Competitive Compensation: A mix of salary, bonuses, and meaningful stock options
  • Financial Benefits: Flexible Spending Accounts for healthcare and dependent care, as well as a Health Savings Account for those opting for a high deductible plan
  • Future Planning: 401(k) options
  • Flexible Work Options: Remote and teleworking capabilities
  • Work-Life Balance: Flexible work hours, generous time off policies, company-sanctioned downtime twice a year, and company-paid holidays
  • Parental Benefits: Comprehensive parental leave plans for all new parents
  • Volunteer Time Off (VTO): Four days a year to give back and make a difference in communities
  • Additional Leaves: Including disability, paid family medical leave, and paid military leave

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Jobs

Please let OctoML know you found this job on JobsCollider. Thanks! πŸ™