Senior AI Infrastructure Engineer

TetraScience Logo

TetraScience

πŸ“Remote - United States

Summary

Join TetraScience, a leader in Scientific Data and AI Cloud, as a Senior AI Infrastructure Engineer. Design, build, and scale AI and data infrastructure, focusing on cloud-based MLOps pipelines. Collaborate with AI engineers, data engineers, and platform teams. Maintain and improve the performance, reliability, and cost-efficiency of AI models. Contribute to the design and evolution of the AI platform. Integrate AI models and LLMs into production systems. This role requires extensive experience in AI/ML infrastructure and strong coding skills.

Requirements

  • 7+ years of professional experience in software engineering and infrastructure engineering
  • Extensive experience building and maintaining AI/ML infrastructure in production, including model, deployment, and lifecycle management
  • Strong knowledge of AWS and infrastructure-as-code frameworks, ideally with CDK
  • Expert-level coding skills in TypeScript and Python building robust APIs and backend services
  • Production-level experience with Databricks MLFlow, including model registration, versioning, asset bundles, and model serving workflows
  • Proven ability to design reliable, secure, and scalable infrastructure for both real-time and batch ML workloads
  • Ability to articulate ideas clearly, present findings persuasively, and build rapport with clients and team members
  • Strong collaboration skills and the ability to partner effectively with cross-functional teams

Responsibilities

  • Design, implement, and maintain cloud-native infrastructure to support AI and data workloads, with a focus on AI and data platforms such as Databricks and AWS Bedrock
  • Build and manage scalable data pipelines to ingest, transform, and serve data for ML and analytics
  • Develop infrastructure-as-code using tools like Cloudformation, AWS CDK to ensure repeatable and secure deployments
  • Collaborate with AI engineers, data engineers, and platform teams to improve the performance, reliability, and cost-efficiency of AI models in production
  • Drive best practices for observability, including monitoring, alerting, and logging for AI platforms
  • Contribute to the design and evolution of our AI platform to support new ML frameworks, workflows, and data types
  • Stay current with new tools and technologies to recommend improvements to architecture and operations
  • Integrate AI models and large language models (LLMs) into production systems to enable use cases using architectures like retrieval-augmented generation (RAG)

Preferred Qualifications

  • Familiarity with emerging LLM frameworks such as DSPy for advanced prompt orchestration and programmatic LLM pipelines
  • Understanding of LLM cost monitoring, latency optimization, and usage analytics in production environments
  • Knowledge of vector databases / embeddings stores (e.g., OpenSearch) to support semantic search and RAG

Benefits

  • 100% employer-paid benefits for all eligible employees and immediate family members
  • Unlimited paid time off (PTO)
  • 401K
  • Flexible working arrangements - Remote work
  • Company paid Life Insurance, LTD/STD
  • A culture of continuous improvement where you can grow your career and get coaching

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.