Senior AI Inference Engineer

Tether.to
Summary
Join Tether and be part of a global team building the next generation of AI models. The company is committed to making advanced AI technologies more accessible, focusing on building AI solutions for both large-scale applications and smaller, efficient models for edge devices. This role involves deploying machine learning models to edge devices using various frameworks, collaborating with researchers to transition models from research to production, and integrating AI features into existing products. Tether offers a fully remote work environment, uniting talent globally, and has a history of rapid growth and efficient operations. The ideal candidate will have excellent programming skills, experience with specific AI/ML frameworks and platforms, and a strong background in AI research and development.
Requirements
- Excellent programming skills in Python, and a solid understanding of C/C++
- Experience with platforms such as Llama.cpp, ONNX, TVM, MLC LLM, and IREE (MLIR), which facilitate the deployment of models to specific GPU architectures
- Experience in NLP, transformers, fine-tuning, computer vision, TensorFlow, PyTorch, JAX and CUDA toolkit
- Demonstrated ability to rapidly assimilate new technologies and techniques
- A degree in Computer Science, AI, Machine Learning, or a related field, complemented by a solid track record in AI R&D
Responsibilities
- Work on deploying machine learning models to edge devices using frameworks such as Llama.cpp, ONNX, TVM, MLC LLM, and IREE (MLIR)
- Collaborate closely with researchers to assist in coding, training and transitioning models from research to production environments
- Integrate AI features into existing products, enriching them with the latest advancements in machine learning
Preferred Qualifications
Experience working with LLMs, fine tuning, RAG, transformers is a plus