AI Research Engineer

Tether.to
Summary
Join Tether's AI model team and drive innovation in architecture development for cutting-edge models of various scales. You will enhance intelligence, improve efficiency, and introduce new capabilities to advance the field. Leveraging your deep expertise in LLM architectures and pre-training optimization, you will explore and implement novel techniques and algorithms. Your mission is to push the limits of AI performance through data curation, strengthening baselines, and resolving pre-training bottlenecks. Tether offers a global, remote work environment where you can collaborate with bright minds and make a mark in the fintech space. The company is a leader in the industry, known for its innovative products and commitment to transparency.
Requirements
- A degree in Computer Science or related field
- Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences)
- Hands-on experience contributing to large-scale LLM training runs on large, distributed servers equipped with thousands of NVIDIA GPUs, ensuring scalability and impactful advancements in model performance
- Familiarity and practical experience with large-scale, distributed training frameworks, libraries and tools
- Deep knowledge of state-of-the-art transformer and non-transformer modifications aimed at enhancing intelligence, efficiency and scalability
- Strong expertise in PyTorch and Hugging Face libraries with practical experience in model development, continual pretraining, and deployment
Responsibilities
- Conduct pre-training AI models on large, distributed servers equipped with thousands of NVIDIA GPUs
- Design, prototype, and scale innovative architectures to enhance model intelligence
- Independently and collaboratively execute experiments, analyze results, and refine methodologies for optimal performance
- Investigate, debug, and improve both model efficiency and computational performance
- Contribute to the advancement of training systems to ensure seamless scalability and efficiency on target platforms