Ml Engineer
Vosyn
Summary
Join Vosyn, a pre-seed AI startup, as a Senior ML Engineer SME specializing in audio processing, particularly speech-to-speech (S2S) technology. This fully remote, contract position (10-15 hours/week) offers the opportunity to provide strategic guidance and technical leadership on key components of our S2S pipeline. You will collaborate with the VosynCore team, solving complex challenges in TTS model development and optimization. Your expertise will be crucial in driving project progress and ensuring our S2S pipeline meets industry standards. This role involves mentoring team members and leveraging your deep knowledge of TTS model architectures and waveform generation methods. Vosyn offers a remote-first culture with flexible working arrangements.
Requirements
- 5+ years of proven experience in machine learning development focused on audio generation and TTS systems
- Extensive expertise in audio signal processing, particularly for human voices
- Deep experience with TTS models, including waveform generation and spectrogram-based methods
- Proven expertise in tuning TTS models for duration control and speech characteristics
- Strong proficiency in Python and machine learning frameworks such as PyTorch
- Experience with advanced deep learning models like WaveNet and transformer-based architectures
- Demonstrated experience in distributed training and deployment of ML models on cloud platforms
- Strong understanding of evaluation metrics for TTS systems
- Proven ability to provide technical leadership and actionable guidance
- Excellent communication and mentoring skills
Responsibilities
- Provide expert-level advice and mentorship on the architecture, training, and production of text-to-speech (TTS) models
- Guide the implementation of robust testing methodologies for TTS models using industry standards like MOS testing
- Share expertise in distributed training, monitoring, and deployment of large-scale ML models on cloud platforms
- Lead latency optimization initiatives in real-time systems for high-quality speech-to-speech conversion
- Provide guidance on tuning TTS models for precise control over speech characteristics
- Share in-depth knowledge of various TTS model architectures and waveform generation methods
- Mentor the team on implementing advanced deep learning models for audio processing
- Guide the development of transformer architectures for complex TTS model development
Preferred Qualifications
- Experience with real-time audio processing systems
- Background in speech synthesis research
- Knowledge of multiple languages and accents in TTS
- Experience with ML model optimization techniques
- Publications or patents in related fields
Benefits
Remote-first culture with flexible working arrangements