Staff ML Engineer - Voice AI

Toast
Summary
Join Toast as a Staff Machine Learning Engineer and bring your Voice AI expertise to enhance our platform. You will collaborate with engineers, data scientists, and product managers to translate Voice AI solutions into tangible business impact across various product lines, including phone ordering, the Toast Local app, drive-thrus, kiosks, and menu recommendations. Your role involves designing APIs and inference services for voice interactions, optimizing models for efficiency, and collaborating on new ideas in generative speech and multilingual processing. You will also mentor junior engineers and contribute to a culture of technical excellence. This position offers a competitive salary and benefits package, and embraces a hybrid work model.
Requirements
- Bachelor’s or Master’s degree in Computer Science, AI, Machine Learning, or related field
- 7+ years of ML software development experience with hands-on experience in voice AI, speech processing, or conversational AI systems
- A proven track record of shipping Agentic Voice AI solutions in production at scale
- Deep expertise in automatic speech recognition (ASR), text-to-speech (TTS), and speech-to-speech (S2S) model development and evaluation and voice agent systems
- Familiarity with voice AI toolkits such as Whisper, Koruru TTS, Hugging Face Transformers, or OpenAI Realtime API
- Extensive background in voice AI technologies is required, with demonstrated expertise in toolkits such as Whisper, ESPNet, Koruru TTS, Hugging Face Transformers, and OpenAI Realtime API
- Strong background in machine learning and signal processing
- Proficiency in Python, Java/Kotlin and SQL and experience with ML frameworks like PyTorch or TensorFlow
- Experience in software engineering best practices and tools including object-oriented programming, test-driven development, CI/CD, git, shell scripting, task orchestration (Airflow)
- Experience with microservice-based architecture, preferably with AWS tooling (SageMaker, DynamoDB, Athena, etc.)
- Strong communication skills, with a track record of technical leadership and cross-functional collaboration
Responsibilities
- Apply Voice AI expertise to help further define and improve the capabilities of Toast’s product platform
- Design APIs and inference services to support voice interactions across Toast platforms and devices
- Work closely with product teams to translate use cases into natural, efficient, and emotionally resonant voice interactions
- Lead model optimization efforts for latency, memory, and inference cost — including edge and mobile deployment if applicable
- Collaborate with data scientists and ML engineers to prototype and productionize new ideas in generative speech, multilingual processing, and agentic behavior
- Mentor junior engineers and contribute to fostering a culture of technical excellence
Preferred Qualifications
- Experience with real-time speech systems, including streaming ASR or low-latency TTS/S2S
- Familiarity with open-source speech toolkits: Kokoro TTS, ESPnet, Fairseq, OpenAI Whisper, or equivalent
- Experience building interactive, embodied, or voice-based agents using LLMs or hybrid architectures
- Background in deploying models to edge/mobile environments or with hardware acceleration
Benefits
- Competitive compensation and benefits programs
- Cash compensation (overtime, bonus/commissions, if eligible)
- Benefits
- Equity (if eligible)
- Hybrid work model