
Kyutai TTS Kyutai TTS is a text-to-speech model optimized for real-time applications. It provides ultra-low-latency, high-accuracy speech synthesis with support for text streaming input and long audio generation for a variety of scenarios that require real-time voice interaction, such as voice assistants, real-time subtitle generation, etc. Kyutai TTS is unique in its delayed-streaming modeling technology, which makes it significantly better than other models in terms of real-time performance.
Kyutai TTS Features
- Highly Accurate Speech Synthesis: The Word Error Rate (WER) of Kyutai TTS is much lower than that of other models, with 2.82% in English and 3.29% in French, ensuring the accuracy of the speech output.
- high fidelityVoice cloning: The model performs well in terms of speech similarity, reaching 77.1% and 78.7% for English and French, respectively, and the generated speech highly reproduces the timbre and style of the original audio.
- Ultra-low Latency Real-Time Processing: From the receipt of the first text token to the generation of the first audio, Kyutai TTS has a latency of only 220 milliseconds, or 350 milliseconds even when processing 32 concurrent requests, ensuring smooth real-time applications.
- Text Streaming Processing: Kyutai TTS supports text streaming input, which enables real-time processing of text generated by large language models without waiting for the full text to be input, significantly improving efficiency.
- Long Audio Generation Support: Kyutai TTS can generate audio of any length, breaking through the limitations of traditional models in long audio generation.
- Production-Ready Servers: Kyutai TTS provides robust Rust servers, supports streaming access via WebSockets, and provides a Dockerfile for easy deployment.
- Word-level timestamped output: Kyutai TTS output contains precise word timestamps, which can be used in scenarios such as generating real-time subtitles or handling user interruptions.
- Multi-language support: English and French are currently supported, and more languages will be supported in the future.
Official website link:https://kyutai.org/next/tts