{"id":15017,"date":"2024-07-08T08:56:28","date_gmt":"2024-07-08T00:56:28","guid":{"rendered":"https:\/\/www.1ai.net\/?p=15017"},"modified":"2024-07-08T08:56:28","modified_gmt":"2024-07-08T00:56:28","slug":"%e9%98%bf%e9%87%8c%e9%80%9a%e4%b9%89%e9%9f%b3%e9%a2%91%e7%94%9f%e6%88%90%e5%a4%a7%e6%a8%a1%e5%9e%8b-funaudiollm-%e5%bc%80%e6%ba%90-%e6%94%af%e6%8c%81%e6%83%85%e7%bb%aa%e8%af%ad%e9%9f%b3%e5%af%b9","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/15017.html","title":{"rendered":"Alibaba Tongyi&#039;s audio generation model FunAudioLLM is open source and supports scenarios such as emotional voice dialogue and audiobooks"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c%e9%80%9a%e4%b9%89\" title=\"[Sees articles with [Ariton] labels]\" target=\"_blank\" >Ali Tongyi<\/a>Laboratory recently<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>Named<a href=\"https:\/\/www.1ai.net\/en\/tag\/funaudiollm\" title=\"_Other Organiser\" target=\"_blank\" >FunAudioLLM<\/a>of<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%9f%b3%e9%a2%91%e7%94%9f%e6%88%90%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with tags on [Big Audio Generation Model]]\" target=\"_blank\" >Large Model for Audio Generation<\/a>The project aims to improve the natural voice interaction experience between humans and large language models (LLMs). The project consists of two core models: SenseVoice and CosyVoice.<\/p>\n<p>CosyVoice focuses on natural speech generation, with multi-language support, timbre and emotion control functions, and excels in multi-language speech generation, zero-sample speech generation, cross-language sound synthesis and command execution. It supports five languages (Chinese, English, Japanese, Cantonese and Korean) through 150,000 hours of data training, can quickly simulate timbre and provide fine-grained control of emotion and rhythm.<\/p>\n<p>SenseVoice is dedicated to high-precision multi-language speech recognition, emotion recognition, and audio event detection. It has been trained with 400,000 hours of data and supports more than 50 languages. Its recognition effect is better than the Whisper model, especially in Chinese and Cantonese, with an improvement of more than 50%. SenseVoice also has the ability to recognize emotions and detect sound events, as well as fast reasoning speed.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-15018\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/6385602521590366116166213.jpg\" alt=\"\" width=\"1000\" height=\"260\" \/><\/p>\n<p>FunAudioLLM supports a variety of human-computer interaction application scenarios, such as multi-language translation, emotional voice conversations, interactive podcasts and audiobooks, etc. It enables seamless voice-to-voice translation, emotional voice chat applications, and interactive podcast radio stations by combining SenseVoice, LLMs, and CosyVoice.<\/p>\n<p>In terms of technical principles, CosyVoice is based on speech quantization coding and supports natural and fluent speech generation, while SenseVoice provides comprehensive speech processing functions, including automatic speech recognition, language recognition, emotion recognition and audio event detection.<\/p>\n<p>The open source models and codes have been released on ModelScope and Huggingface, and the training, inference, and fine-tuning codes are also available on GitHub. Both the CosyVoice and SenseVoice models have online experiences on ModelScope, allowing users to directly try out these advanced voice technologies.<\/p>\n<p><strong>Project address:<\/strong>https:\/\/github.com\/FunAudioLLM<\/p>","protected":false},"excerpt":{"rendered":"<p>Ali Tongyi Labs recently open-sourced an audio generation large model project called FunAudioLLM, which aims to enhance the natural speech interaction experience between humans and large language models (LLMs). The project consists of two core models: SenseVoice and CosyVoice. CosyVoice focuses on natural speech generation with multi-language support, timbre and emotion control, and excels in multi-language speech generation, zero-sample speech generation, cross-language sound synthesis and command execution. With 150,000 hours of data training, it supports five languages: Chinese, English, Japanese, Cantonese, and Korean, and is able to quickly simulate tones and provide fine-grained control of emotions and rhythms. SenseVoice, on the other hand, is dedicated to high-precision multilingual speech recognition, emotion recognition, and audio event detection.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[3392,219,3390,3391],"collection":[],"class_list":["post-15017","post","type-post","status-publish","format-standard","hentry","category-news","tag-funaudiollm","tag-219","tag-3390","tag-3391"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/15017","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=15017"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/15017\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=15017"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=15017"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=15017"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=15017"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}