{"id":15851,"date":"2024-07-18T09:06:10","date_gmt":"2024-07-18T01:06:10","guid":{"rendered":"https:\/\/www.1ai.net\/?p=15851"},"modified":"2024-07-18T09:06:21","modified_gmt":"2024-07-18T01:06:21","slug":"qwen2-audio%ef%bc%9a%e5%8d%83%e9%97%ae%e7%b3%bb%e5%88%97%e7%9a%84%e9%9f%b3%e9%a2%91%e5%a4%9a%e6%a8%a1%e6%80%81%e6%a8%a1%e5%9e%8b-%e6%97%a0%e9%9c%80%e6%96%87%e5%ad%97%e5%8d%b3%e5%8f%af%e8%af%ad","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/15851.html","title":{"rendered":"Qwen2-Audio: The audio multimodal model of the Qianwen series enables voice interaction without text"},"content":{"rendered":"<p data-pm-slice=\"0 0 []\"><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c%e4%ba%91\" title=\"_Other Organiser\" target=\"_blank\" >Alibaba Cloud<\/a>The latest release is a large-scale audio language model called Qwen-Audio. The model can accept a variety of audio signal inputs and can perform audio analysis or directly answer voice commands, greatly improving<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%af%ad%e9%9f%b3%e4%ba%a4%e4%ba%92\" title=\"[Sees articles with [spoken interactive] labels]\" target=\"_blank\" >Voice Interaction<\/a>Experience.<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-15854\" title=\"get-541\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-541.jpg\" alt=\"get-541\" width=\"1209\" height=\"720\" \/><\/div>\n<p data-track=\"35\">In this release, Qwen2udio provides two unique audio interaction modes: audio chat and audio analysis. Users can communicate with others without typing any text. <a href=\"https:\/\/www.1ai.net\/en\/tag\/qwen2-audio\" title=\"[See article with [Qwen2-Audio] label]\" target=\"_blank\" >Qwen2-Audio<\/a> It can conduct voice exchanges and provide audio and text commands for analysis during the interaction to bring users a more convenient experience.<\/p>\n<p data-track=\"36\">Qwen2-Audio can intelligently understand the content in the audio and respond appropriately to voice commands. For example, in an audio segment that contains sounds, multi-speaker conversations, and voice commands at the same time, Qwen2-Audio can directly understand the command and provide an interpretation and response to the audio.<\/p>\n<p data-track=\"37\">In addition, DPO also optimizes the model&#039;s performance in terms of factuality and compliance with expected behavior. According to the evaluation results of AIR-Bench, Qwen2-Audio outperforms previous SOTAs such as Gemini-1.5-pro in tests focusing on audio-centric instruction tracking functions. Qwen2-Audio is open source and aims to promote the advancement of the multimodal language community.<\/p>\n<p data-track=\"38\">It is understood that the Qwen2-Audio series will launch two models: Qwen2-Audio and Qwen-Audio-Chat, providing users with a richer audio interaction experience.<\/p>\n<p data-track=\"39\">The researchers will conduct a comprehensive evaluation of the Qwen2-Audio model, assessing its performance on a variety of tasks without any task-specific fine-tuning. In terms of English automatic speech recognition (ASR) results, Qwen2-Audio showed higher performance compared to previous multi-task learning models.<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-15853\" title=\"get-540\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-540.jpg\" alt=\"get-540\" width=\"859\" height=\"759\" \/><\/div>\n<p data-track=\"40\">In terms of Qwen2-Audio&#039;s chat capabilities, researchers measured its performance on the AIR-Bench chat benchmark (Yang et al., 2024), and Qwen2-Audio demonstrated state-of-the-art (SOTA) instruction tracking capabilities across speech, sound music, and mixed audio subsets. It shows substantial improvements over Qwen-Audio and significantly outperforms other LALMs.<\/p>\n<p>&nbsp;<\/p>","protected":false},"excerpt":{"rendered":"<p>Aliyun recently released a large-scale audio linguistic model called Qwen-Audio, which can accept multiple audio signal inputs and can be used to analyze audio or directly answer voice commands to greatly enhance the voice interaction experience. In this release, Qwen2udio offers two unique audio interaction modes: audio chat and audio analysis. Qwen2-Audio allows users to communicate with Qwen2-Audio without typing text, and also analyzes the audio and text commands provided in the interaction for a more convenient user experience. Qwen2-Audio intelligently understands the content of the audio and responds appropriately to voice commands. For example, in audio segments that simultaneously contain sound, multi-speaker dialog and voice commands, Qwen2-Audio<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[3569,999,334],"collection":[],"class_list":["post-15851","post","type-post","status-publish","format-standard","hentry","category-news","tag-qwen2-audio","tag-999","tag-334"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/15851","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=15851"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/15851\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=15851"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=15851"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=15851"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=15851"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}