{"id":17712,"date":"2024-08-11T08:45:19","date_gmt":"2024-08-11T00:45:19","guid":{"rendered":"https:\/\/www.1ai.net\/?p=17712"},"modified":"2024-08-11T08:45:19","modified_gmt":"2024-08-11T00:45:19","slug":"%e9%98%bf%e9%87%8c%e5%b7%b4%e5%b7%b4%e5%8f%91%e5%b8%83%e6%96%b0%e8%af%ad%e9%9f%b3%e6%a8%a1%e5%9e%8b-qwen2-audio%ef%bc%8c%e5%ae%9e%e5%8a%9b%e8%b6%85%e8%b6%8a-openai-whisper","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/17712.html","title":{"rendered":"Alibaba releases new voice model Qwen2-Audio, surpassing OpenAI Whisper"},"content":{"rendered":"<p data-pm-slice=\"0 0 []\">recently,<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c%e5%b7%b4%e5%b7%b4\" title=\"[Sees articles with [Aribaba] label]\" target=\"_blank\" >Alibaba<\/a>Based on its Qwen-Audio, it launched a new open source<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%af%ad%e9%9f%b3%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [speech modeling]]\" target=\"_blank\" >Voice Model<\/a> Qwen2-Audio. This model not only performs well in speech recognition, translation, and audio analysis, but also achieves significant improvements in functionality and performance. Qwen2-Audio provides a basic version and a command fine-tuning version. Users can ask questions to the audio model through voice, and recognize and analyze the content.<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-17713\" title=\"get-283\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/get-283.jpg\" alt=\"get-283\" width=\"844\" height=\"804\" \/><\/div>\n<p data-track=\"10\">For example, users can ask a woman to speak a paragraph, and Qwen2-Audio can determine her age or analyze her emotions; if a noisy sound is input, the model can analyze the various sound components in it. Qwen2-Audio supports multiple languages including Chinese, Cantonese, French, English and Japanese, which greatly facilitates the development of sentiment analysis and translation applications.<\/p>\n<p data-track=\"11\">Product entrance: https:\/\/top.aibase.com\/tool\/qwen2-audio<\/p>\n<p data-track=\"12\">Compared with the first generation of Qwen-Audio, Qwen2-Audio has been fully optimized in terms of architecture and performance. In the pre-training stage, this new model uses more natural language prompts to replace the previous complex hierarchical labels. This improvement makes the model more handy in understanding and responding to various tasks, and its generalization ability has also been significantly improved.<\/p>\n<p data-track=\"13\">Qwen2-Audio&#039;s command-following ability has also been greatly improved, and it can understand user commands more accurately. For example, when a user issues a command to &quot;analyze the emotional tendency in this audio&quot;, Qwen2-Audio can accurately judge the emotions contained in the audio. In addition, the model introduces two modes: voice chat and audio analysis, making the user&#039;s voice interaction more natural. In audio analysis mode, Qwen2-Audio can deeply analyze various types of audio and provide detailed and accurate analysis results.<\/p>\n<p data-track=\"14\">To ensure that the model&#039;s output meets human expectations, Qwen2-Audio also introduces advanced techniques such as supervised fine-tuning and direct preference optimization. When interacting with humans, the model appears more natural and accurate.<\/p>\n<p data-track=\"15\">In terms of performance testing, Qwen2-Audio performed well in multiple mainstream benchmarks, especially in speech recognition and translation accuracy, surpassing OpenAI&#039;s Whisper-large-v3. The performance of this new model has not only attracted widespread attention in the industry, but also heralded a new future for voice technology.<\/p>\n<p>&nbsp;<\/p>","protected":false},"excerpt":{"rendered":"<p>Alibaba recently launched Qwen2-Audio, a new open-source speech model based on Qwen-Audio, which not only excels in speech recognition, translation, and audio analysis, but also achieves significant improvements in functionality and performance.Qwen2-Audio is available in both a basic version and a fine-tuned version of the command, allowing users to ask questions of the audio model via voice Qwen2-Audio offers a basic version and a fine-tuned version of commands that allow the user to ask questions of the audio model by voice and recognize and analyze the content. For example, you can ask a woman to speak a paragraph, Qwen2-Audio can determine her age or analyze her mood; if you input a noisy voice, the model can analyze the various components of the sound. Qwen2-Audio supports a variety of languages including Chinese, Cantonese, French, English, and Japanese, which is a great opportunity for the user to learn more about the situation.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1875,390],"collection":[],"class_list":["post-17712","post","type-post","status-publish","format-standard","hentry","category-news","tag-1875","tag-390"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/17712","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=17712"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/17712\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=17712"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=17712"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=17712"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=17712"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}