{"id":51817,"date":"2026-04-03T13:03:15","date_gmt":"2026-04-03T05:03:15","guid":{"rendered":"https:\/\/www.1ai.net\/?p=51817"},"modified":"2026-04-03T13:03:15","modified_gmt":"2026-04-03T05:03:15","slug":"%e7%be%8e%e5%9b%a2%e5%8f%91%e5%b8%83%e9%9f%b3%e9%a2%91%e7%94%9f%e6%88%90%e6%a8%a1%e5%9e%8b-longcat-audiodit","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/51817.html","title":{"rendered":"Mission release audio generation model LongCat-AudioDiT"},"content":{"rendered":"<p>April 3 News.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e7%be%8e%e5%9b%a2\" title=\"[Sees articles with [American] labels]\" target=\"_blank\" >Meituan (Japanese company)<\/a>It was released yesterday<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%9f%b3%e9%a2%91%e7%94%9f%e6%88%90%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with tags]\" target=\"_blank\" >Audio Generation Model<\/a> LongCat-AudioDiT and synchronise open source 1B and 3.5B versions\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-51818\" title=\"995e5b1aj00tcwjci001ed000u000km\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2026\/04\/995e5b1aj00tcwjci001ed000u000kcm.jpg\" alt=\"995e5b1aj00tcwjci001ed000u000km\" width=\"1080\" height=\"732\" \/><\/p>\n<p>It was described that LongCat-AudioDiT, which was directly modelled in wave-shaped subspace, only needed a wave-forming decoder (Wav-VAE) and a proliferation Transformer (DiT) to eliminate the accumulation of errors from the root causes of multistage cascades\u3002<\/p>\n<p><strong>Training - Logic Alignment<\/strong>: Force the reset of the hidden variable of the hint area to the real value in each step of the reasoning to resolve the long-standing problem of sound drift\u3002<\/p>\n<p><strong>SELF-ADAPTATION PROJECTORS (APG)<\/strong>REPLACES THE TRADITIONAL NON-CLASSIFIER GUIDE (CFG), DECOMPOSES THE GUIDANCE SIGNAL TO A POSITIVE AND PARALLEL MASS, PRESERVES THE USEFUL, INHIBITS THE POOR, AND AVOIDS THE \"SATURATION\" OF THE SPECTRUM WHILE INCREASING THE SOUND-COLOR SIMILARITIES\u3002<\/p>\n<p>In the Seed benchmark test, the LongCat-AudioDiT-3.5B speaker-similarity (SIM) reached 0.818 in the Chinese test set (Seed-ZH) and the Chinese hard-word set (Seed-Hard) reached 0.797, exceeding models such as Seed-TTS, CosyVoice 3.5 and MiniMax-Speech to achieve current SOTA performance\u3002<\/p>\n<p>GitHub: https:\/\/github.com\/meituan-longcat\/LongCat-AudioDiT<\/p>\n<p>Hugging Face: https:\/\/huggingface.co\/meituan-longcat\/LongCat-AudioDiT<\/p>","protected":false},"excerpt":{"rendered":"<p>On April 3rd, the American Dragon Cat released an audio generation model LongCat-AudioDiT yesterday and synchronized the open source 1B and 3.5B versions. It was described that LongCat-AudioDiT, which was directly modelled in wave-shaped subspace, only needed a wave-forming decoder (Wav-VAE) and a proliferation Transformer (DiT) to eliminate the accumulation of errors from the root causes of multistage cascades. Training - Consistency of reasoning: Force the resetting of the hidden variable of the hint region to the real value in each step of reasoning to address the long-standing problem of sound drift. Self-adaptation projector guide (APG): replaces the traditional non-classifier guide (CFG), decomposes the lead signal into a positive and parallel mass<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[4871,3681],"collection":[],"class_list":["post-51817","post","type-post","status-publish","format-standard","hentry","category-news","tag-4871","tag-3681"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/51817","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=51817"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/51817\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=51817"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=51817"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=51817"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=51817"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}