Mission release audio generation model LongCat-AudioDiT

April 3 News.Meituan (Japanese company)It was released yesterdayAudio Generation Model LongCat-AudioDiT and synchronise open source 1B and 3.5B versions。

Mission release audio generation model LongCat-AudioDiT

It was described that LongCat-AudioDiT, which was directly modelled in wave-shaped subspace, only needed a wave-forming decoder (Wav-VAE) and a proliferation Transformer (DiT) to eliminate the accumulation of errors from the root causes of multistage cascades。

Training - Logic Alignment: Force the reset of the hidden variable of the hint area to the real value in each step of the reasoning to resolve the long-standing problem of sound drift。

SELF-ADAPTATION PROJECTORS (APG)REPLACES THE TRADITIONAL NON-CLASSIFIER GUIDE (CFG), DECOMPOSES THE GUIDANCE SIGNAL TO A POSITIVE AND PARALLEL MASS, PRESERVES THE USEFUL, INHIBITS THE POOR, AND AVOIDS THE "SATURATION" OF THE SPECTRUM WHILE INCREASING THE SOUND-COLOR SIMILARITIES。

In the Seed benchmark test, the LongCat-AudioDiT-3.5B speaker-similarity (SIM) reached 0.818 in the Chinese test set (Seed-ZH) and the Chinese hard-word set (Seed-Hard) reached 0.797, exceeding models such as Seed-TTS, CosyVoice 3.5 and MiniMax-Speech to achieve current SOTA performance。

GitHub: https://github.com/meituan-longcat/LongCat-AudioDiT

Hugging Face: https://huggingface.co/meituan-longcat/LongCat-AudioDiT

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

MORGAN CHASE CEO DAMON: AI WILL BRING THREE AND A HALF DAYS OF WORK, AND HUMAN LIFE IS EXPECTED TO BE 100 YEARS OLD

2026-4-3 13:01:27

Information

Genre GLM-51 low-key on line, 2.6 minutes from Claude Opus 4.6

2026-4-3 13:04:04

Search