Tencent's Hunyuan open-sources end-to-end AI model Hunyuan-Foley: video + text = "cinematic" sound effects

News on August 28,Tencent HunyuanIt was announced at lunchtime todayOpen SourceEnd-to-end video sound generation model Hunyuan-Foley, userSimply type in the video and text to match the video with cinematic sound effects.

Tencent's Hunyuan open-sources end-to-end AI model Hunyuan-Foley: video + text = "cinematic" sound effects

According to the official introduction, HunyuanVideo-Foley not only breaks the AI-generated videoYou can only "see" but not "hear". The sound generation tool can be widely used in short video creation, movie production, advertisement creation, game development and other scenarios. This sound generation tool can be widely used in short video creation, movie production, advertising and game development and other scenes.

TEXT DESCRIPTION: Engine revving loudly and accelerating.

TEXT DESCRIPTION: Rustling and crunching of leaves and twigs under the fox kit's paws.

The Hybrid team developed a comprehensive data processing pipeline that automates the labeling and filtering of collected audio and video data, and constructed a high-quality TV2A dataset of about 100,000 hours, which provides strong support for model training and enables the model to have a strong generalization capability, and generate audio with consistent sound and picture and semantically-aligned high-quality audio, including sound effects and background music, under a variety of complex video conditions. The generated audio can be combined with silent video, which greatly improves the realism and immersion of the video.

1AI attaches the relevant links below:

  • Experience portal: https://hunyuan.tencent.com/video/zh?tabIndex=0
  • Project website: https://szczesnys.github.io/hunyuanvideo-foley/
  • Code: https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley
  • Technical report: https://arxiv.org/abs/2508.16930
  • Hugging Face: https://huggingface.co/tencent/HunyuanVideo-Foley
statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Industry First: 8B Parametric Faceplate MiniCPM-V 4.5 Open-Source, "The Strongest End-Side Multimodal Model"

2025-8-28 11:14:17

Information

xAI Launches Grok Code Fast 1, an Intelligent Code Generation Model: Fast and Cheap, Free and Open for a Limited Time

2025-8-29 10:59:17

Search