{"id":22622,"date":"2024-11-06T01:04:06","date_gmt":"2024-11-05T17:04:06","guid":{"rendered":"https:\/\/www.1ai.net\/?p=22622"},"modified":"2024-11-05T21:08:25","modified_gmt":"2024-11-05T13:08:25","slug":"%e8%85%be%e8%ae%af%e6%8e%a8%e5%87%ba-hunyuan-large-%e5%a4%a7%e6%a8%a1%e5%9e%8b%ef%bc%9a389b-%e6%80%bb%e5%8f%82%e6%95%b0%ef%bc%8c%e4%b8%9a%e7%95%8c%e5%b7%b2%e5%bc%80%e6%ba%90%e5%9f%ba%e4%ba%8e-transfor","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/22622.html","title":{"rendered":"Tencent Launches Hunyuan-Large Large Model: 389B Total Parameters, Industry's Largest Transformer-Based MoE Model Open-Sourced"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%85%be%e8%ae%af\" title=\"[View articles tagged with [Tencent]]\" target=\"_blank\" >Tencent<\/a>Announcing the launch of Hunyuan-Large <a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [large models]]\" target=\"_blank\" >Large Model<\/a>Officially, it's<strong>The industry has now<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>The Transformer-based Maximum MoE Model of<\/strong>The program has 389 billion total parameters (389B) and 52 billion active parameters (52B).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-22623\" title=\"0473510dj00smhb4g006md000v900jjp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/11\/0473510dj00smhb4g006md000v900jjp.jpg\" alt=\"0473510dj00smhb4g006md000v900jjp\" width=\"1125\" height=\"703\" \/><\/p>\n<p>Tencent has open-sourced Hunyuan-A52B-Pretrain, Hunyuan-A52B-Instruct, and Hunyuan-A52B-Instruct-FP8 at Hugging Face, and has released a technical report and a training and reasoning operation manual detailing the model capabilities and the operation of training and reasoning.<\/p>\n<p>Among the modeling technology advantages are the following:<\/p>\n<ul>\n<li><strong>High-quality synthesized data<\/strong>: By augmenting training with synthetic data, Hunyuan-Large is able to learn richer representations, handle long contextual inputs, and better generalize to unseen data.<\/li>\n<li><strong>KV Cache Compression<\/strong>: Adoption of Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies significantly reduces the memory footprint and computational overhead of the KV cache, and improves inference throughput<\/li>\n<li><strong>Expert-specific learning rate scaling<\/strong>: Setting different learning rates for different experts ensures that each sub-model learns effectively from the data and contributes to the overall performance<\/li>\n<li><strong>long context processing capability<\/strong>The pre-trained model supports up to 256K text sequences and the Instruct model supports 128K text sequences, which significantly improves the processing power of long context tasks.<\/li>\n<li><strong>Extensive benchmarking<\/strong>Extensive experiments in multiple languages and tasks have proven the effectiveness and safety of Hunyuan-Large in real-world applications.<\/li>\n<\/ul>\n<p data-vmark=\"461d\">The relevant links are as follows:<\/p>\n<ul class=\"medium-size list-paddingleft-2\">\n<li>\n<p data-vmark=\"c8a2\"><strong>paper<\/strong>:<a href=\"https:\/\/arxiv.org\/pdf\/2411.02265\" target=\"_blank\" rel=\"noopener\"><span class=\"link-text-start-with-http\">https:\/\/arxiv.org\/pdf\/2411.02265<\/span><\/a><\/p>\n<\/li>\n<li>\n<p data-vmark=\"2925\"><strong>Github<\/strong>:<a href=\"https:\/\/github.com\/Tencent\/Tencent-Hunyuan-Large\" target=\"_blank\" rel=\"noopener\"><span class=\"link-text-start-with-http\">https:\/\/github.com\/Tencent\/Tencent-Hunyuan-Large<\/span><\/a><\/p>\n<\/li>\n<li>\n<p data-vmark=\"9218\"><strong>Huggingface<\/strong>:<a href=\"https:\/\/huggingface.co\/tencent\/Tencent-Hunyuan-Large\" target=\"_blank\" rel=\"noopener\"><span class=\"link-text-start-with-http\">https:\/\/huggingface.co\/tencent\/Tencent-Hunyuan-Large<\/span><\/a><\/p>\n<\/li>\n<li>\n<p data-vmark=\"6859\"><strong>Tencent Cloud<\/strong>:<a href=\"https:\/\/cloud.tencent.com\/product\/hunyuan\" target=\"_blank\" rel=\"noopener\"><span class=\"link-text-start-with-http\">https:\/\/cloud.tencent.com\/product\/hunyuan<\/span><\/a><\/p>\n<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Tencent announced the launch of Hunyuan-Large, the largest Transformer-based MoE model open-sourced in the industry, with 389 billion total parameters (389B) and 52 billion activation parameters (52B). Tencent open-sourced Hunyuan-A52B-Pretrain, Hunyuan-A52B-Instruct, and Hunyuan-A52B-Instruct-FP8 at Hugging Face, and released a technical report and a training and inference operation manual, which introduces in detail the model's capabilities and the operation of training and inference. Among the model technology advantages are as follows: High-quality synthetic data: by synthesizing the number of<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[216,219,323],"collection":[],"class_list":["post-22622","post","type-post","status-publish","format-standard","hentry","category-news","tag-216","tag-219","tag-323"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/22622","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=22622"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/22622\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=22622"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=22622"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=22622"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=22622"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}