{"id":9252,"date":"2024-04-28T09:27:36","date_gmt":"2024-04-28T01:27:36","guid":{"rendered":"https:\/\/www.1ai.net\/?p=9252"},"modified":"2024-04-28T09:27:36","modified_gmt":"2024-04-28T01:27:36","slug":"%e9%98%bf%e9%87%8c%e5%b7%b4%e5%b7%b4%e5%bc%80%e6%ba%90-1100-%e4%ba%bf%e5%8f%82%e6%95%b0-qwen1-5-110b-%e6%a8%a1%e5%9e%8b%ef%bc%8c%e4%b8%8e-meta-llama3-70b-%e7%9b%b8%e5%aa%b2%e7%be%8e","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/9252.html","title":{"rendered":"Alibaba open-sources 110 billion parameter Qwen1.5-110B model, comparable to Meta Llama3-70B"},"content":{"rendered":"<p data-vmark=\"3384\"><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c%e5%b7%b4%e5%b7%b4\" title=\"[Sees articles with [Aribaba] label]\" target=\"_blank\" >Alibaba<\/a>It was announced recently that<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a> Qwen1.5 series first 100 billion parameters<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e6%a8%a1%e5%9e%8b\" title=\"_Other Organiser\" target=\"_blank\" >Model<\/a> Qwen1.5-110B, the model is comparable to Meta-Llama3-70B in the basic ability evaluation and performs well in Chat evaluations, including MT-Bench and AlpacaEval 2.0.<\/p>\n<p data-vmark=\"fd6c\">Main content:<\/p>\n<p data-vmark=\"cc09\">According to reports, Qwen1.5-110B is similar to other Qwen1.5 models and uses the same Transformer decoder architecture. It includes grouped query attention (GQA), which is more efficient during model reasoning.<span class=\"accentTextColor\">The model supports a context length of 32K tokens.<\/span>At the same time, it is still multilingual, supporting English, Chinese, French, Spanish, German, Russian, Japanese, Korean, Vietnamese, Arabic and other languages.<\/p>\n<p data-vmark=\"56a9\">Alibaba Qwen1.5-110B model was compared with the recent SOTA language models Meta-Llama3-70B and Mixtral-8x22B. The results are as follows:<\/p>\n<p data-vmark=\"23dd\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-9253\" title=\"3871f8d7-08f7-468b-99ad-0c45ad65394e\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/3871f8d7-08f7-468b-99ad-0c45ad65394e.png\" alt=\"3871f8d7-08f7-468b-99ad-0c45ad65394e\" width=\"607\" height=\"531\" \/><\/p>\n<p data-vmark=\"2108\">The above results show that the new 110B model is at least comparable to the Llama-3-70B model in terms of basic capabilities. In this model, Alibaba did not make major changes to the pre-training method, so they believe that the performance improvement compared to 72B<span class=\"accentTextColor\">Mainly comes from increasing the model size.<\/span><\/p>\n<p data-vmark=\"534e\">Alibaba also conducted a Chat evaluation on MT-Bench and AlpacaEval 2.0. The results are as follows:<\/p>\n<p data-vmark=\"acb2\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-9254\" title=\"96f8e1f6-45ff-4098-9990-9b0f30f8de2e\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/96f8e1f6-45ff-4098-9990-9b0f30f8de2e.png\" alt=\"96f8e1f6-45ff-4098-9990-9b0f30f8de2e\" width=\"445\" height=\"257\" \/><\/p>\n<p data-vmark=\"9111\">Alibaba said that in a benchmark evaluation of two Chat models, compared to the previously released 72B model,<span class=\"accentTextColor\">110B performs significantly better.<\/span>The consistent improvement in evaluation results suggests that a more powerful and larger base language model can lead to better Chat models, even without drastically changing the post-training approach.<\/p>\n<p data-vmark=\"f429\">Finally, Alibaba said that Qwen1.5-110B is the largest model in the Qwen1.5 series.<span class=\"accentTextColor\">It is also the first model in the series to have more than 100 billion parameters.<\/span>It performs well against the recently released SOTA model Llama-3-70B and significantly outperforms the 72B model.<\/p>","protected":false},"excerpt":{"rendered":"<p>Alibaba has announced that it has open sourced Qwen1.5-110B, the first of the Qwen1.5 family of 100 billion parameter models, which is comparable to Meta-Llama3-70B in basic competency evaluations, and performs well in Chat evaluations, including MT-Bench and AlpacaEval 2.0. The main points: According to the report, Qwen1.5-110B is similar to other Qwen1.5 models and uses the same Transformer decoder architecture. 110B is described as similar to other Qwen1.5 models, utilizing the same Transformer decoder architecture. It includes Grouped Query Attention (GQA) to be more efficient in model inference. The model supports a context length of 32K tokens, and it is still multilingual, supporting English, Chinese, French, and Spanish,<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[219,1489,390],"collection":[],"class_list":["post-9252","post","type-post","status-publish","format-standard","hentry","category-news","tag-219","tag-1489","tag-390"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/9252","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=9252"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/9252\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=9252"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=9252"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=9252"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=9252"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}