{"id":21329,"date":"2024-10-13T09:48:40","date_gmt":"2024-10-13T01:48:40","guid":{"rendered":"https:\/\/www.1ai.net\/?p=21329"},"modified":"2024-10-13T09:48:40","modified_gmt":"2024-10-13T01:48:40","slug":"%e4%b8%ad%e5%9b%bd%e7%a7%bb%e5%8a%a8%e3%80%81%e7%94%b5%e5%ad%90%e6%a0%87%e5%87%86%e9%99%a2%e5%8f%8a-16-%e5%ae%b6%e9%87%8d%e7%82%b9%e5%a4%ae%e4%bc%81%e5%8f%91%e5%b8%83%e3%80%8a%e9%80%9a%e7%94%a8","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/21329.html","title":{"rendered":"China Mobile, Electronic Standards Institute and 16 Key Central Enterprises Publish Common Large Model Evaluation Standard"},"content":{"rendered":"<p>from<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e4%b8%ad%e5%9b%bd%e7%a7%bb%e5%8a%a8\" title=\"[See articles with [China Moves] labels]\" target=\"_blank\" >China Mobile<\/a>Officials have learned that during the 2024 China Mobile Global Partner Conference, China Mobile joined forces with the Electronic Standards Institute and 16 key<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%ae%e4%bc%81\" title=\"[Sees articles with labels]\" target=\"_blank\" >central enterprise<\/a>Jointly work on the construction of a large model evaluation system and release the<strong><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%80%9a%e7%94%a8%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles that contain labels of the Universal Large Model]\" target=\"_blank\" >Universal large model<\/a>Evaluation Criteria<\/strong>\u300b.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-21330\" title=\"7a699ceej00sl9ubw00hjd000o000g0m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/10\/7a699ceej00sl9ubw00hjd000o000g0m.jpg\" alt=\"7a699ceej00sl9ubw00hjd000o000g0m\" width=\"864\" height=\"576\" \/><\/p>\n<p>According to reports, the standard is an important result of the construction of the big model evaluation system for the industry<strong>Selection of quality large models<\/strong>Provide an important reference point. The first phase will be organized around the generic areas and the four key industry sectors, starting with the<strong>Assessment standardization, assessment base construction, assessment pilot application<\/strong>etc. to carry out their work.<\/p>\n<p>The Generic Large Model rubric is based on the \"2-4-6\" framework as follows:<\/p>\n<ul>\n<li>\"2\": two types of evaluation perspectives, oriented to the actual use of key industry needs, and aligned with the national standard on the model capability requirements, the evaluation task is divided into two types of perspectives: comprehension and generation.<\/li>\n<li>\"4\": four categories of assessment elements, extracted from the full assessment lifecycle<strong>Evaluation tools, evaluation data, evaluation methods and evaluation metrics<\/strong>Four types of key elements to ensure the implementability of the assessment.<\/li>\n<li>\"6\": six evaluation dimensions that synthesize the process of applying the big model to the<strong>Core competencies that set functionality, accuracy, reliability, security, interactivity and application<\/strong>Six dimensions.<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>From China Mobile official was informed that during the 2024 China Mobile Global Partner Conference, China Mobile, together with the Electronic Standards Institute and 16 key central enterprises, jointly carried out the construction of the large model evaluation system and released the General Large Model Evaluation Standard. According to reports, the standard is an important achievement in the construction of large model evaluation system, providing an important reference basis for the industry to select high-quality large models. The first phase will focus on the general field and 4 key industry areas, from the evaluation standard development, evaluation base construction, evaluation of pilot applications and other aspects of work. The evaluation standards for generic large models are based on the \"2-4-6\" framework as follows: \"2\": two types of evaluation perspectives, guided by the actual use needs of key industries, aligned with the national standard on model capability requirements, and dividing the evaluation tasks into understanding and generation.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[2072,4599,745],"collection":[],"class_list":["post-21329","post","type-post","status-publish","format-standard","hentry","category-news","tag-2072","tag-4599","tag-745"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/21329","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=21329"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/21329\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=21329"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=21329"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=21329"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=21329"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}