{"id":47107,"date":"2025-12-09T11:30:22","date_gmt":"2025-12-09T03:30:22","guid":{"rendered":"https:\/\/www.1ai.net\/?p=47107"},"modified":"2025-12-09T11:30:22","modified_gmt":"2025-12-09T03:30:22","slug":"%e6%99%ba%e8%b0%b1-glm-4-6v-%e7%b3%bb%e5%88%97%e5%a4%9a%e6%a8%a1%e6%80%81-ai-%e5%a4%a7%e6%a8%a1%e5%9e%8b%e5%8f%91%e5%b8%83%e5%b9%b6%e5%bc%80%e6%ba%90%ef%bc%8capi-%e9%99%8d%e4%bb%b7-50","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/47107.html","title":{"rendered":"GENRE GLM-4.6V SERIES MULTIMODULAR AI LARGE MODEL RELEASE AND OPEN SOURCE API REDUCTION 50%"},"content":{"rendered":"<p>The news of December 9th<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e6%99%ba%e8%b0%b1\" title=\"[View articles tagged with [Smart Spectrum]]\" target=\"_blank\" >Zhipu<\/a> IA ANNOUNCED AND<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>\u00a0<strong>GLM-4.6V SERIES<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with [Multimodal Large Model] labels]\" target=\"_blank\" >Multimodal large model<\/a><\/strong>, included:<\/p>\n<ul>\n<li>GM-4.6V (106B-A12B): BASIC VERSION OF CLOUD-ORIENTED AND HIGH-PERFORMANCE CLUSTER SCENARIOS<\/li>\n<li>GM-4.6V-Flash (9B): Light version for local deployment and low-delayed applications\u3002<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-47108\" title=\"2326545dj00t6zgd80009ad000u3p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/12\/2326545dj00t6zgd8009ad000u000d3p.jpg\" alt=\"2326545dj00t6zgd80009ad000u3p\" width=\"1080\" height=\"471\" \/><\/p>\n<p>AS AN IMPORTANT TRAJECTORIES OF THE GLM SERIES IN A MULTI-MODULAR DIRECTION<strong>GLM-4.6V scales the training context window up to 128k tokens with the same parameter in visual understanding accuracy<\/strong>, and for the first time in the model structure, integrate the Function Call (tool call) capability in the visual model, and provide a single technical base for the multi-modular Agent in the real business scene by linking it from \" visual perception \" to \"action \" \u3002<\/p>\n<p>GLM-4.6V SERIES COMPARED TO GLM-4.5V IN ADDITION TO PERFORMANCE OPTIMIZATION\u00a0<strong>REDUCTION IN PRICE 501 TP3T<\/strong>, API calls as low as one dollar\/ million tokens, output three dollars\/ million tokens\u3002<\/p>\n<p>at the same time,<strong>GLM-4.6V-Flash Free and Open<\/strong>.<\/p>\n<p>GLM-4.6V was incorporated into the GMCDing Plan as of this date, and a dedicated MCP tool was developed for the user's 8 major scene orientation, and the model is able to access the most compatible interface on its own\u3002<\/p>\n<p>ACCORDING TO AI, TRADITIONAL TOOL CALLS ARE MOSTLY BASED ON PURE TEXT, REQUIRING MULTIPLE INTERMEDIATE CONVERSIONS, RESULTING IN LOSS OF INFORMATION AND ENGINEERING COMPLEXITY IN THE FACE OF MULTI-MODULAR CONTENT SUCH AS IMAGES, VIDEOS AND COMPLEX DOCUMENTS\u3002<strong>GLM-4.6V REVOLVED AROUND \"IMAGES ARE PARAMETERS, RESULTS ARE CONTEXT\" FROM THE VERY BEGINNING OF THE DESIGN<\/strong>, constructs a call capability for primary multimodular tools:<\/p>\n<ul>\n<li>Enter a multimodular pattern: Images, screenshots, document pages, etc. can be used directly as tool parameters, without having to convert to a text description before decomposition and reducing the loss of the link\u3002<\/li>\n<li>Output multimodules: The model can again visualize the results of statistical charts returned by the tool, post-rendered pages, retrieved merchandise images, etc., and be incorporated into the subsequent reasoning chain\u3002<\/li>\n<\/ul>\n<p>The prototypes support the use of visual input-based tools that fully communicate the closed loop from perception to understanding to implementation. This allows the GM-4.6V to cope with more complex visual tasks such as graphic mix-out, commodity identification and good price advice, and ancillary Agent scenarios\u3002<\/p>\n<p>GLM-4.6V has been validated on the MMBench, MathVista, OCRBench and others 30+ mainstream multi-modular assessment benchmark<strong>Significant improvement over previous generation models<\/strong>I don't know. At the same scale of parameters, the model is obtained in key capabilities such as multimodular interaction, logical reasoning and context\u00a0<strong>SOTA<\/strong>\u00a0Show. Of these, 9B versions of GLM-4.6V-Flash have overall performances exceeding Qwen3-VL-8B, 106B parameter 12B Activating GLM-4.6V over shoulder 2 times the number Qwen3-VL-235B\u3002<\/p>\n<p>IS OPENED THE GLM-4.6V MODEL WEIGHTS, CODE OF REASONING AND EXAMPLE ENGINEERING WITH THE FOLLOWING OPEN SOURCE ADDRESSES:<\/p>\n<ul>\n<li>GitHub: https:\/\/github.com\/zai-org\/GLM-V<\/li>\n<li>Hugging Face: https:\/\/huggingface.co\/collections\/zai-org\/glm-46v<\/li>\n<li>Modified Community: https:\/\/modersscope.cn\/collections\/GLM-46V-37fabc27818446<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Message dated 9 December, Secretary General AI announced and launched a large multi-modular model for the GLM-4.6V series, including: GLM-4.6V (106B-A12B): Basic version of cloud- and high-performance cluster scenarios; GLM-4.6V-Flash (9B): Light version for local deployment and low-delayed applications. GLM-4.6V, an important iterative of the GLM series in a multi-modular direction, raised the context window at training to 128k tokens, reaching the same parameter size in visual understanding accuracy SOTA, and for the first time incorporated the Function Call (tool call) capability into a visual model in a model structure, from Visual Awareness to Executable Action (A)<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[602,219,2680],"collection":[],"class_list":["post-47107","post","type-post","status-publish","format-standard","hentry","category-news","tag-602","tag-219","tag-2680"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/47107","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=47107"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/47107\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=47107"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=47107"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=47107"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=47107"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}