GENRE GLM-4.6V SERIES MULTIMODULAR AI LARGE MODEL RELEASE AND OPEN SOURCE API REDUCTION 50%

The news of December 9thZhipu IA ANNOUNCED ANDOpen Source GLM-4.6V SERIESMultimodal large model, included:

  • GM-4.6V (106B-A12B): BASIC VERSION OF CLOUD-ORIENTED AND HIGH-PERFORMANCE CLUSTER SCENARIOS
  • GM-4.6V-Flash (9B): Light version for local deployment and low-delayed applications。

GENRE GLM-4.6V SERIES MULTIMODULAR AI LARGE MODEL RELEASE AND OPEN SOURCE API REDUCTION 50%

AS AN IMPORTANT TRAJECTORIES OF THE GLM SERIES IN A MULTI-MODULAR DIRECTIONGLM-4.6V scales the training context window up to 128k tokens with the same parameter in visual understanding accuracy, and for the first time in the model structure, integrate the Function Call (tool call) capability in the visual model, and provide a single technical base for the multi-modular Agent in the real business scene by linking it from " visual perception " to "action " 。

GLM-4.6V SERIES COMPARED TO GLM-4.5V IN ADDITION TO PERFORMANCE OPTIMIZATION REDUCTION IN PRICE 501 TP3T, API calls as low as one dollar/ million tokens, output three dollars/ million tokens。

at the same time,GLM-4.6V-Flash Free and Open.

GLM-4.6V was incorporated into the GMCDing Plan as of this date, and a dedicated MCP tool was developed for the user's 8 major scene orientation, and the model is able to access the most compatible interface on its own。

ACCORDING TO AI, TRADITIONAL TOOL CALLS ARE MOSTLY BASED ON PURE TEXT, REQUIRING MULTIPLE INTERMEDIATE CONVERSIONS, RESULTING IN LOSS OF INFORMATION AND ENGINEERING COMPLEXITY IN THE FACE OF MULTI-MODULAR CONTENT SUCH AS IMAGES, VIDEOS AND COMPLEX DOCUMENTS。GLM-4.6V REVOLVED AROUND "IMAGES ARE PARAMETERS, RESULTS ARE CONTEXT" FROM THE VERY BEGINNING OF THE DESIGN, constructs a call capability for primary multimodular tools:

  • Enter a multimodular pattern: Images, screenshots, document pages, etc. can be used directly as tool parameters, without having to convert to a text description before decomposition and reducing the loss of the link。
  • Output multimodules: The model can again visualize the results of statistical charts returned by the tool, post-rendered pages, retrieved merchandise images, etc., and be incorporated into the subsequent reasoning chain。

The prototypes support the use of visual input-based tools that fully communicate the closed loop from perception to understanding to implementation. This allows the GM-4.6V to cope with more complex visual tasks such as graphic mix-out, commodity identification and good price advice, and ancillary Agent scenarios。

GLM-4.6V has been validated on the MMBench, MathVista, OCRBench and others 30+ mainstream multi-modular assessment benchmarkSignificant improvement over previous generation modelsI don't know. At the same scale of parameters, the model is obtained in key capabilities such as multimodular interaction, logical reasoning and context SOTA Show. Of these, 9B versions of GLM-4.6V-Flash have overall performances exceeding Qwen3-VL-8B, 106B parameter 12B Activating GLM-4.6V over shoulder 2 times the number Qwen3-VL-235B。

IS OPENED THE GLM-4.6V MODEL WEIGHTS, CODE OF REASONING AND EXAMPLE ENGINEERING WITH THE FOLLOWING OPEN SOURCE ADDRESSES:

  • GitHub: https://github.com/zai-org/GLM-V
  • Hugging Face: https://huggingface.co/collections/zai-org/glm-46v
  • Modified Community: https://modersscope.cn/collections/GLM-46V-37fabc27818446
statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

ANTGROUP UNIVERSAL AI ASSISTANT WEB PAGE EDITION, "30 SECONDS TO GENERATE SMALL APPLICATIONS IN NATURAL LANGUAGES"

2025-12-9 11:29:02

Information

Google DeepMind CEO: Expanding AI is key to achieving AGI

2025-12-9 11:39:18

Search