Multimodal Agent Upgrade for Autonomous Operating Computers, Smart Spectrum GLM-PC Open Experience

January 23rd, BeijingZhipuWachovia Technologies Ltd. today issued a letter announcing thatIts Smart Spectrum GLM-PC Open ExperienceThe claim that "autonomous operation of computersMultimodality Agent Re-Escalation".

Multimodal Agent Upgrade for Autonomous Operating Computers, Smart Spectrum GLM-PC Open Experience

It is reported that GLM-PC is based on the Chi-Spectrum Multi-Modal Large Model CogAgent.The world's first publicly accessible, turnkey computerized intelligence (agent).GLM-PC v1.0 was released in open beta on November 29, 2024. GLM-PC v1.0 was released on November 29, 2024, and is currently in open beta, with a new "Deep Thinking" mode, additional features dedicated to logical reasoning and code generation, and support for Windows.

1AI learned from the official information of Smart Spectrum that GLM-PC has the following capabilities:

Code Generation and Logic Execution

  • Planning: Supports comprehensive analysis of goals as well as available resources, generates execution roadmaps, and automatically breaks down large tasks into manageable sub-tasks to build out clear execution paths.

  • Loop Execution: After the planning phase, the support launches the code generation module to execute a logical loop that progressively advances the task to completion. This looping mechanism ensures precise execution and a high degree of automation of tasks, thus realizing a complete closed loop from input to output without human intervention.

  • Long-thinking ability: support real-time adjustment, reflection and correction and self-correction to continuously optimize the solution. Specific performance: when the process is interrupted by external factors, the logical path can be reconstructed; when encountering a lack of information, it can take the initiative to interact with the user and improve the task execution plan by asking questions.

Graphics and GUI Cognition

  • GUI image understanding: accurately recognize graphical interface elements (e.g., buttons, icons, layouts, etc.) and understand their functions and interaction logic

  • User behavior cognition: Combining the learning of the user interface and the understanding of historical operation information, it provides users with intelligent recommended operations for the current interface

  • Image Semantic Parsing: In-depth semantic analysis of complex images to extract key information such as text, identifiers, and trends and metrics in data visualization charts and graphs.

  • Multi-modal information fusion: Fusing image and text information to form a comprehensive perceptual result. For example, recognizing both button positions and text labels in the user interface helps the "left brain" to make precise operation plans.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

French AI startup Mistral CEO says company won't sell, plans IPO

2025-1-23 14:31:40

Information

Smart Spectrum GLM-PC Computer Intelligent Body Large Model Open Experience: Autonomous operation of the computer, back to the car that is used

2025-1-23 20:17:33

Search