January 23rd, BeijingZhipuWachovia Technologies Ltd. today issued a letter announcing thatIts Smart Spectrum GLM-PC Open ExperienceThe claim that "autonomous operation of computersMultimodality Agent Re-Escalation".

It is reported that GLM-PC is based on the Chi-Spectrum Multi-Modal Large Model CogAgent.The world's first publicly accessible, turnkey computerized intelligence (agent).GLM-PC v1.0 was released in open beta on November 29, 2024. GLM-PC v1.0 was released on November 29, 2024, and is currently in open beta, with a new "Deep Thinking" mode, additional features dedicated to logical reasoning and code generation, and support for Windows.
1AI learned from the official information of Smart Spectrum that GLM-PC has the following capabilities:
Code Generation and Logic Execution
Planning: Supports comprehensive analysis of goals as well as available resources, generates execution roadmaps, and automatically breaks down large tasks into manageable sub-tasks to build out clear execution paths.
Loop Execution: After the planning phase, the support launches the code generation module to execute a logical loop that progressively advances the task to completion. This looping mechanism ensures precise execution and a high degree of automation of tasks, thus realizing a complete closed loop from input to output without human intervention.
Long-thinking ability: support real-time adjustment, reflection and correction and self-correction to continuously optimize the solution. Specific performance: when the process is interrupted by external factors, the logical path can be reconstructed; when encountering a lack of information, it can take the initiative to interact with the user and improve the task execution plan by asking questions.
Graphics and GUI Cognition
GUI image understanding: accurately recognize graphical interface elements (e.g., buttons, icons, layouts, etc.) and understand their functions and interaction logic
User behavior cognition: Combining the learning of the user interface and the understanding of historical operation information, it provides users with intelligent recommended operations for the current interface
Image Semantic Parsing: In-depth semantic analysis of complex images to extract key information such as text, identifiers, and trends and metrics in data visualization charts and graphs.
Multi-modal information fusion: Fusing image and text information to form a comprehensive perceptual result. For example, recognizing both button positions and text labels in the user interface helps the "left brain" to make precise operation plans.