Byte Seed Open Source UI-TARS-1.5: Multimodal Intelligences Built on Visual-Linguistic Models

April 18, 1AI learned from the beanbag big model team that UI-TARS-1.5 was officially released yesterday andOpen Source. This is an open source visual-linguistic model built on themultimodal intelligenceThe company is able to perform all kinds of tasks efficiently in the virtual world.

Byte Seed Open Source UI-TARS-1.5: Multimodal Intelligences Built on Visual-Linguistic Models

The relevant links are below:

  • GitHub:https://github.com/bytedance/UI-TARS
  • Website:https://seed-tars.com/
  • Arxiv:https://arxiv.org/abs/2501.12326

UI-TARS-1.5 is based onbyteThe previously proposed native intelligentsia scheme, UI-TARS, further enhances the model's higher-order reasoning capabilities through reinforcement learning, enabling the model toThink before you act..

This version of the model also shows the team's new vision of using games as a vehicle to enhance the reasoning capabilities of the underlying model. Games rely more on intuitive, common-sense reasoning and less on specialized knowledge than domains such as math and programming, making them often ideal test scenarios for assessing and enhancing the general capabilities of future models.

According to the introduction, UI-TARS is a native GUI intelligence body, with the ability to operate real computer and cell phone systems, and at the same time, can also control the browser, complete complex interactive tasks.UI-TARS-1.5 can realize accurate GUI operation, based on the team's technical exploration in four dimensions:

  • Enhanced visual perception:Relying on large-scale interface screenshot data, the model understands the semantics and context of the elements to form an accurate description.
  • System 2 Reasoning Mechanisms:Generate "thought" before action to support multi-step planning and decision making for complex tasks.
  • Unified action modeling:Build a cross-platform standardized action space to improve action controllability and execution accuracy through real trajectory learning.
  • Self-evolving training paradigms:Through automated interactive trajectory acquisition and reflective training, the model continuously improves from errors and adapts to complex environmental changes.
statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Google also wants to "send AI to campus": U.S. college students can subscribe to the Google One AI Premium program for free for a limited time.

2025-4-18 10:57:57

Information

OpenAI's strongest inference model o3 / o4-mini released, "photo location search" becomes the latest popular way to play

2025-4-18 11:00:44

Search