{"id":33459,"date":"2025-04-18T10:59:20","date_gmt":"2025-04-18T02:59:20","guid":{"rendered":"https:\/\/www.1ai.net\/?p=33459"},"modified":"2025-04-18T10:59:20","modified_gmt":"2025-04-18T02:59:20","slug":"%e5%ad%97%e8%8a%82-seed-%e5%bc%80%e6%ba%90-ui-tars-1-5%ef%bc%9a%e5%9f%ba%e4%ba%8e%e8%a7%86%e8%a7%89-%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b%e6%9e%84%e5%bb%ba%e7%9a%84%e5%a4%9a%e6%a8%a1%e6%80%81%e6%99%ba","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/33459.html","title":{"rendered":"Byte Seed Open Source UI-TARS-1.5: Multimodal Intelligences Built on Visual-Linguistic Models"},"content":{"rendered":"<p>April 18, 1AI learned from the beanbag big model team that UI-TARS-1.5 was officially released yesterday and<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>. This is an open source visual-linguistic model built on the<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81%e6%99%ba%e8%83%bd%e4%bd%93\" title=\"[Sees articles with [Multimodal Intelligence] labels]\" target=\"_blank\" >multimodal intelligence<\/a>The company is able to perform all kinds of tasks efficiently in the virtual world.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-33460\" title=\"d53c4c71j00suw89g002hd000u000g1p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/04\/d53c4c71j00suw89g002hd000u000g1p.jpg\" alt=\"d53c4c71j00suw89g002hd000u000g1p\" width=\"1080\" height=\"577\" \/><\/p>\n<p>The relevant links are below:<\/p>\n<ul>\n<li><strong>GitHub:<\/strong>https:\/\/github.com\/bytedance\/UI-TARS<\/li>\n<li><strong>Website:<\/strong>https:\/\/seed-tars.com\/<\/li>\n<li><strong>Arxiv:<\/strong>https:\/\/arxiv.org\/abs\/2501.12326<\/li>\n<\/ul>\n<p>UI-TARS-1.5 is based on<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%ad%97%e8%8a%82\" title=\"[See articles with [byte] labels]\" target=\"_blank\" >byte<\/a>The previously proposed native intelligentsia scheme, UI-TARS, further enhances the model's higher-order reasoning capabilities through reinforcement learning, enabling the model to<strong>Think before you act.<\/strong>.<\/p>\n<p>This version of the model also shows the team's new vision of using games as a vehicle to enhance the reasoning capabilities of the underlying model. Games rely more on intuitive, common-sense reasoning and less on specialized knowledge than domains such as math and programming, making them often ideal test scenarios for assessing and enhancing the general capabilities of future models.<\/p>\n<p>According to the introduction, UI-TARS is a native GUI intelligence body, with the ability to operate real computer and cell phone systems, and at the same time, can also control the browser, complete complex interactive tasks.UI-TARS-1.5 can realize accurate GUI operation, based on the team's technical exploration in four dimensions:<\/p>\n<ul>\n<li><strong>Enhanced visual perception:<\/strong>Relying on large-scale interface screenshot data, the model understands the semantics and context of the elements to form an accurate description.<\/li>\n<li><strong>System 2 Reasoning Mechanisms:<\/strong>Generate \"thought\" before action to support multi-step planning and decision making for complex tasks.<\/li>\n<li><strong>Unified action modeling:<\/strong>Build a cross-platform standardized action space to improve action controllability and execution accuracy through real trajectory learning.<\/li>\n<li><strong>Self-evolving training paradigms:<\/strong>Through automated interactive trajectory acquisition and reflective training, the model continuously improves from errors and adapts to complex environmental changes.<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>April 18, 1AI learned from the Beanbag Big Model team that UI-TARS-1.5 was officially released and open-sourced yesterday. This is an open source multimodal intelligence built on a visual-linguistic model, capable of efficiently performing all kinds of tasks in the virtual world. The related links are as follows: GitHub: https:\/\/github.com\/bytedance\/UI-TARS Website: https:\/\/seed-tars.com\/ Arxiv: https:\/\/arxiv.org\/abs\/2501.12326 UI-TARS-1.5 Based on Byte's previously proposed native intelligences program UI-TARS, UI-TARS is further enhanced by reinforcement learning.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[6349,1532,219],"collection":[],"class_list":["post-33459","post","type-post","status-publish","format-standard","hentry","category-news","tag-6349","tag-1532","tag-219"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/33459","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=33459"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/33459\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=33459"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=33459"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=33459"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=33459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}