{"id":3474,"date":"2024-02-01T09:47:46","date_gmt":"2024-02-01T01:47:46","guid":{"rendered":"https:\/\/www.1ai.net\/?p=3474"},"modified":"2024-02-01T09:47:46","modified_gmt":"2024-02-01T01:47:46","slug":"%e9%98%bf%e9%87%8c%e6%8e%a8%e8%87%aa%e4%b8%bb%e5%a4%9a%e6%a8%a1%e6%80%81ai%e4%bb%a3%e7%90%86mobileagent-%e5%8f%af%e6%a8%a1%e6%8b%9f%e4%ba%ba%e7%b1%bb%e6%93%8d%e4%bd%9c%e6%89%8b%e6%9c%ba","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/3474.html","title":{"rendered":"Alibaba launches autonomous multimodal AI agent MobileAgent that can simulate human operation of mobile phones"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/mobileagent\" title=\"_Other Organiser\" target=\"_blank\" >MobileAgent<\/a>Is<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c%e5%b7%b4%e5%b7%b4\" title=\"[Sees articles with [Aribaba] label]\" target=\"_blank\" >Alibaba<\/a>An independent<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81ai%e4%bb%a3%e7%90%86\" title=\"[SEES ARTICLES WITH [MULTIMODAL AI AGENT] LABELS]\" target=\"_blank\" >Multimodal AI Agents<\/a>, which can simulate human operation of mobile phones, is a pure visual solution that does not require any system code and completely understands and operates mobile phones by analyzing images.<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-3475\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/02\/6384237633796874065129959.jpg\" alt=\"\" width=\"887\" height=\"541\" \/><\/p>\n<p>Features:<\/p>\n<ul>\n<li>Reliance on pure vision solutions: MobileAgent understands and operates the phone by analyzing images without requiring any system code. This increases versatility and flexibility, enabling it to operate apps without access to underlying code or data permissions.<\/li>\n<li>Independent of XML and system metadata: It does not rely on XML files and system metadata, which improves versatility and flexibility.<\/li>\n<li>Multiple visual perception tools: Use a variety of techniques to locate actions, including text, icons, buttons, etc.<\/li>\n<li>Plug and Play: No training required, can be used directly on different devices and applications.<\/li>\n<\/ul>\n<p>MobileAgent can automatically complete various tasks, such as helping users find hats on Alibaba and add them to the shopping cart based on conditions, searching for singer Jay Chou in Amazon Music or playing music about &quot;agent&quot;, searching for today&#039;s Lakers game results or information about Taylor Swift in Chrome, sending empty emails or emails with specific content in Gmail, liking or commenting on pet cat videos on TikTok, etc. It can also combine multiple applications to complete complex tasks.<\/p>\n<p>The features of MobileAgent include reliance on pure visual solutions, independence from XML and system metadata, the availability of a variety of visual perception tools for operational positioning, the need for exploration and training, and plug-and-play.<\/p>\n<p>Its working principle includes visual perception tools, autonomous task planning and execution, self-reflection and prompt format. MobileAgent uses visual perception modules, text and icon positioning, autonomous planning and self-reflection methods to realize the operation of mobile applications. Observation, thinking and action are the prompt formats adopted by MobileAgent, requiring the agent to output three components.<\/p>","protected":false},"excerpt":{"rendered":"<p>MobileAgent is an autonomous multimodal AI agent developed by Alibaba that simulates human operation of a cell phone. It is a purely visual solution that does not require any system code and understands and operates a cell phone entirely by analyzing images. Features: Relies on a purely visual solution: MobileAgent understands and operates cell phones by analyzing images without any system code. This adds versatility and flexibility, enabling it to operate applications without access to underlying code or data permissions. Independence from XML and system metadata: Increased versatility and flexibility by not relying on XML files and system metadata. Multiple Visual Awareness Tools: Operational positioning using a variety of techniques, including text, icons, buttons, etc. Plug and Play: No training required<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1109,1110,390],"collection":[],"class_list":["post-3474","post","type-post","status-publish","format-standard","hentry","category-news","tag-mobileagent","tag-ai","tag-390"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3474","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=3474"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3474\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=3474"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=3474"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=3474"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=3474"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}