{"id":7567,"date":"2024-04-10T10:00:30","date_gmt":"2024-04-10T02:00:30","guid":{"rendered":"https:\/\/www.1ai.net\/?p=7567"},"modified":"2024-04-10T10:00:30","modified_gmt":"2024-04-10T02:00:30","slug":"%e8%8b%b9%e6%9e%9c%e6%96%b0ai%e6%a8%a1%e5%9e%8b%e7%a0%94%e7%a9%b6ferret-ui%ef%bc%9a%e6%88%96%e5%b0%86%e6%8f%90%e5%8d%87siri%ef%bc%8c%e8%af%bb%e6%87%82%e5%b1%8f%e5%b9%95%e5%86%85%e5%ae%b9","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/7567.html","title":{"rendered":"Apple&#039;s new AI model research Ferret-UI: may improve Siri and understand screen content"},"content":{"rendered":"<p>although<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%8b%b9%e6%9e%9c\" title=\"[View articles tagged with [apple]]\" target=\"_blank\" >apple<\/a>No AI models have been introduced since the start of the generative AI boom, but the company is working on a number of AI projects in the near future. Last week, Apple researchers shared a paper revealing a new language model the company is working on, and inside sources say Apple is working on two AI-powered robots.<\/p>\n<p>Now, the release of yet another research paper shows that Apple is just getting started. On Monday, Apple researchers published a research paper describing Ferret-UI, a new Multimodal Large Language Model (MLLM) that understands mobile user interface (UI) screens.<\/p>\n<p>MLLMs differ from standard LLMs in that they do not only involve text, but also demonstrate a deep understanding of multimodal elements such as images and audio. In this case, Ferret-UI was trained to recognize different elements of the user's home screen, such as application icons and small texts. In the past, recognizing application screen elements has been challenging for MLLM due to the subtle nature of these elements. To overcome this problem, the research paper notes that the researchers added \"arbitrary resolution\" to Ferret, allowing it to zoom in on details on the screen.<\/p>\n<p>Based on this, Apple's MLLM also has \"referring, grounding, and reasoning capabilities,\" which allow Ferret-UI to fully understand the UI screen and perform tasks based on the screen's content, as shown below.<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-7568\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/6384833853337142618886160.png\" alt=\"\" width=\"625\" height=\"514\" \/><\/p>\n<p>Apple researchers compared Ferret-UI to OpenAI's MLLM GPT-4V in public benchmarks, basic tasks and<span class=\"spamTxt\">advanced<\/span>Comparisons were made across tasks. Among the basic tasks, including icon recognition, OCR, widget categorization, find icons, and find widgets tasks on iPhone and Android, Ferret-UI outperforms GPT-4V on almost all tasks.<span class=\"spamTxt\">only<\/span>The exception is that GPT-4V slightly outperforms the Ferret model in the \"Find Text\" task on the iPhone.<\/p>\n<p>In arguing against the UI findings, GPT-4V came out slightly ahead, outperforming Ferret93.4% versus 91.7% on the inference dialog.However, the researchers noted that Ferret-UI's performance is still \"noteworthy\" because it generates raw coordinates , rather than a set of predefined boxes that GPT-4V selects from.<\/p>\n<p>The paper does not mention how Apple plans to utilize the technology, or if it will. Instead, the researchers state more broadly that the advanced features of Ferret-UI are expected to positively impact UI-related applications.Ferret-UI could enhance Siri's functionality. Thanks to the model's comprehensive understanding of the user's application screen and knowledge of performing certain tasks, Ferret-UI could be used to enhance Siri to perform tasks for the user.<\/p>","protected":false},"excerpt":{"rendered":"<p>Although apples did not present any AI models after the generation of AI heat, the company has recently started some AI projects. Last week, Apple researchers shared a paper that revealed a new language model being developed by the company, with inside news that Apple was developing two AI-driven robots. Today, another research paper shows that apples are just beginning. Monday, Apple researchers published a research paper presenting Ferret-UI, a new multi-modal large-language model (MLLM) capable of understanding the mobile user interface (UI) screen. MLLM differs from standard LLM in that it not only relates to text, but also demonstrates a deep understanding of multiple mode elements such as images and audio. In this case, Fe<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[167,345],"collection":[],"class_list":["post-7567","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai","tag-345"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/7567","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=7567"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/7567\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=7567"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=7567"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=7567"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=7567"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}