{"id":3013,"date":"2024-01-19T10:03:40","date_gmt":"2024-01-19T02:03:40","guid":{"rendered":"https:\/\/www.1ai.net\/?p=3013"},"modified":"2024-01-19T10:03:40","modified_gmt":"2024-01-19T02:03:40","slug":"%e8%8b%b9%e6%9e%9caim%e8%87%aa%e5%9b%9e%e5%bd%92%e8%a7%86%e8%a7%89%e6%a8%a1%e5%9e%8b%e9%aa%8c%e8%af%81%e6%80%a7%e8%83%bd%e4%b8%8e%e6%a8%a1%e5%9e%8b%e8%a7%84%e6%a8%a1%e6%9c%89%e5%85%b3","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/3013.html","title":{"rendered":"Apple AIM autoregressive vision model validation performance is related to model size"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%8b%b9%e6%9e%9c\" title=\"[View articles tagged with [apple]]\" target=\"_blank\" >apple<\/a>Researchers at the company modeled the image by autoregressive image modeling (<a href=\"https:\/\/www.1ai.net\/en\/tag\/aim\" title=\"_OTHER ORGANISER\" target=\"_blank\" >AIM<\/a>AIM can effectively utilize large amounts of unorganized image data, and its training methodology and stability are similar to that of recent large-scale language models (<a href=\"https:\/\/www.1ai.net\/en\/tag\/llm\" title=\"[SEE ARTICLES WITH [LLM] LABELS]\" target=\"_blank\" >LLM<\/a>) is similar. This observation is consistent with previous findings on extending large language models.<\/p>\n<p>Although the model used for the experiments in this paper is limited in size, further exploration is needed to see if this law can be validated on models with larger parameter scales. The pre-training objective used by the researchers follows the standard autoregressive model applied to image patch sequences, and through a series of experiments and studies, it is verified that the model capacity can be easily scaled up to billions of parameters with good performance for downstream tasks.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-3014\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/01\/6384119425717309526096978.png\" alt=\"\" width=\"852\" height=\"389\" \/><\/p>\n<p>In addition, the researchers explored multiple aspects of training ViT models with autoregressive objectives and revisited previous work. The researcher's experiments report that the optimization objective directly leads to better downstream performance throughout the training process, while both the loss value and the accuracy of the downstream task improve as the model capacity increases. This observation is consistent with the trend observed in LLMs, reflecting the fact that optimization goals lead directly to better downstream performance.<\/p>\n<p>Among the design parameters of the AIM, in addition to the extended width, the researcher has specifically adopted a simple design using multi-layer perceptron blocks that process each patch independently. The researcher also emphasizes that the scale of the studied model is limited and validation of this law on models with larger parameter scales is yet to be further explored.<\/p>\n<p>The experimental results of the paper prove that the visual model also follows the law of \"the more parameters, the stronger the performance\", and the autoregressive training has good scalability for the image model and can meet the training requirements of visual features. It provides a new research direction and idea for future image model performance improvement and optimization.<\/p>","protected":false},"excerpt":{"rendered":"<p>Apple researchers have validated the \"more parameters, more performance\" rule for visual models with the Autoregressive Image Model (AIM), further demonstrating that the model improves as the volume or amount of pre-training data increases.The AIM makes efficient use of large amounts of unorganized image data, and the training methodology and stability are similar to those of the recent Large Language Model (LLM). Large Language Model (LLM). This observation is consistent with previous findings on scaling large language models. Although the scale of the model used for the experiments in this paper is limited, further exploration is needed to see if this pattern can be verified on models with larger parameter scales. The pre-training objectives used by the researchers follow the standard autoregressive model applied to image patch sequences, and through a series of experiments and studies, it was verified that the model capacity can be easily scaled up to billions of parameters<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[958,473,345],"collection":[],"class_list":["post-3013","post","type-post","status-publish","format-standard","hentry","category-news","tag-aim","tag-llm","tag-345"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3013","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=3013"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3013\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=3013"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=3013"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=3013"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=3013"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}