{"id":5671,"date":"2024-03-17T09:24:31","date_gmt":"2024-03-17T01:24:31","guid":{"rendered":"https:\/\/www.1ai.net\/?p=5671"},"modified":"2024-03-17T09:24:31","modified_gmt":"2024-03-17T01:24:31","slug":"%e8%8b%b9%e6%9e%9c%e6%8e%a8%e5%87%ba-300-%e4%ba%bf%e5%8f%82%e6%95%b0-mm1-%e5%a4%9a%e6%a8%a1%e6%80%81-ai%e5%a4%a7%e6%a8%a1%e5%9e%8b%ef%bc%8c%e5%8f%af%e8%af%86%e5%88%ab%e5%9b%be%e5%83%8f%e6%8e%a8","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/5671.html","title":{"rendered":"Apple launches MM1 multimodal AI model with 30 billion parameters, capable of recognizing images and reasoning about natural language"},"content":{"rendered":"<p data-vmark=\"c895\"><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%8b%b9%e6%9e%9c\" title=\"[View articles tagged with [apple]]\" target=\"_blank\" >apple<\/a>The company&#039;s research team recently\u00a0<a title=\"ArXiv\" href=\"https:\/\/arxiv.org\/pdf\/2403.09611.pdf\" target=\"_blank\" rel=\"noopener\">ArXiv<\/a>\u00a0published an article titled<a href=\"https:\/\/www.1ai.net\/en\/tag\/mm1\" title=\"_OTHER ORGANISER\" target=\"_blank\" >MM1<\/a>: Methods, Analysis &amp; Insights from Multimodal LLM Pre-training&quot;, which introduces a &quot;MM1&quot;<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with [Multimodal Large Model] labels]\" target=\"_blank\" >Multimodal large model<\/a>The model provides three parameter sizes: 3 billion, 7 billion, and 30 billion.<span class=\"accentTextColor\">Possess image recognition and natural language reasoning capabilities<\/span>.<\/p>\n<p data-vmark=\"ee42\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-5672\" title=\"fa011969-7626-4299-bf0e-8b754d9dc074\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/03\/fa011969-7626-4299-bf0e-8b754d9dc074.png\" alt=\"fa011969-7626-4299-bf0e-8b754d9dc074\" width=\"722\" height=\"844\" \/><\/p>\n<p data-vmark=\"609e\"><span class=\"accentTextColor\">The relevant papers of the Apple research team mainly use the MM1 model for experiments<\/span>, by controlling various variables, find out the key factors that affect the model effect.<\/p>\n<p data-vmark=\"b749\">Research shows that<span class=\"accentTextColor\">Image resolution and the number of image tags have a greater impact on model performance, while the visual language connector has a smaller impact on the model. Different types of pre-training data have different effects on model performance.<\/span>.<\/p>\n<p data-vmark=\"b926\">According to reports, the research team first conducted small-scale ablation experiments on model architecture decisions and pre-training data. Then they built the MM1 model using the Mixture of Experts architecture and a method called Top-2 Gating, which claims to have achieved the best performance in pre-training indicators and maintained competitive performance after supervised fine-tuning on a series of existing multimodal benchmarks.<\/p>\n<p data-vmark=\"6e02\">The researchers tested the &quot;MM1&quot; model.<span class=\"accentTextColor\">The MM1-3B-Chat and MM1-7B-Chat are said to be superior to most models of the same size on the market.<\/span>MM1-3B-Chat and MM1-7B-Chat performed particularly well in VQAv2, TextVQA, ScienceQA, MMBench, MMMU, and MathVista, but their overall performance was inferior to Google&#039;s Gemini and OpenAI&#039;s GPT-4V.<\/p>\n<p data-vmark=\"790e\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-5673\" title=\"e57ba020-a8ea-466e-9cc6-36e92e64475b\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/03\/e57ba020-a8ea-466e-9cc6-36e92e64475b.png\" alt=\"e57ba020-a8ea-466e-9cc6-36e92e64475b\" width=\"1244\" height=\"934\" \/><\/p>","protected":false},"excerpt":{"rendered":"<p>Apple's research team recently published a paper in ArXiv titled \"MM1: Methods, Analysis &amp; Insights from Multimodal LLM Pre-training\", which introduces a \"MM1\" multimodal large model, which offers three parameter scales of 3 billion, 7 billion, and 30 billion, with image recognition and natural language reasoning capabilities. The relevant paper of Apple's research team focuses on experiments using the MM1 model to find out the key factors affecting the model's effectiveness by controlling various variables. The research shows that the image resolution and the number of image markers have a greater impact on the model performance, and the visual language connector has a smaller impact on the model<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1727,602,345],"collection":[],"class_list":["post-5671","post","type-post","status-publish","format-standard","hentry","category-news","tag-mm1","tag-602","tag-345"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/5671","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=5671"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/5671\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=5671"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=5671"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=5671"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=5671"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}