{"id":10881,"date":"2024-05-22T09:16:58","date_gmt":"2024-05-22T01:16:58","guid":{"rendered":"https:\/\/www.1ai.net\/?p=10881"},"modified":"2024-05-22T09:16:58","modified_gmt":"2024-05-22T01:16:58","slug":"meta%e5%8f%91%e5%b8%83%e7%b1%bbgpt-4o%e5%a4%9a%e6%a8%a1%e6%80%81%e6%a8%a1%e5%9e%8bchameleon","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/10881.html","title":{"rendered":"Meta releases Chameleon, a GPT-4o-like multimodal model"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/meta\" title=\"[View articles tagged with [Meta]]\" target=\"_blank\" >Meta<\/a>Recently published a<a href=\"https:\/\/www.1ai.net\/en\/tag\/chameleon\" title=\"_Other Organiser\" target=\"_blank\" >Chameleon<\/a>of<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [multimodal model]]\" target=\"_blank\" >Multimodal Model<\/a>, which sets a new benchmark in the development of multimodal models. Chameleon is an early-stage fusion token-based mixed-modal model family that can understand and generate images and text in any order. It uses a unified Transformer architecture, uses mixed modalities of text, images, and code to complete training, and tokenizes images to generate interleaved text and image sequences.<\/p>\n<p>The innovation of the Chameleon model lies in its early fusion approach, where all processing pipelines are mapped to a common representation space from the beginning, allowing the model to seamlessly process text and images. It demonstrates a wide range of capabilities on a variety of tasks, including visual question answering, image annotation, text generation, image generation, and long-form mixed-modal generation. On the image annotation task, Chameleon achieved<span class=\"spamTxt\">First<\/span>It achieves state-of-the-art performance and surpasses Llama-2 on text tasks, competing with models such as Mixtral8x7B and Gemini-Pro.<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-10882\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/05\/6385189732286813859108454.jpg\" alt=\"\" width=\"785\" height=\"586\" \/><\/p>\n<p>Paper address: https:\/\/arxiv.org\/pdf\/2405.09818<\/p>\n<p>The Chameleon model faced significant technical challenges, and the Meta research team introduced a series of architectural innovations and training techniques. For example, they developed a new image segmenter that encodes a 512\u00d7512 image into 1024 discrete tokens based on a codebook of size 8192. In addition, Chameleon uses the BPE segmenter trained by the sentencepiece open source library.<\/p>\n<p>In the pre-training phase, Chameleon uses mixed-modal data, including plain text, text-image pairs, and multi-modal documents with text and images interleaved. Pre-training is divided into two stages:<span class=\"spamTxt\">First<\/span>The first stage is unsupervised learning, and the second stage is mixed with higher quality data.<\/p>\n<p>The Chameleon model surpassed Llama2 across the board in benchmark evaluations, achieving significant results in common sense reasoning, reading comprehension, math problems, and world knowledge. In human evaluation and security testing, Chameleon-34B also far outperformed Gemini Pro and GPT-4V.<\/p>\n<p>Although Chameleon lacks the speech capabilities in GPT-4o, Meta&#039;s product management director said that they are very proud to support this team and hope to make GPT-4o closer to the open source community. This may mean that in the near future, we may get an open source version of GPT-4o.<\/p>\n<p>The release of the Chameleon model demonstrates Meta&#039;s significant progress in the field of multimodal models. It not only promotes the development of multimodal models, but also provides new possibilities for future research and applications.<\/p>","protected":false},"excerpt":{"rendered":"<p>Meta recently released a multimodal model called Chameleon, which sets a new benchmark in the development of multimodal models.Chameleon is an early convergence of a family of token-based mixed-modal models capable of comprehending and generating an arbitrary sequence of images and text. It is trained with a unified Transformer architecture using text, image, and code mixed modalities, and disambiguates images to generate interleaved text and image sequences. The Chameleon model is innovative in its early fusion approach, where all processing flows are mapped to a common representation space from the beginning, allowing the model to process text and images seamlessly. It demonstrates a wide range of capabilities on a variety of tasks, including visual quizzing, image annotation, text<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[2703,297,1096],"collection":[],"class_list":["post-10881","post","type-post","status-publish","format-standard","hentry","category-news","tag-chameleon","tag-meta","tag-1096"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/10881","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=10881"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/10881\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=10881"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=10881"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=10881"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=10881"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}