{"id":6599,"date":"2024-03-29T10:27:25","date_gmt":"2024-03-29T02:27:25","guid":{"rendered":"https:\/\/www.1ai.net\/?p=6599"},"modified":"2024-03-29T10:27:25","modified_gmt":"2024-03-29T02:27:25","slug":"ai21%e5%8f%91%e5%b8%83%e4%b8%96%e7%95%8c%e9%a6%96%e4%b8%aamamba%e7%9a%84%e7%94%9f%e4%ba%a7%e7%ba%a7%e6%a8%a1%e5%9e%8bjamba-%e6%94%af%e6%8c%81256k%e4%b8%8a%e4%b8%8b%e6%96%87%e9%95%bf%e5%ba%a6","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/6599.html","title":{"rendered":"AI21 releases Jamba, the world\u2019s first production-level model of Mamba, supporting 256K context length"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/ai21\" title=\"_OTHER ORGANISER\" target=\"_blank\" >AI21<\/a>Published World<span class=\"spamTxt\">The first<\/span>Mamba&#039;s production-grade model:<a href=\"https:\/\/www.1ai.net\/en\/tag\/jamba\" title=\"_Other Organiser\" target=\"_blank\" >Jamba<\/a>. This model uses the groundbreaking SSM-Transformer architecture with 52B parameters, of which 12B are active at generation time. Jamba combines Joint Attention and Mamba technology to support 256K context length. A single A10080GB can accommodate up to 140K contexts. Compared with Mixtral8x7B, the throughput of long contexts is increased by 3 times.<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-6600\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/03\/6384730205528475546732656.png\" alt=\"\" width=\"590\" height=\"341\" \/><\/p>\n<p>Model address: https:\/\/huggingface.co\/ai21labs\/Jamba-v0.1<\/p>\n<p>Jamba represents a major innovation in model design. It combines elements of Mamba structured state space (SSM) technology and traditional Transformer architecture to make up for the inherent limitations of pure SSM models. Mamba is a structured state space model (SSM), which is a model used to capture and process data changes over time, and is particularly suitable for processing sequence data such as text or time series data. A key advantage of the SSM model is its ability to efficiently process long sequence data, but it may not be as powerful as other models when dealing with complex patterns and dependencies.<\/p>\n<p>The Transformer architecture is one of the most successful models in the field of artificial intelligence in recent years, especially in<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%87%aa%e7%84%b6%e8%af%ad%e8%a8%80%e5%a4%84%e7%90%86\" title=\"[Sees articles with [natural language processing] labels]\" target=\"_blank\" >Natural Language Processing<\/a>\uff08<a href=\"https:\/\/www.1ai.net\/en\/tag\/nlp\" title=\"_OTHER ORGANISER\" target=\"_blank\" >NLP<\/a>) tasks. It can process and understand language data very effectively and capture long-distance dependencies, but it encounters problems with computational efficiency and memory consumption when processing long sequence data.<\/p>\n<p>The Jamba model combines elements of Mamba&#039;s SSM technology and the Transformer architecture, aiming to leverage the strengths of both while overcoming their respective limitations. Through this combination, Jamba is not only able to efficiently process long sequences of data (which is Mamba&#039;s strength), but also maintains a high level of understanding of complex language patterns and dependencies (which is the strength of the Transformer). This means that the Jamba model can maintain high efficiency without sacrificing performance or accuracy when dealing with tasks that require understanding large amounts of text and complex dependencies.<\/p>","protected":false},"excerpt":{"rendered":"<p>AI21 has released the world's first production-grade model of Mamba:Jamba.This model utilizes the groundbreaking SSM-Transformer architecture with 52B parameters, 12B of which are active at the time of generation.Jamba combines Joint Attention and Mamba technology to support 256K context lengths. A single A10080GB can hold up to 140K contexts. Compared to Mixtral8x7B, the throughput of long contexts is improved by a factor of 3. Model address: https:\/\/huggingface.co\/ai21labs\/Jamba-v0.1 Jamba represents a major innovation in model design. It combines the Mamba structured shape<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1948,1949,188,1950],"collection":[],"class_list":["post-6599","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai21","tag-jamba","tag-nlp","tag-1950"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/6599","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=6599"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/6599\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=6599"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=6599"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=6599"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=6599"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}