{"id":3596,"date":"2024-02-03T09:11:14","date_gmt":"2024-02-03T01:11:14","guid":{"rendered":"https:\/\/www.1ai.net\/?p=3596"},"modified":"2024-02-03T09:11:14","modified_gmt":"2024-02-03T01:11:14","slug":"ai2%e5%8f%91%e5%b8%83%e5%bc%80%e6%94%be%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8bolmo-%e5%8f%b7%e7%a7%b0%e5%a4%9a%e9%a1%b9%e6%80%a7%e8%83%bd%e5%aa%b2%e7%be%8ellama2","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/3596.html","title":{"rendered":"AI2 releases open language model OLMo, claiming that many performances are comparable to Llama2"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/ai2\" title=\"[SEE ARTICLES WITH [AI2] LABELS]\" target=\"_blank\" >AI2<\/a><span class=\"spamTxt\">up to date<\/span>Released Open<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [language model]]\" target=\"_blank\" >Language Model<\/a>\uff08<a href=\"https:\/\/www.1ai.net\/en\/tag\/olmo\" title=\"_Other Organiser\" target=\"_blank\" >OLMo<\/a>) framework is designed to promote research and experimentation in large-scale language models. By providing training code, models, and evaluation code on Hugging Face and GitHub, AI2 is committed to enabling academics and researchers to jointly study the science of language models, explore the impact of new pre-training data subsets on downstream performance, and study new pre-training methods and stability.<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-3597\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/02\/6384248130440996737699919.png\" alt=\"\" width=\"651\" height=\"361\" \/><\/p>\n<p>The first batch of models in the project include four 7B scale final variants corresponding to different architectures, optimizers and training hardware, and a 1B scale model, all trained on at least 2T tokens. This is a long-term plan.<span class=\"spamTxt\">First<\/span>As the company continues to build out its product, it plans to continue releasing larger models, models with guidance tweaks, and more variants.<\/p>\n<p>Each model is provided with complete training data, including code for generating training data, as well as AI2&#039;s Dolma and WIMBD for analyzing pre-trained data. In addition, complete model weights, training code, training logs, training metrics in the form of Weights &amp; Biases logs, and inference code are also provided. More than 500 checkpoints in the training process for each model are also available as revisions on HuggingFace.<\/p>\n<p>In creating a strong open model, AI2 learned from many other open and partially open models and used them as competitive benchmarks for OLMo. The technical report of the project mentioned that the OLMo7B model surpassed the OLMo7B model in aspects such as generation tasks or reading comprehension (such as truthfulQA).<a href=\"https:\/\/www.chinaz.com\/tags\/835619.shtml\" target=\"_blank\" rel=\"noopener\">Llama2<\/a>, but lags slightly behind on popular question answering tasks such as MMLU or Big-bench Hard.<\/p>\n<p>For the 1B OLMo model, an analysis was performed using AI2\u2019s Paloma and checkpoints available on GitHub to explore the relationship between the model\u2019s performance in terms of language prediction and factors such as model size. AI2 emphasized that Paloma\u2019s approach attempts to provide a more balanced representation of the many domains in which language models are used by sampling each domain evenly.<\/p>\n<p>The OLMo framework adopts<span class=\"spamTxt\">up to date<\/span>Many trends in the literature, including not using bias (such as stability in PaLM), the SwigLU activation function used by PaLM and Llama, Rotary Positional Embedding (RoPE), and a modified version of the BPE base tagger of GPT-NeoX-20B, aim to reduce personally identifiable information.<\/p>\n<p>This release is just the beginning of OLMo and the framework, and future work is planned to be launched in different scales, modalities, data sets, safety measures, and evaluation. AI2 encourages the use of the OLMo model, provides simple installation steps and usage examples, and says that in the future, it will launch features such as guided adjustment models, complete training logs, and wandb reports.<\/p>\n<p>Blog URL: https:\/\/blog.allenai.org\/olmo-open-language-model-87ccfc95f58<\/p>","protected":false},"excerpt":{"rendered":"<p>AI2's newly released Open Language Modeling (OLMo) framework aims to advance research and experimentation in large-scale language modeling. By making training code, models, and evaluation code available on Hugging Face and GitHub, AI2 is committed to enabling academics and researchers to work together on the science of language modeling, exploring the impact of new subsets of pre-training data on downstream performance, as well as investigating new pre-training methods and stability. The project's first models include four final variants at 7B scale, corresponding to different architectures, optimizers, and training hardware, and a 1B scale model, all trained on at least 2T tokens. This is the first step in a long-term program with plans to continue releasing larger scale models, guidance-tuned models, and more variants. Each<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1143,1146,1145,1144],"collection":[],"class_list":["post-3596","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai2","tag-llama2","tag-olmo","tag-1144"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3596","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=3596"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3596\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=3596"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=3596"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=3596"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=3596"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}