{"id":8481,"date":"2024-04-19T10:06:27","date_gmt":"2024-04-19T02:06:27","guid":{"rendered":"https:\/\/www.1ai.net\/?p=8481"},"modified":"2024-04-19T10:06:27","modified_gmt":"2024-04-19T02:06:27","slug":"%e5%a4%a7%e6%a8%a1%e5%9e%8b%e9%82%a3%e4%b9%88%e7%81%ab%ef%bc%8c%e6%95%99%e4%bd%a0%e4%b8%80%e9%94%ae%e7%8e%a9%e8%bd%ac%e5%bc%80%e6%ba%90llama3%e5%a4%a7%e6%a8%a1%e5%9e%8b","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/8481.html","title":{"rendered":"Big models are so popular, teach you how to play with open source Llama3 big models with one click"},"content":{"rendered":"<div class=\"pgc-img\" data-pm-slice=\"0 0 []\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8482\" title=\"get-599\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/get-599.jpg\" alt=\"get-599\" width=\"1080\" height=\"483\" \/><\/div>\n<p data-track=\"205\"><a href=\"https:\/\/www.1ai.net\/en\/tag\/llama-3\" title=\"[See articles with [Llama 3] labels]\" target=\"_blank\" >Llama 3<\/a>Released today, it provides pre-trained and instruction-fine-tuned language models with 8B and 70B parameters, and these models will soon be available on mainstream platforms such as AWS, Google Cloud, and Microsoft Azure, with strong support from hardware platforms such as AMD and Intel.<\/p>\n<p data-track=\"207\"><a href=\"https:\/\/www.1ai.net\/en\/tag\/llama\" title=\"_Other Organiser\" target=\"_blank\" >Llama<\/a>3 Direct links:<u>https:\/\/llama.meta.com\/llama3<\/u><\/p>\n<p data-track=\"208\">The Llama Chinese Community will take you to learn more about and use Llama3 from the following aspects:<\/p>\n<p data-track=\"209\">1. Llama3 introduction, performance and technical analysis<\/p>\n<p data-track=\"210\">2. How to experience and download Llama3 models<\/p>\n<p data-track=\"211\">3. How to call Llama3<\/p>\n<p data-track=\"212\">4. Chat with industry experts about Llama3<\/p>\n<p data-track=\"214\"><strong>1. Introduction to Llama3<\/strong><\/p>\n<p data-track=\"217\"><strong>\u201cThe best open source<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [large models]]\" target=\"_blank\" >Large Model<\/a>\u201d<\/strong><\/p>\n<p data-track=\"219\">The new Llama 3 models, including 8B and 70B parameter versions, are a major upgrade of Llama 2. Llama3 pre-trained models and instruction fine-tuning models perform well in the 8B and 70B parameter scales, becoming<strong>The current open source model is the best<\/strong>The post-training improvements significantly reduced the false rejection rate, improved consistency, and increased the diversity of the model\u2019s responses.<\/p>\n<p data-track=\"221\"><strong>Llama3 model details at a glance<\/strong><\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8484\" title=\"get-601\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/get-601.jpg\" alt=\"get-601\" width=\"1080\" height=\"505\" \/><\/div>\n<p data-track=\"223\"><strong>Performance<\/strong><\/p>\n<p data-track=\"225\">Llama 3 in<strong>Great progress has also been made in functions such as reasoning, code generation and instruction tracing<\/strong>, the model is easier to control. The performance and user experience of the model are significantly improved.<\/p>\n<p data-track=\"227\">Llama 3 8B outperforms other open source models such as Mistral\u2019s Mistral 7B and Google\u2019s Gemma 7B in at least 9 benchmarks. Both models contain 7 billion parameters. Llama 3 8B performs well in the following benchmarks:<\/p>\n<blockquote class=\"pgc-blockquote-abstract\">\n<p data-track=\"228\">MMLU: A Multi-Task Language Understanding Benchmark.<\/p>\n<p data-track=\"229\">ARC: A test of complex reading comprehension.<\/p>\n<p data-track=\"230\">DROP: A digital reading comprehension test.<\/p>\n<p data-track=\"231\">GPQA: A set of questions covering biology, physics, and chemistry related issues.<\/p>\n<p data-track=\"232\">HumanEval: Code generation testing.<\/p>\n<p data-track=\"233\">GSM-8K: Math word problems.<\/p>\n<p data-track=\"234\">MATH: Mathematics benchmark test.<\/p>\n<p data-track=\"235\">AGIEval: Problem Solving Test Set.<\/p>\n<p data-track=\"236\">BIG-Bench Hard: An assessment of common sense reasoning.<\/p>\n<p data-track=\"237\">\n<\/blockquote>\n<p data-track=\"238\">Llama 3 70B outperforms the weaker Claude 3 Sonnet in the Claude 3 family on five benchmarks, including MMLU, GPQA, HumanEval, GSM-8K, and MATH. These results highlight the superior performance of the Llama 3 70B model across a wide range of application domains.<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8485\" title=\"get-602\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/get-602.jpg\" alt=\"get-602\" width=\"1080\" height=\"608\" \/><\/div>\n<p data-track=\"239\">During the development of Llama 3, we not only focused on model performance, but also focused on optimizing performance in actual application scenarios.<\/p>\n<p data-track=\"241\"><strong>The team created a new set of high-quality human evaluations<\/strong>, covers 1,800 prompts across 12 key use cases: suggestion consultation, brainstorming, classification, closed question and answer, programming, creative writing, information extraction, character creation, open question and answer, reasoning, rewriting, and summarizing.<\/p>\n<p data-track=\"243\">The figure below shows the human evaluation results for Claude Sonnet, Mistral Medium, and GPT-3.5 on these categories and prompts.<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8483\" title=\"get-600\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/get-600.jpg\" alt=\"get-600\" width=\"1080\" height=\"677\" \/><\/div>\n<p data-track=\"245\">\n<p data-track=\"246\">and<strong>The Llama3-8B model performed better than the Llama2-70B model in the test results.<\/strong><\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8486\" title=\"get-603\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/get-603.jpg\" alt=\"get-603\" width=\"1080\" height=\"525\" \/><\/div>\n<p data-track=\"249\"><strong>Technical Details<\/strong><\/p>\n<p data-track=\"251\">The development of Llama 3 emphasizes excellent language model design, focusing on innovation, expansion, and optimization. The project revolves around four key elements: model architecture, pre-training data, expansion of pre-training scale, and instruction fine-tuning.<\/p>\n<h1 class=\"pgc-h-arrow-right\" spellcheck=\"false\" data-track=\"253\">Model Architecture<\/h1>\n<p data-track=\"254\">Llama 3 uses a relatively standard decoder-only Transformer architecture and makes key improvements over Llama 2. The model uses a 128K token vocabulary, which improves the efficiency of language encoding and significantly improves performance.<\/p>\n<p data-track=\"256\">In the 8B and 70B scale models,<strong>Llama 3 introduces Grouped Query Attention (GQA)<\/strong>, and trained the model on 8,192 labeled sequences, using masks to ensure that self-attention does not cross document boundaries, thereby improving the inference efficiency of the model.<\/p>\n<h1 class=\"pgc-h-arrow-right\" spellcheck=\"false\" data-track=\"258\">Training Data<\/h1>\n<p data-track=\"259\">In order to build excellent language models, it is essential to manage large, high-quality training datasets. Llama 3 is pre-trained with more than 15T tokens, all of which are from public sources.<strong>Seven times the size of the Llama 2 training dataset and contains four times more code<\/strong>.<\/p>\n<p data-track=\"261\">also,<strong>Over 5%&#039;s pre-training dataset consists of high-quality non-English data covering more than 30 languages<\/strong>.<\/p>\n<p data-track=\"263\">To ensure that the model accepts<strong>Highest quality training data<\/strong>,Llama 3 developed a series of data filtering pipelines, including heuristic filters, NSFW filters, semantic deduplication methods, and text quality predictors.<\/p>\n<h1 class=\"pgc-h-arrow-right\" spellcheck=\"false\" data-track=\"265\">Scaling up pre-training<\/h1>\n<p data-track=\"266\">During the development of Llama 3, a lot of effort was put into scaling up pre-training. By developing detailed scaling rules, the team was able to optimize the data combination to ensure optimal use of training compute. <strong>15T tokens training<\/strong>After that, the 8B and 70B parameter models continue to improve in a log-linear manner.<\/p>\n<p data-track=\"268\">In addition, by combining data parallelism, model parallelism, and pipeline parallelism<strong>Three types of parallel training methods<\/strong>, Llama 3 achieves efficient training on two custom 24K GPU clusters.<\/p>\n<p data-track=\"270\">The combined application of these technologies and methods ensures<strong>The training efficiency of Llama 3 is about three times higher than that of Llama 2.<\/strong>, providing users with a better experience and more powerful model performance.<\/p>\n<h1 class=\"pgc-h-arrow-right\" spellcheck=\"false\" data-track=\"272\">Instruction fine-tuning<\/h1>\n<p data-track=\"273\">To fully realize the potential of pre-trained models for conversational use cases, the Llama 3 team used a combination of techniques including supervised fine-tuning (SFT), rejection sampling, proximal policy optimization (PPO), and direct policy optimization (DPO).<\/p>\n<p data-track=\"274\"><strong>The quality of the hints used in SFT and the preference ranking used in PPO and DPO have a huge impact on the performance of the alignment model.<\/strong>By carefully curating the data and performing quality assurance on the annotations, Llama 3 achieves significant improvements on both reasoning and encoding tasks.<\/p>\n<h1 class=\"pgc-h-arrow-right\" spellcheck=\"false\" data-track=\"277\">Computing power consumption and carbon emissions<\/h1>\n<p data-track=\"278\">Llama3 pre-training uses H100-80GB (thermal design power consumption TDP is 700W),<strong>Training required 7.7 million GPU hours<\/strong>Total carbon emissions were 2,290 tonnes of carbon dioxide equivalent (tCO2eq), all of which were offset through Meta\u2019s sustainability program.<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8487\" title=\"get-604\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/get-604.jpg\" alt=\"get-604\" width=\"1080\" height=\"445\" \/><\/div>\n<p data-track=\"280\"><strong>2.Llama3 model experience and download<\/strong><\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8488\" title=\"get-605\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/get-605.jpg\" alt=\"get-605\" width=\"1080\" height=\"523\" \/><\/div>\n<p data-track=\"284\">Hugging face experience link:<\/p>\n<p data-track=\"285\">https:\/\/huggingface.co\/chat\/<\/p>\n<p data-track=\"286\">Meta.ai experience link:<\/p>\n<p data-track=\"287\">https:\/\/www.meta.ai\/<\/p>\n<p data-track=\"289\">Model download application:<\/p>\n<p data-track=\"290\">https:\/\/llama.meta.com\/llama-downloads<\/p>\n<p data-track=\"292\">It is recommended that you experience it on Hugging face first. The community\u2019s official website https:\/\/llama.family is also launching links and model downloads for domestic experience.<\/p>\n<p data-track=\"295\"><strong>3. How to call Llama3<\/strong><\/p>\n<p data-track=\"297\">Llama 3 uses several special tags:<\/p>\n<p data-track=\"298\"><strong>&lt;|begin_of_text|&gt;<\/strong>: Equivalent to the BOS tag, marking the beginning of a sentence.<\/p>\n<p data-track=\"299\"><strong>&lt;|eot_id|&gt;<\/strong>: Equivalent to the EOS marker, marking the end of a sentence.<\/p>\n<p data-track=\"300\"><strong>&lt;|start_header_id|&gt;{role}&lt;|end_header_id|&gt;<\/strong>: Identifies the role corresponding to a message, which can be &quot;system&quot;, &quot;user&quot; and &quot;assistant&quot;.<\/p>\n<p data-track=\"302\"><strong>Basic model call<\/strong><\/p>\n<p data-track=\"303\">Llama 3 basic model call is relatively simple, at the start marker<strong>&lt;|begin_of_text|&gt;<\/strong>Just add the user information at the end, and the model will generate subsequent text based on the {{ user_message }} information.<\/p>\n<pre><code>&lt;|begin_of_text|&gt;{{ user_message }}<\/code><\/pre>\n<p data-track=\"306\"><strong>Dialogue model call<\/strong><\/p>\n<p data-track=\"307\">For a single-round conversation, first you need to use part 1 &lt;|begin_of_text|&gt; to mark the beginning of the prompt, then part 2 to mark the role (for example, &quot;user&quot;), part 3 to contain the specific conversation information, and part 4 &lt;|eot_id|&gt; to mark the end of the text. Then part 5 to mark the next role (for example, &quot;assistant&quot;). The model will generate a conversation reply message after the prompt, that is, {{assistant_message}}.<\/p>\n<pre><code>&lt;|begin_of_text|&gt;1&lt;|start_header_id|&gt;user&lt;|end_header_id|&gt;2 {{user_message }}3&lt;|eot_id|&gt;4&lt;|start_header_id|&gt;assistant&lt;|end_header_id|&gt;5<\/code><\/pre>\n<p data-track=\"310\">In addition, you can also add system information to the prompt, for example, add {{ system_prompt }} after the system logo.<\/p>\n<pre><code>&lt;|begin_of_text|&gt;&lt;|start_header_id|&gt;system&lt;|end_header_id|&gt; {{system_prompt }} &lt;|start_header_id|&gt;user&lt;|end_header_id] {{user_message }}&lt;|eot_id|&gt;&lt;|start_header_id|&gt;assistant<\/code><\/pre>\n<p data-track=\"313\">The same is true for multi-round conversations. By representing multiple pieces of user and assistant information, the model can generate multi-round conversations.<\/p>\n<pre><code>&lt;|begin_of_text|&gt;&lt;|start_header_id|&gt;system&lt;|end_headder_id {{system_prompt }}&lt;|eot_id|&gt;&lt;|start_header_id|&gt;user&lt;|end_header_id]&gt; {{user_message_1 }}&lt;|eot_id|&gt;&lt;|start_header_id|&gt; assistant&lt;|end_header_id|&gt; {{ model_answer_1}}&lt;|eot_id|&gt;&lt;|start_header_id|&gt;user {{user_message_2}}&lt;|eot_id|&gt;&lt;|start_header_id|&gt;assisttant&lt;|end_header_id|&gt;<\/code><\/pre>\n<p data-track=\"316\"><strong>4. Next steps<\/strong><\/p>\n<p data-track=\"318\"><strong>The Llama 3 400B model is in training..<\/strong><\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8489\" title=\"get-606\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/get-606.jpg\" alt=\"get-606\" width=\"1080\" height=\"608\" \/><\/div>\n<p>&nbsp;<\/p>","protected":false},"excerpt":{"rendered":"<p>Llama 3 is released today, providing pre-trained and instruction fine-tuned language models with 8B and 70B parameters, and these models will soon be available on mainstream platforms such as AWS, Google Cloud, Microsoft Azure, etc., with strong support from hardware platforms such as AMD and Intel. Llama3 link direct: https:\/\/llama.meta.com\/llama3 Llama Chinese community will take you from the following aspects of in-depth understanding and use of Llama3: 1, Llama3 introduction, performance and technical analysis 2, how to experience and download the Llama3 model 3, Llama3 invocation method 4, and industry experts Talk about Llama3 together 1.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[149,144],"tags":[184,1671,216],"collection":[],"class_list":["post-8481","post","type-post","status-publish","format-standard","hentry","category-jiaocheng","category-baike","tag-llama","tag-llama-3","tag-216"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/8481","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=8481"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/8481\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=8481"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=8481"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=8481"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=8481"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}