{"id":12888,"date":"2024-06-12T09:11:48","date_gmt":"2024-06-12T01:11:48","guid":{"rendered":"https:\/\/www.1ai.net\/?p=12888"},"modified":"2024-06-12T09:12:19","modified_gmt":"2024-06-12T01:12:19","slug":"%e4%bf%84%e7%bd%97%e6%96%af%e7%a7%91%e6%8a%80%e5%b7%a8%e5%a4%b4-yandex-%e5%ae%a3%e5%b8%83%e5%bc%80%e6%ba%90yafsdp%e5%a4%a7%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b%e8%ae%ad%e7%bb%83","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/12888.html","title":{"rendered":"Russian tech giant Yandex announces open source &quot;YaFSDP&quot; large language model training tool: greatly improves GPU utilization, and can achieve 26% acceleration for Llama 3"},"content":{"rendered":"<p data-vmark=\"1bba\">Russian tech giants <a href=\"https:\/\/www.1ai.net\/en\/tag\/yandex\" title=\"_Other Organiser\" target=\"_blank\" >Yandex<\/a> Launched a<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>Large language model training tool\u2014\u2014<a href=\"https:\/\/github.com\/yandex\/YaFSDP?tab=readme-ov-file\" target=\"_blank\" rel=\"noopener\">YaFSDP<\/a>, claiming to increase speed by up to 26% compared to existing tools.<\/p>\n<p data-vmark=\"2aec\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-12890\" title=\"c988753f-0499-4b8e-9aab-f4a68ff7f4cc\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/06\/c988753f-0499-4b8e-9aab-f4a68ff7f4cc.png\" alt=\"c988753f-0499-4b8e-9aab-f4a68ff7f4cc\" width=\"656\" height=\"180\" \/><\/p>\n<p data-vmark=\"0460\">According to the introduction,<a href=\"https:\/\/www.1ai.net\/en\/tag\/yafsdp\" title=\"_Other Organiser\" target=\"_blank\" >YaFSDP<\/a> It outperforms the traditional FSDP method in terms of training speed, especially for large models. In terms of pre-training LLM, YaFSDP is 20% faster and performs better under high memory pressure conditions.<\/p>\n<p data-vmark=\"2ce8\">For example, YaFSDP can achieve an efficiency improvement of 21% for Llama 2 with 70 billion parameters, and 26% for Llama 3 with the same level of parameters. IT Home attached official data list:<\/p>\n<table>\n<thead>\n<tr class=\"firstRow\">\n<th align=\"left\">Model<\/th>\n<th align=\"right\">gpu-count<\/th>\n<th align=\"right\">seq-len<\/th>\n<th align=\"right\">num-ckpt-layers<\/th>\n<th align=\"right\">speedup<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td align=\"left\">Llama 2 7B<\/td>\n<td align=\"right\">64<\/td>\n<td align=\"right\">2048<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">9.92%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 7B<\/td>\n<td align=\"right\">64<\/td>\n<td align=\"right\">4096<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">3.43%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 7B<\/td>\n<td align=\"right\">64<\/td>\n<td align=\"right\">8192<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">2.68%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 7B<\/td>\n<td align=\"right\">128<\/td>\n<td align=\"right\">2048<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">9.57%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 7B<\/td>\n<td align=\"right\">128<\/td>\n<td align=\"right\">4096<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">2.42%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 7B<\/td>\n<td align=\"right\">128<\/td>\n<td align=\"right\">8192<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">2.32%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 13B<\/td>\n<td align=\"right\">128<\/td>\n<td align=\"right\">2048<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">12.10%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 13B<\/td>\n<td align=\"right\">128<\/td>\n<td align=\"right\">4096<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">3.49%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 34B<\/td>\n<td align=\"right\">128<\/td>\n<td align=\"right\">2048<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">20.70%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 34B<\/td>\n<td align=\"right\">256<\/td>\n<td align=\"right\">2048<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">21.99%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 34B<\/td>\n<td align=\"right\">256<\/td>\n<td align=\"right\">4096<\/td>\n<td align=\"right\">5<\/td>\n<td align=\"right\">8.35%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 70B<\/td>\n<td align=\"right\">256<\/td>\n<td align=\"right\">2048<\/td>\n<td align=\"right\">10<\/td>\n<td align=\"right\">21.48%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 2 70B<\/td>\n<td align=\"right\">256<\/td>\n<td align=\"right\">4096<\/td>\n<td align=\"right\">50<\/td>\n<td align=\"right\">7.17%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 3 8B<\/td>\n<td align=\"right\">64<\/td>\n<td align=\"right\">2048<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">11.91%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 3 8B<\/td>\n<td align=\"right\">64<\/td>\n<td align=\"right\">4096<\/td>\n<td align=\"right\">0<\/td>\n<td align=\"right\">7.86%<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">Llama 3 70B<\/td>\n<td align=\"right\">256<\/td>\n<td align=\"right\">2048<\/td>\n<td align=\"right\">20<\/td>\n<td align=\"right\">26.60%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p data-vmark=\"d98c\">Yandex says that by optimizing GPU usage, YaFSDP can save developers and companies a lot of money \u2014 potentially hundreds of thousands of dollars per month.<\/p>\n<p data-vmark=\"f518\">Mikhail Khruschev, a senior developer at Yandex and a member of the YaFSDP team, also mentioned that \u201cwe are currently actively trying various model architectures and parameter sizes to expand the versatility of YaFSDP.\u201d<\/p>","protected":false},"excerpt":{"rendered":"<p>Russian tech giant Yandex has launched YaFSDP, an open-source training tool for large language models, claiming a speedup of up to 261 TP3T compared to existing tools. According to the report, YaFSDP outperforms traditional FSDP methods in terms of training speed, especially for large models. In terms of pre-training LLMs, YaFSDP achieves a speedup of 201 TP3T and performs better under high memory pressure conditions. For example, YaFSDP achieves 21% for Llama 2 with 70 billion parameters, and 26% for Llama 3 with the same level of parameters. itHome with official data at a glance: model gpu-count seq<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[3023,3022,219],"collection":[],"class_list":["post-12888","post","type-post","status-publish","format-standard","hentry","category-news","tag-yafsdp","tag-yandex","tag-219"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/12888","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=12888"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/12888\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=12888"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=12888"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=12888"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=12888"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}