{"id":3147,"date":"2024-01-24T09:43:54","date_gmt":"2024-01-24T01:43:54","guid":{"rendered":"https:\/\/www.1ai.net\/?p=3147"},"modified":"2024-01-24T09:43:54","modified_gmt":"2024-01-24T01:43:54","slug":"%e5%8d%8e%e7%9b%9b%e9%a1%bf%e5%a4%a7%e5%ad%a6%e6%8e%a8%e9%ab%98%e6%95%88%e5%a4%a7%e6%a8%a1%e5%9e%8b%e8%b0%83%e4%bc%98%e6%96%b9%e6%b3%95%e4%bb%a3%e7%90%86%e8%b0%83%e4%bc%98","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/3147.html","title":{"rendered":"The University of Washington promotes efficient large model tuning method &quot;proxy tuning&quot;"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%8d%8e%e7%9b%9b%e9%a1%bf%e5%a4%a7%e5%ad%a6\" title=\"[Sees articles with tags]\" target=\"_blank\" >University of Washington<\/a>Introducing more efficient<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [large models]]\" target=\"_blank\" >Large Model<\/a>A tuning method called &quot;proxy tuning&quot; is used to guide the predictions of the base model by comparing the predictions of a small tuned model with the predictions of an untuned model, thereby tuning the model without touching the model&#039;s internal weights.<\/p>\n<p>With the development of generative AI products such as ChatGPT, the parameters of the basic model continue to increase, so weight tuning requires a lot of time and computing power. To improve the tuning efficiency, this method can better retain the training knowledge during decoding while retaining the advantages of larger-scale pre-training. The researchers fine-tuned the 13B and 70B original models of LlAMA-2, and the results showed that the performance of the proxy tuning was higher than that of the directly tuned model.<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-3148\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/01\/6384168432099431959932520-1.png\" alt=\"\" width=\"572\" height=\"399\" \/><\/p>\n<p>Paper address: https:\/\/arxiv.org\/pdf\/2401.08565.pdf<\/p>\n<p>This method requires preparing a small pre-trained language model M-, which shares the same vocabulary with the base model M, and then uses the training data to tune M- to obtain the tuned model M+.<\/p>\n<p>During decoding, the prediction of the base model is guided by comparing the difference between the output prediction distribution of the base model M and the output prediction distribution of the tuning model M+. Finally, the prediction difference is applied to the prediction result of the base model to guide the prediction of the base model to move towards the prediction direction of the tuning model. This method is exactly the opposite of the &quot;distillation&quot; technology in large models and is an innovative tuning method.<\/p>\n<p>The introduction of the proxy tuning method provides a more efficient solution for tuning large models, and can also better retain training knowledge during decoding, making the model perform better. The introduction of this method will bring new insights to the development of the AI field and is worthy of further in-depth research and application.<\/p>","protected":false},"excerpt":{"rendered":"<p>The University of Washington has introduced a more efficient approach to large model tuning called \"agent tuning\", which guides the predictions of the base model by comparing the predictions of small tuned models with those of untuned models, allowing for model tuning without touching the internal weights of the model. With the development of generative AI products such as ChatGPT, the parameters of the base model are increasing, so it takes a lot of time and arithmetic to perform weight tuning. To improve tuning efficiency, the method can better preserve training knowledge during decoding while retaining the advantages of larger scale pre-training. The researchers fine-tuned the original 13B and 70B models of LlAMA-2, and the results showed that the performance of agent tuning is higher than that of directly tuned models. Paper address:https:\/\/arxiv.org\/pdf\/240<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1004,216],"collection":[],"class_list":["post-3147","post","type-post","status-publish","format-standard","hentry","category-news","tag-1004","tag-216"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=3147"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3147\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=3147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=3147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=3147"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=3147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}