{"id":17010,"date":"2024-08-02T09:48:08","date_gmt":"2024-08-02T01:48:08","guid":{"rendered":"https:\/\/www.1ai.net\/?p=17010"},"modified":"2024-08-02T09:48:08","modified_gmt":"2024-08-02T01:48:08","slug":"ai%e8%a1%8c%e4%b8%9a%e9%9d%a2%e4%b8%b4%e6%95%b0%e6%8d%ae%e5%a2%99%e6%8c%91%e6%88%98%ef%bc%9a2028%e5%b9%b4%e9%ab%98%e8%b4%a8%e9%87%8f%e8%ae%ad%e7%bb%83%e6%95%b0%e6%8d%ae%e6%88%96%e5%b0%86%e8%80%97","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/17010.html","title":{"rendered":"AI industry faces 'data wall' challenge: high-quality training data may run out by 2028"},"content":{"rendered":"<p data-pm-slice=\"0 0 []\">Recently,<a href=\"https:\/\/www.1ai.net\/en\/tag\/ai%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [AI Big Model]]\" target=\"_blank\" >AI Big Model<\/a><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%ae%ad%e7%bb%83%e6%95%b0%e6%8d%ae\" title=\"[Sees articles with [training data] labels]\" target=\"_blank\" >Training Data<\/a>THE SHORTAGE HAS AGAIN BECOME A FOCUS OF MEDIA ATTENTION. THE MOST RECENT ARTICLE IN THE ECONOMIST MAGAZINE, \" AI WILL SOON RUN OUT OF MOST INTERNET DATA \" , HAS GENERATED EXTENSIVE INDUSTRY DISCUSSION. THE ARTICLE NOTED THAT THE AI AREA WAS FACING THE \u201cDATA WALL\u201d CHALLENGE AS THE INTERNET DEPLETED HIGH-QUALITY DATA\u3002<\/p>\n<p data-track=\"111\">The research company Epoch AI predicts that by 2028 all high-quality text data on the Internet will have been exhausted and that machine learning data sets could have depleted all \u201cquality language data\u201d by 2026. This \u201cdata wall\u201d phenomenon has become a major problem for AI and could slow down its training progress\u3002<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-17011\" title=\"get-38\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/get-38.jpg\" alt=\"get-38\" width=\"653\" height=\"435\" \/><\/div>\n<p data-track=\"112\">Source Note: The image is generated by AI, and the image is authorized by Midjourney<\/p>\n<p data-track=\"113\">Industry has long warned about this. In July 2023, Professor Stuart Russell of the University of California at Berkeley warned that AI-driven robots such as ChatGPT might soon \u201cdeplete the text of the universe\u201d. However, there are different views. In May 2024, Professor Li Fei Fei, Stanford University, indicated that there was still a large amount of differentiated data awaiting excavation to build more customized models\u3002<\/p>\n<p data-track=\"114\">THE USE OF SYNTHETIC DATA IS A POTENTIAL SOLUTION TO THE DATA SHORTAGE. HOWEVER, IN A RECENT PAPER PUBLISHED BY NATURE MAGAZINE, IT WAS NOTED THAT THE USE OF AI-GENERATED DATA SETS TO TRAIN FUTURE GENERATIONS OF MACHINE LEARNING MODELS COULD LEAD TO A \u201cMODEL COLLAPSE\u201d AND A MISUNDERSTANDING OF REALITY. THE RESEARCH TEAM RECOMMENDED RETAINING PART OF THE RAW DATA IN THE TRAINING DATA, USING A VARIETY OF DATA SOURCES AND STUDYING BETTER TRAINING ALGORITHMS\u3002<\/p>\n<p data-track=\"115\">HOW TO BREAK THE \u201cDATA WALL\u201d LIMIT AND ENSURE THE CONTINUED AVAILABILITY OF HIGH-QUALITY TRAINING DATA HAS BECOME AN URGENT ISSUE FOR THE AI INDUSTRY. THIS REQUIRES NOT ONLY TECHNOLOGICAL INNOVATION BUT ALSO JOINT EFFORTS BY GOVERNMENTS, ENTERPRISES AND RESEARCH INSTITUTIONS. AS AI TECHNOLOGY BECOMES INCREASINGLY INTEGRATED INTO ALL SECTORS, ADDRESSING DATA SHORTAGES WILL HAVE FAR-REACHING IMPLICATIONS FOR THE CONTINUED HEALTH DEVELOPMENT OF AI\u3002<\/p>","protected":false},"excerpt":{"rendered":"<p>In the recent past, the problem of the shortage of data for training in the AI Large Model has again been the focus of media attention. The most recent article in The Economist magazine, \" AI will soon run out of most Internet data \" , has generated extensive industry discussion. The article noted that the AI area was facing the \u201cdata wall\u201d challenge as the Internet depleted high-quality data. The research company Epoch AI predicts that by 2028 all high-quality text data on the Internet will have been exhausted and that machine learning data sets could have depleted all \u201cquality language data\u201d by 2026. This \u201cdata wall\u201d phenomenon has become a major problem for AI and could slow down its training progress. Source Note: Picture generated by AI, Figure<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[433,488],"collection":[],"class_list":["post-17010","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai","tag-488"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/17010","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=17010"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/17010\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=17010"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=17010"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=17010"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=17010"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}