{"id":25097,"date":"2024-12-13T17:19:25","date_gmt":"2024-12-13T09:19:25","guid":{"rendered":"https:\/\/www.1ai.net\/?p=25097"},"modified":"2024-12-13T17:19:25","modified_gmt":"2024-12-13T09:19:25","slug":"%e5%93%88%e4%bd%9b%e5%a4%a7%e5%ad%a6%e3%80%81%e8%b0%b7%e6%ad%8c%e5%8f%91%e5%b8%83-100-%e4%b8%87%e6%9c%ac%e5%85%ac%e5%85%b1%e9%a2%86%e5%9f%9f%e4%b9%a6%e7%b1%8d%ef%bc%8c%e4%b8%ba-ai%e8%ae%ad%e7%bb%83","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/25097.html","title":{"rendered":"Harvard, Google release 1 million public domain books to provide legitimate data for AI training"},"content":{"rendered":"<p>December 13, 2012 - TechCrunch reported on December 12 that<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%93%88%e4%bd%9b%e5%a4%a7%e5%ad%a6\" title=\"[Sees articles with labels]\" target=\"_blank\" >Harvard University<\/a>and<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%b0%b7%e6%ad%8c\" title=\"[View articles tagged with [Google]]\" target=\"_blank\" >Google<\/a>announced the joint release of\u00a0<strong>1 million books in the public domain<\/strong>Training as AI<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e6%95%b0%e6%8d%ae%e9%9b%86\" title=\"[See articles with [data set] labels]\" target=\"_blank\" >Dataset<\/a>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-25098\" title=\"a253d858j00sofdv900b3d000bz00i2p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/a253d858j00sofdv900b3d000bz00i2p.jpg\" alt=\"a253d858j00sofdv900b3d000bz00i2p\" width=\"431\" height=\"650\" \/><br \/>\nImage source: Pexels<\/p>\n<p>The data needed for AI training is costly, but is better suited for well-funded tech companies. As a result, Harvard plans to release a dataset of about 1 million public domain books that<strong>Covering a wide range of genres, languages and authors<\/strong>This includes classic authors such as Dickens, Dante and Shakespeare who are no longer protected by copyright, as the copyrights of these works have expired over time.<\/p>\n<p>While this new dataset is not yet public, and it is not clear exactly how and when it will be released, it comes from Google's long-standing program, Google Books. As such, Google will be participating in the broader release of this \"valuable asset.\"<\/p>\n<p>According to 1AI, back in March of this year, Harvard University revealed its Institutional Data Initiative (IDI) and said that the program was designed to<strong>Providing AI with \"Trusted Access to Legitimate Data\"<\/strong>. It was not until after the official launch that the program confirmed<strong>Funded by Microsoft and OpenAI<\/strong>.<\/p>\n<p>Greg Leppert, IDI's executive director, says the goal of the dataset is \"<strong>Leveling the playing field<\/strong>\", by providing the following information to the organizations that include<strong>Research Institutes and AI Startups<\/strong>A variety of organizations, including the University of California at Berkeley, are opening up this huge dataset to help them train large-scale language models.<\/p>","protected":false},"excerpt":{"rendered":"<p>December 13, 2011 - Harvard University and Google announced the joint release of 1 million public domain books as an AI training dataset, TechCrunch reported on December 12th. Image source Pexels The data required for AI training is costly, but more suitable for well-funded tech companies. As a result, Harvard plans to release a dataset of about 1 million public domain books covering a wide range of genres, languages, and authors, including classic authors such as Dickens, Dante, and Shakespeare that are no longer under copyright, as the copyrights on these works have expired over time. While this new dataset is not yet publicly available, and it's not clear exactly how and when it will be released, it comes from Google's long-standing project, Google Books (Goo<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[5218,3355,281],"collection":[],"class_list":["post-25097","post","type-post","status-publish","format-standard","hentry","category-news","tag-5218","tag-3355","tag-281"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/25097","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=25097"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/25097\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=25097"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=25097"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=25097"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=25097"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}