{"id":37592,"date":"2025-06-17T11:27:27","date_gmt":"2025-06-17T03:27:27","guid":{"rendered":"https:\/\/www.1ai.net\/?p=37592"},"modified":"2025-06-17T11:29:48","modified_gmt":"2025-06-17T03:29:48","slug":"%e5%93%88%e4%bd%9b%e5%a4%a7%e5%ad%a6%e5%bc%80%e6%ba%90-ai-%e8%ae%ad%e7%bb%83%e6%95%b0%e6%8d%ae%e9%9b%86institutional-books-1-0%ef%bc%8c%e6%b6%b5%e7%9b%96%e9%a6%86%e8%97%8f-98-3","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/37592.html","title":{"rendered":"Harvard open-sources AI training dataset 'Institutional Books 1.0', covering 983,000 books in its collection"},"content":{"rendered":"<p>With the support of Microsoft and OpenAI<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%93%88%e4%bd%9b%e5%a4%a7%e5%ad%a6\" title=\"[Sees articles with labels]\" target=\"_blank\" >Harvard University<\/a>The law school library officially<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>Its first AI training open<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e6%95%b0%e6%8d%ae%e9%9b%86\" title=\"[See articles with [data set] labels]\" target=\"_blank\" >Dataset<\/a>\"Institutional Books 1.0\". The dataset purportedly contains 983,000 books in the Harvard University collection, covering 245 languages, and contains a total of 242 billion Token, 1AI with project address (https:\/\/huggingface.co\/datasets\/institutional\/institutional-books-1.0).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-37593\" title=\"1c12ca03j00sxzdk8001cd000v400eyp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/06\/1c12ca03j00sxzdk8001cd000v400eyp.jpg\" alt=\"1c12ca03j00sxzdk8001cd000v400eyp\" width=\"1120\" height=\"538\" \/><\/p>\n<p>According to the report, the corresponding data set contains 40% books in English, books published in the 19th and 20th centuries, divided into a total of 20 topics, in addition to the following<strong>The corresponding dataset also provides complete metadata for each book, including information on \"author, year of publication, language, and original source\".<\/strong>.<\/p>\n<p>According to the Harvard Law School Library, the researchers will continue to expand the data in the future, and members of the project team are already working with the Boston Public Library to digitize \"millions\" of historical newspapers to add to the dataset.<\/p>\n<p>In the future, the Harvard Law School Library plans to develop a series of AI tools to improve the efficiency of organizing and opening collections and to promote \"responsible data use practices.<\/p>","protected":false},"excerpt":{"rendered":"<p>With the support of Microsoft and OpenAI, the Harvard University School of Law Library officially opened its first AI Training Open Data Set, \u201cInstitutional Books 1.0\u201d, last week. The data set is said to contain 983,000 books in the Harvard University's collection, covering 245 languages and containing 24.2 billion Token, 1AI with project addresses (https:\/\/huggingface.co\/datasets\/instigational\/instigational-books-1.0). It is described that the corresponding data set contains 40% in English, and that the books were published mainly in the 19th and 20th centuries<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[5218,219,3355],"collection":[],"class_list":["post-37592","post","type-post","status-publish","format-standard","hentry","category-news","tag-5218","tag-219","tag-3355"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/37592","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=37592"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/37592\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=37592"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=37592"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=37592"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=37592"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}