Harvard University opens source AI training dataset "Institutional Books 1.0", covering a collection of 983,000 books

Harvard open-sources AI training dataset 'Institutional Books 1.0', covering 983,000 books in its collection

With the support of Microsoft and OpenAIHarvard UniversityThe law school library officiallyOpen SourceIts first AI training openDataset"Institutional Books 1.0". The dataset purportedly contains 983,000 books in the Harvard University collection, covering 245 languages, and contains a total of 242 billion Token, 1AI with project address (https://huggingface.co/datasets/institutional/institutional-books-1.0).

Harvard open-sources AI training dataset 'Institutional Books 1.0', covering 983,000 books in its collection

According to the report, the corresponding data set contains 40% books in English, books published in the 19th and 20th centuries, divided into a total of 20 topics, in addition to the followingThe corresponding dataset also provides complete metadata for each book, including information on "author, year of publication, language, and original source"..

According to the Harvard Law School Library, the researchers will continue to expand the data in the future, and members of the project team are already working with the Boston Public Library to digitize "millions" of historical newspapers to add to the dataset.

In the future, the Harvard Law School Library plans to develop a series of AI tools to improve the efficiency of organizing and opening collections and to promote "responsible data use practices.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Harvard open-sources AI training dataset 'Institutional Books 1.0', covering 983,000 books in its collection

The world's first large pediatric model lands at Beijing Ronghua Hospital, with diagnostic accuracy better than the average of attending physicians

Meta partners with Oakley, expects to announce new smart glasses on June 20

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

The world's first large pediatric model lands at Beijing Ronghua Hospital, with diagnostic accuracy better than the average of attending physicians

Meta partners with Oakley, expects to announce new smart glasses on June 20

The world's largest Oracle "dataset" is open source

Wuhan University and China Mobile's Jiutian AI team jointly open-sourced the audio and video speaker recognition dataset VoxBlink2

Harvard, Google release 1 million public domain books to provide legitimate data for AI training

World's First: Wizards Robotics Announces Open Source AgiBot World Million-Machine Dataset, Dramatically Outperforms Google's Open X-Embodiment

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow