{"id":26565,"date":"2025-01-11T10:40:13","date_gmt":"2025-01-11T02:40:13","guid":{"rendered":"https:\/\/www.1ai.net\/?p=26565"},"modified":"2025-01-11T10:40:13","modified_gmt":"2025-01-11T02:40:13","slug":"%e4%b8%ad%e6%96%87%e4%ba%92%e8%81%94%e7%bd%91%e8%af%ad%e6%96%99-ai-%e8%b5%84%e6%ba%90%e5%b9%b3%e5%8f%b0%e5%8f%91%e5%b8%83%ef%bc%9a27-%e4%b8%aa%e6%95%b0%e6%8d%ae%e9%9b%86%e3%80%81%e6%80%bb%e9%87%8f-2-7","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/26565.html","title":{"rendered":"Chinese Internet Corpus AI Resource Platform Released: 27 Datasets, Total 2.7T"},"content":{"rendered":"<p>January 11 news, China Cyberspace Security Association on January 9 issued a notice for the community to release the Chinese Internet<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%af%ad%e6%96%99\" title=\"[Sees articles with [language] labels]\" target=\"_blank\" >corpus<\/a>Resource platform that supports a variety of labeling categories such as industry sector, content modality, volume size, etc., which makes it easy for users to download and use.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-26566\" title=\"71a7ec3cj00spwkp5004od000v900epp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/01\/71a7ec3cj00spwkp5004od000v900epp.jpg\" alt=\"71a7ec3cj00spwkp5004od000v900epp\" width=\"1125\" height=\"529\" \/><\/p>\n<p>The Association indicated that under the guidance of the Central Internet Information Office, together with the National Internet Emergency Response Center, on the basis of the release of the Chinese Basic Internet Corpus 1.0 in the previous period, and relying on the corpus construction and sharing mechanism established by the Specialized Committee, it gathered a batch of new high-quality and credible data, and went through a series of rigorous and meticulous data processing and handling measures, such as source screening, content filtering, and data de-emphasis.<strong>Formed and released to the public the Chinese Internet Basic Corpus 2.0, with a size of 120GB and 38 million data items.<\/strong><\/p>\n<p>Note: 27 corpora are currently hosted on the platform.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e6%95%b0%e6%8d%ae%e9%9b%86\" title=\"[See articles with [data set] labels]\" target=\"_blank\" >Dataset<\/a>The total amount of data is about 2.7T, which is divided into three main categories:<\/p>\n<ul>\n<li>First, the Chinese Internet basic corpus built by the China Cyberspace Security Association together with the National Internet Emergency Response Center and others;<\/li>\n<li>The second is the Internet corpus shared by People's Daily, Beijing Zhiyuan Research Institute, and Shanghai Artificial Intelligence Laboratory;<\/li>\n<li>The third is the high-quality Chinese basic corpus samples contributed by the China Institute of Cyberspace Research, the National Version Library of China, the Encyclopedia of China Publishing House, and the Library of the Chinese Academy of Social Sciences.<\/li>\n<\/ul>\n<p>Users can log in to the website of China Association for Cyberspace Security (https:\/\/www.cybersac.cn\/newhome), click on the link \"Chinese Internet Corpus Resource Platform\", and pass the procedures of registration and authentication to download the relevant corpus.<\/p>\n<p>The person in charge of the Special Committee on Artificial Intelligence Security Governance of the Internet Security Association said that data is a key resource for the development of artificial intelligence, and the Chinese Internet Basic Corpus 2.0 is another important achievement of the collaborative efforts of all sectors to build a high-quality Chinese corpus, and that the Special Committee will continue to strengthen the construction of the Chinese Internet Basic Corpus to provide strong support and guarantee for the technological innovation and industrial development of artificial intelligence.<\/p>","protected":false},"excerpt":{"rendered":"<p>ON 11 JANUARY, THE CHINA CYBERSPACE SAFETY ASSOCIATION ISSUED AN ANNOUNCEMENT ON 9 JANUARY, IN WHICH IT PUBLISHED A CHINESE-LANGUAGE INTERNET LANGUAGE RESOURCE PLATFORM FOR SOCIETY, WHICH SUPPORTS A VARIETY OF LABEL CATEGORIES, SUCH AS INDUSTRY AREAS, CONTENT MOSAICS, VOLUME SIZES, ETC., TO FACILITATE DOWNLOADING AND USE BY USERS. THE ASSOCIATION INDICATED THAT, UNDER THE GUIDANCE OF THE CENTRAL NETWORKING OFFICE, IT WOULD WORK WITH THE NATIONAL INTERNET EMERGENCY RESPONSE CENTRE TO CREATE A COMMON SHARING MECHANISM FOR THE CHINESE-LANGUAGE INTERNET, BASED ON THE PRE-PUBLICATION OF THE CHINESE-LANGUAGE BASIC LANGUAGE MATERIAL 1.0, BASED ON THE LANGUAGE SET UP BY THE HIGH-LEVEL COMMITTEE, WHICH WOULD BRING TOGETHER A NEW SET OF HIGH-QUALITY AND CREDIBLE DATA, THROUGH A SERIES OF RIGOROUS DATA PROCESSING MEASURES SUCH AS FILTERING, FILTERING OF CONTENT, RE-ENGINEERING OF DATA, ETC., AND THE CREATION AND DISSEMINATION TO SOCIETY OF THE CHINESE-LANGUAGE INTERNET BASIC LANGUAGE 2.0, SCALE 120GB, DATA 38 MILLION. NOTE: THE TOTAL NUMBER OF CURRENT PLATFORMS IS 27<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[411,3355,5506],"collection":[],"class_list":["post-26565","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai","tag-3355","tag-5506"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/26565","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=26565"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/26565\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=26565"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=26565"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=26565"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=26565"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}