{"id":32095,"date":"2025-04-02T11:44:25","date_gmt":"2025-04-02T03:44:25","guid":{"rendered":"https:\/\/www.1ai.net\/?p=32095"},"modified":"2025-04-02T11:44:25","modified_gmt":"2025-04-02T03:44:25","slug":"deepseek-%e6%96%b0%e4%b8%93%e5%88%a9%e5%85%ac%e5%b8%83%ef%bc%9a%e5%87%8f%e5%b0%91%e6%95%b0%e6%8d%ae%e9%87%87%e9%9b%86%e6%97%b6%e7%bd%91%e7%bb%9c%e8%b5%84%e6%ba%90%e6%b6%88%e8%80%97","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/32095.html","title":{"rendered":"New DeepSeek Patent Announced: Reducing Network Resource Consumption During Data Collection"},"content":{"rendered":"<p>April 2, 1AI from the State Intellectual Property Office of China<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e4%b8%93%e5%88%a9\" title=\"[Sees articles with [patent] labels]\" target=\"_blank\" >patent<\/a>The Publication Announcement Network has learned that<a href=\"https:\/\/www.1ai.net\/en\/tag\/deepseek\" title=\"[View articles tagged with [DeepSeek]]\" target=\"_blank\" >DeepSeek<\/a> On April 1, a patent for \"a method and system for broad data collection\" filed by affiliated company Hangzhou Depth Seeking Artificial Intelligence Basic Technology Research Co.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-32096\" title=\"e9289fedj00su2np7006bd000k100t5p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/04\/e9289fedj00su2np7006bd000k100t5p.jpg\" alt=\"e9289fedj00su2np7006bd000k100t5p\" width=\"721\" height=\"1049\" \/><\/p>\n<p>The patent abstract shows:<\/p>\n<blockquote>\n<ul>\n<li>The beneficial effects of the invention are: discovering as many web links as possible and reducing the traffic impact on the website; analyzing the content that has been downloaded, inferring the quality of the links that have not been downloaded, and reducing the low-quality web page downloads and repetitive downloads by means of allocating the quota to the downloads on the basis of merit to improve the quality of the data and the efficiency of the downloads, and reducing the consumption of the network resources in the process of the data collection; adopting a separate information backfeeding A separate information recharge queue is used to ensure the atomicity and stability of the modification operation of the web page meta-information database.<\/li>\n<\/ul>\n<\/blockquote>\n<p>BACKGROUND TECHNOLOGY CLAIM: In recent years, with the progress of artificial intelligence technology, the field of NLP natural language has made great progress. Many Large Language Models (LLMs) have been trained and applied in the field of Natural Language Processing (NLP) to study various theories and methods for realizing effective communication between humans and computers in natural language.<\/p>\n<p>The training of a large language model requires the construction of a<strong>High-quality, diverse datasets for large language models<\/strong>This requires web page data to be captured and processed to obtain a large amount of high-quality textual information as input to the model, which is used for training the large language model.<\/p>\n<p>However, there are many problems with existing data collection techniques, such as<strong>Unable to get full links when harvesting for complex sites; easy to overdownload<\/strong>The download page is a good example of how a download page can be used to crash an opponent's Web site.<strong>No content quality analysis and inferences<\/strong>This will result in duplicate downloads or low quality downloads, affecting the efficiency of data collection.<\/p>\n<p>Therefore, in the process of acquiring data from a large number of web pages, it becomes crucial to collect Internet data quickly, accurately, safely and efficiently.<\/p>","protected":false},"excerpt":{"rendered":"<p>April 2, 1AI learned from the State Intellectual Property Office of China Patent Publication Announcement Network that the patent of \"a method of wide data collection and its system\" filed by Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co. According to the abstract of the patent, the beneficial effects of the invention are as follows: discovering as many web links as possible and reducing the traffic impact on the website; analyzing the downloaded content, inferring the quality of the links that have not been downloaded, and reducing the download of low-quality web pages and repeated downloads by allocating the quota of downloads on the basis of the merit of the downloads, so as to improve the quality of the data and the efficiency of downloads, and to reduce the consumption of network resources in the process of data collection; adopting a separate information refilling queue; and using a system to collect data from a wide range of sources. A separate information recharge queue is used to ensure the atomicity and stability of the modification operation of the web page meta-information database.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[3606,1941],"collection":[],"class_list":["post-32095","post","type-post","status-publish","format-standard","hentry","category-news","tag-deepseek","tag-1941"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/32095","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=32095"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/32095\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=32095"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=32095"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=32095"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=32095"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}