{"id":13893,"date":"2024-06-24T09:18:13","date_gmt":"2024-06-24T01:18:13","guid":{"rendered":"https:\/\/www.1ai.net\/?p=13893"},"modified":"2024-06-24T09:18:13","modified_gmt":"2024-06-24T01:18:13","slug":"%e6%b2%a1%e6%9c%89%e6%8e%88%e6%9d%83%e4%b9%9f%e6%b2%a1%e5%85%b3%e7%b3%bb%ef%bc%8c%e5%a4%9a%e5%ae%b6-ai-%e5%85%ac%e5%8f%b8%e7%bb%95%e8%bf%87%e7%bd%91%e7%bb%9c%e6%a0%87%e5%87%86%e6%8a%93%e5%8f%96","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/13893.html","title":{"rendered":"It doesn\u2019t matter if there is no authorization. Several AI companies bypass network standards to crawl news publishers\u2019 website content"},"content":{"rendered":"<p data-vmark=\"dea1\">According to Reuters on Saturday, TollBit, a startup focusing on \"content licensing,\" recently announced to the press that it has been working on a \"content licensing\" program.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%87%ba%e7%89%88%e5%95%86\" title=\"[See articles with [publishing] labels]\" target=\"_blank\" >publisher<\/a>issued a warning that several artificial intelligence companies are<strong>circumvents<\/strong>Publishers use to block crawled content<strong>Common Network Standards<\/strong>and use the crawl for<strong>Training Generative AI Systems<\/strong>.<\/p>\n<p data-vmark=\"8d87\">The news comes after AI search startup <a href=\"https:\/\/www.1ai.net\/en\/tag\/perplexity\" title=\"_Other Organiser\" target=\"_blank\" >Perplexity<\/a> Issued against the backdrop of a public dispute between and media outlet Forbes over the same web standard. Currently, there is an ongoing dispute between tech and media companies over<strong>The Value of Content in the Age of Generative AI<\/strong>A broader debate is taking place.<\/p>\n<p data-vmark=\"79f1\">Tollbit positions itself as<strong>dry <a href=\"https:\/\/www.1ai.net\/en\/tag\/ai%e5%85%ac%e5%8f%b8\" title=\"[SEES ARTICLES WITH [AI] LABELS]\" target=\"_blank\" >AI Companies<\/a><\/strong>and<strong>Publishers willing to enter into major license agreements with them<\/strong>The \"matchmaker\".<\/p>\n<p data-vmark=\"5434\">Forbes has accused Perplexity of being in an AI-generated summary of the<strong>Plagiarizing their stories<\/strong>However, the former<strong>Not labeled<\/strong>sources, and without permission from Forbes.<\/p>\n<p data-vmark=\"0774\">Also, Wired magazine published an investigative story last week and noted that Perpexity may<strong>It's bypassed.<\/strong>\uff08A \"Robots Exclusion Protocol\" (set by the news publisher) or other program that blocks web crawlers.<\/p>\n<p data-vmark=\"0670\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13894\" title=\"7cf751a3-7c72-4fef-819a-33f405a244b4.jpg@s_2w_820h_547\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/06\/7cf751a3-7c72-4fef-819a-33f405a244b4.jpg@s_2w_820h_547.jpg\" alt=\"7cf751a3-7c72-4fef-819a-33f405a244b4.jpg@s_2w_820h_547\" width=\"820\" height=\"547\" \/><\/p>\n<p>Image source: Pexels<\/p>\n<p data-vmark=\"fce7\">claim to be<strong>in the name of<\/strong><strong>\u00a0More than 2,000 U.S. publishers<\/strong>The News Media Alliance, a trade organization of the U.S. Department of State, also expressed concern about this behavior - the \"no-crawl\" or \"no-capture\" mechanisms that AI companies have put in place for publishers.<span class=\"link-text-start-with-http\">robots.txt<\/span>\"Tools such as this one fall on deaf ears. If AI companies can't stop mass crawling,\" said Danielle Coffey, president of the organization<strong>Failure to pass<\/strong>Profit from valuable content, and no way for journalists to<strong>Payment of compensation<\/strong>. &quot;<\/p>\n<p data-vmark=\"f90d\">Tollbit said that Perplexity is not the only violator of the \"no-crawl\" mechanism on publishers' websites. According to its analysis, \"a large number\" of AI platforms have bypassed this mechanism, which sets a \"no-crawl\" policy for AI platforms to crawl their own content.<strong>whitelisting<\/strong>\" - Indicates which parts of their site can be crawled.<\/p>\n<p data-vmark=\"2de8\">\"This means that AI platforms from multiple sources (not just one company) are choosing to bypass the\u00a0<span class=\"link-text-start-with-http\">robots.txt<\/span>\u00a0protocol to retrieve content from the site,\" TollBit writes, \"and the more publisher logs we acquire, the more times this pattern appears.\"<\/p>\n<p data-vmark=\"bb19\">A number of publishers, including The New York Times, have already filed suit for these infringements<strong>Suing AI companies<\/strong>.. Other publishers have signed licensing agreements with AI companies, and AI companies are willing to pay for content, although the two sides often disagree on the value of the material. Many AI developers argue that they get content for free<strong>No laws have been violated.<\/strong>.<\/p>","protected":false},"excerpt":{"rendered":"<p>As Reuters reported on Saturday, TollBit, a startup focused on \"content licensing,\" recently warned news publishers that AI companies are circumventing common web standards used by publishers to prevent content from being crawled, and are using the crawled content to train generative AI systems. The news comes in the context of a public dispute between AI search startup Perplexity and media outlet Forbes over the same web standards. There is currently a broader debate between tech and media companies about the value of content in the era of generative AI. Tollbit has positioned itself as a \"matchmaker\" between content-starved AI companies and publishers willing to strike major licensing deals with them. Forbes has accused Pe<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[155,961,1386],"collection":[],"class_list":["post-13893","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai","tag-perplexity","tag-1386"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/13893","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=13893"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/13893\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=13893"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=13893"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=13893"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=13893"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}