{"id":1870,"date":"2023-12-12T09:29:55","date_gmt":"2023-12-12T01:29:55","guid":{"rendered":"https:\/\/www.1ai.net\/?p=1870"},"modified":"2023-12-12T09:29:55","modified_gmt":"2023-12-12T01:29:55","slug":"%e7%a0%94%e7%a9%b6%e4%ba%ba%e5%91%98%e8%af%b1%e5%af%bcai%e8%81%8a%e5%a4%a9%e6%9c%ba%e5%99%a8%e4%ba%ba%e6%b3%84%e9%9c%b2%e6%9c%89%e5%ae%b3%e5%86%85%e5%ae%b9%ef%bc%8c%e6%88%90%e5%8a%9f%e7%8e%87%e9%ab%98","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/1870.html","title":{"rendered":"Researchers trick AI chatbots into leaking harmful content with a success rate of 98%"},"content":{"rendered":"<p>Researchers at Purdue University in Indiana have devised a new method to successfully induce<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%a7%e5%9e%8b%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [large-scale language model]]\" target=\"_blank\" >Large Language Models<\/a>\uff08<a href=\"https:\/\/www.1ai.net\/en\/tag\/llm\" title=\"[SEE ARTICLES WITH [LLM] LABELS]\" target=\"_blank\" >LLM<\/a>) generates harmful content, revealing the potential harm hidden in compliant answers.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%81%8a%e5%a4%a9%e6%9c%ba%e5%99%a8%e4%ba%ba\" title=\"[View articles tagged with [chatbot]]\" target=\"_blank\" >Chatbots<\/a>During the conversation, the researchers found that by leveraging probability data and soft labels made public by the model maker, they could force the model to generate harmful content with a success rate of up to 98%.<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1871\" title=\"202310250959077312_3-1\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2023\/12\/202310250959077312_3-1.jpg\" alt=\"202310250959077312_3-1\" width=\"1000\" height=\"666\" \/><\/p>\n<p>Source: The image is generated by AI, and the image is authorized by Midjourney<\/p>\n<p>Traditional jailbreaking methods usually require providing prompts to bypass security features, while this new method uses probabilistic data and soft labels to force the model to generate harmful content without complex prompts. The researchers call it LINT (short for LLM Inquiry), which induces the model to generate harmful content by asking harmful questions to the model and ranking the top few tags in the response.<\/p>\n<p>In the experiment, the researchers tested 7 open source LLMs and 3 commercial LLMs using a dataset of 50 toxic questions. The results showed that when the model was asked once, the success rate reached 92%; when the model was asked five times, the success rate was even higher, reaching 98%. Compared with other jailbreaking techniques, the performance of this method is significantly superior, and it is even suitable for models customized for specific tasks.<\/p>\n<p>The researchers also warned the AI community to be cautious when open-sourcing LLMs, as existing open-source models are vulnerable to this type of forced interrogation. They recommend<span class=\"spamTxt\">most<\/span>The solution is to ensure harmful content is removed rather than hidden in models. The results of this study remind us that ensuring the safety and trustworthiness of AI technology remains an important challenge.<\/p>","protected":false},"excerpt":{"rendered":"<p>Researchers at Purdue University in Indiana have devised a new method that successfully induces large language models (LLMs) to generate harmful content, revealing potential hazards hidden in compliant responses. During a conversation with a chatbot, the researchers found that by utilizing probabilistic data and soft tags made public by the modeler, they could force the model to generate harmful content with a success rate of 98%. Image Source Remarks:Image Generated by AI, Image License Provider Midjourney While traditional jailbreaking methods usually require providing hints to bypass security features, this new method uses probabilistic data and soft tags to force the model to generate harmful content without the need for complex prompts. The researchers call it LINT (short for LLM Inquiry), and it does this by asking the model harmful questions and ranking the responses in the<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[473,371,275],"collection":[],"class_list":["post-1870","post","type-post","status-publish","format-standard","hentry","category-news","tag-llm","tag-371","tag-275"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/1870","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=1870"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/1870\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=1870"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=1870"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=1870"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=1870"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}