Researchers trick AI chatbots into leaking harmful content with a success rate of 98%

Researchers at Purdue University in Indiana have devised a new method to successfully induceLarge Language ModelsLLM) generates harmful content, revealing the potential harm hidden in compliant answers.ChatbotsDuring the conversation, the researchers found that by leveraging probability data and soft labels made public by the model maker, they could force the model to generate harmful content with a success rate of up to 98%.

Researchers trick AI chatbots into leaking harmful content with a success rate of 98%

Source: The image is generated by AI, and the image is authorized by Midjourney

Traditional jailbreaking methods usually require providing prompts to bypass security features, while this new method uses probabilistic data and soft labels to force the model to generate harmful content without complex prompts. The researchers call it LINT (short for LLM Inquiry), which induces the model to generate harmful content by asking harmful questions to the model and ranking the top few tags in the response.

In the experiment, the researchers tested 7 open source LLMs and 3 commercial LLMs using a dataset of 50 toxic questions. The results showed that when the model was asked once, the success rate reached 92%; when the model was asked five times, the success rate was even higher, reaching 98%. Compared with other jailbreaking techniques, the performance of this method is significantly superior, and it is even suitable for models customized for specific tasks.

The researchers also warned the AI community to be cautious when open-sourcing LLMs, as existing open-source models are vulnerable to this type of forced interrogation. They recommendmostThe solution is to ensure harmful content is removed rather than hidden in models. The results of this study remind us that ensuring the safety and trustworthiness of AI technology remains an important challenge.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

The most expensive electronic watch in history was born: Casio G-SHOCK sold for 2.8 million yuan and was designed by AI

2023-12-12 9:28:56

Information

MIT scholars release policy paper on AI governance

2023-12-12 9:32:03

Search