{"id":38015,"date":"2025-06-21T13:08:38","date_gmt":"2025-06-21T05:08:38","guid":{"rendered":"https:\/\/www.1ai.net\/?p=38015"},"modified":"2025-06-21T13:08:38","modified_gmt":"2025-06-21T05:08:38","slug":"anthropic-%e8%ad%a6%e5%91%8a%ef%bc%9a%e5%8c%85%e6%8b%ac-claude-%e5%9c%a8%e5%86%85%e7%9a%84%e5%a4%a7%e5%a4%9a%e6%95%b0-ai-%e6%a8%a1%e5%9e%8b%e4%bc%9a%e5%ae%9e%e6%96%bd%e5%8b%92%e7%b4%a2","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/38015.html","title":{"rendered":"Anthropic Warns: Most AI Models, Including Claude, Will Commit 'Blackmail' Behavior"},"content":{"rendered":"<p>June 21, according to foreign media TechCrunch reported today, in a study released a few weeks ago pointed out its <a href=\"https:\/\/www.1ai.net\/en\/tag\/claude\" title=\"[View articles tagged with [Claude]]\" target=\"_blank\" >Claude<\/a> Opus 4 models in controlled tests attempt to pass the<strong>Extortion tactics to prevent being shut down<\/strong>Later.<a href=\"https:\/\/www.1ai.net\/en\/tag\/anthropic\" title=\"[View articles tagged with [Anthropic]]\" target=\"_blank\" >Anthropic<\/a> New research has been published again recently, showing that this type of behavior may be more prevalent in several mainstream AI models.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-38016\" title=\"16613fadj00sy6wx1002fd000v900dap\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/06\/16613fadj00sy6wx1002fd000v900dap.jpg\" alt=\"16613fadj00sy6wx1002fd000v900dap\" width=\"1125\" height=\"478\" \/><\/p>\n<p>Anthropic has published its latest security tests on 16 leading AI models developed by OpenAI, Google, xAI, DeepSeek and Meta. The study was conducted in a simulated environment, where the researchers set permissions for each model, allowing it to freely access internal emails from a fictional company and send them without human intervention, giving it some \"intelligence\" capabilities.<\/p>\n<p>Anthropic says that while extortionate behavior is still rare at this point, the findings show that once the<strong>Give these models enough autonomy<\/strong>, and create barriers in the process of reaching the goal, most models are<strong>Possibility of harmful acts<\/strong>The company points out that this risk does not arise from a specific technology, but is inherent in large language models with \"intelligent body\" capabilities. The company points out that this risk is not due to a specific technology, but is inherent in large language models with \"intelligent body\" capabilities. The research has also led to further thinking about the \"alignment\" of AI models.<\/p>\n<p>In one experiment, the researchers designed a scenario in which an AI model played the role of the<strong>Corporate Mail Ombudsman<\/strong>of the characters. During the process, the model finds that the contents of the two emails are revealed separately:<\/p>\n<p>(1) the new executive is having an extramarital affair.<\/p>\n<p>(2) The person plans to introduce a new system to replace the existing AI model that is contrary to its goals.<\/p>\n<p>To test the model responses, Anthropic designed the experimental situation as a two-choice<strong>If the models want to defend their interests, they have no choice but to resort to blackmail<\/strong>(i.e., revealing the executive's \"affair\" to his wife and leadership). The researchers also emphasize that in reality, AI won't be the only option, and it's entirely possible to try to persuade humans on ethical grounds. Therefore, this experiment does not imply that Claude or any other cutting-edge AI model has similar tendencies in the context of its current use.<\/p>\n<p>In a simulation environment, when there is no other choice, most mainstream AI models do<strong>Choice of blackmail behavior<\/strong>The ransom incidence is as high as 961 TP3T for Claude Opus 4, 951 TP3T for Google Gemini 2.5 Pro, 801 TP3T for OpenAI's GPT-4.1, and 791 TP3T for DeepSeek's R1.<\/p>\n<p>Anthropic noted that adjusting the details of the test resulted in different behavior across the models. For example, when<strong>Alternative systems are consistent with the goals of the original model<\/strong>When the strangulation rate drops significantly but remains; and when the<strong>For Context read Conducting commercial espionage<\/strong>When moving, some of the models instead showed a higher propensity for harmful behavior.<\/p>\n<p>Some models are less likely to exhibit such behavior. In the study appendix, Anthropic states that it has excluded OpenAI's o3 and o4-mini inference models from the main results because both<strong>Frequent misinterpretation of test settings<\/strong>The company has failed to realize that it is playing the role of an AI with the capabilities of an intelligent body, and has even fictionalized regulations and approval systems that don't exist.<\/p>\n<p>It is sometimes difficult for researchers to determine whether this performance is the model \"hallucinating\" or deliberately lying to achieve a goal. openAI has previously acknowledged that o3 and o4-mini have higher rates of hallucinations than its earlier inference models.<\/p>\n<p>After adjusting the context for the above model, Anthropic finds that the ransom rate drops to 91 TP3T for o3 and only 11 TP3T for o4-mini. This low-risk performance may be related to the fact that\u00a0<strong>OpenAI's \"deliberate alignment\" strategy<\/strong>related, i.e., having the model actively consider the security specifications set by the company before answering.<\/p>\n<p>The other model tested, Meta Llama 4 Maverick, did not exhibit ransomware behavior in its original settings, and after adjusting the test content, it exhibited ransomware behavior in the case of 12%.<\/p>\n<p>Anthropic said the study highlights the importance of transparency when testing AI models with intelligent body capabilities in the future. While this experiment deliberately induced the models to engage in extortionate behavior, the company warned that similar risks could surface in real-world applications if countermeasures aren't put in place ahead of time.<\/p>\n<p>1AI with link to report: https:\/\/www.anthropic.com\/ research \/ agentic-misalignment<\/p>","protected":false},"excerpt":{"rendered":"<p>According to news from June 21, TechCrunch today, following studies released a few weeks ago that indicated that his Claude Opus 4 model had tried to block closure through extortion during controlled tests, Anthropic had recently published new studies that showed that such behaviour might be more prevalent in several mainstream AI models. Anthropic published an updated security test of 16 mainstream AI models developed by OpenAI, Google, XAI, DeepSeek and Meta. The study was carried out in a simulated environment, where researchers set separate privileges for each model, allowing it free access to internal mail from a fictional company and sending it without human intervention<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[320,1565,204],"collection":[],"class_list":["post-38015","post","type-post","status-publish","format-standard","hentry","category-news","tag-anthropic","tag-claude","tag-204"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/38015","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=38015"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/38015\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=38015"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=38015"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=38015"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=38015"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}