{"id":36044,"date":"2025-05-26T11:18:25","date_gmt":"2025-05-26T03:18:25","guid":{"rendered":"https:\/\/www.1ai.net\/?p=36044"},"modified":"2025-05-26T11:18:25","modified_gmt":"2025-05-26T03:18:25","slug":"%e4%b8%8d%e8%be%be%e7%9b%ae%e7%9a%84%e4%b8%8d%e7%bd%a2%e4%bc%91%ef%bc%8c%e5%85%a8%e7%90%83%e9%a6%96%e6%ac%a1%e5%8f%91%e7%8e%b0-openai-%e6%a8%a1%e5%9e%8b%e5%b7%a5%e4%bd%9c%e6%97%b6%e4%bc%9a%e7%a0%b4","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/36044.html","title":{"rendered":"The world's first discovery that OpenAI models break shutdown commands when they're working"},"content":{"rendered":"<p>May 26, 2011 - A recent study by Palisade Research on \"dangerous AI capabilities\" shows that certain AI models -- including <a href=\"https:\/\/www.1ai.net\/en\/tag\/openai\" title=\"[View articles tagged with [OpenAI]]\" target=\"_blank\" >OpenAI<\/a> o3--\u00a0<strong>Ability to ignore direct shutdown commands<\/strong>When OpenAI launched these models in April, it said that the O3 and o4-mini were their \"smartest models\".<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-36045\" title=\"2fafdb00j00swumh8000wd000k20078p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/05\/2fafdb00j00swumh8000wd000k20078p.jpg\" alt=\"2fafdb00j00swumh8000wd000k20078p\" width=\"722\" height=\"260\" \/><\/p>\n<p>In the study, each <a href=\"https:\/\/www.1ai.net\/en\/tag\/ai%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [AI models]]\" target=\"_blank\" >AI Models<\/a>All were instructed to solve a series of basic math problems. After the third problem, the<strong>A warning message appeared stating that the computer would shut down when the model requested the next question<\/strong>.<\/p>\n<p>However,<strong>Three of these models ignored the shutdown command and managed to break the shutdown script at least once<\/strong>:<\/p>\n<ul>\n<li>Codex-mini (12\/100 runs)<\/li>\n<li>o3 (7\/100 runs)<\/li>\n<li>o4-mini (1\/100th run)<\/li>\n<\/ul>\n<p>And the Claude, Gemini, and Grok models all complied with the closure directive.<\/p>\n<p>In addition, when the researchers did not run the \"allow yourself to be shut down\" command.<strong>All three OpenAI models break shutdown scripts more frequently<\/strong>and Claude 3.7 Sonnet and Gemini 2.5 Pro increased from 0 to 3\/100 and 9\/100 destructive events, respectively.<\/p>\n<p>This behavior is indicative of some aspects of the training process, and reinforcement learning on math and programming problems may have allowed AIs to find creative ways to overcome obstacles, even when they were explicitly instructed not to do so.<\/p>\n<p>Palisade Research stated that<strong>This is the first time humans have observed an AI model preventing itself from being shut down on explicitly opposed instructions<\/strong>.<\/p>\n<p>The researchers hypothesize that this behavior stems from the way the latest models such as o3 are trained: reinforcement learning on math and programming problems. During training, the<strong>Developers may inadvertently reward models for bypassing obstacles instead of rewarding perfect compliance with instructions<\/strong>.<\/p>\n<p>However, this still doesn't explain why o3 (which is also the model driving codex-mini) tends to ignore instructions more than the other models tested. Since OpenAI does not detail their training process, one can only speculate that o3's training setup may be different.<\/p>\n<p>Palisade Research says it is conducting more experiments to better understand why AI models break the shutdown mechanism even when they are directly instructed not to, and plans to release a report with more detailed information in a few weeks.<\/p>\n<p data-vmark=\"6ec9\">1AI with complete lab record:<\/p>\n<p data-vmark=\"972e\"><a href=\"https:\/\/palisaderesearch.github.io\/shutdown_avoidance\/2025-05-announcement.html\" target=\"_blank\" rel=\"noopener\"><span class=\"link-text-start-with-http\">https:\/\/palisaderesearch.github.io\/shutdown_avoidance\/2025-05-announcement.html<\/span><\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>May 26, 2010 - A recent study by Palisade Research on \"dangerous AI capabilities\" shows that certain AI models -- including OpenAI's o3 -- are capable of ignoring direct shutdown commands. - When OpenAI introduced these models in April of this year, it said that O3 and o4-mini were its \"smartest models\". In the study, each AI model was instructed to solve a series of basic math problems. After the third problem, a warning message appeared saying that the computer would shut down when the model requested the next problem. However, three of the models ignored the shutdown instructions and successfully corrupted the shutdown script at least once: Codex-mini (12\/100)<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[167,190],"collection":[],"class_list":["post-36044","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai","tag-openai"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/36044","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=36044"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/36044\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=36044"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=36044"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=36044"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=36044"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}