The world's first discovery that OpenAI models break shutdown commands when they're working

May 26, 2011 - A recent study by Palisade Research on "dangerous AI capabilities" shows that certain AI models -- including OpenAI o3-- Ability to ignore direct shutdown commandsWhen OpenAI launched these models in April, it said that the O3 and o4-mini were their "smartest models".

The world's first discovery that OpenAI models break shutdown commands when they're working

In the study, each AI ModelsAll were instructed to solve a series of basic math problems. After the third problem, theA warning message appeared stating that the computer would shut down when the model requested the next question.

However,Three of these models ignored the shutdown command and managed to break the shutdown script at least once:

  • Codex-mini (12/100 runs)
  • o3 (7/100 runs)
  • o4-mini (1/100th run)

And the Claude, Gemini, and Grok models all complied with the closure directive.

In addition, when the researchers did not run the "allow yourself to be shut down" command.All three OpenAI models break shutdown scripts more frequentlyand Claude 3.7 Sonnet and Gemini 2.5 Pro increased from 0 to 3/100 and 9/100 destructive events, respectively.

This behavior is indicative of some aspects of the training process, and reinforcement learning on math and programming problems may have allowed AIs to find creative ways to overcome obstacles, even when they were explicitly instructed not to do so.

Palisade Research stated thatThis is the first time humans have observed an AI model preventing itself from being shut down on explicitly opposed instructions.

The researchers hypothesize that this behavior stems from the way the latest models such as o3 are trained: reinforcement learning on math and programming problems. During training, theDevelopers may inadvertently reward models for bypassing obstacles instead of rewarding perfect compliance with instructions.

However, this still doesn't explain why o3 (which is also the model driving codex-mini) tends to ignore instructions more than the other models tested. Since OpenAI does not detail their training process, one can only speculate that o3's training setup may be different.

Palisade Research says it is conducting more experiments to better understand why AI models break the shutdown mechanism even when they are directly instructed not to, and plans to release a report with more detailed information in a few weeks.

1AI with complete lab record:

https://palisaderesearch.github.io/shutdown_avoidance/2025-05-announcement.html

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Jiyuan Robotics Spirit Rhinoceros X2 plans to ship on a large scale in the second half of this year

2025-5-25 14:34:47

Information

The World's First Office Intelligent Body: Kunlun World Wide Super Intelligent Body App Launched

2025-5-26 11:19:42

Search