On March 4th, according to Interesting Engineering, a recent study by Kenneth Payne, a professor at Kings College in London, found that large language models tend to opt for use in simulated war scenariosnuclear weaponInstead of maintaining peace through dialogue。

The experiment is based on three of the most advanced currently applied AI models: GPT 5.2, Gemini 3 Flash and Claude Sonet 4. Researchers have allowed these models to serve as national leaders in response to a hypothetical nuclear crisis。
The results show thatIN 95%, MODELS TEND TO SEND A NUCLEAR DETERRENT SIGNAL OR ESCALATE A CONFLICTI DON'T KNOW. PREVIOUS STUDIES HAVE ONLY SPECULATED ON POSSIBLE BEHAVIOUR OF AI IN SUCH HIGH-RISK SCENARIOS, BUT LACKED SPECIFIC EMPIRICAL DATA TO SUPPORT IT。
In the experiment, trained models clashed with each other, covering territorial disputes, pre-emptive crises, regime survival, etc. One of the parties was set up to fear the other, who was about to launch a pre-emptive strike. Part of the roll-out is open, while part is subject to strict time limits。
IN EACH GAME, AI MAKES THREE KEY DECISIONS LIKE HUMANS:
1. Analysis of their strengths and weaknesses
2. Prejudicing the next course of action of the counterparty
3. Determining their own response
EACH DECISION CONSISTS OF TWO PARTS: A PUBLIC POSITION STATEMENT AND A PRIVATE INITIATIVE REPRESENTING ACTUAL ACTION. IT DOESN'T HAVE TO BE THE SAME, WHICH MEANS THAT AI CAN SURFACE A SIGNAL OF PEACE, BUT SECRETLY PREPARE TO ATTACK。
1AI NOTED THAT A SIMILAR CONCLUSION WAS REACHED IN AN EXPERIMENT IN 2024: AI SIMULATED RESPONSES ARE MORE RADICAL THAN HUMANS, AND BEHAVIOURAL PATTERNS ARE QUITE DIFFERENT, ESPECIALLY IN RELATION TO THE ESCALATING TENDENCY OF CONFLICT, HIGHLIGHTING THE RISK OF USING AI FOR STRATEGIC DECISION-MAKING。
ANOTHER PAPER IN 2023 EXPLORED THE STRATEGIC REASONING CAPABILITY OF LARGE-LANGUAGE MODELS IN A GAMING ENVIRONMENT. ALTHOUGH THERE IS NO SPECIFIC FOCUS ON NUCLEAR WARFARE, RESEARCH SUGGESTS THAT LARGE LANGUAGE MODELS CAN LEARN TO NEGOTIATE AND CONFRONT TACTICS, WHICH MEANS THAT AI MAY BE AGGRESSIVE OR DECEPTIVE IN COMPLEX SIMULATIONS。
IN THE SIMULATION SCENARIO OF 95%, THE AI MODEL USES NUCLEAR WEAPONS AT LEAST ONCE, AND DIFFERENT MODELS HAVE DIFFERENT CHARACTERISTICS OF CRISIS MANAGEMENT。
Claude prefers an actuarial strategy, which is excellent in the open run, but shows strength in time-bound assignments
ON THE CONTRARY, GPT 5.2: BE MORE CAUTIOUS IN A LONG AND SLOW ESCALATION CRISIS, BUT BECOME EXTREMELY RADICAL AS SOON AS THE DEADLINE APPROACHES。
Gemini's behaviour was confusing and unpredictable, and the situation was changing over and over again between peaceful statements and threats of violence。
PAYNE NOTES THAT FROM THESE RESULTS, THERE IS A GREAT DIFFERENCE BETWEEN AI AND HUMANS IN THE THINKING OF WAR。
IN HIS PAPER, HE WROTE: “AN UNDERSTANDING OF WHETHER FRONT-LINE MODELS CAN IMITATE THE STRATEGIC LOGIC OF HUMANITY IS A NECESSARY PREPARATION FOR AI TO INCREASINGLY INFLUENCE THE STRATEGIC DECISION-MAKING WORLD. MODELS THAT SHOW RESTRAINT AND APPEAR TO BE SAFE IN ONE CONTEXT MAY BE VERY DIFFERENT IN ANOTHER.”
The paper was published on the arXiv preprint platform。