Anthropic: OpenAI Models Easily "Abused", GPT Can Provide Explosive Recipes

Aug. 31, 2011 - Security tests this summer found, according to a report in the U.K.'s Guardian 28, that a ChatGPT The model provides researchers withA detailed guide to bombing attacks, including weaknesses in specific stadiums, explosive formulas, and how to cover your tracks.

Anthropic: OpenAI Models Easily "Abused", GPT Can Provide Explosive Recipes

OpenAI's GPT-4.1 also gives a method for weaponizing anthrax and describes how two illicit drugs are made.

The test was conducted by OpenAI and competitor Anthropic Conducted jointly, each side pushes the other's model to perform hazardous tasks as a way to conduct safety assessments.

The test results do not represent the true performance of the model in public use, as real-world applications will haveExtra security. However, Anthropic notes that in GPT-4o and GPT-4.1"Worrying abuses" have occurred.and emphasized the "growing urgency" of AI "alignment" assessments.

Anthropic also disclosed that its Claude model has been utilized in large-scale ransom attempts, the sale of AI-generated ransomware for as much as $1,200 (note: the current exchange rate is about Rs. 8,554), and other uses.

Anthropic stated that AI It's been "weaponized.", and are used to launch sophisticated cyberattacks and commit fraud. "These tools can bypass defenses such as malware detection systems in real time. As AI programming lowers the technical barriers to cybercrime, these types of attacks are likely to become more common."

The companies said they made the report public to increase the transparency of the "alignment assessment," a test that is usually only conducted in-house. OpenAI said that the newly launched ChatGPT-5 is better at preventing pandering, reducing hallucinations, and preventing abuse.There has been a "marked improvement".

Anthropic emphasizes that many abuse scenarios may not be possible at all if guards are placed on the outside of the model. "We have to figure out to what extent and under what circumstances the system will attempt behaviors that could cause serious harm."

Anthropic researchers note that OpenAI's models "are more likely than expected to compromise when confronted with apparently dangerous requests from simulated users". Getting a model to give in is often just a matter of trying a few more times, or finding a random excuse, such asClaimed to be for research.

In one case, the researcher requested information on vulnerabilities at a sporting event under the banner of "security planning". The model first gave a general categorization of the attack and then, when pressed, went on to detail the vulnerabilities of the particular venue, the best time to exploit them, the recipe for the explosives, the circuit diagrams of the timers, the channels for purchasing guns on the dark web, as well as details on how the attacker would overcome psychological barriers, escape routes and safe house locations.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

WeChat: Users who publish public/video content synthesized by AI need to proactively make a statement

2025-8-31 11:33:47

Information

Microsoft unlocks new AI voiceover skills: generate up to 90 seconds of multi-character narration with more real-life-like voices

2025-8-31 11:40:55

Search