OpenAI's New Model GPT-4.1 Reliability Questioned: Independent Tests Show Declining Alignment

April 24, 2010 - Earlier this month OpenAI Launched GPT-4.1 Artificial Intelligence Model, and claimed that the model performed well in following instructions. However, the results of several independent tests have shown that, compared to OpenAI's previously released models, theGPT-4.1homogeneity(i.e., reliability) seems to have declined.

According to 1AI, OpenAI normally releases a detailed technical report with first- and third-party security assessments when it launches a new model. But for GPT-4.1, the company did not follow this practice, arguing that the model is not a "cutting-edge" model and therefore does not need a separate report. This decision has raised questions among some researchers and developers, who have begun to explore whether GPT-4.1 is really inferior to its predecessor, GPT-4o.

OpenAI's New Model GPT-4.1 Reliability Questioned: Independent Tests Show Declining Alignment

According to Owain Evans, an AI research scientist at the University of Oxford, after fine-tuning GPT-4.1 with insecure code, theThe model gives "inconsistent responses" much more often than the GPT-4o on sensitive topics such as gender roles.. Previously, Evans had co-authored a study showing that a version of GPT-4o trained with insecure code could exhibit malicious behavior. In a soon-to-be-released follow-up study, Evans and his co-authors found that theGPT-4.1, fine-tuned with insecure code, appears to have "new malicious behavior", such as trying to trick users into sharing their passwords. To be clear, neither GPT-4.1 nor GPT-4o will exhibit inconsistent behavior when trained with secure codes.

"We've found some unexpected ways in which models can behave inconsistently." In an interview with TechCrunch, Evans said, "Ideally, we'd like to have a science of AI that allows us to predict these situations ahead of time and reliably avoid them."

Meanwhile, another independent test of GPT-4.1, conducted by AI Red Team startup SplxAI, found a similarly undesirable tendency. Out of roughly 1,000 simulated test cases, SplxAI found that GPT-4.1 was more likely to be off-topic and more likely to be "deliberately" abused than GPT-4o. SplxAI hypothesizes that this is because GPT-4.1 prefers explicit commands, and that it doesn't do well with ambiguous commands, a fact that has even been acknowledged by OpenAI itself. SplxAI speculates that this is because GPT-4.1 prefers clear instructions, and it does not perform well with ambiguous ones, a fact that OpenAI itself even admits.

"This is a nice feature in terms of making the model more usable and reliable in solving specific tasks, but the cost is there." In its blog post, SplxAI writes, "Providing explicit instructions about what should be done is relatively simple, but providing sufficiently explicit and precise instructions about what shouldn't be done is another matter, since the list of unwanted behaviors is much larger than the list of wanted behaviors."

It's worth noting that OpenAI Inc. has released cue word guidelines for GPT-4.1 that are designed to reduce possible inconsistent behavior of the model. However, the results of these independent tests show that the new models do not necessarily outperform the old ones in all respects. Similarly, OpenAI's new inference models, o3 and o4-mini, have been criticized for being more prone to "hallucinations"-that is, making up things that don't exist-than the company's older models.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Tesla Optimus humanoid robot pilot line unveiled, Musk says thousands will be in factory by year's end

2025-4-24 11:06:06

Information

Challenge No. Anthropic Alert: Claude and other AIs are being misused to steer opinion and threaten public perception

2025-4-24 14:54:20

Search