Columbia study: AI search tool averages only 60% accuracy and confidently "admits no mistakes"

March 13 (Bloomberg) -- Foreign media outlet Techspot reported Tuesday thatColumbia UniversityThe Tow Center for Digital Journalism (TCDJ) recently conducted a survey of eight AI Search EngineLaunching of the study, including ChatGPT Search, Perplexity, Perplexity Pro, Gemini, DeepSeek Search, Grok-2 Search, Grok-3 Search and Copilot. The researchers tested the accuracy of each engine and recorded how often they refused to answer questions.

Columbia study: AI search tool averages only 60% accuracy and confidently "admits no mistakes"

Researchers from 20 news organizations200 randomly selected reports(10 articles each) to ensure that they rank in the top three when searched by Google, and then test each AI search tool with the same query and assess whether they correctly cite theArticle content, name of news organization and original link.

The results of the tests show that with the exception of Perplexity and its paid version, the rest of the AI search engines did not perform as well as they should have. Overall.AI search engines provide answers that 60% is inaccurate.And AI. "Confidence" in wrong answersInstead, it exacerbates the problem.

The importance of this study is that it corroborates with data what has been feared for years -- that the Big language models are not only wrong, they're also good at talking nonsense in a serious way.. They often start with aA tone of absolute certaintyStating misinformation and even trying to justify it when challenged.

Even after admitting an error, ChatGPT may continue to fabricate content in subsequent responses. In the setting of the large language model, it is almost"Answers will be given no matter what.". The research data supports this idea: ChatGPT Search was the only AI tool that answered all 200 news queries, but itsThe "completely correct" rate was only 281 TP3T, while the "completely incorrect" rate was as high as 571 TP3T.

ChatGPT wasn't the worst performer. x's Grok AI performed particularly poorly, with Grok-3 Search'sError rate up to 94%Microsoft Copilot. Microsoft Copilot was also problematic -- out of 200 queries, there were 104 refusals to answer.Of the remaining 96, only 16 were "completely correct", 14 were "partially correct" and 66 were "completely incorrect".The overall error rate is close to 70%.

The companies developing these AI toolsThere is no public acknowledgement of these issuesThe company also charges subscribers between $20 and $200 per month (note: currently around Rs. 145 to Rs. 1,449) for a subscription. In addition, the paid versions of Perplexity Pro ($20/month) and Grok-3 Search ($40/month) answer more questions than the free versions, but also have higher error rates.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Beijing's primary and secondary schools to create first 11 AI application scenarios

2025-3-13 17:33:26

Information

Bill Gates looks at the future of AI: "One human" intelligence to help you prioritize your most important tasks

2025-3-13 19:06:34

Search