Columbia University study: the average accuracy of AI search tools is only 60%, and confident "do not admit mistakes"

March 13 (Bloomberg) -- Foreign media outlet Techspot reported Tuesday thatColumbia UniversityThe Tow Center for Digital Journalism (TCDJ) recently conducted a survey of eight AI Search EngineLaunching of the study, including ChatGPT Search, Perplexity, Perplexity Pro, Gemini, DeepSeek Search, Grok-2 Search, Grok-3 Search and Copilot. The researchers tested the accuracy of each engine and recorded how often they refused to answer questions.

Columbia study: AI search tool averages only 60% accuracy and confidently "admits no mistakes"

Researchers from 20 news organizations200 randomly selected reports(10 articles each) to ensure that they rank in the top three when searched by Google, and then test each AI search tool with the same query and assess whether they correctly cite theArticle content, name of news organization and original link.

The results of the tests show that with the exception of Perplexity and its paid version, the rest of the AI search engines did not perform as well as they should have. Overall.AI search engines provide answers that 60% is inaccurate.And AI. "Confidence" in wrong answersInstead, it exacerbates the problem.

The importance of this study is that it corroborates with data what has been feared for years -- that the Big language models are not only wrong, they're also good at talking nonsense in a serious way.. They often start with aA tone of absolute certaintyStating misinformation and even trying to justify it when challenged.

Even after admitting an error, ChatGPT may continue to fabricate content in subsequent responses. In the setting of the large language model, it is almost"Answers will be given no matter what.". The research data supports this idea: ChatGPT Search was the only AI tool that answered all 200 news queries, but itsThe "completely correct" rate was only 281 TP3T, while the "completely incorrect" rate was as high as 571 TP3T.

ChatGPT wasn't the worst performer. x's Grok AI performed particularly poorly, with Grok-3 Search'sError rate up to 94%Microsoft Copilot. Microsoft Copilot was also problematic -- out of 200 queries, there were 104 refusals to answer.Of the remaining 96, only 16 were "completely correct", 14 were "partially correct" and 66 were "completely incorrect".The overall error rate is close to 70%.

The companies developing these AI toolsThere is no public acknowledgement of these issuesThe company also charges subscribers between $20 and $200 per month (note: currently around Rs. 145 to Rs. 1,449) for a subscription. In addition, the paid versions of Perplexity Pro ($20/month) and Grok-3 Search ($40/month) answer more questions than the free versions, but also have higher error rates.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Columbia study: AI search tool averages only 60% accuracy and confidently "admits no mistakes"

Beijing's primary and secondary schools to create first 11 AI application scenarios

Bill Gates looks at the future of AI: "One human" intelligence to help you prioritize your most important tasks

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Beijing's primary and secondary schools to create first 11 AI application scenarios

Bill Gates looks at the future of AI: "One human" intelligence to help you prioritize your most important tasks

Japanese media warns: AI search engines may have irreversible impact on culture

OpenAI launches SearchGPT to challenge Google: Based on the GPT-4 series of AI models, only 10,000 people will be invited to test it in the early stage

ChatGPT's News Search Results Often Inaccurate: Columbia Study

Columbia University Study: Big Language Models Are Becoming More Like the Human Brain

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow