Regardless of the current AI ChatbotsNo matter how powerful AI is, it will have a behavior that is often criticized - providing users with answers that are inconsistent with the facts in a way that looks convincing. In simple terms, AI sometimes "talks nonsense" or even "spreads rumors" in its answers.

Image source: Pixabay
Preventing large AI models from behaving in this way is not easy and is a technical challenge. However, according to foreign media Marktechpost,Google DeepMind andStanford UniversityIt seems some kind of workaround has been found.
Researchers have introduced a tool based on a large language model - Search Enhanced Fact EvaluatorThe results of the study, along with the experimental code and dataset, have been published.Click here to view
The system analyzes, processes, and evaluates responses generated by the chatbot in four steps:, to verify accuracy and truthfulness: split the answer into individual fact checks, correct them, and compare them with Google search results. The system then checks the relevance of each fact to the original question.
To evaluate its performance, the researchers created a dataset called LongFact containing about 16,000 facts and tested the system on 13 large language models from Claude, Gemini, GPT, and PaLM-2. The results showed that in a focused analysis of 100 controversial facts, SAFE's judgments were correct with a rate of 76% under further review. At the same time, the framework also has economic advantages:The cost is more than 20 times cheaper than manual annotation.
