In recent years, artificial intelligence (AI) have made significant progress in various areas, with large-scale language modeling (LLM) is capable of generating human-level text and even exceeding human performance on some tasks. However, researchers of LLM'sreasoning abilityquestioned, they found that these models, when solving simple mathematical problems, wereThe fact that mistakes are made with just a few minor changes suggests that they may not be capable of true logical reasoning.

Thursday.appleA group of researchers at the company published a paper titled "Understanding the Limitations of Mathematical Reasoning in Large Language Models," revealing that LLMs are susceptible to interference when solving mathematical problems.IT House notes that theThe researchers tested the reasoning power of the LLM by making small changes to the math problem, such as adding irrelevant information. It turns out that the performance of these models drops dramatically in the face of such changes.
For example, when researchers give a simple mathematical question: "Oliver picks 44 ecstasy nuts on Friday and 58 ecstasy on Saturday. On Sunday, he picked twice as strange as Friday. How many strange results did Oliver pick?" When the LLM was able to calculate the answer correctly. However, when the researcher added an unrelated detail, “Sunday, he picked twice as many ecstasy as Friday, five of which were smaller than average”, the LLM answered wrongly. For example, GPT-o1-mini responded: "... Sunday, 5 of which are smaller than the average. We need to subtract them from the total number of Sundays: 88 – 5 = 83
The above is just a simple example ofThe researchers modified hundreds of questions, almost all of which resulted in a significant decrease in the model's response success rate.
According to the researchers, this phenomenon suggests that LLMs don't really understand math problems, but instead make predictions based solely on patterns in the training data. But when real "reasoning" is required, such as whether to count small kiwis, they produce strange and implausible results.
This finding has important implications for the development of AI. Although LLM performs well in many areas, there are still limitations in its reasoning ability. In the future, researchers need to further explore how to improve LLM's reasoning ability so that it can better understand and solve complex problems.