May 17, 2011 - According to a report today in the foreign media LiveScience, there are some tasks that humans can easily accomplish thatAI But it's not up to the task. For example, AI can program, draw realistic images, generate text close to human tone, and even score well on some exams, but in everyday life the most basic"Watch the clock," "count the days."But there have been frequent mistakes in such matters-- Either you can't read the pointer or you can't figure out the day of the week..

The researchers presented the findings at the International Conference on Learning Representations (ICLR) 2025, and the paper, which has been published on arXiv, has not yet been peer-reviewed.
Rohit Saxena, a researcher at the University of Edinburgh and author of the paper, said, "Humans have been able to grasp the concept of time and calendars from a young age, and AI's shortcomings in this area are aSigns to be wary of." He noted that to apply AI to real-life, time-sensitive scenarios such asScheduling, automated processes or assistive technology, this type of basic competency deficiency must be addressed.
The research team fed several large language models with graphic processing capabilities a set ofcustomizedclocksWith calendar imagesThe models tested include Meta's Llama 3.2-Vision, Anthropic's Claude-3.5 Sonnet, Google's Gemini 2.0 and OpenAI GPT-4o. Tests showed that none of these models were more than half correct on the tasks of determining clock time or extrapolating the day of the week of a date.
Saxena said, "AI training in the past has relied onNumerous examples with labelswhile reading the clock requiresspatial reasoning. The model not only recognizes whether the pointers overlap or not, but alsoUnderstanding angles, distinguishing various styles of dials, such as Roman numerals or artistic designs. It's far more complicated than just recognizing 'this is a clock'."
Calendar problems are also difficult for AI. for example, in "What day of the week is the 153rd day of the year?"Error rates remain high on such questions. Studies have shown thatThe AI reads the clock correctly at only 38.7% and judges the calendar even less accurately at 26.3%.
Saxena explains, "Arithmetic is a breeze for traditional computers, but not for big models. ai doesn't execute algorithms, but ratherRelying on patterns learned from training datato predict the answer." He noted that while AI can sometimes answer questions correctly, itsLack of consistency in the reasoning process, nor is it based on fixed rules, which is precisely the gap revealed by the study.
The study also revealed another problem, which is that AIs tend to perform worse when their training samples lack a certain type of phenomenon, such as leap years or complex calendar rules, Saxena said, "Even if the models understand the concept of a 'leap year,' that doesn't mean that they can correctly apply this knowledge to specific visual judgments."
1AI learned from the report that the study emphasized two areas of improvement:One is that the training data should contain more representative examples; the other is that how AI integrates logical reasoning and spatial perception should be revisited, especially when dealing with infrequently encountered tasks.