AI PK NORTH CHEMISTRY: TOP MODEL IS JUST THE SAME AS THE AVERAGE FOR JUNIOR UNDERGRADUATES

December 29, according to XinhuaPeking UniversityThe recent results of the multi-modular in-depth reasoning assessment in the field of chemistry, SUPERChem, were released by a team from the Great North Computing Centre, the Computer Academy and the Yumpe Institute。

AI PK NORTH CHEMISTRY: TOP MODEL IS JUST THE SAME AS THE AVERAGE FOR JUNIOR UNDERGRADUATES

And in the near future, they're using this "Northern Test Paper" as a yardstick, so they're trying to measure it AI The true boundaries of scientific reasoning。

According to the information received, the examination was attended by two junior students from the North Great Chemical and Molecular Engineering Institute, in addition to the GPT, Gemini, DeepSeek and Qwen, among others。

According to the report, the SUPERChem library is made up of 500 deep adaptations of difficult questions and front-line professional literature, not from a web-based public repository. The library is also designed to create a set of topics that AI “not seen” must rely on hard-power reasoning。

IN THIS CAREFULLY DESIGNED EXAMINATION, HUMANS HAVE SHOWN COMPLEX SCIENTIFIC INSTINCTS. AS A BASELINE, UNDERGRADUATE STUDENTS AT THE NORTH GREAT CHEMICAL INSTITUTE WHO PARTICIPATED IN THE TESTING ACHIEVED AN AVERAGE ACCURACY RATE OF 40.31 TP3T。

ON THE OTHER HAND, AI IS DOING WELL:

Even the top-of-the-post model tested has only the same level of achievement as the average for undergraduate students in the lower grades. According to the list, the highest GPT-5 (High) is the correct rate of 39.61 TP3T, which is below human level。

Not only is the correct rate "unusual", but in some areas, modeling is confusing for the team:

THE LANGUAGE OF CHEMISTRY IS GRAPHIC, AND THE MOLECULAR STRUCTURE, THE RESPONSE MACHINE, CONTAINS KEY INFORMATION. FOR SOME MODELS, HOWEVER, THE ACCURACY RATE IS NOT REVERSED WHEN IMAGE INFORMATION IS INTRODUCED. THIS SUGGESTS THAT THE CURRENT AI STILL HAS SIGNIFICANT SENSORY BOTTLENECKS IN TRANSLATING VISUAL INFORMATION INTO CHEMICAL SYNTAX。

EVEN IF THE RIGHT ANSWER IS CHOSEN, IT MAY BE DIFFICULT TO SOLVE THE PROBLEM. THE TEAM FOUND THAT THE AI CHAIN OF REASONING TENDED TO BREAK UP HIGH-LEVEL TASKS SUCH AS PRODUCT STRUCTURE PREDICTION, RESPONSE MACHINE RECOGNITION AND STRUCTURE RELATIONSHIP ANALYSIS. THE CURRENT TOP-OF-THE-ART MODEL, WITH ITS VAST KNOWLEDGE RESERVES, IS STILL ILL-EQUIPPED TO DEAL WITH HARD NUCLEAR CHEMISTRY, WHICH REQUIRES CAREFUL LOGIC AND DEEP UNDERSTANDING。

According to the report, the team released this result not to prove AI ' s short board, but to push it further. SUPERChem is like a signpost. It reminds us:

There is still a long way to go from a general chat robot to a professional scientific assistant who can understand the structure of the relationship, and who can drive the response machine. It's from "Remember Knowledge" to "Understanding Physical World."。

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Cursor CEO: over-dependent vibe coding will happen sooner or later

2025-12-27 17:30:54

Information

Anthropic Alliance: Under calm, AI is starting to divide the "parallel world."

2025-12-29 10:42:27

Search