OpenAI release of the scientific capacity assessment benchmark Frontier Science

OpenAI introduced the scientific capabilities of the Frontier Science baseline assessment expert level, consisting of more than 700 physico-chemical topics, divided into the Osai track (100 questions) and the research track (60 original research sub-missions); GPT-5.2 scored 771 TP3T in the Osai track and 251 TP3T in the research track ahead of other front-line models; Gemini 3 Pro performed comparablely with GPT-5.2 in the Osai track (761 TP3T); and the research track adopted a 10-point evaluation structure based on metrology, focusing on the correctness of the reasoning step rather than on the final answer and revealing the problem of the logical error of the model and inadequate understanding of professional concepts。

Search