GOOGLE AI DECIPHERING THE ANCIENTS' NEW PEAK: THE ERROR RATE IS ONLY 0.561 TP3T, WHICH COMPARES WITH HUMAN EXPERTS

on november 16th, the technology media wentlem published yesterday, november 15, a blog that reportedGoogleThrough its Al Studio platformTesting an unnamed item AI ModelsIt is close to the level of human experts in deciphering hard-to-read historical manuscripts。

GOOGLE AI DECIPHERING THE ANCIENTS' NEW PEAK: THE ERROR RATE IS ONLY 0.561 TP3T, WHICH COMPARES WITH HUMAN EXPERTS

1AI cites the presentation that the performance of the model was systematically tested by historian Mark Humphreys using a specially developed baseline data set. The results show that when dealing with five difficult historical manuscripts, the overall character error rate of the model is approximately 1.7%, most of which relates to the symbol and case, rather than the word itself。

The Humphries assessment further states that if the fuzzy punctuation and case error are excluded, the character error rate of the AI model will be sharply reduced to about 0.561 TP3T, equivalent to one error per 200 characters。

This alarming level of accuracy is comparable to that of human professionals engaged in the transliteration of historical literature. The document from this test covers a wide range of handwritten styles from the 18th to the 19th century, with complex samples of mosaics, spelling errors and grammatical inconsistencies, further highlighting the strong capabilities of the model。

The most surprising manifestation of the model is its ability to demonstrate complex reasoning beyond simple writing. In the processing of an 18th century trader's diary, there is a record of the purchase of sugar in the original text, with only the number “145” marked and no unit of measure indicated。

GOOGLE'S AI MODEL IS NOT DIRECTLY REWRITTEN AS "145", BUT IT'S "14 POUNDS 5 OUNCES." THE RESEARCHERS FOUND THAT AI SUCCESSFULLY EXTRAPOLATED THIS RESULT BY INVERTING THE TOTAL PRICES RECORDED IN THE BOOKS AND BY COMBINING THE THEN BRITISH CURRENCY (POUNDS, SHILLINGS, PENCE) WITH THE WEIGHT UNIT。

While the initial results were encouraging, Humphreys also highlighted the limitations of the current assessment. Only a sample of approximately 101 TP3T of baseline data is currently assessed because of the difficulty of conducting systematic large-scale tests as the model appears sporadically through A/B tests。

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

CHAIRMAN OF THE BOARD OF DIRECTORS, PRIVATE LIU: FULL SUPPORT FOR OPEN SOURCE. WE'VE GOT MORE THAN 40 AI MODELS

2025-11-16 11:50:13

Information

First industry, Jetbrains release AI encoded smarts benchmarking platform DPAI Arena

2025-11-17 11:34:54

Search