Grok 4's anticlimactic score leaks, 'Humanity's Final Exam' boasts 451 TP3T all time?

Grok 4 scored as high as 45% in the "Human Last Exam" (HLE) test, far exceeding Gemini 2.5 Pro and Claude 4 Opus, sparking discussion; Musk said Grok 4 builds its reasoning mechanism on "first principles", thinking like a physicist and analyzing problems at the level of the basic axioms; Grok 4 will strengthen coding capabilities or be divided into two versions. Grok 4 and Grok 4 Code versions are expected to be released anytime after July 4th.

Search