The first OpenClaw-specific benchmark-testing furnace: Lightweight model is a full-scale anti-flagship

March 9th, yesterday, to evaluate the big language model OpenClaw Performance of the mandatebenchmarking PinchBench officially came out and tested 32 of the main mainstream models for a one-time, horizontal comparison of success, speed and cost。

The first OpenClaw-specific benchmark-testing furnace: Lightweight model is a full-scale anti-flagship

In the success dimension, Google's Gemini 3 Flash Preview ranked first with 95.1% success。

As a "light version" of the Gemini series, the performance went beyond its own flagship Gemini 3 Pro (91.71 TP3T) and overcame Claude Sonet 4.5 (92.71 TP3T) and GPT-4o (85.21 TP3T)。

The performance of the national production model was equally bright, with MiniMax M2.1 ranked second in the success rate of 93.6%, and Kimi K2.5 followed by 93.4%, with two national production models co-taking two of the top three seats worldwide。

Anthropic flagship model Claude Opus 4.6 The success rate was only 90.6%, ranked seventh, behind the multi-media end model。

In terms of speed, MiniMax M2.5 completes the entire test in 105.96 seconds, leading by 0.09 seconds by the second-named Gemini 2.0 Flash, to the speed champion。

By contrast, Claude Sonet 4 took 137.66 seconds, while Gemini 3 Pro was as high as 239.55 seconds, about twice as long as a champion。

On a cost dimension, GPT-5 Nano became the minimum price option for the field at $0.03 per assignment, with a success rate of 85.8%。

Gemini 2.5 Flash Lite follows with a success rate of USD 0.05, 83.2%. Claude Opus 4.6 completed the test at a cost of $5.89, nearly 200 times the GPT-5 Nano, but the success rate was lower than Mini Max M2.1 by more than 3 percentage points。

PinchBench's rating mechanisms include code running validation (automated inspection), quality assessment (with Claude Opus as judge) and a combination of three ways in which all topics and answers are available to GitHub. The full list can be found in pinchbench.com。

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

ChatGPT "Adult Mode" double jump, OpenAI: Higher priority jobs are more important

2026-3-9 11:36:39

HeadlinesInformation

"AI Lobster Feeder" is behind the red

2026-3-9 11:38:52

Search