The first OpenClaw-specific benchmark-test out: Lightweight Model Comprehensive Anti-Standship, National Production AI Strong into First Platoon

The first OpenClaw-specific benchmark-testing furnace: Lightweight model is a full-scale anti-flagship

March 9th, yesterday, to evaluate the big language model OpenClaw Performance of the mandatebenchmarking PinchBench officially came out and tested 32 of the main mainstream models for a one-time, horizontal comparison of success, speed and cost。

The first OpenClaw-specific benchmark-testing furnace: Lightweight model is a full-scale anti-flagship

In the success dimension, Google's Gemini 3 Flash Preview ranked first with 95.1% success。

As a "light version" of the Gemini series, the performance went beyond its own flagship Gemini 3 Pro (91.71 TP3T) and overcame Claude Sonet 4.5 (92.71 TP3T) and GPT-4o (85.21 TP3T)。

The performance of the national production model was equally bright, with MiniMax M2.1 ranked second in the success rate of 93.6%, and Kimi K2.5 followed by 93.4%, with two national production models co-taking two of the top three seats worldwide。

Anthropic flagship model Claude Opus 4.6 The success rate was only 90.6%, ranked seventh, behind the multi-media end model。

In terms of speed, MiniMax M2.5 completes the entire test in 105.96 seconds, leading by 0.09 seconds by the second-named Gemini 2.0 Flash, to the speed champion。

By contrast, Claude Sonet 4 took 137.66 seconds, while Gemini 3 Pro was as high as 239.55 seconds, about twice as long as a champion。

On a cost dimension, GPT-5 Nano became the minimum price option for the field at $0.03 per assignment, with a success rate of 85.8%。

Gemini 2.5 Flash Lite follows with a success rate of USD 0.05, 83.2%. Claude Opus 4.6 completed the test at a cost of $5.89, nearly 200 times the GPT-5 Nano, but the success rate was lower than Mini Max M2.1 by more than 3 percentage points。

PinchBench's rating mechanisms include code running validation (automated inspection), quality assessment (with Claude Opus as judge) and a combination of three ways in which all topics and answers are available to GitHub. The full list can be found in pinchbench.com。

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

The first OpenClaw-specific benchmark-testing furnace: Lightweight model is a full-scale anti-flagship

ChatGPT "Adult Mode" double jump, OpenAI: Higher priority jobs are more important

"AI Lobster Feeder" is behind the red

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

ChatGPT "Adult Mode" double jump, OpenAI: Higher priority jobs are more important

"AI Lobster Feeder" is behind the red

MLCommons Releases First Public Version 0.5 of PC AI Benchmark MLPerf Client

Change name two times in hours: Blaze AI Assistant Clawdbot Transform OpenClaw

OpenClaw Father: 80% Existing App will disappear

THE AI LOBSTER IS OFFICIALLY ON THE CHINESE SOCIAL MEDIA PLATFORM

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow