{"id":50740,"date":"2026-03-09T11:37:50","date_gmt":"2026-03-09T03:37:50","guid":{"rendered":"https:\/\/www.1ai.net\/?p=50740"},"modified":"2026-03-09T11:37:50","modified_gmt":"2026-03-09T03:37:50","slug":"%e9%a6%96%e4%b8%aa-openclaw-%e4%b8%93%e9%a1%b9%e5%9f%ba%e5%87%86%e6%b5%8b%e8%af%95%e5%87%ba%e7%82%89%ef%bc%9a%e8%bd%bb%e9%87%8f%e6%a8%a1%e5%9e%8b%e5%85%a8%e9%9d%a2%e5%8f%8d%e8%b6%85%e6%97%97%e8%88%b0","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/50740.html","title":{"rendered":"The first OpenClaw-specific benchmark-testing furnace: Lightweight model is a full-scale anti-flagship"},"content":{"rendered":"<p>March 9th, yesterday, to evaluate the big language model <a href=\"https:\/\/www.1ai.net\/en\/tag\/openclaw\" title=\"[See articles with [OpenClaw] label]\" target=\"_blank\" >OpenClaw<\/a> Performance of the mandate<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%9f%ba%e5%87%86%e6%b5%8b%e8%af%95\" title=\"[See articles with [baseline test] labels]\" target=\"_blank\" >benchmarking<\/a> PinchBench officially came out and tested 32 of the main mainstream models for a one-time, horizontal comparison of success, speed and cost\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-50741\" title=\"3b1cd797j00tbm4pw001vd000q00kim\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2026\/03\/3b1cd797j00tbm4pw001vd000qw00kim.jpg\" alt=\"3b1cd797j00tbm4pw001vd000q00kim\" width=\"968\" height=\"738\" \/><\/p>\n<p>In the success dimension, Google's Gemini 3 Flash Preview ranked first with 95.1% success\u3002<\/p>\n<p>As a \"light version\" of the Gemini series, the performance went beyond its own flagship Gemini 3 Pro (91.71 TP3T) and overcame Claude Sonet 4.5 (92.71 TP3T) and GPT-4o (85.21 TP3T)\u3002<\/p>\n<p>The performance of the national production model was equally bright, with MiniMax M2.1 ranked second in the success rate of 93.6%, and Kimi K2.5 followed by 93.4%, with two national production models co-taking two of the top three seats worldwide\u3002<\/p>\n<p>Anthropic flagship model Claude Opus 4.6 The success rate was only 90.6%, ranked seventh, behind the multi-media end model\u3002<\/p>\n<p>In terms of speed, MiniMax M2.5 completes the entire test in 105.96 seconds, leading by 0.09 seconds by the second-named Gemini 2.0 Flash, to the speed champion\u3002<\/p>\n<p>By contrast, Claude Sonet 4 took 137.66 seconds, while Gemini 3 Pro was as high as 239.55 seconds, about twice as long as a champion\u3002<\/p>\n<p>On a cost dimension, GPT-5 Nano became the minimum price option for the field at $0.03 per assignment, with a success rate of 85.8%\u3002<\/p>\n<p>Gemini 2.5 Flash Lite follows with a success rate of USD 0.05, 83.2%. Claude Opus 4.6 completed the test at a cost of $5.89, nearly 200 times the GPT-5 Nano, but the success rate was lower than Mini Max M2.1 by more than 3 percentage points\u3002<\/p>\n<p>PinchBench's rating mechanisms include code running validation (automated inspection), quality assessment (with Claude Opus as judge) and a combination of three ways in which all topics and answers are available to GitHub. The full list can be found in pinchbench.com\u3002<\/p>","protected":false},"excerpt":{"rendered":"<p>On March 9th, yesterday, PinchBench, a benchmark test to assess the performance of the Big Language Model in the OpenClaw mission, officially launched a one-time test of 32 major mainstream models, comparing horizontally from the three dimensions of success, speed and cost. In the success dimension, Google's Gemini 3 Flash Preview ranked first with 95.1% success rate. As a \"light version\" of the Gemini series, the performance went beyond its own flagship Gemini 3 Pro (91.71 TP3T) and overcame Claude Sonet 4.5 (92.71 TP3T) and GPT-4o (85.21 TP3T). The national model is the same<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[8229,5192],"collection":[],"class_list":["post-50740","post","type-post","status-publish","format-standard","hentry","category-news","tag-openclaw","tag-5192"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/50740","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=50740"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/50740\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=50740"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=50740"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=50740"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=50740"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}