{"id":32579,"date":"2025-04-08T11:26:10","date_gmt":"2025-04-08T03:26:10","guid":{"rendered":"https:\/\/www.1ai.net\/?p=32579"},"modified":"2025-04-08T11:26:10","modified_gmt":"2025-04-08T03:26:10","slug":"deepseek-%e7%aa%81%e7%a0%b4-ai-%e8%ae%ad%e7%bb%83%e7%83%a7%e9%92%b1%e9%ad%94%e5%92%92%ef%bc%9a1-2-%e4%b8%87%e7%be%8e%e5%85%83-1-525-%e6%88%90%e6%9c%ac-mt-bench-%e8%b7%91%e5%88%86%e5%aa%b2%e7%be%8e-gpt","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/32579.html","title":{"rendered":"Deepseek Breaks AI Training Burnout Spell: $12K 1\/525 Cost MT-Bench Scores Rival GPT-4o"},"content":{"rendered":"<p>April 8 News.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e6%b7%b1%e5%ba%a6%e6%b1%82%e7%b4%a2\" title=\"[See articles with [Deep Request] labels]\" target=\"_blank\" >search in depth<\/a>\uff08<a href=\"https:\/\/www.1ai.net\/en\/tag\/deepseek\" title=\"[View articles tagged with [DeepSeek]]\" target=\"_blank\" >DeepSeek<\/a>The company has joined forces with Tsinghua University to launch a new AI alignment technology, SPCT (Self-Principled Critique Tuning), which breaks through the traditional model of relying on massive amounts of training data.<strong>Dynamically optimizing output quality through the inference phase.<\/strong><\/p>\n<p>According to a paper published by the research team on April 4, the technique is based on a recursive architecture of \"principle synthesis-response generation-critical filtering-principle optimization\".<strong>Enable models to dynamically correct outputs as they reason.<\/strong><\/p>\n<p>The SPCT approach is divided into two phases. First, rejection fine-tuning is used as a cold-start phase to allow the GRM to adapt to different input types and generate principles and critique content in the correct format. The second is the rule-based online reinforcement learning phase, which uses rule-based outcome rewards to encourage the GRM to generate better principles and critique content and improve the scalability of the inference phase.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-32580\" title=\"c4fbfe6cj00sudqub00dvd000v900kbp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/04\/c4fbfe6cj00sudqub00dvd000v900kbp.jpg\" alt=\"c4fbfe6cj00sudqub00dvd000v900kbp\" width=\"1125\" height=\"731\" \/><\/p>\n<p>Tests of the DeepSeek-GRM model with 27 billion parameters show a performance level of a 671B scale model through inference computation with 32 samples per query. This hardware-aware design utilizes a hybrid expert system (MoE) and supports a 128k token context window with a single query latency of only 1.4 seconds.<\/p>\n<p>The report points out that SPCT significantly reduces the deployment threshold of high-performance models, with the DeepSeek-GRM model, for example, costing about $12,000 to train (note: the current exchange rate is about RMB 87,871) and scoring 8.35 on the MT-Bench.<\/p>\n<table>\n<tbody>\n<tr class=\"firstRow\">\n<th>Model<\/th>\n<th>ballpark<\/th>\n<th>MT-Bench<\/th>\n<th>Estimated training costs<\/th>\n<\/tr>\n<\/tbody>\n<tbody>\n<tr>\n<td>DeepSeek-GRM<\/td>\n<td>27B<\/td>\n<td>8.35<\/td>\n<td>$12,000<\/td>\n<\/tr>\n<tr>\n<td>Nemotron-4<\/td>\n<td>340B<\/td>\n<td>8.41<\/td>\n<td>$1.2 million<\/td>\n<\/tr>\n<tr>\n<td><a href=\"https:\/\/www.1ai.net\/en\/tag\/gpt-4o\" title=\"[View articles tagged with [GPT-4o]]\" target=\"_blank\" >GPT-4o<\/a><\/td>\n<td>1.8T<\/td>\n<td>8.72<\/td>\n<td>$6.3 million<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>For comparison, 340B's Nemotron-4, which costs $1.2 million, scored 8.41 points. OpenAI's 1.8T parameter GPT-4o scored 8.72 points, but the cost is as high as $6.3 million (RMB 46.132 million at current exchange rate), while the cost of DeepSeek-GRM is only one part in 525. This technology reduces the need for human annotation by 90% and energy consumption by 73% compared to DPO, providing new possibilities for dynamic scenarios such as real-time robot control.<\/p>","protected":false},"excerpt":{"rendered":"<p>April 8, DeepSeek, in collaboration with Tsinghua University, launched a new AI alignment technology SPCT (Self-Principle Critique Tuning), which breaks through the traditional mode of relying on massive training data and dynamically optimizes the quality of the output through the reasoning stage. According to a paper published by the research team on April 4, the technique uses a recursive architecture of \"principle synthesis-response generation-criticism filtering-principle optimization\" to allow the model to dynamically correct the output during inference. The SPCT method is divided into two stages. First, rejection fine-tuning is used as a cold-start phase to allow the GRM to adapt to different input types and generate principles and critiques in the correct format. The second is the rule-based online reinforcement learning phase, which uses rule-based rewards for results to encourage the GRM to generate better<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[3606,2582,5114],"collection":[],"class_list":["post-32579","post","type-post","status-publish","format-standard","hentry","category-news","tag-deepseek","tag-gpt-4o","tag-5114"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/32579","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=32579"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/32579\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=32579"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=32579"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=32579"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=32579"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}