MiniMax Introduces M1, the World's First Open Source Large-Scale Hybrid Architecture for Inference Modeling: 456B Parameters, Superior Performance over DeepSeek-R1

June 17 news.MiniMax Rare Technology has announced that it will be releasing important updates for five consecutive days. Today's first bullet isOpen SourceThe firstinference model MiniMax-M1.

MiniMax Introduces M1, the World's First Open Source Large-Scale Hybrid Architecture for Inference Modeling: 456B Parameters, Superior Performance over DeepSeek-R1

According to the official introduction, MiniMax-M1 is the world's first open-source large-scale hybrid architecture inference model.MiniMax said: M1 in productivity-oriented complex scenarios in the ability of the open-source model is the best of the best class, more than the closed-source model in China, close to the most leading overseas models, while at the same time the industry's highest price-performance ratio.

The official blog also mentioned that based on two major technological innovations, the MiniMax-M1 training process was "beyond expectations", taking only 3 weeks and 512 H800 GPUs to complete the reinforcement learning training phase, and the arithmetic leasing cost was only US$534,700 (note: the current exchange rate is about 3,841,000 yuan), which is an order of magnitude lower than what was expected at the beginning. This is an order of magnitude lower than what was expected at the beginning.

  • One significant advantage of M1 is that it supports the industry's highest input of 1,000,000 contexts, the same as Google Gemini 2.5 Pro in closed-source modeling, 8 times that of DeepSeek R1, and the industry's longest inference output of 80,000 tokens.
  • This is mainly due to our original hybrid architecture based on the lightning attention mechanism, which is significantly more efficient in computing long contextual inputs as well as deep inference. For example, deep reasoning with 80,000 Token requires only about 30% of arithmetic for DeepSeek R1. This feature gives us a significant arithmetic efficiency advantage in both training and inference. In addition, we propose a faster reinforcement learning algorithm, CISPO, which improves the efficiency of reinforcement learning by trimming the importance sampling weights instead of traditional token updating. In our experiments at AIME, we found this to be twice as fast as the convergence performance of reinforcement learning algorithms including DAPO recently proposed by Byte, and significantly outperforms GRPO used by DeepSeek earlier.
  • Thanks to these two technological innovations, we ended up with a very efficient reinforcement training process that exceeded our expectations. In fact, the entire intensive learning phase only took 512 H800s for three weeks, with a rental cost of only US$ 534,700. That's an order of magnitude less than what was expected at the outset. We reviewed the M1 in detail on 17 of the industry's leading review sets, and here are the results:
  • We find that our model has significant advantages in complex productivity-oriented scenarios such as software engineering, long contexts and tool usage.
  • MiniMax-M1-40k and MiniMax-M1-80k achieve excellent results of 55.6% and 56.0% on the SWE-bench validation benchmarks, respectively, which is slightly lower than DeepSeek-R1-0528's 57.6% but significantly better than other open-source weighting models.
  • Relying on its millions of context windows, the M1 series excels in long context understanding tasks, outperforming not only all open-source weighting models, but even OpenAI o3 and Claude 4 Opus, ranking second globally, and only narrowly trailing Gemini 2.5 Pro.
  • In the agent tool usage scenario (TAU-bench), MiniMax-M1-40k also leads all open source weighting models and beats Gemini-2.5 Pro.
  • Notably, MiniMax-M1-80k consistently outperforms MiniMax-M1-40k in most benchmarks, which fully validates the effectiveness of the computational resources when scaling tests. Detailed technical reports with full model weights can be accessed under our official Hugging Face and GitHub accounts. vLLM and Transformer are two open source projects that provide their own inference deployment support, and we are also working with SGLang to advance deployment support. Because of the relatively efficient use of training and inference power, we keep unlimited free use on MiniMax App and Web, and provide APIs on the official website at the lowest price in the industry. for input lengths of 0-32k, input $0.8/million tokens and output $8/million tokens; for 32k-128k input lengths, input $1.2/million tokens and output $16/million tokens; and for 32k-128k input lengths, input $1.2/million tokens and output $1.2/million tokens. The first two modes are more cost-effective than DeepSeek-R1, and the latter mode is not supported by the DeepSeek model. In addition to M1, we have prepared some updates for you in the next four consecutive working days, so stay tuned.
statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

CyberSpark's Big Medical Model Announces Top MedBench Ranking, Ranks #1 in Several Core Capabilities

2025-6-17 21:28:47

Information

Adobe Firefly goes live on mobile platforms: supports text-to-graphics/video, can call third-party AI models

2025-6-18 11:35:28

Search