November 7th, the dark side of the moon Kimi The most powerful so farOpen SourceThinking model - Kimi K2 Thinking。

The model was described as the dark side of the moon based on "The Model is Agent"Thinking Agent, the new generation of conceptual training, has the ability to think and use tools." Performance in many benchmark tests such as Humanity's Last Exam, Autonomous Web Browser Capability (BrowseComp), Complex Information Collection Logic (SEAL-0)REACHING SOTA LEVELAnd there has been an overall improvement in the capacity of Agenic search, Agenic programming, writing and integrated reasoning。
Without human intervention, the model can autonomously achieve up to 300 rounds of tools to mobilize and continuously stabilize multiple rounds of thinking, thus helping users to solve more complex problems。
1AI with links to Huging Face, ModelScop deployment as follows:
- Hugging Face: https://huggingface.co/moonshotai
- ModelScope: https://www.modelscope.cn/organization/moonshotai
The Human Last Examination is a final closed academic test covering more than 100 areas of specialization. Kimi K2 Thinking achieved the SOTA performance of 44.9% in this benchmark assessment, where the tools - search, Python, web browsing - are allowed。
In the official example provided, Kimi K2 Thinking, after five rounds of search and reasoning, combined with new information from each round, layered in depth and ultimately deduced the answer:
According to the presentation, the Kimi K2 Thinking model also performed well in complex search and browsing scenarios. BrowneComp is a benchmark test published by OpenAI to assess the ability of AI Agent to browse, which was originally designed to measure AI Agent Persistence and creativity in an information overloading environmentIn other words, it is possible to “scratch the bottom” like human researchers. On average, humanity can achieve only 29.21 TP3T in this challenging task. Kimi K2 Thinking demonstrated a great ability to drill in this benchmark testA NEW SOTA MODEL WITH RESULTS FROM 60.2%.
Kimi K2 Thinking, driven by long-range planning and autonomous search capabilities, can use up toHundreds of rounds of "thinking, searching, browse, browsing, browsing" dynamic cycle, presents and refines assumptions on an ongoing basis, validates evidence, undertakes reasoning and constructs logical answers. This ability to actively search and think on a continuous basis enables Kimi K2 Thinking to break vague and open issues into clear, implementable sub-tasks。
In another example provided by official sources, Kimi K2 Thinking, after two rounds of search and reflection, first found the company that made the speedboat based on known information on stock buy-backs, and then found the stock buy-back announcement information on the United States Securities and Exchange Commission (SEC) online, giving the correct answer:
The coding capacity of the Kimi K2 Thinking model has also been enhanced, with further improvements in performance in benchmarking tests such as the multilingual software engineering benchmark SWE-multilingual, the SWE-bench validation set and the Terminal terminal use。
The dark side of the moon indicates that the common base capacity of Kimi K2 Thinking has also been upgraded:
- Creative Writing: Kimi K2 Thinking has significantly improved writing skills, which translates crude inspiration into clear, moving and well-intended narratives that combine rhythm and depth. It can easily manage subtle textual differences and vague structures and maintain consistency in style in long speeches. In terms of creative writing, it has a more lively image and a stronger emotional resonance that integrates a precise expression with a rich performance。
- Academia and research: Kimi K2 Thinking has significantly improved in terms of analytical depth, accuracy of information and logical structure in academic and professional fields. It analyses complex instructions in an orderly manner and expands thinking in a clear and rigorous manner. This makes it particularly specialized in dealing with academic papers, technical abstracts and long reports that are highly demanding for the integrity of information and the quality of reasoning。
- Personal and Emotional: In response to personal or emotional questions, Kimi K2 Thinking's answer is more common and balanced. It is thoughtful and concrete and provides a nuanced perspective and practical follow-up recommendations. It helps users to streamline complex decision-making with clarity and concern, and its tone is both real and relevant and more human。