Qwen3-Max won 22,32%, and GPT-5 lost more than 62%

On November 5th, American Research Institute Nof1 recently launched a live disk test: They'll be six tops, AI Large Language Model（LLM(c) Injecting $10,000 in each of the initial funds to enable them to trade in real markets。

The first Alpha Arena: Ali Tunyi 3-Max won 22,32% return, GPT-5 lost over 621,TP3T

First Alpha Arena It's officially over, and Ali's Qwen3-Max is in the lead at the end, winning the investment champion at 22.32% returns。

Qwen3-Max, DeepSeek v3.1, GPT-5, Gemini 2.5 Pro, Claude Sonet 4.5, Grok 4 and the top six global models, with the exception of Qwen and DeepSeek, have all lost, and even GPT-5 has more than 62%。

Alpha Arena aims to test the capabilities of these models in the field of “quantitative transactions” in a dynamic, competitive environment。

WHILE THE AI MODEL CAN PERFORM ASSIGNED TASKS, RESEARCHERS POINT OUT THAT THE MODEL SHOWS SIGNIFICANT VARIATIONS IN RISK MANAGEMENT, TRADING BEHAVIOUR, LEAD TIME, DIRECTIONAL PREFERENCES, ETC。

THE TEAM STRESSED THAT THIS WAS NOT TO “SELECT THE STRONGEST MODEL”, BUT TO FACILITATE THE TRANSITION OF AI RESEARCH FROM STATIC, TEST-BASED BENCHMARKING TO “REAL-WORLD” AND “REAL-TIME DECISION-MAKING”。

Experimental design

Each model has the initial funds of $10,000 (note: the current exchange rate is approximately RMB 71218) for the trade of encrypted currency contracts (including BTC, ETH, SOL, BNB, DOGE, XRP) on the Hyperliquid trading platform。
Models can only be based on numerical market data (price, turnover, technical indicators, etc.) and do not allow access to news or current information。
The goal of each model is “maximum PnL” and the Sharpe Ratio ratio is given as a risk-adjusted indicator。
transactions are simplified as: buying (many), selling (empty), holding, flatting. all models use the same hint (prompt), the same data interface, and no specific fine-tuning。

Preliminary results

The report indicates that while each model operates under the same structure, there are significant differences in their transaction style, risk preferences, hold time and transaction frequency. For example, some models are short (empty) and others are almost empty. Some models are long in hold and have low transaction frequency, while others are frequently traded。

In terms of the sensitivity of the data format, the team observed that if the "data order" was changed from "new and old" to "old and old" in the hint, the error caused by the misreading of some of the models could be repaired。

The study also noted the limitations of the test: limited sample size, short running time, no history of model performance and no cumulative learning capacity. The team said it would introduce more control, more features and more statistical strength next season。

Meaning and observation

the project seeks to answer a basic question: “can a large language model be a model for trading in a genuine trading environment as a zero sample (zero-shot) system?”

Through the experiment, Nof1 aims to facilitate the transition of AI research to “the organization of real, dynamic, risk-driven benchmarks” rather than just static data sets。

WHILE THE EXPERIMENT HAS NOT YET REACHED THE CONCLUSION THAT “WHICH MODEL IS STRONGEST”, IT HAS BEEN REVEALED THAT EVEN THE MOST ADVANCED LLM STILL FACES MULTIPLE CHALLENGES IN ACTUAL TRANSACTIONS, SUCH AS “ACTION-ENFORCEMENT” “MARKET-STATE UNDERSTANDING” “TRIBE-FORMAT SENSITIVITY”。

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

The first Alpha Arena: Ali Tunyi 3-Max won 22,32% return, GPT-5 lost over 621,TP3T

OpenAI video generation application Sora landing on the Android platform still requires an invitation code

Microsoft announced that its first self-study image generation model, MAI-Image-1, had been integrated into Bing and Copilot, gradually reducing dependence on OpenAI

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

OpenAI video generation application Sora landing on the Android platform still requires an invitation code

Microsoft announced that its first self-study image generation model, MAI-Image-1, had been integrated into Bing and Copilot, gradually reducing dependence on OpenAI

Canalys: Chinese manufacturers are expected to be the first to bring AI mobile phones to lower price segments

Research: The Internet is full of low-quality machine-translated content, and large language model training needs to be wary of data traps

Databricks launches DBRX, a 132 billion parameter large language model, known as "the most powerful open source AI at this stage"

Ali Tongyi Qianwen announced the launch of the new domain name "tongyi.ai", and added a deep search function to the web version of the chat

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow