{"id":45612,"date":"2025-11-05T11:36:46","date_gmt":"2025-11-05T03:36:46","guid":{"rendered":"https:\/\/www.1ai.net\/?p=45612"},"modified":"2025-11-05T11:36:46","modified_gmt":"2025-11-05T03:36:46","slug":"%e9%a6%96%e5%b1%8a-ai%e5%a4%a7%e6%a8%a1%e5%9e%8b%e7%9c%9f%e5%ae%9e%e6%8a%95%e8%b5%84%e6%af%94%e8%b5%9b-alpha-arena-%e8%90%bd%e5%b9%95%ef%bc%9a%e9%98%bf%e9%87%8c%e9%80%9a%e4%b9%89%e5%8d%83%e9%97%ae-qwe","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/45612.html","title":{"rendered":"The first Alpha Arena: Ali Tunyi 3-Max won 22,32% return, GPT-5 lost over 621,TP3T"},"content":{"rendered":"<p>On November 5th, American Research Institute Nof1 recently launched a live disk test: They'll be six tops, AI <a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%a7%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [large language model]]\" target=\"_blank\" >Large Language Model<\/a>\uff08<a href=\"https:\/\/www.1ai.net\/en\/tag\/llm\" title=\"[SEE ARTICLES WITH [LLM] LABELS]\" target=\"_blank\" >LLM<\/a>(c) Injecting $10,000 in each of the initial funds to enable them to trade in real markets\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-45613\" title=\"c9be668j00t58hzg00bjd000v90lbp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/11\/ca9be668j00t58hzg00bjd000v900lbp.jpg\" alt=\"c9be668j00t58hzg00bjd000v90lbp\" width=\"1125\" height=\"767\" \/><\/p>\n<p>First <a href=\"https:\/\/www.1ai.net\/en\/tag\/alpha-arena\" title=\"[See article with [Alpha Arena] label]\" target=\"_blank\" >Alpha Arena<\/a> It's officially over, and Ali's Qwen3-Max is in the lead at the end, winning the investment champion at 22.32% returns\u3002<\/p>\n<p>Qwen3-Max, DeepSeek v3.1, GPT-5, Gemini 2.5 Pro, Claude Sonet 4.5, Grok 4 and the top six global models, with the exception of Qwen and DeepSeek, have all lost, and even GPT-5 has more than 62%\u3002<\/p>\n<p>Alpha Arena aims to test the capabilities of these models in the field of \u201cquantitative transactions\u201d in a dynamic, competitive environment\u3002<\/p>\n<p>WHILE THE AI MODEL CAN PERFORM ASSIGNED TASKS, RESEARCHERS POINT OUT THAT THE MODEL SHOWS SIGNIFICANT VARIATIONS IN RISK MANAGEMENT, TRADING BEHAVIOUR, LEAD TIME, DIRECTIONAL PREFERENCES, ETC\u3002<\/p>\n<p>THE TEAM STRESSED THAT THIS WAS NOT TO \u201cSELECT THE STRONGEST MODEL\u201d, BUT TO FACILITATE THE TRANSITION OF AI RESEARCH FROM STATIC, TEST-BASED BENCHMARKING TO \u201cREAL-WORLD\u201d AND \u201cREAL-TIME DECISION-MAKING\u201d\u3002<\/p>\n<p><strong>Experimental design<\/strong><\/p>\n<ul>\n<li>Each model has the initial funds of $10,000 (note: the current exchange rate is approximately RMB 71218) for the trade of encrypted currency contracts (including BTC, ETH, SOL, BNB, DOGE, XRP) on the Hyperliquid trading platform\u3002<\/li>\n<li>Models can only be based on numerical market data (price, turnover, technical indicators, etc.) and do not allow access to news or current information\u3002<\/li>\n<li>The goal of each model is \u201cmaximum PnL\u201d and the Sharpe Ratio ratio is given as a risk-adjusted indicator\u3002<\/li>\n<li>transactions are simplified as: buying (many), selling (empty), holding, flatting. all models use the same hint (prompt), the same data interface, and no specific fine-tuning\u3002<\/li>\n<\/ul>\n<p><strong>Preliminary results<\/strong><\/p>\n<p>The report indicates that while each model operates under the same structure, there are significant differences in their transaction style, risk preferences, hold time and transaction frequency. For example, some models are short (empty) and others are almost empty. Some models are long in hold and have low transaction frequency, while others are frequently traded\u3002<\/p>\n<p>In terms of the sensitivity of the data format, the team observed that if the \"data order\" was changed from \"new and old\" to \"old and old\" in the hint, the error caused by the misreading of some of the models could be repaired\u3002<\/p>\n<p>The study also noted the limitations of the test: limited sample size, short running time, no history of model performance and no cumulative learning capacity. The team said it would introduce more control, more features and more statistical strength next season\u3002<\/p>\n<p><strong>Meaning and observation<\/strong><\/p>\n<p>the project seeks to answer a basic question: \u201ccan a large language model be a model for trading in a genuine trading environment as a zero sample (zero-shot) system?\u201d<\/p>\n<p>Through the experiment, Nof1 aims to facilitate the transition of AI research to \u201cthe organization of real, dynamic, risk-driven benchmarks\u201d rather than just static data sets\u3002<\/p>\n<p>WHILE THE EXPERIMENT HAS NOT YET REACHED THE CONCLUSION THAT \u201cWHICH MODEL IS STRONGEST\u201d, IT HAS BEEN REVEALED THAT EVEN THE MOST ADVANCED LLM STILL FACES MULTIPLE CHALLENGES IN ACTUAL TRANSACTIONS, SUCH AS \u201cACTION-ENFORCEMENT\u201d \u201cMARKET-STATE UNDERSTANDING\u201d \u201cTRIBE-FORMAT SENSITIVITY\u201d\u3002<\/p>","protected":false},"excerpt":{"rendered":"<p>On November 5th, American Research Institute Nof1 recently launched a live disk test: They injected the six top AI large language models (LLLM) into the initial amount of $10,000 each to enable them to trade in real markets. The first Alpha Arena was officially declared closed, and Ali, under his banner, Qwen3-Max, led the way at the end, winning the investment champion at the return rate of 22.32%. Qwen3-Max, DeepSeek v3.1, GPT-5, Gemini 2.5 Pro, Claude Sonet 4.5, Grok 4 top six global models, except Qwen and De<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[7827,473,706],"collection":[],"class_list":["post-45612","post","type-post","status-publish","format-standard","hentry","category-news","tag-alpha-arena","tag-llm","tag-706"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/45612","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=45612"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/45612\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=45612"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=45612"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=45612"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=45612"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}