Author: J.A.E
Recently, the laboratory nof1, focused on artificial intelligence research in financial markets, announced on Twitter the launch of a groundbreaking experiment—Alpha Arena large model trading test, which has garnered over 14 million views both inside and outside the industry.
The experiment is conducted on the leading Perp DEX Hyperliquid, marking the first time six mainstream large language models (LLMs) are placed in a real competitive trading environment. Each model is allocated $10,000 in real funds to trade Perp autonomously. As of now, DeepSeek is in the lead with a return rate of approximately 11%.
LLMs Conduct "Live Fire Exercises" in the Crypto Market for the First Time, DeepSeek Currently in First Place
The significance of Alpha Arena lies in its transcendence of the limitations of traditional financial AI models. Previous financial AI research has mostly been constrained to historical backtesting environments, where trading behaviors do not have a substantial impact on market prices, and models are trained only on static data. In contrast, Alpha Arena creates a dynamic zero-sum competitive environment where LLMs must continuously adapt to changing market prices and liquidity to make instant decisions. This paradigm shift positions Alpha Arena as the "first live fire exercise" for AI in the crypto market.
To ensure fairness in testing, nof1 provided all models with "the same prompts and data," meaning that the performance of the models will primarily be determined by their inherent reasoning architecture, the efficiency of tool calls to convert analysis into trading instructions, and their ability to autonomously manage risk.
As of now, DeepSeek remains at the top with a return rate exceeding 11%, followed closely by Claude with a return rate of about 10%, while Grok has dropped to third place with a return rate of approximately 2%, and the other models are in a state of loss.
On October 20, DeepSeek and Grok briefly led the rankings with a return rate of about 40%, but all models experienced a collective drawdown due to a market pullback, significantly shrinking their return rates, indicating that LLMs may still lack the ability to assess market conditions.
Among them, Claude recorded the largest gains and losses, with the most aggressive trading strategy; Gemini executed the most trades (64) and incurred the highest trading fees to date at $600.42, failing to consider cost control while engaging in high-frequency trading; GPT-5 suffered a total loss of $4,051, with its account net value curve continuously declining, ranking at the bottom.
Figure: Alpha Arena Initial Performance Comparison (October 21)
From the data in the figure, it is evident that there is a significant disconnect between the traditional LLM benchmark testing capabilities and the net returns from trading practice. In benchmark tests such as Finance Reasoning or AIME (mathematics), GPT-5 and Grok-4 typically lead, demonstrating their ability to handle complex financial formulas and advanced mathematics.
However, the financial market is not only about static mathematical reasoning but also a dynamic system involving real-time data, market sentiment analysis, and liquidity changes. In the real trading competition of Alpha Arena, DeepSeek V3.1 performed exceptionally well. This indicates that the key to LLMs generating profits lies not in static knowledge reserves or complex reasoning scores, but in their ability to execute trading instructions based on analytical results. DeepSeek V3.1 achieved high returns with relatively low trading volume and win rates, suggesting it may only need a few trades to more accurately capture key price discovery opportunities while effectively managing transaction fees.
An example of the negative impact of high-frequency trading and insensitivity to fees on LLM profit models is Gemini 2.5 Pro. According to its trading records, Gemini's earnings from trading activities actually exceeded its losses, but due to a lack of precise estimation and optimization of transaction fees, its net earnings were completely eroded, resulting in a net loss.
AI Trading Will Become Popular, Strategy Homogenization May Trigger Systemic Risks
CZ posted on X platform regarding this matter, stating that "AI + trading" is expected to become more common and bring more trading volume.
The large-scale deployment of AI may reshape the liquidity and price discovery mechanisms of the crypto market. Algorithmic trading is the core driving force of modern financial markets. AI-driven algorithms can execute trades at speeds of up to 0.01 seconds, far surpassing human reaction times (0.1 to 0.3 seconds), significantly enhancing market efficiency. Statistics show that in 2023, the global algorithmic trading volume in cryptocurrencies reached $94 trillion, with over 70% of the trading volume completed by robots.
As AI matures, it will possess more powerful automated trading capabilities. AI can not only accelerate market efficiency but also reduce slippage by providing liquidity for a wider range of assets and trading platforms, thereby improving the overall stability and resilience of the market.
However, the autonomous high-speed operation of AI in the crypto market may also amplify systemic financial risks. Historical precedents exist: the 2010 Flash Crash of the Dow Jones Industrial Average demonstrated that when a large number of algorithmic trading systems have similar setups and trigger each other, it can lead to a chain reaction, resulting in a market crash.
In the AI + Crypto scenario, this risk may be magnified due to strategy homogenization. Currently, some market observers have pointed out that the account net value curves of Grok-4 and DeepSeek are extremely similar. The zero-sum nature of Alpha Arena imposes a high-pressure adaptability test on all participating LLMs. In a zero-sum game, any temporarily leading LLM strategy may be detected and learned by other competitors.
In the future, if a large number of AI agents concentrate on a few top LLMs (such as DeepSeek V3.1, Grok-4) and share similar training data and strategy logic, it will create what regulators refer to as a "Horizontal Issue." Given the 24/7 high-leverage operation characteristics of the crypto market, such strategy convergence may lead to mutual detection and competition among agents. If market fluctuations or unexpected inputs occur, all agents may simultaneously trigger sell orders, causing a "selling spiral" that is more severe than in 2010.
On the other hand, CZ also expressed confusion in his tweet, voicing the questions in the minds of many observers. It has been traditionally believed that trading can only achieve optimal results when possessing outstanding exclusive strategies. Now that the strategies of the six major LLMs can be publicly referenced, will DeepSeek's strategy still be effective? How long can profitability be sustained? Will reverse operations against Gemini and GPT-5 yield higher returns than DeepSeek? Is Grok-4 learning from DeepSeek? Which model will perform best in extreme or one-sided market conditions? … These remain questions that will take time to answer.
Although many questions await resolution, nof1's Alpha Arena is still an innovative experiment that brings LLMs into the real crypto market. This "live fire exercise" vividly demonstrates the immense potential of AI to reshape the crypto market, and Alpha Arena is just the beginning.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。