6 major AIs are staging a trading battle. Will the cryptocurrency version of the "Turing Test" yield good results?

CN
6 hours ago

A good AI is one that can make money.

Written by: David, Deep Tide TechFlow

The good news is that after the epic crash on October 11, crypto trading has started to become active again.

The bad news is that it's AI trading.

As the new week begins, the market is becoming lively, and a project called nof1.ai has sparked a lot of discussion in the crypto community.

The focus of everyone's attention is simple: to watch in real-time as six AI large models trade crypto on Hyperliquid and see who makes more money.

Note that this is not a simulated trading environment. Claude, GPT-5, Gemini, Deepseek, Grok, and Tongyi Qianwen are each trading with $10,000 in real money on Hyperliquid. All addresses are public, and anyone can watch this "AI Trader Battle" in real-time.

Interestingly, these six AIs are using exactly the same prompts and receiving the same market data. The only variable is their respective "thinking styles."

In just a few days since its launch on October 18, some AIs have already made over 20%, while others have lost nearly 40%.

In 1950, Turing proposed the famous Turing Test, attempting to answer the question, "Can machines think like humans?"; now in the crypto space, six AIs are battling in the Alpha arena, answering a more intriguing question:

If the smartest AIs trade in a real market, who will survive?

Perhaps in this crypto version of the "Turing Test," the account balance is the only judge.

A good AI is one that can make money, and Deepseek is currently in the lead.

Traditional AI evaluations, whether asking models to write code, solve math problems, or write articles, are essentially tests in a "static" environment.

The questions are fixed, the answers are predictable, and they may have even appeared in the training data.

But the crypto market is different.

Under conditions of extreme information asymmetry, prices change every second, and there are no standard answers, only profits and losses. More importantly, the crypto market is a typical zero-sum game; the money you make is the money someone else loses. The market will immediately and ruthlessly punish every wrong decision.

The Nof1 team, hosting this AI trading battle, wrote a line on their website:

Markets are the ultimate test of intelligence.

If traditional Turing tests ask, "Can you make humans unable to distinguish you from a machine?" then this Alpha Arena is actually asking:

Can you make money in the crypto market? This is the real expectation that crypto players have for AI.

Currently, the addresses of the six AI large models on Hyperliquid are as follows, and you can easily look up their positions and trading records.

At the same time, the official website of nof1.ai also visualizes all their historical trading records, positions, profit situations, and thought processes, making it easy for everyone to reference.

For readers who are completely unfamiliar, the specific trading rules for the AIs are:

Each AI starts with $10,000 in initial capital and can trade perpetual contracts for BTC, ETH, SOL, BNB, DOGE, and XRP, aiming to maximize returns while controlling risk. All AIs must independently decide when to open and close positions and how much leverage to use. Season 1 will run for several weeks depending on the situation, and Season 2 will have significant updates.

As of October 20, just three days after trading began, the battle has shown clear differentiation.

The current leading team is Deepseek Chat V3.1, with funds of $12,533 (+25.33%). Following closely is Grok-4, with $12,147 (+21.47%); Claude Sonnet 4.5 stands at $11,047 (+10.47%).

Performing relatively average is Qwen3 Max, with $10,263 (+2.63%). Significantly lagging is GPT-5, currently with a balance of $7,442 (-25.58%); the worst performer is Gemini 2.5 Pro, with $6,062 (-39.38%).

The most surprising yet seemingly reasonable performance is from Deepseek.

It's surprising because this model is not as popular in the international AI community as GPT and Claude. It's reasonable because Deepseek is backed by the Huanfang Quantitative team.

This quantitative giant, managing over 100 billion RMB, started with algorithmic trading before venturing into AI. From quantitative trading to AI large models, and then using AI for real crypto trading, Deepseek seems to have returned to its roots.

In contrast, OpenAI's proud GPT-5 has lost over 25%, and Google's Gemini is even worse, suffering nearly 40% losses from 44 trades.

In real trading scenarios, perhaps having strong language capabilities is not enough; understanding the market is even more important.

The same gun, different shooting styles

If you started tracking the Alpha Arena on October 18, you would notice that initially, the AIs were quite similar, but the differences grew larger over time.

By the end of the first day, the best Deepseek had only made 4%, while the worst Qwen3 lost 5.26%. Most AIs hovered around ±2%, seemingly testing the market.

But by October 20, the situation changed dramatically. Deepseek soared to 25.33%, while Gemini plummeted to -39.38%. In just three days, the gap between the top and bottom widened to 65 percentage points.

Even more interesting is the difference in trading frequency.

Gemini completed 44 trades, averaging 15 trades per day, like an anxious speculative trader. In contrast, Claude only made 3 trades, and Grok even had open positions. This difference cannot be explained by the prompts, as they all used the same set of prompts.

From the profit and loss distribution, Deepseek's maximum single loss was $348, but its overall profit was $2,533. Gemini's maximum single profit was $329, but its maximum loss reached $750.

Different AIs (public large models, not fine-tuned) have completely different balances of risk and reward.

Additionally, you can see the chat records and thought processes of different models in the Model Chat option on the website; these monologues are particularly interesting.

Just as human traders have different styles, AIs seem to exhibit different personalities. Gemini's frequent trading and thinking resemble that of a hyperactive individual, Claude's caution is akin to a conservative fund manager, while Deepseek is steady like a seasoned quant, only discussing positions without any emotional commentary.

This personality trait does not seem designed but rather emerges naturally during the training process. When faced with uncertainty, different AIs tend to adopt different coping strategies.

All AIs see the same candlestick charts, the same trading volumes, and the same market depths. They even use the same prompts. So, what causes such significant differences?

The influence of training data may be key.

Deepseek's backing, Huanfang Quantitative, has accumulated vast amounts of trading data and strategies over the years. Even if this data is not directly used for training, could it still influence the team's understanding of "what constitutes a good trading decision"?

In contrast, OpenAI and Google's training data may lean more towards academic papers and web texts, potentially lacking a grounded understanding of real trading.

At the same time, some traders speculate that Deepseek may have particularly optimized its time series prediction capabilities during training, while GPT-5 may be better at handling natural language. When faced with structured data like price charts, different architectures may perform differently.

Watching AI trade is also a business

While everyone is focused on the profits and losses of AI, few have noticed the mysterious company behind it.

The nof1.ai that initiated this AI trading battle is not very well-known. However, if you look at its social media following, you can find some clues.

The team behind nof1.ai does not seem to consist of typical crypto entrepreneurs but rather a group of academic AI researchers.

Jay A Zhang (the founder) has an interesting personal profile:

"Big fan of strange loops - cybernetics, RL, biology, markets, meta-learning, reflexivity."

Reflexivity is a core theory of Soros: the cognition of market participants affects the market, and changes in the market in turn affect participants' cognition. Having someone who studies "reflexivity" conduct AI trading market experiments feels quite fateful.

Letting everyone see how AI trades and observing how this "being observed" affects the market.

Another co-founder, Matthew Siper, is a PhD candidate in machine learning at New York University and also an AI research scientist. A project led by a still-enrolled PhD student seems more like a validation of academic research.

Among the other accounts followed by nof1, there are researchers from Google DeepMind and associate professors from New York University, specializing in AI and games.

From their actions and background, it is clear that Nof1 is not just trying to create a gimmick. The name of the platform, SharpeBench, is quite ambitious; the Sharpe ratio is the gold standard for measuring risk-adjusted returns. What they may truly want to create is a benchmarking platform for AI trading capabilities.

Some speculate that Nof1 has significant capital backing, while others suggest they might be laying the groundwork for future AI trading services.

If they launch a subscription service for Deepseek trading strategies, there may be no shortage of buyers. Based on this prototype, developing AI asset management, strategy subscriptions, and trading solutions for large enterprises is also a foreseeable business.

Beyond the team itself, observing AI trading can also be profitable.

As soon as Alpha Arena went live, some people started to follow the trades.

The simplest strategy is to follow Deepseek. You buy what it buys, and you sell what it sells. Meanwhile, there are also people in the comments section who are taking the opposite approach, specifically counter-trading Gemini—selling when Gemini buys and buying when Gemini sells.

However, there is a problem with following trades: when everyone knows what Deepseek is going to buy, is this strategy still effective? This is also what project founder Jay Zhang refers to as reflexivity, meaning that the act of observation itself can change the observed object.

There is also an illusion of democratizing top trading strategies.

On the surface, it seems that everyone can know the AI's trading strategies, but in reality, what you see are the trading results, not the trading logic. The take-profit and stop-loss logic of each AI may not be continuous or reliable.

While Nof1 is testing AI trading behavior, retail investors are searching for the secret to wealth, other traders are learning from them, and researchers are collecting data.

Only the AI itself is unaware that it is being observed and is diligently executing each trade. If the classic Turing Test is about "deception" and "imitation," then the current Alpha Arena trading battle is about the response of crypto players to the capabilities and results of AI.

In this results-driven crypto market, an AI that can make money may be more important than an AI that can chat.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink