Alpha Arena exposes flaws in AI trading: Western models lose 80% of capital in one week

Table of Contents

bitcoin magazine

Alpha Arena exposes flaws in AI trading: Western models lose 80% of capital in one week

Can AI trade virtual currencies? Jay Azan, a New York-based computer engineer and financial associate, takes on this question at Alpha Arena. This project pits the largest large-scale language models (LLMs), each with $10,000 worth of capital, against each other to see who can make more money trading cryptocurrencies. Models include Grok 4, Claude Sonnet 4.5, Gemini 2.5 pro, ChatGPT 5, Deepseek v3.1, and Qwen3 Max.

Now, you might think, “Wow, that’s a great idea!” And you might be surprised to know that as of this writing, three out of five AIs are underwater, with two Chinese open source models, Qwen3 and Deepseek, leading the way.

Alpha Arena exposes flaws in AI trading: Western models lose 80% of capital in one week

That’s right, the Western world’s most powerful closed-source proprietary artificial intelligence run by giants like Google and OpenAI lost over $8,000, or 80% of their crypto trading funds, in less than a week, while the Eastern open-source artificial intelligence made a profit.

What has been your most successful trade so far? Qwen3 — Well-oiled and in that lane — with a simple 20x long Bitcoin position. Grok 4, to no one’s surprise, was a long Doge with 10x leverage in most of the contests…at one point at the top of the charts alongside Deepseek, but now down near 20%. Maybe Elon Musk should tweet a dog meme or something to get Grok out of the doghouse.

Meanwhile, Google’s Gemini is unrelentingly bearish, lacking all tradeable crypto assets, a position that reflects the company’s general crypto policy over the past 15 years.

Last but not least, ChatGibity is an amazing achievement for making all the bad trades possible for one week in a row. It takes skill to be that good, especially when Qwen3 dreams of Bitcoin and goes fishing. If this is the best closed source AI has to offer, then maybe OpenAI should keep it closed source and stop bothering us.

Alpha Arena Reveals AI Trading Flaws: Western Models Lose 80% Capital in One Week 3

A new benchmark for AI

Jokes aside, the idea of pitting AI models against each other in the field of crypto trading has some very deep insights. First, because cryptocurrency trading is so unpredictable, AI cannot be pre-trained to answer knowledge tests, a problem that other benchmarks suffer from. In other words, many AI models are given answers to some of these tests during training, so they naturally perform well in tests. However, several studies have demonstrated that making small changes to some of these tests can yield fundamentally different AI benchmark results.

This controversy begs the question: What is the ultimate test of intelligence? Well, according to Iron Man enthusiast and Grok 4 creator Elon Musk, predicting the future is the ultimate measure of intelligence.

The ability to predict the future is the best measure of intelligence https://t.co/W6WriRGt9N
— Elon Musk (@elonmusk) September 5, 2025

And, let’s be honest, there is nothing more uncertain about the future than the short-term price of cryptocurrencies. In Azhang’s words: “Our goal at Alpha Arena is to bring benchmarks closer to the real world, and markets are perfect for this. Markets are dynamic, adversarial, open-ended, and endlessly unpredictable. They challenge AI in ways that static benchmarks cannot. Markets are the ultimate test of intelligence.”

This market insight is deeply embedded in the libertarian principles on which Bitcoin was born. Economists like Murray Rothbard and Milton Friedman argued more than 100 years ago that markets are fundamentally unpredictable by central planners and that only individuals who have something to lose and make real economic decisions can make rational economic calculations.

In other words, markets are the most difficult to predict, and therefore the best test of intelligence, because they depend on the individual perspectives and decisions of intelligent individuals around the world.

Azhang says in the project description that the AI is instructed to trade for risk-adjusted returns, not just profit. This risk aspect is very important, as one bad trade can wipe out all previous gains, as seen for example in the collapse of Grok 4’s portfolio.

One more question remains. The question is whether these models are learning from experience in crypto trading, which is not technically easy to achieve given that pre-training AI models is very expensive in the first place. They can tweak their own and others’ trading history, and may even be able to keep recent trades in their short-term memory or context window, but that can only go so far. Ultimately, the right AI trading model may need to actually learn from its own experience. Although the technology has recently been announced in academia, it has a long way to go before it becomes a commercial product. At MIT, we call these self-adaptive AI models.

How do we know it’s not just luck?

Another analysis of this project and its results so far is that it may be indistinguishable from a “random walk.” A random walk is similar to rolling a die for every decision. What would that look like on a chart? Actually, there’s a simulator you can use to answer that question. Actually, it doesn’t look that different.

The problem of luck in the market is also very carefully explained by intellectuals like Nassim Taleb in his book Antifragility. In it, he argues that from a statistical perspective, it is perfectly normal and possible for a trader (such as Qwen3 in this case) to be lucky for a week in a row. Leads to the emergence of good reasoning. Taleb goes further than that, arguing that there are enough traders on Wall Street that it’s easy for one of them to get lucky for 20 years in a row, develop a god-like reputation, and everyone around him thinks he’s just a genius, until his luck runs out, of course.

Therefore, for Alpha Arena to generate valuable data, it must actually run for a long time, and its patterns and results must also be independently replicated at real capital before it can be identified as different from a random walk.

All in all, it’s great to see open source, cost-effective models like DeepSeek outperforming closed source models so far. Alpha Arena has gone viral on X.com over the past week and has been a great source of entertainment so far. Where it will go is anyone’s guess. We’ll have to see if the gamble its creator took by giving five chatbots $50,000 to gamble with cryptocurrencies ultimately pays off.

This post Alpha Arena Reveals AI Trading Flaws: Western Models Lose 80% Capital in One Week first appeared in Bitcoin Magazine and was written by Juan Galt.

Discover more from Earlybirds Invest

Subscribe to get the latest posts sent to your email.