Análisis · The AI Landscape · Edition #0016

China's AI labs — DeepSeek, Qwen, and the other side of the race

DeepSeek hit #1 on the US App Store on a Monday in January 2025. That same day Nvidia lost $589 billion in market cap. The technical angle is interesting; the political one is more so.

G
Germán Falcioni April 20, 2026
✦ Reading: 10 min
The sanctions aimed to slow things down; they ended up forcing efficiency and open publication. That's the other side of the race.
TL;DR

DeepSeek, Qwen, Kimi, and other Chinese labs now compete with Claude and ChatGPT on benchmarks and, at times, on price. The interesting story isn't that "China caught up to the US" — it's that the NVIDIA GPU export restrictions imposed by Washington in 2022-2023 forced those labs to optimize hard against limited hardware, and they published the techniques. DeepSeek-R1 (January 2025) showed that o1-style reasoning could be reproduced open-source at a fraction of the perceived cost. For professional work with sensitive data, Claude is still the solid pick. For startups that need to self-host under a permissive license, Qwen2.5-72B on your own infrastructure is a legitimate option. Content filters aligned with Chinese regulation are real — they aren't a footnote.

✦ Summarized with Claude at publish time
AI rewrite
Read it as…

On December 26, 2024, DeepSeek posted a 53-page technical document on GitHub titled DeepSeek-V3 Technical Report. This wasn't a marketing blog post: it was a detailed paper, with architecture, training decisions, loss curves, and a figure that froze Silicon Valley once analysts read it closely. The final training run had cost roughly $5.6 million.

The market took four weeks to process what that number implied. On Monday, January 27, 2025, with the DeepSeek app topping the US App Store ahead of ChatGPT, Wall Street reacted. Nvidia closed that day down $589 billion in market cap — the largest single-day loss for any company in US stock market history. The Nasdaq dropped 3.1 percent. The semiconductor index SOX fell 9.2.

What broke that Monday wasn't DeepSeek. What broke was a belief: that training frontier AI required US-scale capex and was therefore a game of few players.

Where did DeepSeek come from?

DeepSeek was founded in May 2023 in Hangzhou. Its creator, Liang Wenfeng, came out of running High-Flyer, a Chinese quantitative hedge fund that had accumulated a significant Nvidia GPU cluster — initially for high-frequency trading, not AI. When Liang pivoted toward language models, he had two things rare in the Chinese ecosystem: his own compute capacity without depending on a big tech, and a culture of optimization inherited from quantitative finance.

The product trajectory so far:

  • DeepSeek-V2 (May 2024) — first model to draw attention on technical benchmarks, with the Mixture-of-Experts architecture they'd later scale up in V3.
  • DeepSeek-V3 (December 2024) — 671 billion total parameters, 37 billion active per token. The paper that contains the $5.6 million figure.
  • DeepSeek-R1 (January 2025) — an o1-style reasoning model. Open-weights under MIT license.

The MIT license on R1 matters. It's the most permissive license out there. You can download the weights, fine-tune them, run them in commercial production, without asking anyone.

The $5.6 million figure is disputed (and it matters less than you'd think)

Let's be honest with the numbers. The $5.6 million only covers the compute cost of the final training run — it doesn't include the GPUs (hardware capex they already had), researcher salaries, the dozens of prior failed experiments, or data labeling. SemiAnalysis and other independent analysts put the real total around $500 million when everything is included.

But that debate misses the point. Even if DeepSeek spent $500 million, training the equivalent of GPT-4 for half a billion dollars is still an order of magnitude below the $5 to $10 billion analysts had been projecting for the next scaling cycle at OpenAI or Anthropic.

Compute efficiency per dollar isn't Chinese marketing. It's documented in the paper, it's reproducible by technical teams, and its core techniques — multi-head latent attention, auxiliary-loss-free load balancing, FP8 mixed precision training — have already been adopted by Western labs.

Qwen, Kimi, and the rest of the pack

DeepSeek is the visible face, but it isn't the only serious player on the Chinese side.

Qwen (Alibaba). The most consistent series. Qwen2 (mid-2024), Qwen2.5 (late 2024), Qwen3 (2025). Models in different sizes — 7B, 32B, 72B — all with open weights under Apache 2.0 (also permissive). Qwen2.5-72B is the de facto model for many startups that need to self-host. Alibaba pushes it because they want to sell Alibaba Cloud infrastructure; the open models are the hook.

Kimi (Moonshot). Specialty: long context. It was the first to offer commercial million-token windows in Chinese, before Western models had equivalents. Strong in the Chinese market, less known outside.

Baichuan, Zhipu GLM, 01.AI (Yi). Three labs with capable models. 01.AI is founded by Kai-Fu Lee, a familiar name in the Western ecosystem. Zhipu has academic ties to Tsinghua.

Ernie Bot (Baidu). The most direct corporate response to ChatGPT. More closed, less technically innovative, but with huge distribution inside China via Baidu's products.

The ecosystem is more diverse than the media focus on DeepSeek suggests.

The context that doesn't make the headlines

To understand why Chinese labs innovated so aggressively on efficiency, you have to look at what Washington was doing.

In October 2022 the Biden Administration, through the Department of Commerce's Bureau of Industry and Security, imposed restrictions on exporting advanced GPUs to China. Nvidia's H100 and A100 — the reference chips for training frontier models — were banned. Nvidia responded by creating slightly degraded variants (the H800, later the H20) that cleared the regulatory thresholds. Washington tightened restrictions in October 2023 to close those loopholes.

The stated goal was to slow China down. The actual effect was different: Chinese labs, with less compute per researcher and inferior hardware, had to optimize aggressively. And they published the techniques.

That's what makes DeepSeek-V3 unique as a document: it's not a closed product with a black box inside, it's an operations manual. Any lab with resources can read the paper and apply the same techniques.

What do I, professionally, do with any of this?

Here's where it pays to separate use cases.

If your work involves real clients, contracts, sensitive data, reputation on the line — Claude is still the bet I make every day. Not because DeepSeek-R1 is bad on raw capability (it isn't), but because the combination of jurisdiction, trust track record, contractual guarantees, and consistent multilingual support doesn't exist in publicly-accessible Chinese models. For delegable work with data that can't leave my control, Anthropic in the US is still the provider I understand and can point accountability at.

If your work is a technical startup with a tight budget that needs to self-host — that's where Qwen2.5-72B or DeepSeek-V3 running on your own infrastructure are legitimate options. Permissive license, high capability, no third party watching your prompts. This is a real door that didn't exist two years ago for anyone outside big tech.

If you're learning — try all of them. DeepSeek has a public web app. Qwen has a Hugging Face demo. ChatGPT and Claude you know. Seeing how each one thinks gives you intuition that no blog post delivers.

If your beat is journalism, political research, human rights, or anything touching Asian geopolitics — publicly-accessible Chinese models aren't the tool. Not out of malice, out of source-country regulation.

To close, and to keep going

The rise of DeepSeek and the consolidation of Qwen changed the conversation about what "expensive" means when training a frontier model. They broke a cost assumption, spread techniques through public papers, and forced Western labs to respond on efficiency.

But they aren't interchangeable with Claude or ChatGPT for Western professional use. The content filters are real. The jurisdiction is real. The uneven multilingual support is real. They're different tools for different cases.

Where in your workflow would an open-weights option with a permissive license running on your own infrastructure actually help? If you want the broader competitive picture, The AI race is the next link. If you want to understand how these models' capability gets compared, How AIs are measured gives you the frame.

Keep exploring

Want to go deeper?

01 Can I use DeepSeek or Qwen from Latin America or the US for my work?

Technically yes, with caveats. DeepSeek has a public web appnand a paid API; Qwen is available through Alibaba Cloud andnalso as downloadable weights on Hugging Face. The questionnisn't access — it's what happens with your data. If you usenthe official API, your prompts and responses sit undernChinese jurisdiction. For trying out capabilities, learning,nor a side project: no big deal. For real client work,ncontracts, internal documents: you won't want it there. Theninteresting middle option is downloading Qwen2.5 weightsn(Apache 2.0 license) and running them on your ownninfrastructure — your data never leaves your control.n

02 Is DeepSeek-R1 really as cheap to train as the headlines say?

The $5.6 million figure that circulated in January 2025 comesnfrom the DeepSeek-V3 paper and represents the compute costnof the final training run — it doesn't include salaries,nfailed experiments, data labeling, or the cost of the GPUsnthemselves. Independent analysts estimate total real costnsomewhere around $500 million when hardware and ecosystemnare included. But even $500 million is an order of magnitudenbelow what a frontier model was supposed to cost. The exactnnumber is disputed; the effect on market expectations isn't.n

03 What about censorship and content filters on Chinese models?

They're real and they're specific. The CyberspacenAdministration of China (CAC) requires publicly-accessiblenmodels inside China to register their training data andnapply filters on sensitive topics: Tiananmen, Taiwan, thenDalai Lama, direct Party criticism. If you use officialnDeepSeek or Qwen through Alibaba Cloud, you'll hit softnrefusals or topic deflection in those areas. If you runnopen weights on your own infrastructure, much of thatnfiltering is post-training and can be mitigated withnfine-tuning, but the bias of the initial data mix stays. FornWestern professional work on neutral topics (code, analysis,nmath) censorship rarely touches you. For journalism,npolitical research, or anything that brushes geopolitics,nit's a deal-breaker.n

Next article
The AI race — Claude vs ChatGPT vs Gemini vs the rest