A Latin American engineer, a laptop with 32 GB of RAM and no dedicated GPU, opens Ollama on the machine and downloads Qwen-2.5-Coder 14B. She tries it — asks for a Python function, then a refactor of an old class, then an explanation of a bug. It works. Not like Claude, not like GPT-4, but decent. It runs locally, offline, without sending anything anywhere.
In late 2024 and through 2025, this was the change the press underreported but the developer community noticed: Chinese open-source models became viable on serious user hardware, with good results. Powerful AI stopped being a synonym for "connecting to a big tech API."
Quick map of the Chinese players that matter
DeepSeek (subsidiary of High-Flyer, a quant hedge fund): the one that broke the silence in January 2025. Main models — V3 (general), R1 (reasoning), R1-Distill (smaller distilled versions). All with open weights and a technical paper. The most disruptive lab of 2025.
Qwen / Alibaba Cloud: the largest and most-used family on Hugging Face. Qwen-2.5 (general), Qwen-2.5-Coder (coding), Qwen-2.5-Max (frontier), Qwen-VL (vision), Qwen-Audio (audio). Strategy: compete with top-tier open source, monetize through Alibaba Cloud.
Moonshot AI — the Kimi model, very popular chatbot in China, large context window. Usable interface in English too.
MiniMax — video products (Hailuo) and text models. Strong in video generation.
Zhipu AI — GLM models, a Tsinghua University spin-off. Mostly used inside the Chinese enterprise ecosystem.
Baidu — Ernie, Baidu's chatbot, aimed at the domestic Chinese market.
It isn't one board. It's six or seven labs competing, and two of them (DeepSeek and Qwen) already touch the global technical frontier.
How you use them, practically
On the official cloud (free in most cases): chat.deepseek.com, chat.qwen.ai. Mobile app. Good for trying out.
Via API: deepseek.com for developers, dashscope.aliyun.com for Qwen. Per-token pricing noticeably lower than OpenAI or Anthropic. For high-volume scenarios where cost matters.
Local with open weights: download from Hugging Face, run with Ollama (easier) or LM Studio. DeepSeek-R1-Distill-Qwen-14B or Qwen-2.5-14B run acceptably on a laptop with 32 GB of RAM. Smaller versions (7B, 3B) run even on phones.
Through alternative cloud providers (Together, Fireworks, Groq): they host Chinese models on American hardware with good latency. A solution for companies that want the models but don't want to send data to China.
Honest comparison of capabilities
Coding: Qwen-2.5-Coder and DeepSeek-R1 competitive with Claude 3.5 Sonnet on benchmarks like HumanEval and MBPP (source: LMArena, Hugging Face model cards). On agentic multi-file tasks Claude still usually wins.
Math and reasoning: DeepSeek-R1 competes with OpenAI's o1 on AIME and MATH. One of the domains where Chinese AIs are at frontier level.
Long-form writing: Claude and ChatGPT stay ahead. DeepSeek and Qwen produce competent text but less cohesion in longer pieces.
Spanish: Claude, ChatGPT and Gemini take a clear lead. Chinese AIs prioritize English and Chinese — Spanish works but with idiomatic errors.
Multimodal: Qwen-VL is reasonably competitive on vision. Gemini still leads on evenly-balanced multimodal integration.
The geopolitical piece and the chips
This is the chapter that gives context to everything. Since 2022, Biden sanctions and later Trump restricted exports to China of top-tier NVIDIA H100 and A100 chips. The idea was to slow Chinese AI down. The partial result has been the opposite: DeepSeek and others trained competitive models on lesser hardware (H800, domestic chips), forced to innovate on training efficiency. DeepSeek-R1 declared training costs roughly 10x lower than estimates for GPT-o1. The industry debates whether those numbers are fully verifiable, but nobody argues about the order of magnitude.
An open question
If Chinese open-source models keep improving at the 2024-2025 pace, how much of the value you pay for today through Claude/ChatGPT/Gemini could be free and local in 18 months? If the topic pulls you in, continue with Chinese AI — DeepSeek, Qwen, and the other side of the race and with Open versus closed models.
Monday, January 27, 2025, opened weirdly on Wall Street. An app nobody had heard of the previous Friday was number one on the US App Store, above ChatGPT. It was called DeepSeek. It was Chinese. Free. And opening it, it answered — in a style very close to OpenAI's o1 — questions that until that day supposedly only American models could handle well.
At 9:30 a.m. the markets opened. Nvidia dropped. Dropped hard. By the closing bell it had lost around $600 billion in market cap (source: Bloomberg, January 27, 2025), the largest single-day absolute value drop for a single company in US stock market history. The shock wasn't just that a Chinese AI worked well. It was that a small lab had built it, working with restricted chips, and at a declared cost far below what the industry assumed was needed.
That's how Chinese AI entered the mass global conversation.
The two you can use today without installing anything
DeepSeek — chat.deepseek.com. Free, email registration. Two main models: V3 (general, like ChatGPT) and R1 (reasoning, shows the thinking steps like o1). Especially strong in math, logic, and coding.
Qwen — chat.qwen.ai. Alibaba's model — Alibaba is the Chinese Amazon. Also free, also with several models: general Qwen, Qwen-Coder for programming, Qwen-VL for image analysis. Has the largest open-source ecosystem in the world on Hugging Face.
Both have mobile apps on the App Store and Play Store.
What to know before trusting them
- Censorship: questions about Tiananmen, Taiwan, Xi Jinping, or criticism of the Chinese government get evasive answers or no answer. Not a bug — product policy.
- Language: they work well in English and Chinese. In Spanish they're okay, but noticeably below Claude, ChatGPT or Gemini. If your work is in Spanish, keep them as a secondary tool, not a default.
- Privacy: data you send to the official chat may fall under Chinese jurisdiction. For work with sensitive info (clients, finance, internal docs), they aren't the prudent choice.
Three things to take with you
- DeepSeek and Qwen are the first case where non-American AIs entered the technical frontier group photo. Worth trying, if only to know.
- For professional work in Spanish with sensitive data, Claude, ChatGPT or Gemini are still the safer bets. The Chinese ones are complements, not defaults.
- If you care about reasoning or coding and English is fine, DeepSeek-R1 is the cheapest free preview of what a year ago cost $20/month in ChatGPT Plus.
A Latin American engineer, a laptop with 32 GB of RAM and no dedicated GPU, opens Ollama on the machine and downloads Qwen-2.5-Coder 14B. She tries it — asks for a Python function, then a refactor of an old class, then an explanation of a bug. It works. Not like Claude, not like GPT-4, but decent. It runs locally, offline, without sending anything anywhere.
In late 2024 and through 2025, this was the change the press underreported but the developer community noticed: Chinese open-source models became viable on serious user hardware, with good results. Powerful AI stopped being a synonym for "connecting to a big tech API."
Quick map of the Chinese players that matter
DeepSeek (subsidiary of High-Flyer, a quant hedge fund): the one that broke the silence in January 2025. Main models — V3 (general), R1 (reasoning), R1-Distill (smaller distilled versions). All with open weights and a technical paper. The most disruptive lab of 2025.
Qwen / Alibaba Cloud: the largest and most-used family on Hugging Face. Qwen-2.5 (general), Qwen-2.5-Coder (coding), Qwen-2.5-Max (frontier), Qwen-VL (vision), Qwen-Audio (audio). Strategy: compete with top-tier open source, monetize through Alibaba Cloud.
Moonshot AI — the Kimi model, very popular chatbot in China, large context window. Usable interface in English too.
MiniMax — video products (Hailuo) and text models. Strong in video generation.
Zhipu AI — GLM models, a Tsinghua University spin-off. Mostly used inside the Chinese enterprise ecosystem.
Baidu — Ernie, Baidu's chatbot, aimed at the domestic Chinese market.
It isn't one board. It's six or seven labs competing, and two of them (DeepSeek and Qwen) already touch the global technical frontier.
How you use them, practically
On the official cloud (free in most cases): chat.deepseek.com, chat.qwen.ai. Mobile app. Good for trying out.
Via API: deepseek.com for developers, dashscope.aliyun.com for Qwen. Per-token pricing noticeably lower than OpenAI or Anthropic. For high-volume scenarios where cost matters.
Local with open weights: download from Hugging Face, run with Ollama (easier) or LM Studio. DeepSeek-R1-Distill-Qwen-14B or Qwen-2.5-14B run acceptably on a laptop with 32 GB of RAM. Smaller versions (7B, 3B) run even on phones.
Through alternative cloud providers (Together, Fireworks, Groq): they host Chinese models on American hardware with good latency. A solution for companies that want the models but don't want to send data to China.
Honest comparison of capabilities
Coding: Qwen-2.5-Coder and DeepSeek-R1 competitive with Claude 3.5 Sonnet on benchmarks like HumanEval and MBPP (source: LMArena, Hugging Face model cards). On agentic multi-file tasks Claude still usually wins.
Math and reasoning: DeepSeek-R1 competes with OpenAI's o1 on AIME and MATH. One of the domains where Chinese AIs are at frontier level.
Long-form writing: Claude and ChatGPT stay ahead. DeepSeek and Qwen produce competent text but less cohesion in longer pieces.
Spanish: Claude, ChatGPT and Gemini take a clear lead. Chinese AIs prioritize English and Chinese — Spanish works but with idiomatic errors.
Multimodal: Qwen-VL is reasonably competitive on vision. Gemini still leads on evenly-balanced multimodal integration.
The geopolitical piece and the chips
This is the chapter that gives context to everything. Since 2022, Biden sanctions and later Trump restricted exports to China of top-tier NVIDIA H100 and A100 chips. The idea was to slow Chinese AI down. The partial result has been the opposite: DeepSeek and others trained competitive models on lesser hardware (H800, domestic chips), forced to innovate on training efficiency. DeepSeek-R1 declared training costs roughly 10x lower than estimates for GPT-o1. The industry debates whether those numbers are fully verifiable, but nobody argues about the order of magnitude.
An open question
If Chinese open-source models keep improving at the 2024-2025 pace, how much of the value you pay for today through Claude/ChatGPT/Gemini could be free and local in 18 months? If the topic pulls you in, continue with Chinese AI — DeepSeek, Qwen, and the other side of the race and with Open versus closed models.
On January 20, 2025, DeepSeek published the DeepSeek-R1 paper along with open weights for the model. The document described a surprisingly direct recipe: start from DeepSeek-V3 (a mixture-of-experts base model, 671B total parameters, ~37B activated per token), apply pure reinforcement learning with verifiable-output rewards (math, code), then distill the behaviors to smaller models. The declared total training cost for the final phase was USD 5.576 million. If accurate, that's an order of magnitude below estimates for GPT-o1 (reported above $100M by analysts like The Information and Epoch AI).
The technical community took it with healthy skepticism. Independent audits can't verify the declared cost — DeepSeek didn't publish detailed training logs. Analysts like SemiAnalysis estimated real cost, including infrastructure, people, and prior experiments, at $1B+ — well below American competitors but well above the single number that circulated in the press.
Even with that caveat, the technical lesson holds: the reasoning frontier can be reached with less compute than previously assumed, given the right architecture (MoE + multi-head latent attention) and training method (pure RL with GRPO instead of PPO, dropping the critic model).
Architecture that matters
Multi-head Latent Attention (MLA): DeepSeek innovation that compresses keys and values into a smaller latent space, dramatically reducing memory use during inference. It allows sustaining long contexts on less hardware.
Deep mixture-of-experts (256 experts in V3): activates only a fraction (~37B of 671B) per token, keeping inference compute manageable while the total model capacity is large.
GRPO (Group Relative Policy Optimization): variant of PPO without a critic model, designed specifically for tasks with verifiable rewards. Saves compute and memory in RL.
Reasoning via R1-Zero → R1: first trained with pure RL, no supervised fine-tuning (R1-Zero), found long chain-of-thought patterns emerging spontaneously, then added a short SFT layer for readability (R1).
These pieces aren't unreleased breakthroughs — MoE is old, RL with verifiable rewards is old, reasoning LLMs are the o1 paradigm — but the specific combination and execution on restricted chips is what made the moment.
Qwen: the open-source volume play
While DeepSeek won the press cycle, Alibaba's Qwen won the quiet adoption cycle. On Hugging Face, the Qwen family is consistently among the most downloaded open-weight models. Qwen-2.5, Qwen-2.5-Coder, Qwen-2.5-VL, Qwen-Audio — they cover text, code, vision, audio with published weights.
Alibaba's strategy is clear: compete on the open-source frontier, grow an ecosystem (fine-tunes, quantizations, integrations), and monetize on top with Alibaba Cloud Model Studio and Alibaba Cloud. The same playbook Microsoft ran with Linux — don't fight open, embrace it and sell the infrastructure.
Qwen-2.5-Max (late 2024) went head-to-head with GPT-4o and Claude 3.5 Sonnet on several benchmarks. Qwen-2.5-Coder-32B became the go-to open coding model, fine-tuned by the community for hundreds of vertical use cases.
Geopolitics: the restrictions that accelerated efficiency
The most interesting thesis of 2025 is that tech sanctions produced the opposite effect from the one intended. Banning H100 and A100 exports to China in 2022-2024 forced Chinese labs to train on H800 (with interconnect-bandwidth cuts) and domestic chips (Huawei Ascend). The result was systemic focus on efficiency: architectures like MLA and deep MoE emerged because they were the ways to squeeze more juice out of less compute.
Meanwhile, OpenAI, Anthropic and Google operated without chip restrictions and could assume "scaling solves almost everything." That comfort turned into a competitive disadvantage when the technical ceiling was reached with less.
Dario Amodei (Anthropic) responded to the DeepSeek moment with an essay arguing the paper doesn't invalidate the scaling curve — R1 still wins with more compute, like all models. What changed isn't that "compute no longer matters": it's that the compute needed to enter the frontier dropped. That's a different story, less dramatic than the headlines but strategically more relevant.
Real limitations for the Latin American professional user
- Political censorship: not evasive, it's categorical on Chinese-government topics. If your work brushes geopolitics, journalism, Asian market analysis, it's a product limitation.
- Spanish quality: acceptable but with idiomatic and cultural errors more frequent than in Claude/GPT/Gemini. The training data is dominated by English and Chinese.
- Output safety: less worked through than in models aligned with constitutional AI or enterprise-grade RLHF. More prone to producing inappropriate outputs in edge cases.
- Privacy and jurisdiction: data sent to official Chinese APIs potentially falls under Chinese jurisdiction. For regulated enterprises, compliance is an open question corporate legal departments are only beginning to answer.
- Latency: for Latin American users, latency from chat.deepseek.com or chat.qwen.ai can be higher than Claude/ChatGPT (which have global edge presence). Mitigated by using Together, Fireworks, Groq or other providers hosting Chinese models on American infrastructure.
Editorial thesis: the end of the compute moat
The official story of 2022-2024 was: "the moat of frontier labs is compute — nobody else has the H100s or the capital." The real story of 2025 is: compute helps, but it isn't enough of a moat. DeepSeek showed that a relatively small team, with hardware restrictions, reaches the frontier with a better recipe. That's a strategic piece of news the industry is still digesting.
What does it mean for a professional in Buenos Aires, São Paulo or Mexico City? Three concrete things:
One, per-token cost will keep falling. Chinese APIs compete on price, American ones have to respond. For high-volume use cases (customer service, mass analysis, content generation at scale), the cost of AI stops being the constraint.
Two, local models become a real option. For privacy or offline tasks, being able to run a decent model on your own laptop changes the math. It doesn't replace Claude for critical tasks, but it covers 70% of daily use for free.
Three, dependence on a single lab decreases. The professional stack for the next year isn't "Claude or ChatGPT" — it's "Claude or ChatGPT as default, DeepSeek/Qwen as a high-volume alternative, local as a privacy option." That three-layer architecture is more robust and cheaper than 2023's dependency.
For the Spanish speaker producing serious professional work, the 2026 recommendation stays Claude, ChatGPT or Gemini as primary horses. But knowing that a free open-weight version of technical reasoning at o1 level exists, works, and is free changes the conversation about how wide the American labs' moat really was. It wasn't that wide. 2025 proved it.