Meta and Llama — the quietest AI rollout in the world

TL;DR

Meta built the most distinctive strategy in the sector. On one side, it opened the weights of its Llama models to the world — Llama 2 in July 2023, Llama 3 in April 2024, Llama 4 in April 2025 — and effectively founded the modern open-weights community. On the other, it bolted Meta AI inside WhatsApp, Instagram, and Facebook: over 500 million monthly active users interact with it without thinking about it. The real strength is open-weights: you can host Llama 3.3 70B on reasonable hardware, pay nothing per token, and keep your data inside your own infrastructure. For companies with privacy requirements, that's gold. The real weakness: for general users, Meta AI is good enough but not better than Claude or ChatGPT on complex tasks. Distribution through WhatsApp is its biggest edge, not model quality.

✦ Summarized with Claude at publish time

✦ AI rewrite

Read it as…

In late February 2023, a researcher at Meta AI Research — FAIR — published a paper titled "LLaMA: Open and Efficient Foundation Language Models." Access to the model was restricted: academic researchers had to fill out a form and sign a use agreement. Five days after publication, someone leaked the weights on 4chan. A BitTorrent link made its way through Twitter in under twenty-four hours.

Inside Meta, the initial reaction was what you'd expect — lawyers on alert, security teams running their playbook. But in the weeks that followed something unusual happened. The technical community started publishing fine-tunes, optimizations, versions running on ordinary laptops. Alpaca appeared, Vicuna appeared, a long list of derivatives followed. An ecosystem bloomed without Meta spending a dollar on evangelism.

In July 2023, Meta made the call that redefined its strategy. It shipped Llama 2 under a permissive license with commercial rights included. It wasn't an apology for the leak. It was an acknowledgment: the accident had revealed where the real moat actually sat.

That acknowledgment is the key to understanding why Meta competes so differently from Anthropic and OpenAI.

Yann LeCun's thesis

Meta's Chief AI Scientist is Yann LeCun, 2018 Turing Award recipient for his foundational work on convolutional networks. LeCun has been the public face of Meta's open-weights stance. His argument has two legs.

The first is technical. LeCun holds — and many researchers agree with him — that current autoregressive language models (GPT-4, Claude, Llama) aren't the path to genuinely general capability. In his read, the qualitative jumps will come from different architectures entirely. If that thesis holds, training one more model and charging a premium for it isn't the long-game winning move.

The second leg is strategic. If the base model ends up a commodity within a few years, whoever gives it away today builds an ecosystem; whoever charges for it today ends up tied to a moat that's eroding. The technical vocabulary here is "commoditize your complement" — a phrase Joel Spolsky popularized in 2002 but that Meta has applied to AI more consistently than anyone else. If what your rival sells at a premium becomes free, the rival loses the moat.

Between these two readings, Meta decided that base models are the complement, and that what needs protecting is distribution — WhatsApp, Instagram, Facebook — plus the data feeding its advertising business.

The Llama sequence and the infrastructure behind it

Worth walking through the release cadence to see the company's pace.

Llama 1 (February 2023). Four sizes: 7B, 13B, 33B, 65B parameters. Restricted access. Leaked on 4chan five days later. The modern open-source community starts here.

Llama 2 (July 2023). 7B, 13B, 70B. Permissive license with commercial rights. A foundational milestone for the open-weights industry.

Llama 3 (April 2024). 8B and 70B initially. The 70B became the workhorse for anyone wanting to host their own AI without depending on third parties. Months later came the 405B — competitive with GPT-4 on the main benchmarks.

Llama 3.1, 3.2, 3.3 (2024). Successive iterations. Llama 3.3 70B matched the original 405B on many benchmarks with a third of the parameters — a demonstration that efficiency still has room to run.

Llama 4 (April 2025). Mixture-of-Experts architecture. The largest model in the family, Behemoth, was unveiled at 2 trillion total parameters (not all active per token). It marked Meta's move to MoE, the same path OpenAI and Google had taken with their frontier models.

The infrastructure enabling that cadence is offensive. Meta stated in its Q4 2024 earnings call roughly $65 billion in capex for 2024, with guidance above $100 billion for 2025 — the most aggressive investment in the sector by an order of magnitude. As of late 2024, Meta reported more than 350,000 NVIDIA H100 GPUs installed. Those figures come from the company itself and should be read as "interested-party data," as we warn when covering similar metrics in our coverage of the broader landscape.

Meta AI in WhatsApp: the other play

While Llama grew as the open-weights standard, Meta ran a parallel move: jam Meta AI inside the apps most of the world already has installed.

Today Meta AI lives in three places. Inside WhatsApp chats (type "@Meta AI" and it appears). In Instagram's search bar. In Facebook Messenger. The number Meta communicated in 2025 is over 500 million monthly active users — a figure that, if confirmed by independent measurement, would make Meta AI the generative AI with the most users in the world by absolute volume.

That figure carries an important asterisk. Most of those users didn't choose Meta AI as a tool. Meta AI chose the users: it showed up inside the app they already had open, with no explicit decision in between. That's the textbook definition of platform advantage.

The real strength: hostable open-weights

Where Meta genuinely punches hardest is in a specific professional segment: companies with privacy requirements that can't send data to a third-party cloud.

A law firm handling matters under confidentiality, a hospital with medical records subject to strict regulation, a bank with transactional data — none of these can, for compliance reasons, ship that information to Anthropic or OpenAI. With Llama the story flips: you download the model, you run it on your own infrastructure, and the data never leaves your network.

Llama 3.3 70B runs on reasonable hardware — servers with four to eight GPUs — and delivers quality close to Claude or GPT-4 on extraction, classification, and summarization tasks. No per-token cost. No dependency on a vendor's uptime. No vendor lock-in. For that customer profile, Llama is the only viable option in the market, and it's a very good one.

The honest limitations

An honest pro-Claude read has to name what Meta doesn't do better.

For the individual professional who wants to delegate quality work — writing a careful report, analyzing a contract, reasoning through a complex problem across multiple turns — Meta AI and Llama aren't the first pick. Claude is still more consistent on literal instructions, more honest when it doesn't know something, and better at holding coherence across long conversations. That's a measurable work difference, not a theoretical detail.

The difference comes down to priority. Anthropic invests heavily in alignment and reliability with Constitutional AI. Meta invests heavily in scale, distribution, and efficiency. Two different bets. For serious professional use, Anthropic's bet pays off better. For ambient mass use on your phone and for private hosting at companies, Meta's bet is hard to beat.

To close, and to keep going

Meta is the hardest company to fit into a single narrative about the sector. It doesn't compete on the same lane as Anthropic (professional reliability), nor the same lane as OpenAI (mass consumer application with subscriptions), nor the same lane as Google (deep office suite integration). Meta plays its own game: open-weights at the model layer, glued-to-user distribution at the application layer, and offensive capex at the infrastructure layer.

That strategy has clear winners. For companies with sensitive data, Llama is the only viable path today. For five hundred million people already living inside WhatsApp, Meta AI is the de facto AI, even if they have no idea Llama exists. For the individual professional producing delegable work, Claude or ChatGPT still deliver more.

If you want to see how this bet compares to the other majors', The AI race sets out the full map. If you want to dig into the question of how models get measured honestly — because benchmarks published by the companies themselves need to be read with care — How AIs are measured is the next link.

Does your company have data that can't leave your network, or is your daily AI use exploratory and living on top of your phone? That question tells you which face of Meta actually serves you.

Open WhatsApp. Look at the bottom right corner of your chat list. There's a small circular blue button that says "Meta AI."

That button sits on the phones of five hundred million people who use it every month. Most of them have never heard the words "Llama" or "language model." Some don't even know there's an AI in there — they think it's just another chat feature, like stickers.

That's probably the quietest AI rollout in the world.

The company that made two bets at once

Meta — the owner of Facebook, Instagram, and WhatsApp — made an odd call three years ago. Instead of selling its AI as a subscription, it gave it away.

Llama launched in February 2023. At first it was a model meant for researchers, with restricted access. Within days, someone leaked the weights on the internet. Instead of a disaster, that leak turned into a business: Meta saw the technical community had rushed to Llama, and chose to take the accident as strategy. From Llama 2 onward, in July that same year, they opened the weights officially.

Why give away something that costs billions to train? Because if every developer in the world learns to work with your model, the ecosystem grows around you. You don't sell the model. You sell — or protect — everything that sits on top of it.

What's already in your phone

But Meta doesn't just live on giving models away. The other play is mass distribution.

Meta AI is glued inside WhatsApp, Instagram, and Facebook. Type "@Meta AI" inside a chat and it shows up. Tap the magnifying glass on Instagram and the search bar now has AI in it. Friction to use it is zero. No new account, no new app to download, no new site to learn.

The result is that Meta AI became the most-used AI on the planet by raw number of people, even though almost nobody uses it as their main work tool. It's pass-through AI — you ask it things while you're already there for something else.

What yes, what no

Meta AI inside WhatsApp works well for a bunch of small daily questions. Translate this. Summarize this long article. Draft a toast for a birthday. What happened today on this news story. Quick questions where "good enough" beats "perfect but in another app."

It doesn't work as well for serious work. If you need to review a contract, analyze a spreadsheet, write a report going to a client — Claude or ChatGPT are still more careful, more honest when they don't know something, more consistent when you give them long specific instructions.

The practical rule is simple. Meta AI for what you're doing already inside your phone. Claude or ChatGPT for when you open the laptop and say "now to work."

Llama, the model you can download

There's a use of Meta's tech that most people don't know about, and it's the most interesting one for companies: Llama is downloadable.

If you're a medical consultancy, a law firm, a bank, or any company working with data that can't leave your network — you can download Llama 3.3 70B, install it on your own servers, and use it with nothing going over the public internet. Neither Meta nor anyone else sees what you ask. For those use cases, Llama is the only real option on the market. Claude and ChatGPT, by design, need to send data to their own servers.

That takes a technical team that knows how to install and maintain the model. It's not plug-and-play. But for companies with hard privacy requirements, that technical work is worth it because there's no alternative.

What to take away

Three things worth holding onto:

Meta won on distribution, not on quality. Meta AI is the most-used AI in the world by headcount, but almost nobody picks it because it's the best. They pick it because they already have it open in WhatsApp. That's a platform advantage, not a product advantage.

Open Llama changed the game for companies. Before 2023, if you wanted capable AI without shipping data outside, you had no option. Now Llama exists and runs on-premise. Not for you as an individual — for your bank, your clinic, your firm.

For professional work, Meta AI isn't your first pick. For quick questions and translations inside your phone, it delivers. For a contract, a real analysis, or code headed to production, open Claude or ChatGPT. Each tool has its place.

That acknowledgment is the key to understanding why Meta competes so differently from Anthropic and OpenAI.

Yann LeCun's thesis

The Llama sequence and the infrastructure behind it

Worth walking through the release cadence to see the company's pace.

Llama 1 (February 2023). Four sizes: 7B, 13B, 33B, 65B parameters. Restricted access. Leaked on 4chan five days later. The modern open-source community starts here.

Llama 2 (July 2023). 7B, 13B, 70B. Permissive license with commercial rights. A foundational milestone for the open-weights industry.

Meta AI in WhatsApp: the other play

While Llama grew as the open-weights standard, Meta ran a parallel move: jam Meta AI inside the apps most of the world already has installed.

The real strength: hostable open-weights

Where Meta genuinely punches hardest is in a specific professional segment: companies with privacy requirements that can't send data to a third-party cloud.

The honest limitations

An honest pro-Claude read has to name what Meta doesn't do better.

To close, and to keep going

Does your company have data that can't leave your network, or is your daily AI use exploratory and living on top of your phone? That question tells you which face of Meta actually serves you.

Meta's most consequential strategic decision in AI wasn't training Llama. It was deciding, after the February 2023 leak, that the leak was information rather than a problem. That the competitive moat didn't live in the model weights, and that giving them away was consistent with the company's value structure. That decision, partly traceable to Yann LeCun's reading and executed by Mark Zuckerberg with aggressive capex, defined Meta's differential position in the sector. Worth disassembling with precision.

The "commoditize your complement" thesis, applied

Joel Spolsky's framework in "Strategy Letter V" (2002) identifies a recurring dynamic in tech markets. Every product has complements: goods consumed alongside the product, whose demand moves with it. The winning strategy, per Spolsky, is to identify the complements of your own product and work to make those complements commodities — cheap, abundant, interchangeable. Because when the complement is cheap, demand for your own product rises.

The classic example: IBM pushed Linux as an alternative to Windows because the operating system was complementary to IBM servers. If Linux became free and ubiquitous, Windows lost its premium pricing, and IBM servers (the product IBM actually sold) became more attractive.

The analogous read for Meta: base AI models are complementary to — for Meta — its distribution platform and its advertising business. Meta's customers are advertisers. The value Meta captures comes from attention captured in WhatsApp, Instagram, and Facebook, and from the targeting quality of its ad algorithms. If AI models become commodities — free, abundant, interchangeable — the position of rivals dependent on pricing them at a premium (OpenAI, Anthropic) weakens, and Meta's position — monetizing attention and targeting — strengthens relatively.

LeCun has defended this read in public talks and in X threads with a consistency that goes beyond corporate messaging. There's an honest intellectual dimension to the bet alongside the strategic one.

Technical architecture: the Llama line

The Llama architecture followed the Transformer decoder-only trend, with specific optimizations worth naming.

Llama 1 introduced Rotary Position Embeddings (RoPE) and SwiGLU as the feed-forward activation. Those two technical choices — both previously explored in academic papers — put Llama 1 ahead of GPT-3 on training efficiency per token.

Llama 2 added Grouped Query Attention (GQA) in the larger models, cutting memory use and speeding inference with no measurable quality drop. The Llama 2 Community License allowed commercial use under specific restrictions (over 700 million MAUs required a separate license — a clause pointedly aimed at Google, Amazon, and Microsoft).

Llama 3 pushed training data scale to 15 trillion tokens, an order of magnitude above Llama 2. The 405B was the first open-weights model that genuinely competed with the original GPT-4 on the main benchmarks. The honest technical critique is that the 405B, because of its size, isn't practically hostable without datacenter-class infrastructure; the "production-useful" model in the open-weights ecosystem remains the 70B.

Llama 3.3 70B — released late 2024 — is the most interesting engineering piece in the sequence. With 70B parameters, it matches or beats the original 405B on several benchmarks thanks to post-training improvements and high-quality synthetic data. It's the clearest empirical evidence that the frontier is no longer purely scale; it's data curation, fine-tuning methods, and distillation techniques.

Llama 4 (April 2025) pivots to Mixture-of-Experts. The series includes Scout (109B total, 17B active per token), Maverick (400B total, 17B active), and Behemoth (2T total, 288B active). The MoE bet aligns Meta with the architectures OpenAI and Google had been using a release cycle earlier, and it lets Meta scale capability without scaling inference cost linearly. The trade-off is serving complexity: MoE models need sophisticated routing and benefit less from modest infrastructure.

Benchmarks: evidentiary regime

Benchmarks published by Meta for Llama should be read under the same evidentiary regime applied to any data self-reported by an interested party. Indicative figures from the Llama 3.3 / Llama 4 cycle:

MMLU (general knowledge): Llama 3.3 70B and Llama 4 Maverick sit in the 85-88% band, comparable to frontier closed models in contemporary releases.
HumanEval (coding): Llama 3.3 70B around 88-90%.
GSM8K / MATH (mathematical reasoning): competitive on Llama 4, below OpenAI's o3 tier by design (Llama doesn't use inference-time chain-of-thought as a default).

On instruction-following consistency and hallucination absence — the metrics where Claude historically leads — Llama models consistently sit below Claude Opus / Sonnet in the same era. That gap is smaller on mechanical tasks (extraction, structured classification) and larger on open-ended ones (professional writing, judgment-laden analysis).

The editorial read is that Meta isn't investing on the same qualitative frontier as Anthropic. It's investing on a different frontier — efficiency, open scale, mass distribution — and that strategic choice, not a technical shortfall, explains the consistency gap.

The capex bet and the legitimate doubt

Meta's stated 2024 capex figure is around $65 billion, with guidance above $100 billion for 2025 on the Q4 2024 earnings call. That is, by an order of magnitude, the most aggressive investment in the sector. Order-of-magnitude equivalent: more than the combined capex of Anthropic, OpenAI, and xAI in those years.

Can that investment be monetized? The honest financial answer is that the question is badly posed. Meta doesn't need Llama to generate direct revenue for capex to pay off. It needs three things in parallel:

First, Meta AI in WhatsApp / Instagram / Facebook has to raise engagement and session time. That monetizes through advertising.

Second, ad-targeting algorithms have to improve with more capable models trained on in-house infrastructure. That monetizes through higher CPMs.

Third, the Llama ecosystem has to capture mindshare and professional training in enough critical mass that the infrastructure and services layers around Llama are defensible against the eventual equivalent move from competitors — particularly Alibaba with Qwen and DeepSeek with their models.

All three propositions are reasonable. None is certainty. The legitimate doubt about Meta's bet isn't whether the capex is defensible in 2026 — it is, because Meta can absorb it from operating cash flow — but whether the advantage of being the Western open-weights leader will hold up against Chinese labs publishing more often under even more permissive licenses.

The open ecosystem and the balkanization risk

A factor outside analyses often underweight is the ecosystem effect Llama generated. Hugging Face reports more than 100,000 derivative Llama models (fine-tunes, distillations, quantization-optimized versions). That's not just a vanity number: it's distributed cognitive infrastructure. Each fine-tune solving a vertical case — legal, medical, multilingual — makes the Llama ecosystem more valuable as the default.

The associated strategic risk is balkanization. If Qwen (Alibaba) and DeepSeek consolidate adoption in Asian markets with more permissive licenses and comparable quality — which they're already showing on contemporary benchmarks — the open ecosystem fragments. Meta as the Western leader, Qwen as the Asian leader, and Meta's moat becomes regional rather than global.

Meta's response to that risk appears to be release speed and scale. Offensive capex is, read in that frame, a bet that absolute infrastructure — compute, data, concentrated talent — will hold up as a durable advantage even though the model layer has been commoditized.

Editorial thesis

I'll close with a thesis that goes past reporting and into evaluation.

The AI sector in April 2026 is stratifying into distinguishable layers with different winners. In the mass-consumer subscription application layer, OpenAI leads through mindshare and Microsoft distribution. In the delegable professional reliability layer, Anthropic leads through alignment discipline and operational consistency. In the deep office-suite integration layer, Google has a structural advantage because it owns the suite. In the infrastructure and open-ecosystem layer, Meta leads — and is, arguably, the only Western major with a coherent strategy in that specific layer.

That stratification has a useful implication for the professional picking tools. The right question isn't "which AI is best?" The right question is "which layer does my work live in?" If your work is delegable production with high stakes, Claude. If it's exploration with Office integration, ChatGPT via Copilot. If it's deep Workspace integration, Gemini. If it requires on-premise AI for compliance, Llama.

Meta's differential strength is the fourth category. It's narrow, but it's real, and it's the only category where Anthropic's or OpenAI's position is structurally weak — not from a technical shortfall but because their business models depend on the model running in their cloud. Meta played to occupy the slot the others can't occupy without cannibalizing their own business.

That's the bet that defines Meta in AI. Five years after the ChatGPT moment, that bet is paying off where it needs to pay off. And the capex trajectory suggests Meta is going to defend that position with an intensity the rest of the sector will have to take seriously.

Which layer of the AI stack does your professional work live in, and which provider has the bet most coherent with that layer?