Análisis · History & Fundamentals · Edition #0004

How an AI "thinks" — no jargon, for humans

AI doesn't think. It guesses the next word. Here's how — and why that's enough to change how you work.

G
Germán Falcioni April 12, 2026
✦ Reading: 8 min
Illustration: neural autocomplete — prediction at massive scale
TL;DR

AI predicts the next word based on text it has read. It works well because it has read a lot. But it's predicting, not understanding.

✦ Summarized with Claude at publish time
AI rewrite
Read it as…

How an AI "thinks" — no jargon, for humans

An ordinary morning on your phone

You open WhatsApp. You type: "Hey, how are…". The keyboard suggests "you." It didn't think. It predicted.

A generative AI does exactly the same thing, but at a scale that's hard to picture. It's called a Large Language Model, and it's a word-prediction machine trained on so much text that its predictions end up sounding intelligent.

But they aren't. They're probabilities. That's the trick, and that's the limit.

How it was trained

Creators feed the model brutal amounts of text. GPT-3 was trained on roughly 300 billion tokens, according to the original paper (Brown et al., 2020, NeurIPS). Later models used even more. We don't have exact public numbers for the latest Claude or GPT-4, but they're in the same order of magnitude or higher.

The model reads all that and learns correlations. It doesn't "understand" concepts. It learns: after "the cat" you probably get "is," "jumps," or "meows" — almost never "studies math."

That learning freezes into neural weights — numbers inside the connections between simulated artificial neurons. GPT-3 has 175 billion of those weights. That's the official figure, from the same paper. The biggest 2024 models have more, though companies publish fewer details every year.

How it predicts in real time

You type: "Analyze this balance sheet and…". The model does this:

First. It converts every word into numbers. Those numbers are called embeddings and live in a mathematical space with thousands of dimensions.

Second. It runs those numbers through layers of multiplications and additions. Many layers.

Third. At the end it calculates a score for every possible word in its vocabulary (around 50,000 words or sub-word fragments for GPT-2, as a verifiable reference point).

Fourth. The word with the highest score wins. That's the next one.

Fifth. That word gets fed back into the system. And it predicts the next. And another.

That's how a whole paragraph gets generated, one word at a time, while you watch it appear on screen.

What changes for you

It means the AI doesn't genuinely understand meaning. It doesn't know what a balance sheet is. It doesn't know if your numbers are right. What it does is: "after 'analyze this balance sheet,' you probably get 'revenue,' 'expenses,' 'your margin improved'…" Statistically plausible words.

That's why it works well for three kinds of tasks. Writing an email. Summarizing a text. Explaining a known concept. The pattern is in those 300 billion words of training.

That's also why it fails in three other kinds. New truths it didn't see in training. Facts specific to your business. Things that happened last week. There it guesses.

Here's the most important detail. When it guesses wrong, it does it with the same confidence as when it guesses right. That's called a hallucination. The machine has no internal thermometer for certainty.

The architecture that changed everything

Until 2017, AIs processed text slowly and sequentially. Word by word. Then Google published a paper called "Attention Is All You Need" (Vaswani et al., 2017) and everything broke.

The paper introduced transformers. The key innovation is the attention mechanism. Instead of reading word by word, the model processes the whole paragraph in parallel. And it decides, mathematically, how much each earlier word matters for predicting the next one.

That paper now has more than 100,000 citations. It changed everything. Without transformers there's no ChatGPT, no Claude, no Gemini.

Without context, it predicts generically

If you say "write me an email," the AI predicts the most average email possible. Its predictions come out generic because it has no clues from you.

With context — your business, your voice, your specific problem — the predictions adjust. That's the entire point of the CAFÉ method: clear Context, concrete Action, precise Format, defined Style. It's not a trick. It's giving the model more signal so its predictions come out yours.

What most people miss

A lot of people assume the AI has "knowledge inside" or some "magical understanding of the world." It has none of that. It has numbers. When you ask a question, those numbers rearrange in ways that proved to work well on similar cases during training.

It's like a chef who read 50,000 recipes but never tasted the food. They know where the salt goes because it's in all the texts. They don't know how the food tastes because they never tried it.

That metaphor matters. The AI can talk brilliantly about topics it never experienced. It can write poetry about pain without ever having felt anything. It can explain how to ride a bike without balance of its own.

It's text that produced patterns. Not lived experience that produced text.

Next time you use one

An open question for you: if the AI predicts statistically, how different would the result be if you gave it three examples of how you write? What about ten? What about telling it who you're writing to?

That's the difference between using a generic AI and making it predict like you. If you want to understand why Claude distinguishes better between what it knows and what it doesn't while other models hallucinate more, we cover that in another article in the series.

For now, remember: the AI predicts. Give it clear clues, a concrete task, specific form, your own sound. That's how its predictions end up working for you.

Keep exploring

Want to go deeper?

01 Do AIs actually think?

No. They predict the next word (or token) from statistical patterns. They feel like they think because they absorbed hundreds of billions of words.

02 Why does an AI sometimes lie?

Because it's predicting. If its best guess is a false word that's statistically plausible, it'll say it with the same confidence as a true one.

03 Does it work differently if I give it more context?

Yes. With context, the model adjusts its predictions based on what it has already read in your conversation. Think of it as feeding it clues.

Next article
The AI timeline — milestones year by year