Análisis · The C.A.F.E. Method · Edition #0023

Why AI gets it wrong — hallucinations, limits, and how to protect yourself

AI tools make things up with unsettling confidence. It's not a bug that'll vanish in the next release: it's a direct consequence of how they're trained. Understanding why — and building a protocol to protect yourself — is part of the craft of using them.

G
Germán Falcioni April 20, 2026
✦ Reading: 10 min
Hallucinations aren't a side bug — they're the natural shape of a system that learned to sound convincing before learning to be accurate.
TL;DR

AI tools hallucinate because they're trained to predict the next most likely word, not to tell the truth. They don't have a default "I don't know" mode. A New York lawyer (Steven Schwartz, June 2023) filed a legal brief with six cases ChatGPT invented whole — names, judges, citations, all fake. The judge looked them up, they didn't exist, and Schwartz got sanctioned. That scene sums up the problem. The rule of thumb for using these tools is simple: treat them like a brilliant but new intern, not like an encyclopedia. Verify numbers, dates, laws, citations. One way is to ask it to search the web and cite sources.

✦ Summarized with Claude at publish time
AI rewrite
Read it as…

In February 2024, Google staged a showcase event for Gemini, its AI model, hoping to catch up with ChatGPT. In one of the promotional pieces, Gemini described in vivid detail "the first direct image of an exoplanet outside our solar system, captured by the James Webb telescope." It sounded impressive.

That same afternoon, NASA had to come out publicly and clarify that James Webb had never taken such an image. It didn't exist. Gemini invented it in a demo Google had designed to look flawless.

The Verge ran the story with clinical precision: hallucinations reach even the launches that should be perfect. It was the second time in under a year that the most-hyped AI of the moment invented facts on stage — ChatGPT had done the same in the Schwartz case. The problem isn't one model. It's structural.

The mechanism: why predicting isn't knowing

To understand hallucinations you have to understand an uncomfortable thing about how language models work.

An LLM — large language model — doesn't have an internal database of verified facts. It doesn't have a lookup table that says "Varghese v. China Southern Airlines: doesn't exist, don't cite". What it has is a statistical representation, built from hundreds of billions of words, of which word tends to come after which other word in which context.

When you ask a question, the model doesn't look up the answer. It generates a response token by token, choosing at each step the most likely word given the prior sequence. If the sequence leads to a pattern shaped like a legal citation — two names, a versus, a court — the model completes with that shape. The shape is right. The content can be real or invented. The model doesn't tell the two apart.

That's the root of the problem. There's no default "I don't know" mode because there's no internal module marking the difference between knowing and not knowing.

Where it goes wrong most often

Six categories of errors where you should raise your guard:

Specific dates. "When was such-and-such agreement signed?" The model gives you a day and month. Sometimes it's right, sometimes it swaps the year.

Financial figures. Prices, revenues, valuations. Numbers move often, and the model sticks with old versions or invents updates.

Direct quotes. Sentences attributed to someone. Part exists, part is altered, and you can't tell which is which.

Code. A function that doesn't exist, a library that doesn't install with that command, syntax the language doesn't accept. Code is where you catch it fastest, because it simply doesn't run.

Math. Multi-step operations where it loses the thread. Modern tools mitigate this with code execution, but if you don't have that on, watch out.

Legal and medical topics. High-risk because they're areas where the plausibility of the format (article X, subsection Y, dose Z) masks the possibility that the content is false.

The rates you can actually cite

There's a public reference worth knowing: the Vectara Hallucination Leaderboard. It's a benchmark that measures how much each model hallucinates on a specific task — summarizing a document you hand it — and it gets updated periodically.

The numbers move with each model version, but the general picture of the last year is consistent: Claude, GPT-4, and Gemini all sit at low single-digit rates, with differences that in practice are smaller than the marketing suggests. Two years ago the differences were much larger. Today, picking the "best model" is no longer the main lever against hallucinations.

If you want to cite a specific number, cite the source — for example, "Claude 3.5 Sonnet scored X percent per Vectara, checked April 2026." Without a verifiable source, you don't put a number. It's a craft rule.

What works and what doesn't to reduce it

Three techniques have solid evidence behind them:

Native web search. Claude, ChatGPT, and Gemini today can search the internet while they respond. When you turn that on, the task shifts from "remembering" to "looking up", and the hallucination rate drops noticeably. It's the simplest and most effective technique.

RAG (Retrieval-Augmented Generation). A system where before answering, the model queries a specific database of yours — company manuals, contracts, internal documentation. Heavily used in enterprise applications. Cuts risk a lot but doesn't eliminate it: the model can still invent within the retrieved document.

Prompting with explicit instructions. Phrases like "if you aren't sure, say so", "cite the exact source", "separate what you know from what you're inferring". It helps, but it's the weakest of the three.

What doesn't work as well: asking the model "how confident are you?" and trusting its answer. Models have a calibration problem — they report similar confidence for things they know and things they invent. Useful as a signal, not as truth.

A practical verification protocol

This is what I use and what I recommend in consulting when an AI output is going to reach a client.

  1. Identify what's verifiable and what isn't. Verifiable: numbers, dates, laws, names, quotes, URLs. Not verifiable: opinions, interpretations, suggestions on phrasing.
  1. For what's verifiable, apply two steps. First, ask the model to search the web and cite sources. Second, open the sources and check them. If a source won't open, or doesn't say what the model claims, it's invented.
  1. For what isn't verifiable, use your judgment. Your experience and your knowledge of the domain. The AI proposes; you decide.
  1. Never sign, send, or file anything you haven't reviewed. Even when the AI sounds confident. Schwartz learned that at the cost of his bar license.
  1. When the cost of an error is high, ask two different models. Five extra minutes, risk cut to a fraction.

To close

Hallucinations aren't a defect of one version. They're the natural shape of a system that learned to complete patterns before it learned to say "I don't know". They'll improve over time — Dario Amodei said in 2024 that "hallucinations won't be eliminated entirely, they'll become rarer and more verifiable" — but they won't disappear.

That means the verification protocol is part of the craft of using AI, not an accessory. The good news is the protocol isn't complicated: one of the most effective moves is to ask it to search the web and cite sources.

If you want to go deeper into the full workflow, The café method is the next step. If you're just starting out and want to understand what a prompt is and how to write one well, What a prompt is.

Where in your work can't you afford an AI error — and what protocol did you build to protect yourself there?

Next article
Claude deep dive — what it can do (and what most people miss)