The CAFÉ Method — the complete framework that transforms every prompt

TL;DR

CAFÉ is a four-component framework — Context, Action, Format and style — that turns any vague request into a professional prompt. I built it after spotting a pattern: the best prompts, no matter who writes them, always include the same four pieces. CAFÉ makes them explicit so you don't forget any. Works with Claude, ChatGPT, Gemini and whatever comes next — for text, image and video. The examples in this piece are built with Claude because that's what I use in my consulting work, but the method doesn't depend on the tool. The acronym stays in Spanish; the É keeps the accent as a visual marker.

✦ Summarized with Claude at publish time

✦ AI rewrite

Read it as…

A few months ago I ran a workshop at an insurance company. I asked the twenty participants to write down, quietly, the prompt they'd use to get an AI to draft a difficult email to a client. Five minutes. Then we read all twenty out loud.

The range was striking. Some people wrote two lines. Some wrote twelve. Some started with "I'm an insurance agent"; others started with "I need an email." Some specified length; others asked for "something human but professional." None of them included the same four elements.

That was the deeper problem. They weren't worse or better prompts. They were twenty different languages all speaking to the same tool.

That afternoon it clicked for me that what was missing wasn't technique. It was a shared language. A small checklist so that, without overthinking it, anyone could put together a request that works. I gave it a name that sticks in your head over a coffee. That's how CAFÉ was born.

Quick note before we go in

The examples in this piece are built with Claude because that's the AI I use every day in my consulting work. CAFÉ works the same way with any of them — ChatGPT, Gemini, Copilot, whatever comes next. It's a communication method, not a technique tied to a specific tool.

A note on the acronym: it stays in Spanish. C for Context, A for Action, F for Format, É for style (Estilo in Spanish — the É keeps the accent as a visual marker).

C — Context

The component with the biggest impact, and the one people skip most.

Context has four pieces: who you are professionally, the concrete situation you're solving, your objective, and who the final output is for.

Here's the difference.

Without context: "Write me an email for an upset client."

With context: "I'm an insurance consultant. My client has been with us for fifteen years. We just denied a claim that was technically invalid — but he's an important client and he's very upset. My objective is to keep the relationship without walking back the technical decision."

What changed? Claude isn't writing for "any upset client" anymore. It's writing for this specific client. It gives you phrases that recognize the fifteen years, that separate the technical decision from the relationship, and that keep a door open. The output stops being a template and becomes a draft.

A — Action

If Context defines the universe, Action defines the move.

Three pieces: a concrete verb, a specific task, a measurable result.

Concrete verb. "Help me with" isn't a verb, it's a cry for help. Write, analyze, summarize, compare, structure, translate, list. Each one triggers a different kind of response.

Specific task. "Give me ideas" leaves the door open to infinity. "Give me five strategies to land ten new clients this quarter" closes it somewhere useful.

Measurable result. "The output should have five options, each with a title, a two-line description, and a main advantage." Now the model knows when it's done.

Without clear Action, Claude returns something good — but that "something" is its interpretation of your ambiguity. With clear Action, it returns exactly what you asked for.

F — Format

Format is the shape of the output. It's what separates useful from usable.

In text: email, table with named columns, numbered list, document with sections, script, Slack-style short message.

In image: resolution (1200×1600, 4K, 8K), aspect ratio (16:9, 1:1, 9:16, 3:4), lighting (soft natural light, golden hour, harsh midday light), palette, depth of field.

In video: length in seconds, fps (24, 30, 60), shot type (close-up, wide, overhead), camera movement (static, tracking, drone).

Without Format, the model picks for you. And it almost never picks what you needed. You ask for an analysis and get twelve paragraphs when you wanted a table. You ask for an email and get five hundred words when you wanted six lines.

É — style

Style is how it should sound. It's the most underrated component because it looks cosmetic. It isn't.

Practical and direct for an internal report. Formal and corporate for a board proposal. Warm and conversational for social media. Technical and precise for a consulting piece.

A sales email with a formal, distant style aimed at a young audience reads as spam. The same content with a warm, unjargoned style converts. The message didn't change. The voice did.

When I struggle to define style, I use a trick: I tell Claude who I want it to sound like. "Write this in the voice of someone speaking in first person, direct, no ornament, like an experienced teacher sharing what actually works." That's style.

The evolution, step by step

Same request — a response to an upset client — growing letter by letter:

C only. "I'm an insurance advisor with a fifteen-year client who just got a claim denied." → Claude asks you what you want to do.

C + A. "…Write me the reply." → You get something back. Could be an email, could be a script for a call, could be three loose paragraphs. It's guessing.

C + A + F. "…Six-line email, with subject line and closing." → Now it has shape. The content may still be impersonal.

Full CAFÉ. "…Professional but warm tone, one that recognizes the fifteen years without apologizing for the technical decision." → This is a draft. You review it, change two words, send it.

The difference between the first step and the last isn't gradual. It's qualitative.

CAFÉ for image and video

What makes the method useful is that the logic survives when the medium changes.

Image. Context: "I'm an editorial photographer working on a cover for a piece about workplace burnout." Action: "Generate a conceptual image that conveys exhaustion without showing faces." Format: "3:4 vertical, 1200×1600, harsh side lighting from an office setting, shallow depth of field." Style: "Restrained editorial aesthetic, greys and blues, documentary style."

Video. Context: "I'm a coach and I'm putting together an Instagram reel about productivity." Action: "Write a fifteen-second script where I give one practical tip." Format: "15 seconds, 9:16, close-up to camera, cut every three seconds." Style: "Warm, conversational, like I'm talking to a friend, no epic music."

The method didn't change. The units inside each letter did.

To close

Which of the four letters do you feel slips away most often when you prompt an AI? I mean the question seriously — in my experience there's almost always one that systematically goes missing, and the person doesn't notice until someone points it out.

If you want to step back a layer, What is a prompt? is the piece before this one. And if CAFÉ is already in your hands and you want the next step — more complex prompts, multi-turn, with assigned role — Professional prompting guide is the way forward.

Picture yourself walking into a bar and telling the bartender "bring me something to drink." They might come back with water, with coffee, with a soda, with a glass of hot red wine. None of those are wrong. None of them are what you wanted.

If instead you say "bring me still mineral water, cold, in a glass," you get still mineral water, cold, in a glass.

AI works exactly the same way.

The four letters

A couple of years back I started noticing a pattern. Every time someone showed me a prompt that worked well, it had the same four ingredients. And every time someone showed me a prompt that came out wrong, at least one of them was missing.

I put names to them so I wouldn't forget:

C for Context. Who you are, what you're working on, what it's for. Without this, the model answers you like you're anyone.

A for Action. What you want it to do, exactly. Not "help me with" — "write me," "summarize," "compare," "list."

F for Format. The shape of the output. An email, a table, a five-point list, a vertical image.

É for style. How it should sound. Formal, warm, technical, direct. (The letter stays É because the Spanish word is Estilo. The accent sticks.)

CAFÉ. Four letters, the length of an espresso. It fits in your head.

An example

Without CAFÉ: "Help me with an email to a client."

With CAFÉ: "I'm an insurance advisor. I need to reply to a client who has been with us for fifteen years and just got a claim denial. Write me a six-line email, with a subject line, that acknowledges their frustration, explains why the claim was denied, and proposes a call to review it. Professional but warm — no cold corporate language."

The first request gets you a generic template. The second gets you an email you send today.

Three things to take away

One. CAFÉ works with any AI. My examples use Claude because that's what I run in my consulting work. Paste the same prompt into ChatGPT or Gemini and it works the same way.

Two. You don't need to learn "prompt engineering." If you can explain to a colleague what you need, you can do CAFÉ. Same skill.

Three. When something's missing, you can feel it. Next time an AI hands you something mediocre, stop and ask which of the four letters wasn't there. There's almost always a clear answer.

That was the deeper problem. They weren't worse or better prompts. They were twenty different languages all speaking to the same tool.

Quick note before we go in

A note on the acronym: it stays in Spanish. C for Context, A for Action, F for Format, É for style (Estilo in Spanish — the É keeps the accent as a visual marker).

C — Context

The component with the biggest impact, and the one people skip most.

Context has four pieces: who you are professionally, the concrete situation you're solving, your objective, and who the final output is for.

Here's the difference.

Without context: "Write me an email for an upset client."

A — Action

If Context defines the universe, Action defines the move.

Three pieces: a concrete verb, a specific task, a measurable result.

Concrete verb. "Help me with" isn't a verb, it's a cry for help. Write, analyze, summarize, compare, structure, translate, list. Each one triggers a different kind of response.

Specific task. "Give me ideas" leaves the door open to infinity. "Give me five strategies to land ten new clients this quarter" closes it somewhere useful.

Measurable result. "The output should have five options, each with a title, a two-line description, and a main advantage." Now the model knows when it's done.

Without clear Action, Claude returns something good — but that "something" is its interpretation of your ambiguity. With clear Action, it returns exactly what you asked for.

F — Format

Format is the shape of the output. It's what separates useful from usable.

In text: email, table with named columns, numbered list, document with sections, script, Slack-style short message.

In image: resolution (1200×1600, 4K, 8K), aspect ratio (16:9, 1:1, 9:16, 3:4), lighting (soft natural light, golden hour, harsh midday light), palette, depth of field.

In video: length in seconds, fps (24, 30, 60), shot type (close-up, wide, overhead), camera movement (static, tracking, drone).

É — style

Style is how it should sound. It's the most underrated component because it looks cosmetic. It isn't.

Practical and direct for an internal report. Formal and corporate for a board proposal. Warm and conversational for social media. Technical and precise for a consulting piece.

A sales email with a formal, distant style aimed at a young audience reads as spam. The same content with a warm, unjargoned style converts. The message didn't change. The voice did.

The evolution, step by step

Same request — a response to an upset client — growing letter by letter:

C only. "I'm an insurance advisor with a fifteen-year client who just got a claim denied." → Claude asks you what you want to do.

C + A. "…Write me the reply." → You get something back. Could be an email, could be a script for a call, could be three loose paragraphs. It's guessing.

C + A + F. "…Six-line email, with subject line and closing." → Now it has shape. The content may still be impersonal.

Full CAFÉ. "…Professional but warm tone, one that recognizes the fifteen years without apologizing for the technical decision." → This is a draft. You review it, change two words, send it.

The difference between the first step and the last isn't gradual. It's qualitative.

CAFÉ for image and video

What makes the method useful is that the logic survives when the medium changes.

The method didn't change. The units inside each letter did.

To close

I opened a thread in Claude Desktop, another in ChatGPT, one in Cursor, one in Perplexity Pro. I asked each of them to generate a similar piece of analysis. Then I pulled up the system prompts reported by the community for each of those applications. And I started looking at what made them work.

The system prompts behind serious AI applications — the ones product teams spend months tuning before launch — have a structure that repeats with a consistency that isn't accidental. They define who the assistant is and the context it operates in. They define which specific actions it can and can't take. They define the exact format for each type of response. And they define the register, the voice, the style.

They are, without anyone naming it this way, CAFÉ prompts. Prompt engineers at the frontier labs structure them this way without putting a label on the pattern. I put the label on it for everyday use. That's the entire novelty of the method, and it's enough — because what has no name doesn't get taught, and what doesn't get taught doesn't scale outside the lab.

Methodological note

The examples that follow are built with Claude Opus 4.7. It's the tool I use in my consulting work every day and the one I can report observed behavior on with precision. The method works with any generative AI: the logic of the four components is independent of the underlying model. If you swap Claude for ChatGPT, for Gemini, for Grok, or for a decent open-source model, the same four components produce outputs comparable in relative quality — the specific textual output varies, but the structural win holds.

The pattern CAFÉ names

Anyone who has done prompt engineering seriously for more than six months knows this: the four components CAFÉ points at are the same ones Anthropic recommends in its official prompt engineering guide, the same ones OpenAI lists in its best-practices guide, the same ones Google documents for Gemini. It's not a coincidence. It's that the underlying technical problem — how to condition the generation of an LLM toward a useful output — has a structure that admits solution along four relatively orthogonal dimensions.

Context conditions the prior. A language model generates responses conditioned on input. Rich context activates more specific and relevant response subspaces. Without context, the model defaults to the modal distribution — the most probable response for the average user. With context, it generates from a specific position in semantic space. The academic literature calls this "in-context learning" and treats it as one of the most important emergent capabilities of models at scale.

Action shrinks the search space. A prompt with a vague verb ("help me with") leaves a huge decision tree open — the model has to infer what you want. A prompt with a concrete verb ("write a six-line follow-up email") narrows the response space drastically. The quality difference isn't gradual — it's qualitative, because the cost of reasoning over ambiguity typically exceeds the implicit compute budget the model spends on the task.

Format imposes explicit structure. Models are very good at following structure when you give it, and fairly arbitrary when you don't. Asking for "a table with name, advantage, cost columns" produces exactly that. Not asking for it produces loose paragraphs, sometimes lists, sometimes headers, depending on the model's latent state in that particular generation.

Style tunes the surface without touching the content. It's the most underrated component and the one that makes the biggest difference in the final deliverable. Same content with the wrong style is unusable. Same content with the right style is publishable. The difference isn't cosmetic — it's operational.

CAFÉ as generalization

The interesting thesis isn't that CAFÉ works. It's that CAFÉ is an explicit generalization of the pattern already implicit in any serious prompt.

Other frameworks exist: RACE (Role, Action, Context, Expectation), CO-STAR (Context, Objective, Style, Tone, Audience, Response), RISEN (Role, Instructions, Steps, End goal, Narrowing), CRISPE, and several more. They all capture, with different slicings, the same four-to-six-dimensional space. CAFÉ's advantage isn't that it's more complete — it's that it has the right number of letters to recall without looking, and those letters pronounce into a word anyone in the Spanish-speaking world can hold.

Without the euphemisms: in prompt engineering there's no technically superior framework. There's one you'll actually use because you remember it at the moment you're about to write the prompt, and there are others that live in a Notion doc you never open.

A full case: contract analysis pipeline

Where CAFÉ shows its value most clearly is in real pipelines — work where the output has to stay consistent across dozens or hundreds of runs.

Real case from my consulting practice: preliminary analysis of commercial contracts for a law firm. Volume: forty to sixty contracts a month.

Prompt without CAFÉ: "Analyze this contract and tell me what problems it has."

Output: variable. Sometimes a list of risks. Sometimes a narrative summary. Sometimes a table. The firm can't audit the work because every output has a different structure.

Full CAFÉ prompt:

Context. You are a senior lawyer specialized in commercial contracts with fifteen years of experience in the Argentine market. You are doing a preliminary review for a law firm that needs to identify risks before going deeper on the most critical ones. The attached contract is a distribution agreement between an Argentine company (principal) and a Chilean distributor. Action. Identify clauses that may generate legal or commercial risk. For each risky clause, extract the literal text, label the risk category (legal, operational, financial, jurisdictional), rate severity on a 1-5 scale, and propose a question for the senior lawyer to decide whether to go deeper. Format. Markdown table with columns: Clause (number and title), Literal text (max 200 characters), Category, Severity, Question for deeper review. After the table, a three-line executive summary with the total number of high risks (severity 4-5), the dominant theme, and the overall recommendation. Style. Legal-professional without ornament. Active voice. Technical precision. Don't say "could be interpreted as" when you mean "is interpreted as"; if there's genuine ambiguity, name it explicitly.

Output: a table identical in structure across every run, with content specific to each contract. The firm can audit, compare contracts against each other, build aggregate metrics about risk types, train juniors on the outputs. The pipeline works because the prompt works. The prompt works because it has all four components.

Without F for Format, every run has different structure and the downstream work doubles. Without É for style, the voice varies between runs and the output loses authority. Without A for Action, the model interprets "problems" differently each time. Without C for Context, the analysis happens from the position of a generic lawyer instead of a specialized senior — and that changes which risks get surfaced and which slip past.

What CAFÉ isn't

Worth being honest about the method's limits.

CAFÉ isn't advanced prompt engineering. It doesn't cover chain-of-thought, few-shot examples, self-consistency, tool-use, or multi-step agent techniques. It's the foundational layer — the base those techniques build on. If your use case needs complex multi-step reasoning or external tools, CAFÉ is the starting point, not the finish line.

CAFÉ also doesn't replace editorial judgment. The output of a well-built CAFÉ prompt is a high-quality first draft. You still have to read it, criticize it, adjust it. The method's claim isn't "you automate thinking"; it's "you reduce friction between what you want and what the tool returns."

And CAFÉ isn't a long-term competitive differentiator. As models improve, they'll get more forgiving of badly structured prompts — they already are. What doesn't change is that having a consistent way of asking produces consistent outputs. That's the real value, and it scales rather than erodes.

Editorial thesis

I'll close with the thesis that orients the rest of this series.

Prompt engineering won't vanish with the next models — it will transform. Models will forgive ambiguity more, but the person writing explicit prompts will keep getting more predictable outputs, more auditable outputs, outputs more integrable into real pipelines. The difference between "AI sometimes helps me" and "AI is part of my pipeline" isn't the model: it's the discipline you bring to the request.

CAFÉ is that discipline in minimal form. It's not the only way to structure a prompt. It's the way you can remember while drinking a coffee, and that cognitive portability is what makes it used — not just known. There's a big gap between those two things.

When a method scales, it isn't because it's the most sophisticated. It's because it's the one that reaches your head first at the moment you're about to act. CAFÉ is designed for that. And as models get better, CAFÉ doesn't become obsolete — it becomes more valuable, because consistency on the request side compounds with consistency on the model side and produces professionally reliable outputs. That compounding is, right now, the biggest difference between a team that uses AI with enthusiasm and a team that uses AI with rigor.

What's your current process when a prompt doesn't return what you expected — do you rewrite the whole thing, or do you try to identify which of the components failed?