The future of AI — what's coming in the next 24 months

TL;DR

The future of AI over the next 24 months breaks down into four visible vectors, not speculation: agents that execute tasks in your browser and apps, real multimodality (text + audio + video in a single conversation), AI embedded inside the tools you already use, and persistent personalization. AGI is an open debate with no date. The practical consequence is the same under every scenario: whoever masters AI today replaces whoever learns it two years from now, not the other way around. The advantage window is finite and it's closing.

✦ Summarized with Claude at publish time

✦ AI rewrite

Read it as…

I ask the model: "open my inbox, find the last ten unanswered client emails, group them by urgency, reply to the simple confirmations yourself, and give me a summary of the rest so I can decide."

Eight minutes later the summary arrives. Four emails answered. Six pending with a line of context each and a suggested reply. No errors. No hallucinations. You didn't click anything.

That happened today, April 2026. Not a prototype. Not a keynote demo. An agent running on top of Claude Computer Use, a public capability for the past eighteen months. And it's the strongest signal of where AI moves over the next twenty-four.

The regime change isn't the model — it's agency

AI press coverage is still obsessed with the model: how many parameters, which benchmark it cleared, who shipped what. Wrong metric.

The shift that actually reorganizes the market in 2026-2028 isn't that the model got smarter. It's that the model stops replying and starts acting.

The operational gap is enormous. A chatbot cuts your writing time. An agent cuts the execution time of a whole flow. The first is a 30% improvement. The second is a 90% improvement — plus a transfer of who does what.

Four vectors are moving at the same time. Worth mapping them.

Vector 1 — Agents

An agent combines three capabilities: it sees the screen, it reasons about what to do, it executes the next step. Then it loops. It does all of that on its own, until the task is finished or it asks you for help.

Today it works well on predictable tasks with stable interfaces: searching and booking travel, processing forms, moving data between spreadsheets, managing inboxes with clear rules. It fails when the interface changed since last time or when there's ambiguity about which decision to make.

The trajectory is clear: every six months, agents tolerate longer tasks with more decisions. In 2024 they handled three steps without losing the thread. In 2026 they handle twenty. The reasonable projection for 2028 is full day-of-work flows, supervised with human approval gates at critical moments rather than at every click.

Small case from my desk: a client asked me to consolidate billing across four vendors into a single monthly report. Used to be four hours — open each portal, export, normalize, total. Now, one instruction to the agent, twenty minutes of review, done. The client pays the same. I deliver faster and take on another client.

Vector 2 — Real multimodality

"Multimodal" today is marketing because most models process multiple formats in turns: you upload an image, they describe it as text, then they continue as text. There's loss at every jump.

Real multimodality means the model processes text, image, audio, and video in a single internal representation, without intermediate transcription. The operational difference: tasks that depend on the temporal or emotional context of a video — "when did the energy in the meeting shift, and which comment caused it?" — go from impossible to trivial.

The frontier is close. Late-2025 models already handled short video reasonably well. The 2027 projection is native long video, long audio, real mixes without loss.

Vector 3 — Embedded AI

The bottleneck on everyday AI use today isn't model quality. It's workflow friction. Open another tab, copy, paste, adjust, switch back.

The friction is dissolving slowly. An AI button inside the email client. A side panel in the spreadsheet. A quick command inside the text editor. Each of these integrations, alone, sounds marginal. Stacked, they redraw the working day for millions of people.

The question for 2028 isn't "which AI tool do you use?" It's "which tool do you use that doesn't have AI inside?" The answer increasingly looks like "none."

Vector 4 — Persistent personalization

State of the art today is the user-loaded version: you upload documents to a Project and the model works with that in the background. Useful but limited. It doesn't persist across Projects. It doesn't learn from your choices over time.

The frontier over the next two years is memory that persists. Models that remember your industry, your style, the decisions you made last week and what they returned. The operational consequence is high: you stop explaining context every time. The AI starts calibrated to you.

There's an honest technical tension here: the more personalized the model, the more sensitive the data it stores. The architectures that win the next 24 months are the ones that solve personalization with strong privacy, not the ones that ask for more data in exchange for more utility.

On AGI, and why it matters less than it sounds

AGI — a system capable of any intellectual task at expert human level — is the loudest debate in the field. And, for your working life over the next two years, the least relevant.

Serious positions from AI researchers cover a wide range: technical optimists say five years, the academic mainstream says twenty to fifty, architectural skeptics say "we need theoretical breakthroughs we don't yet have." There's no consensus because there's no evidence that supports one.

Here's the important part: the change that hits your desk in 2026-2028 doesn't depend on AGI. It depends on the four vectors already in motion. Even if AGI never arrives, the next 24 months redraw your profession.

And if it does arrive, it lands on top of a labor market that already reorganized around "human + AI." The person who prepared for that world is better positioned in any scenario.

The replacement rule

I'll close with the line that matters most in this whole piece.

AI isn't going to replace you. The person who uses AI better than you will.

Concrete, real, not invented. I know two lawyers in the same specialty. Same age, same school. One uses Claude to review contracts: one hour per contract, twelve contracts a week. The other uses nothing: four hours per contract, three contracts a week. The first one bills, net, about four times what the second one does.

This isn't a projection. It's happening in April 2026. In firms in Buenos Aires, Madrid, Mexico City.

The unavoidable question: which side of the river are you on by May 2028?

If you want to dig into how that edge is built without becoming a programmer, the piece on generative AI is the next link. If you want to understand how we got here, start with the history of AI.

Which of the four vectors — agents, embedded, multimodal, personalized — hits your work hardest over the next twelve months?

Today you type a message and the AI replies. In two years you'll barely type anything — the AI will live inside your email, your document, your browser, doing things on your behalf without you asking twice.

That's not sci-fi. That's what already started.

From replying to doing

The AI you know today is a chatbot. You type, it reads, it sends text back. Passive.

The AI of the next two years is an agent. You tell it what outcome you want and it moves around the screen: opens tabs, searches, compares, fills, sends. You watch.

Real case, already shipping: you ask "find three flights to Barcelona in May, max 500 euros, and email me the details." The AI opens the browser, hits flight search engines, filters, summarizes, sends the email. Ten minutes. You're getting coffee.

This isn't a promise. It exists. Claude has a feature called Computer Use that does exactly this, in production, since October 2024.

AI will live inside everything

Today you do your work across five different apps: mail, spreadsheet, slides, chat, browser. To use AI, you open another tab, copy, paste, switch back.

In two years, AI lives inside each of those apps. It's not a separate tool — it's a button in the app you already use.

You're in your inbox, you hit a button, you type "reply to this like me, casual tone." The AI drafts it inside the inbox. No copying, no pasting, no jumping around.

This already happens in Microsoft Office, in Google Docs, in several design apps. In 24 months, an app without AI will feel like an app without internet. Strange.

Conversations that cross formats

Today: if you want to work with a long video, you upload it, you wait for a transcript, you copy, you paste. Fragile. Slow.

Soon, this gets fluid. You hand over a one-hour video and say "draft a follow-up email with the key points." The model watches, listens, understands. Ten minutes.

Or the other way: you talk into your phone while driving. "Turn this into a five-tweet thread." Done by the time you pull in.

Text, audio, image, video in the same conversation. No friction.

AI that knows you

Today every conversation with AI starts from zero. You re-explain who you are, what you do, which client.

Soon, AI remembers. Not yesterday's joke — but your industry, your writing voice, the client you work with every month, the format your reports come in.

A starter version of this exists today. They're called Projects: you upload five documents about you or your business, the AI learns them, and everything you ask inside that Project comes out with that context loaded in the background.

In two years that gets deeper. You won't have to re-explain anything.

And AGI

AGI means an AI that can do any intellectual task a human expert can. It doesn't exist today.

When does it arrive? Some say five years. Others, never.

Nobody knows. Genuinely.

But here's the part that matters: you don't need to wait for AGI for your work to change. The four vectors above — agents, embedded, multimodal, personalized — are already reshaping millions of desks this year.

The real race is between people, not against machines

The line going around is "AI is going to replace me." That's the wrong line.

The right one: the person who uses AI better than you is going to replace you.

It's Excel in 1995. Accountants didn't disappear. Accountants who didn't learn Excel did.

The lawyer who drafts a contract in an hour with Claude replaces the one who takes four without it. The designer who delivers ten proposals in a day replaces the one who delivers three.

In two years, "knowing how to use AI" will be baseline. Not a differentiator. Like sending email today.

But right now, in this 24-month window, it's still real edge. That's the window.

What to take away

Three things, the only ones worth remembering:

AI moves from replying to acting. Agents that execute tasks, not chatbots that print text.

AI moves from outside to inside the apps you already use.

The race isn't against the machine. It's against the person who learned AI before you did. And the window closes around 2028.

I ask the model: "open my inbox, find the last ten unanswered client emails, group them by urgency, reply to the simple confirmations yourself, and give me a summary of the rest so I can decide."

Eight minutes later the summary arrives. Four emails answered. Six pending with a line of context each and a suggested reply. No errors. No hallucinations. You didn't click anything.

The regime change isn't the model — it's agency

AI press coverage is still obsessed with the model: how many parameters, which benchmark it cleared, who shipped what. Wrong metric.

The shift that actually reorganizes the market in 2026-2028 isn't that the model got smarter. It's that the model stops replying and starts acting.

Four vectors are moving at the same time. Worth mapping them.

Vector 1 — Agents

Vector 2 — Real multimodality

"Multimodal" today is marketing because most models process multiple formats in turns: you upload an image, they describe it as text, then they continue as text. There's loss at every jump.

The frontier is close. Late-2025 models already handled short video reasonably well. The 2027 projection is native long video, long audio, real mixes without loss.

Vector 3 — Embedded AI

The bottleneck on everyday AI use today isn't model quality. It's workflow friction. Open another tab, copy, paste, adjust, switch back.

The question for 2028 isn't "which AI tool do you use?" It's "which tool do you use that doesn't have AI inside?" The answer increasingly looks like "none."

Vector 4 — Persistent personalization

On AGI, and why it matters less than it sounds

AGI — a system capable of any intellectual task at expert human level — is the loudest debate in the field. And, for your working life over the next two years, the least relevant.

And if it does arrive, it lands on top of a labor market that already reorganized around "human + AI." The person who prepared for that world is better positioned in any scenario.

The replacement rule

I'll close with the line that matters most in this whole piece.

AI isn't going to replace you. The person who uses AI better than you will.

This isn't a projection. It's happening in April 2026. In firms in Buenos Aires, Madrid, Mexico City.

The unavoidable question: which side of the river are you on by May 2028?

If you want to dig into how that edge is built without becoming a programmer, the piece on generative AI is the next link. If you want to understand how we got here, start with the history of AI.

Which of the four vectors — agents, embedded, multimodal, personalized — hits your work hardest over the next twelve months?

Here's a question almost no one asks when reading about the future of AI: which level of the agency hierarchy are we standing on? Public conversation conflates two technical advances — more capable models and more autonomous systems — and that confusion distorts the timeline on which your work actually gets reshaped.

The frontier that matters for the next twenty-four months isn't measured in parameters. It's measured in how many steps of a full workflow the system executes before you have to step in. And that metric is rising in measurable ways every six months.

Agency hierarchy: where we stand and where we're going

Worth formalizing the historical trajectory before projecting.

Level	Capability	State	Control model
1	Pure retrieval	Pre-2020	Human does everything, system indexes
2	Analysis and classification	2005-2020	System scores, human decides
3	Generation and synthesis	2020-2024	System generates, human validates
4	Restricted agency	2024-2026 (present)	System executes within bounded permissions
5	General agency	2026-2030 (projected)	System plans, executes, iterates
6	AGI	2030+ (speculative)	System generalizes

Today, April 2026, we sit firmly in Level 4 with pioneer agents probing 4.5. Claude Computer Use is the most mature production example of Level 4: the system observes the screen, reasons about the next step, executes via simulated mouse-and-keyboard control, and loops. It terminates when the task is complete or when ambiguity triggers a request for human input.

The transition to Level 5 — systems that plan and execute whole flows without gates at every step — is the vector that most reorganizes the knowledge economy between 2026 and 2030. It doesn't require AGI. It requires current systems to grow on two parameters that are already growing: planning horizon length and robustness to unfamiliar interfaces.

The economics of agents: why this matters more than the model

Per-inference model cost falls roughly an order of magnitude every twelve to eighteen months. That curve is documented in several reports (Stanford AI Index 2025 among others). In parallel, the cost of running an agent — which requires multiple inferences per task, screen observation, and iterative reasoning — follows a slower but also descending curve.

The economic question that defines the next 24 months: when does the total cost of delegating task X to an agent cross below the human cost of doing it? The answer varies by domain. For structured tasks with stable interfaces (form processing, data normalization, rule-driven inbox management), the crossover already happened in 2024-2025. For tasks with ambiguity or unfamiliar interfaces, the crossover is in progress between 2026 and 2028.

The structural consequence: the labor market for knowledge work segments into three bands moving at different speeds. The automatable band — predictable, high volume, low ambiguity — contracts fast. The judgment band — high ambiguity, decision under uncertainty, relational — contracts slowly or expands. The middle band — where most current professional work lives — transforms: the professional who occupies it shifts from executor to agent supervisor.

Multimodality: the real change is internal representation

There's a technical distinction press coverage tends to skip. A "multimodal" model can mean two very different things:

Pipeline multimodality (current state of most systems): the system chains specialized models. Image comes in, vision model describes it as text, language model continues from there. Audio comes in, speech-to-text transcribes, language model continues. There's loss at every transcription and latency stacks up.

Native multimodality (frontier 2026-2028): a single model internally represents text, image, audio, and video in the same embedding space. No intermediate transcription. Inference is direct, multimodal input to multimodal output.

The difference matters for tasks that depend on information lost in transcription. Three examples: emotional tone analysis of a recorded meeting, temporal sync between a verbal comment and a visual reaction, comprehension of music or ambient sound as contextual signal. Intractable in pipeline. Trivial in native.

Reasonable projection: by 2028, frontier models have native multimodality for text + image + audio + short video. Native multimodality for long video and complex mixes lands afterward.

Personalization: in-context vs persistent fine-tuning

Two technical avenues lead to "AI that knows you," with very different trade-offs:

Avenue A — In-context learning. The model receives documents about the user in each conversation and operates inside the context window. That's what Claude Projects does today. Advantages: no retraining, the data doesn't end up in model weights, it's reversible. Limits: memory doesn't persist structurally across sessions, token cost grows with context size, there's a ceiling on how many documents the model can effectively integrate into a single chain of reasoning.

Avenue B — Persistent fine-tuning. The model is adjusted with user data. Memory lives in parameters. Advantages: context always loaded, no per-token marginal cost for personalization at inference time. Limits: training cost, generality loss if overfit, irreversibility, serious privacy risk if data leaks.

Avenue C — Hybrid (the most likely path for 2027-2030). Externally structured memory (vector store + retrieval) plus selective fine-tuning of very specific capabilities like writing voice. Most of the implementation rides on in-context, but certain repetitive user patterns get baked into parameters.

Whichever avenue wins defines the privacy model for personal AI. It's a technical question with immediate political and regulatory consequences, especially in jurisdictions with frameworks like GDPR.

AGI: three interpretations, none decided

The label "AGI" carries three distinct technical meanings worth disambiguating:

Interpretation 1 — General human competence. An AI capable of any intellectual task at expert human level. The most operationally used definition. Serious projections of when it lands range from 5 to 50 years. The opinion distribution in ML researcher surveys clusters around 20 years with long tails on both sides.

Interpretation 2 — Superintelligence. An AI better than human at most intellectual tasks. Different from AGI and depends on which tasks. An AI can be superhuman at text processing and subhuman at emotional comprehension. The "when" question is even less answerable.

Interpretation 3 — Sentience or consciousness. An AI with subjective experience. A philosophical question more than a technical one, and probably without a clean empirical answer.

Interpretation 1 is the one the market is betting will land on a timescale relevant to investment. Three camps on the technical trajectory:

Scaling optimists: scaling laws keep working, emergent capabilities show up at greater scale, AGI arrives via more compute.
Breakthrough realists: scaling has diminishing returns on certain dimensions (abstract reasoning, long-horizon planning) and new architectural breakthroughs are required.
Architectural skeptics: current models are fundamentally next-token predictors and won't reach AGI through more data or compute. New paradigms required.

None of the three positions is ruled out by current evidence. Intellectual honesty requires acknowledging that uncertainty as part of the analysis, not hiding it behind a specific prediction.

Unresolved tensions a power user should map

Three tensions that don't show up in enthusiastic coverage and are worth keeping on the radar:

Tension 1 — Literalism vs robustness in agents. The more literally a model follows instructions, the more predictable it is for autonomous systems but the less tolerant of imperfect instructions. The industry is shifting toward more literalism. Implication: agents work well for users who write detailed instructions and poorly for casual users. The productivity gap between power users and casual users widens.

Tension 2 — Privacy vs personalization. Both technical avenues toward persistent memory generate sensitive data. Regulation moves slower than capability. In 24 months the question isn't "do I want personalized AI?" but "in which jurisdiction and under what contractual frame do I want it?".

Tension 3 — Token cost in production at scale. New tokenization systems can change the token count for the same input by factors of up to 1.35x. For high-volume deploys, that's a non-formal cost increase. Enterprise contracts based on historical estimates drift out of alignment. Worth recalibrating dashboards quarterly.

Thesis: the structural winner of the next 24 months

I'll close with an editorial thesis, not a neutral projection.

The structural winner of the 2026-2028 period won't be the largest model, the highest benchmark, or the agent with the most viral demos. It'll be the system that the most mid-sized companies and independent professionals can delegate real work to with confidence.

That requires three attributes that don't generate headlines:

Behavior predictability. That the same instruction produces the same result tomorrow that it did today. That model updates don't break production flows.
Trustworthy self-verification. That the model says "I know this and I don't know that" instead of confidently inventing. The real frontier of practical utility lives here, not in abstract benchmarks.
Predictable total operating cost. That the cost of delegating task X is calculable in advance and stable quarter to quarter.

The companies building on these three dimensions — past the race for raw capability benchmarks — are the ones that will capture the real-work market when the current hype cycle ends. And that transition happens, without exaggeration, on a timescale of the next twenty-four months.

What's your personal empirical test for deciding whether an AI system is reliable enough to fold into a critical workflow in your business?