The AI timeline — milestones year by year
Before we start: a mental image
Picture the history of AI as a long roller coaster. It climbs slowly for sixty years. It has two deep drops (the winters). And since 2012 it climbs at a slope almost nobody saw coming.
If everything feels fast right now, that's because you're on the steep part.
Before all that: the theoretical seed (1936-1950)
Long before useful computers existed, Alan Turing published "On Computable Numbers" (1936). It defines the Turing machine: a theoretical device that can solve any computable problem if you give it the right algorithm.
Pure abstraction. But it sets the floor.
In 1943, Warren McCulloch and Walter Pitts publish "A Logical Calculus of Ideas" and model an artificial neuron as mathematical logic. The first abstraction of how a brain might be simulated. Conceptual foundation for neural networks.
1950: Turing — "Computing Machinery and Intelligence"
Turing comes back with the almost philosophical question: can machines think? He proposes the Turing Test: if you can't tell a conversation with a machine from one with a human, then it thinks.
Curiosity: ChatGPT and Claude have been beating the Turing Test for years now. Nobody calls AI "done" because the test turned out to be superficial. But it was the starting point.
1956: Dartmouth — the official birth
John McCarthy, Marvin Minsky, Nathaniel Rochester and Claude Shannon convene a summer meeting at Dartmouth College. McCarthy coins the term Artificial Intelligence. The original proposal asked for 7,500 dollars at the time to bring ten people together for two months (Dartmouth archive, 1955).
They all believe they'll solve the problem that summer. They don't. But the discipline is officially born.
1966: ELIZA
Joseph Weizenbaum, at MIT, writes ELIZA. A program that simulates a psychotherapist. It reads "I'm sad" and replies "Why are you sad?" People bonded with it deeply.
ELIZA understood nothing. It did keyword search and answered with templates. But it demonstrated something important: humans project intelligence wherever we see coherent responses.
1974-1980: First Winter
The Lighthill report (1973) in the UK and DARPA cuts in the US shut off the funding spigot. Machines were slow. Algorithms were naive. There was no mass data. Academics kept working, just out of the spotlight.
1980-1987: the expert systems era
A new strategy: instead of machines that learn, build systems that encode human expert rules.
Digital Equipment Corporation's XCON system configured VAX computer orders and saved the company about 40 million dollars a year (DEC internal documentation, cited in Crevier 1993). The expert systems market reached an estimated 3 billion dollars by the late 1980s.
The problem: brittle. Change one rule and the machine fails. Coding millions of rules by hand is impossible.
1987-1997: Second Winter
Expert systems didn't scale. The hype collapsed again. The irony: in the quiet, in unglamorous labs, people like Rumelhart, Hinton and Williams (1986) had already published the backpropagation algorithm that would later make deep learning possible.
But in 1990 nobody had the data or the compute to make it useful.
1997: Deep Blue
IBM builds a machine that defeats Garry Kasparov, world chess champion. A psychological milestone. Deep Blue doesn't understand chess: it calculates millions of possible moves. The public doesn't care about the how. They care about the result. AI is back on the cover.
2009-2012: the break
Three things converge at once.
One: Nvidia's GPUs (CUDA, 2006 onward) enable massively parallel compute. Neural networks that used to take years now train in days.
Two: the internet provides mass data. Fei-Fei Li and her team at Stanford publish ImageNet (Deng et al., 2009): 14 million hand-labeled images. The first giant dataset with academic-quality curation.
Three: in 2012, Geoffrey Hinton's team — with Alex Krizhevsky and Ilya Sutskever — wins ImageNet with AlexNet, a deep convolutional network. They cut classification error from 26% to 15% in a single year (official ImageNet 2012 results).
The message: deep learning works. At scale. More data plus more compute plus more layers equals better performance.
From here, everything accelerates.
2014: Google buys DeepMind
DeepMind, founded by Demis Hassabis, had shown that neural networks could play Atari videogames better than humans. Google buys it for roughly 500 million dollars (figure reported by The Times and the BBC). A clear signal: AI is the future.
2016: AlphaGo beats Lee Sedol
DeepMind builds AlphaGo, which defeats the world Go champion. Go is exponentially more complex than chess. Decades were supposed to be left. AlphaGo combined deep neural networks with reinforcement learning and did it in one.
2017: "Attention Is All You Need"
Vaswani and others publish the paper introducing transformers: a brand new architecture based on the attention mechanism. Without this paper, there's no ChatGPT, no Claude, no Gemini. It now has more than 100,000 citations.
2018-2020: GPT-1, GPT-2, GPT-3
OpenAI scales transformers for text.
GPT-1 (2018): 117 million parameters. GPT-2 (2019): 1.5 billion. GPT-3 (2020): 175 billion parameters, trained on roughly 300 billion tokens (Brown et al., 2020).
Each leap brings capabilities nobody specifically asked for. We call them "emergent abilities" and we still don't understand exactly why they happen.
2021: Anthropic is founded
Dario and Daniela Amodei leave OpenAI with a group of researchers and found Anthropic. The focus: safe and aligned AI. Pioneers of the Constitutional AI approach (Bai et al., 2022).
2022: ChatGPT
OpenAI launches ChatGPT on November 30. It's a tuned version of GPT-3.5 with a simple conversational interface. It hits 100 million users in two months (Stanford AI Index 2024). The fastest-adopted consumer product in history up to that point.
2023-2024: the model war
March 2023: GPT-4 (initially limited multimodal, full multimodal in September with GPT-4 Vision). July 2023: Claude 2. December 2023: Gemini 1.0. March 2024: Claude 3 (Opus, Sonnet, Haiku). June 2024: Claude 3.5 Sonnet. October 2024: Claude 3.5 Sonnet (updated version). November 2024: Anthropic publishes the Model Context Protocol (MCP).
Prices fall. Access rises. AI stops being a monopoly.
2025-2026: agents and MCP
AI stops being a chat tool. It becomes an agent that acts.
It reads your calendar. Answers emails by category. Chains tasks. Plugs into Slack, GitHub, Google Workspace, your CRM. Executes in real time.
MCP is the layer that lets AIs access data and tools without every company having to build a custom integration for every model. It's the missing piece.
A question to close on
If you look at the AI cycles, there's a clear pattern: overpromise, disappointment, quiet research, breakthrough, mass hype. We've lived it at least twice in the twentieth century.
Are we in another cycle, or is this time different? I wrote my view in another piece in this series, "Is another AI winter coming?". Short version: I don't think a hard winter like the previous ones is coming. But I do think the current hype will correct. And the deciding factor comes down to one question: do we deliver what we promise?
In the meantime, mastering AI today is like mastering Excel in the 1990s. A separating skill. The next five years will split those who use AI like pros from those who don't.
The AI timeline — milestones year by year
The field is ninety years old
AI has nearly a century of history. But the last five years were more transformative than the previous ninety combined.
This is the minimum you need to know to understand the moment we're living in.
1950: Turing writes the founding question
Alan Turing publishes "Computing Machinery and Intelligence" and proposes the Turing Test: if you can converse with something and can't tell it from a human, that counts as intelligence. Still a reference 75 years later.
1956: the Dartmouth Conference
John McCarthy, Marvin Minsky and others meet for a summer. McCarthy coins the term "Artificial Intelligence." They announce they'll solve the problem in a few months. They don't. And so the pattern is born: AI promises a lot and delivers less.
1974-1980: the First Winter
The hype collapses. Computers were slow. There was no data. Funding dried up. Academics kept working in silence.
1980-1987: expert systems
Euphoria returns with systems that encode human expert rules. They work in niches but are fragile. Change one rule and everything breaks.
1987-1997: the Second Winter
Expert systems don't scale. Hype died again. In the quiet, people like Geoffrey Hinton and Yann LeCun kept working on neural networks.
1997: Deep Blue beats Kasparov
IBM builds a machine that defeats the world chess champion. It doesn't "understand" chess: it calculates millions of moves per second. But the public cares about the result, not the how.
2012: AlexNet wins ImageNet
A neural network recognizes images better than any program before it. That's the break. From here, everything accelerates.
2016: AlphaGo beats Lee Sedol
DeepMind builds a system that defeats the Go world champion. Experts thought we were 30 years away from this. It arrived in one.
2017: the Transformers paper
Vaswani and his team at Google publish "Attention Is All You Need." It invents the architecture every modern model now uses. Without this paper there's no ChatGPT, no Claude, no Gemini.
2020: GPT-3
OpenAI releases a 175-billion-parameter model. The first one that feels versatile for almost any task.
2022: ChatGPT arrives
OpenAI launches it in November. It hits 100 million users in two months (Stanford AI Index 2024). The whole world finds out AI has shifted.
2023-2024: the model war
GPT-4, Claude 2 and 3, Gemini, Llama. Brutal competition. Prices drop. Access grows.
2025-2026: agents
AI stops being a chatbot. Now it acts: reads your calendar, chains tasks, plugs into your tools. It's full work, not just conversation.
Three ideas to take with you
First: AI is 90 years old, but it has only worked for real for 14 of them (since AlexNet, 2012). If the hype feels new to you, you're not alone. It is new at scale.
Second: the winters weren't because AI was impossible. They happened because data and compute were missing. When both arrived, the curve took off.
Third: you're living through a hinge moment. The next five years will separate those who use AI well from those who don't. You don't need to be technical. You need to understand what it does and what it's for.
The AI timeline — milestones year by year
Before we start: a mental image
Picture the history of AI as a long roller coaster. It climbs slowly for sixty years. It has two deep drops (the winters). And since 2012 it climbs at a slope almost nobody saw coming.
If everything feels fast right now, that's because you're on the steep part.
Before all that: the theoretical seed (1936-1950)
Long before useful computers existed, Alan Turing published "On Computable Numbers" (1936). It defines the Turing machine: a theoretical device that can solve any computable problem if you give it the right algorithm.
Pure abstraction. But it sets the floor.
In 1943, Warren McCulloch and Walter Pitts publish "A Logical Calculus of Ideas" and model an artificial neuron as mathematical logic. The first abstraction of how a brain might be simulated. Conceptual foundation for neural networks.
1950: Turing — "Computing Machinery and Intelligence"
Turing comes back with the almost philosophical question: can machines think? He proposes the Turing Test: if you can't tell a conversation with a machine from one with a human, then it thinks.
Curiosity: ChatGPT and Claude have been beating the Turing Test for years now. Nobody calls AI "done" because the test turned out to be superficial. But it was the starting point.
1956: Dartmouth — the official birth
John McCarthy, Marvin Minsky, Nathaniel Rochester and Claude Shannon convene a summer meeting at Dartmouth College. McCarthy coins the term Artificial Intelligence. The original proposal asked for 7,500 dollars at the time to bring ten people together for two months (Dartmouth archive, 1955).
They all believe they'll solve the problem that summer. They don't. But the discipline is officially born.
1966: ELIZA
Joseph Weizenbaum, at MIT, writes ELIZA. A program that simulates a psychotherapist. It reads "I'm sad" and replies "Why are you sad?" People bonded with it deeply.
ELIZA understood nothing. It did keyword search and answered with templates. But it demonstrated something important: humans project intelligence wherever we see coherent responses.
1974-1980: First Winter
The Lighthill report (1973) in the UK and DARPA cuts in the US shut off the funding spigot. Machines were slow. Algorithms were naive. There was no mass data. Academics kept working, just out of the spotlight.
1980-1987: the expert systems era
A new strategy: instead of machines that learn, build systems that encode human expert rules.
Digital Equipment Corporation's XCON system configured VAX computer orders and saved the company about 40 million dollars a year (DEC internal documentation, cited in Crevier 1993). The expert systems market reached an estimated 3 billion dollars by the late 1980s.
The problem: brittle. Change one rule and the machine fails. Coding millions of rules by hand is impossible.
1987-1997: Second Winter
Expert systems didn't scale. The hype collapsed again. The irony: in the quiet, in unglamorous labs, people like Rumelhart, Hinton and Williams (1986) had already published the backpropagation algorithm that would later make deep learning possible.
But in 1990 nobody had the data or the compute to make it useful.
1997: Deep Blue
IBM builds a machine that defeats Garry Kasparov, world chess champion. A psychological milestone. Deep Blue doesn't understand chess: it calculates millions of possible moves. The public doesn't care about the how. They care about the result. AI is back on the cover.
2009-2012: the break
Three things converge at once.
One: Nvidia's GPUs (CUDA, 2006 onward) enable massively parallel compute. Neural networks that used to take years now train in days.
Two: the internet provides mass data. Fei-Fei Li and her team at Stanford publish ImageNet (Deng et al., 2009): 14 million hand-labeled images. The first giant dataset with academic-quality curation.
Three: in 2012, Geoffrey Hinton's team — with Alex Krizhevsky and Ilya Sutskever — wins ImageNet with AlexNet, a deep convolutional network. They cut classification error from 26% to 15% in a single year (official ImageNet 2012 results).
The message: deep learning works. At scale. More data plus more compute plus more layers equals better performance.
From here, everything accelerates.
2014: Google buys DeepMind
DeepMind, founded by Demis Hassabis, had shown that neural networks could play Atari videogames better than humans. Google buys it for roughly 500 million dollars (figure reported by The Times and the BBC). A clear signal: AI is the future.
2016: AlphaGo beats Lee Sedol
DeepMind builds AlphaGo, which defeats the world Go champion. Go is exponentially more complex than chess. Decades were supposed to be left. AlphaGo combined deep neural networks with reinforcement learning and did it in one.
2017: "Attention Is All You Need"
Vaswani and others publish the paper introducing transformers: a brand new architecture based on the attention mechanism. Without this paper, there's no ChatGPT, no Claude, no Gemini. It now has more than 100,000 citations.
2018-2020: GPT-1, GPT-2, GPT-3
OpenAI scales transformers for text.
GPT-1 (2018): 117 million parameters. GPT-2 (2019): 1.5 billion. GPT-3 (2020): 175 billion parameters, trained on roughly 300 billion tokens (Brown et al., 2020).
Each leap brings capabilities nobody specifically asked for. We call them "emergent abilities" and we still don't understand exactly why they happen.
2021: Anthropic is founded
Dario and Daniela Amodei leave OpenAI with a group of researchers and found Anthropic. The focus: safe and aligned AI. Pioneers of the Constitutional AI approach (Bai et al., 2022).
2022: ChatGPT
OpenAI launches ChatGPT on November 30. It's a tuned version of GPT-3.5 with a simple conversational interface. It hits 100 million users in two months (Stanford AI Index 2024). The fastest-adopted consumer product in history up to that point.
2023-2024: the model war
March 2023: GPT-4 (initially limited multimodal, full multimodal in September with GPT-4 Vision). July 2023: Claude 2. December 2023: Gemini 1.0. March 2024: Claude 3 (Opus, Sonnet, Haiku). June 2024: Claude 3.5 Sonnet. October 2024: Claude 3.5 Sonnet (updated version). November 2024: Anthropic publishes the Model Context Protocol (MCP).
Prices fall. Access rises. AI stops being a monopoly.
2025-2026: agents and MCP
AI stops being a chat tool. It becomes an agent that acts.
It reads your calendar. Answers emails by category. Chains tasks. Plugs into Slack, GitHub, Google Workspace, your CRM. Executes in real time.
MCP is the layer that lets AIs access data and tools without every company having to build a custom integration for every model. It's the missing piece.
A question to close on
If you look at the AI cycles, there's a clear pattern: overpromise, disappointment, quiet research, breakthrough, mass hype. We've lived it at least twice in the twentieth century.
Are we in another cycle, or is this time different? I wrote my view in another piece in this series, "Is another AI winter coming?". Short version: I don't think a hard winter like the previous ones is coming. But I do think the current hype will correct. And the deciding factor comes down to one question: do we deliver what we promise?
In the meantime, mastering AI today is like mastering Excel in the 1990s. A separating skill. The next five years will split those who use AI like pros from those who don't.
The AI timeline — milestones year by year
Why history matters more than it seems
AI isn't new. What's new is that it now works at consumer scale. Understanding the cycles lets you separate genuine breakthroughs from hollow speculation. And it positions you better for the changes still coming.
If you want to operate in this field with judgment, knowing that ChatGPT launched in 2022 isn't enough. You have to understand why the predecessors failed and what changed in the 2010s for it to work this time.
1936-1950: theoretical foundations without machines
Turing (1936) and McCulloch-Pitts (1943) establish that intelligence is computable. But this is pure theory: useful computers were still years away.
The underlying insight: intelligence isn't magic, it's mathematics. If you can formalize logic, a machine can execute it. That was conceptually radical.
1950-1956: Turing and Dartmouth
Turing publishes "Computing Machinery and Intelligence" in 1950. He operationalizes the philosophical question with the Turing Test.
McCarthy, Minsky, Rochester and Shannon organize Dartmouth in 1956. They coin the term. They underestimate the scale of the problem by a factor of probably five thousand. Intelligence requires context, data and compute in magnitudes that were impossible in the 1950s. A typical computer of the era took up a building and processed kilobytes.
The original Dartmouth budget asked for 7,500 dollars at the time for ten people for two months (Dartmouth archive). For perspective: training GPT-4 is estimated at hundreds of millions of dollars (Stanford AI Index 2024, no official figure but consistent with public estimates from Epoch AI).
1966-1973: ELIZA and the first expectations crisis
Weizenbaum (MIT) creates ELIZA. A program that simulates a psychotherapist with keyword search and templates. People bonded with it deeply.
ELIZA is brilliant pedagogically: it demonstrates that humans project intelligence onto coherent responses, with no real understanding underneath.
But it also exposes the limits. ELIZA doesn't generalize, doesn't learn, doesn't adapt. It's a decision tree in costume.
That triggered the first mass disillusionment. If a simple pattern-matching machine can fool people but can't go further, what's the next step? The answer: learning from data, not manual rule programming. But that would take 40 years to be technically possible.
1973-1980 and 1987-1997: why the winters lasted so long
The winters didn't happen because AI was impossible. They happened because the conditions for success didn't exist.
Computers: slow, megahertz not gigahertz. Data: scarce, public internet only started spreading in the mid-1990s. Algorithms: naive for real problems. Investment: drained by unmet promises (Lighthill report 1973 in the UK, DARPA cuts in 1974).
The cycle: hype, investment, unmet promises, withdrawal of funding, researcher silence.
In the quiet, though, serious people kept working. Rumelhart, Hinton and Williams (1986) rediscover backpropagation in the context of deep networks (Nature 323). LeCun develops convolutional networks at Bell Labs (LeNet, 1989-1998). Hopfield publishes the networks that bear his name (1982). But without data or compute, these were academic curiosities.
The second failed boom: expert systems (1980-1987)
Systems like DEC's XCON, Stanford's MYCIN, and others appeared to prove that AI could generate economic value. XCON saved DEC roughly 40 million dollars a year (Crevier 1993, citing internal documentation). The global expert systems market reached an estimated 3 billion dollars annually by the late 1980s.
But the structural fragility ended it. Expert systems didn't learn. Every rule change could break behaviors in cascade. Maintaining them cost more than rewriting them. When Japan defunded the Fifth Generation project (which had announced 850 million dollars in 1982), the Western market took the cue.
2010-2012: the break — convergence of three factors
Three things coincided in a very narrow window.
One: the general-purpose GPU. Nvidia's CUDA (2006 onward) enabled massively parallel compute. Deep neural networks went from "impossible to train" to "train in days."
Two: the internet and data. ImageNet (Deng et al., 2009) made 14 million labeled images available. Wikipedia consolidated structured knowledge. YouTube accumulated video at planetary scale.
Three: the right algorithms. Backpropagation plus deep layers plus enormous datasets equals networks that can learn complex patterns they couldn't before.
AlexNet (Krizhevsky, Sutskever, Hinton, 2012) is the proof of concept. An eight-layer CNN trained on ImageNet that won by a historic margin (top-5 error: 15.3% versus 26.2% from the runner-up, per the official ImageNet 2012 results).
From there, the exponential curve never stops.
2012-2017: deep learning dominates any domain with data
2013: AlphaGo research begins at DeepMind. 2014: Goodfellow publishes GANs (Generative Adversarial Networks). 2015: He and others publish ResNets (residual networks), enabling unprecedented depth. 2016: AlphaGo defeats Lee Sedol (Silver et al., Nature 529). 2016: sequence-to-sequence models dominate machine translation. 2017: Vaswani and others publish "Attention Is All You Need" (NeurIPS 30).
The aggregate message: deep learning plus scale plus data equals superhuman performance on any task with prior training data.
2017-2020: transformers and GPT — prelude to the big leap
The transformer is the architecture that changed everything. The innovation: the attention mechanism lets the network mathematically decide which parts of the context matter for predicting the next token.
OpenAI experiments. What happens if you train a giant transformer on text from the internet?
GPT-1 (2018): 117 million parameters, trained on 40 GB of text. GPT-2 (2019): 1.5 billion parameters. GPT-3 (2020): 175 billion parameters, ~300 billion tokens (Brown et al., 2020, NeurIPS).
Every leap brings emergent capabilities: tasks the model wasn't trained for specifically and yet performs. That phenomenon is still an active area of research. The dominant hypothesis is that certain abilities require a minimum scale threshold to appear.
2021-2022: Anthropic and the public paradigm shift
Anthropic is founded in 2021 by Dario Amodei (former VP of Research at OpenAI), Daniela Amodei and a group of researchers. Explicit focus on safety, interpretability and alignment. Pioneers of Constitutional AI (Bai et al., 2022): an approach where the model is trained to critique its own outputs against a set of principles.
ChatGPT launches on November 30, 2022. It's an RLHF version of GPT-3.5 with a conversational interface. It hits 100 million users in two months (Stanford AI Index 2024). At the time, it was the fastest-adopted consumer product in history.
Operational consequence: for the first time in history, AI is a common tool, not a laboratory object.
2023-2024: competition, specialization and differentiation
OpenAI: GPT-4 (March 2023, full multimodal in September 2023 with GPT-4V). Google: Gemini 1.0 (December 2023), Gemini 1.5 (February 2024) with long context window. Meta: Llama 2 (July 2023) and Llama 3 (April 2024), open to the community. Anthropic: Claude 2 (July 2023), Claude 3 (March 2024), Claude 3.5 Sonnet (June 2024 and update in October 2024). Microsoft: Copilot embedded in Office 365.
Prices fall by two orders of magnitude between 2023 and 2025. Access rises. AI stops being a monopoly.
Each model differentiates on training data, specific architecture and fine-tuning choices. Claude calibrates uncertainty better. GPT-4 is more aggressive in trying to solve. Gemini pushes multimodal capabilities. Llama dominates the open source segment.
2024-2026: the era of agents and MCP
A new paradigm. AI stops just predicting. It acts.
It reads your calendar and books meetings. It monitors mail and answers predefined categories. It accesses tools: GitHub, Figma, Slack, CRM, databases. It chains tasks: write, revise, iterate, publish. It learns from feedback in real time.
MCP (Model Context Protocol), launched by Anthropic in November 2024, is an open protocol that lets any AI access tools and data sources without custom integrations. It was rapidly adopted by OpenAI, Google and others, turning MCP into a de facto standard in less than a year.
The deeper implication: AI is no longer a tool you reach for occasionally. It's an assistant that operates inside your stack.
The historical pattern and what it says about the future
The pattern is clear and has repeated several times.
First, a technological breakthrough. Then, initial hype. Then, excessive promises. Then, reality catches up to expectations more slowly than anyone predicted. Then, a new normal and the next breakthrough.
We're in phase four or five. ChatGPT promised to change everything. Four years later: it changed a lot, but not "everything." Promises of imminent AGI were exaggerated. The underlying technology, however, is real, works and scales.
The next likely breakthrough is agents with persistent memory and sustained action. Systems that remember conversations across months, learn from accumulated feedback, and execute multi-step tasks without intervention. That seems to be two to four years away.
The blog's thesis
Looking at the full history changes how you read the present.
People who have been following AI for decades don't get rattled by every new model. They know the leaps are real but the timelines are always longer than the hype suggests. They know emergent capabilities exist, and they also know that hallucinations persist because they're inherent to the architecture.
Anyone who entered the field in 2023 experiences every release as a civilizational shift. That leads them to overinvest, overpromise, and then overcorrect when something doesn't work.
The people who operate best in this field aren't the most optimistic or the most skeptical. They're the ones who learned to read cycles. They know there are moments of real acceleration (we're in one) and moments of correction (they always come). And they place their bets through that lens.
AI isn't the future. It's the present. The operational question is: where in the cycle are you, and what decisions does that demand of you this week?