A History of Artificial Intelligence — From Turing to the Phone in Your Pocket

TL;DR

Artificial intelligence started as a lab idea in 1956, survived two winters when it was nearly abandoned, and blew up in 2022 with ChatGPT and Claude. Here's the full story — written for people who use AI without knowing where it came from.

✦ Summarized with Claude at publish time

✦ AI rewrite

Read it as…

The question that unsettled a generation

Alan Turing died in 1954 before any of this existed. He was 41. He'd helped win a war by cracking the Enigma machine and written a paper that still, 76 years later, defines how we think about thinking machines. What he published in 1950 in the journal Mind wasn't philosophy for philosophy's sake. For Turing, asking whether a machine could think was the logical consequence of the computing he'd just helped invent.

Two years before he died, the British state convicted him for being gay. They gave him a choice between prison and chemical castration. He picked the second. His career stopped there.

I mention it because the history of AI gets told as a clean sequence of discoveries. It isn't. It's a story of personal obsessions, brutal leaps, and lost decades. Let's walk through it.

1956: ten people at Dartmouth

Summer of 1956. John McCarthy — a young mathematician with more ambition than budget — convinces nine colleagues to spend two months locked up at Dartmouth College in New Hampshire. The proposal he wrote to get funding says, verbatim, that they're going to solve the problem of making a machine think.

They don't solve it. But they walk out with the term "artificial intelligence" — McCarthy preferred it over "thinking machines" because he wanted to duck the philosophical debates — and with a new field of research. The original budget was $7,500 according to the Dartmouth proposal archive. That's about $80,000 today. A whole discipline was born for that price.

Euphoria, promises, and the first blow (1956–1974)

The early years ran on uncontrolled optimism. Herbert Simon predicted that in 10 years a computer would be the world chess champion. It took 40 — Deep Blue didn't beat Kasparov until 1997. Marvin Minsky wrote in 1967 that within a generation artificial intelligence would be "substantially solved." Sixty years on, it still isn't.

The programs of the era were impressive for the hardware they ran on but didn't scale. ELIZA appeared in 1966 — an MIT program that pretended to be a therapist using pattern-matching tricks. It worked well enough that some patients opened up emotionally to the machine. Joseph Weizenbaum, its creator, was so uncomfortable with that result that he spent the rest of his career warning about the dangers of AI.

When governments realized the promises weren't landing, they pulled the plug. The 1973 Lighthill Report in the UK was brutal — it said AI was useless in practice. DARPA in the US did the same. The first winter started. It lasted a decade.

Expert systems and the second collapse (1980–1993)

AI came back in the 1980s with a different approach: expert systems. Programs that encoded a specialist's knowledge into explicit rules of the "if A and B, then C" kind. MYCIN diagnosed infections. XCON configured computer orders and saved Digital Equipment Corporation around $40 million a year, per DEC's own historical documentation.

Japan bet big. Its 1982 Fifth Generation Computer program mobilized the equivalent of $850 million in today's dollars. The goal was to build computers that reasoned logically. The US responded. So did Europe.

But expert systems were brittle. They worked in their narrow domain and broke the moment they stepped outside it. Every rule had to be written by hand. They cost millions to maintain. When it was clear they didn't scale, funding dried up again. Second winter, 1987 to 1993.

The quiet revolution (1993–2017)

While the world believed AI was dead, three researchers kept working on neural networks. Geoffrey Hinton in Toronto. Yann LeCun at NYU and Bell Labs. Yoshua Bengio in Montreal. Almost nobody paid attention. In later interviews, Hinton has said his papers were systematically rejected through the 1990s. Their bet was simple: with enough layers of artificial neurons (deep learning), enough data, and enough compute, the networks would work.

The internet gave them the data. GPUs — chips designed for video games — gave them the power. In 2012, Hinton's AlexNet won the ImageNet contest by a huge margin, dropping image-recognition error from 26% to 15% according to the competition's official records. The industry woke up.

Google bought DeepMind in 2014 for roughly $500 million, based on reporting from the time. In 2016, DeepMind's AlphaGo beat the world Go champion — a game with more possible positions than there are atoms in the universe. The result was published in Nature. A year later, in 2017, a Google team published "Attention Is All You Need" at NeurIPS — the paper that introduced the Transformer architecture. It's the backbone of everything you use today: GPT, Claude, Gemini, Llama.

The explosion (2022–today)

November 2022. OpenAI ships ChatGPT. A hundred million users in two months. Anthropic ships Claude. Google answers with Gemini. Meta releases Llama. Microsoft puts AI in all of Office and invests $13 billion in OpenAI, per the company's financial reporting.

In three years, AI went from conference topic to everyday infrastructure. The Stanford AI Index Report 2024 logs global private AI investment at $67 billion in 2023. What Turing imagined in 1950 as a thought experiment is getting closer every day.

So what now?

Seventy years after the Dartmouth summer, the original question — can a machine think? — still has no clean answer. The models you use today do things neither Turing nor McCarthy could have imagined. They also fail in ways no human would. They make up books. They cite papers that don't exist. They miss obvious dates.

Where is this heading? That's the question worth sitting with. Here's one to take with you: if the last seventy years teach us that AI moves in cycles, are we at the top of another cycle, or the start of a genuinely new curve? If you want to dig into why Anthropic is betting it's the second, read #0027 on the frontier model race.

Ten people, two months, and one absurd bet

The bet these ten scientists made in 1956 was simple and, at the time, ridiculous: a machine can think. They locked themselves in a New Hampshire college for two months and tried to crack it. They didn't. But that meeting — the Dartmouth Conference — gave the field its name, artificial intelligence, and kicked off seventy years of work that ends, roughly, at the phone in your hand.

The idea was already in the air. In 1950, Alan Turing asked a question nobody had put in writing yet: can a machine think? He didn't answer it. He proposed a test instead: if you talk to a machine and can't tell it from a human, something interesting is happening. That test is still running today.

Two winters and a broken promise

The early years were pure euphoria. Researchers promised intelligent machines within 20 years. They didn't deliver. 1960s computers couldn't handle it. When governments noticed the gap between promises and results, they cut funding. That was the first "AI winter" — a stretch where almost no one wanted to pay for the field.

There were two winters. The second came in the 1980s. But each time AI stalled, someone figured out a new way to push it a little further.

What finally broke it open

The real leap came in the 2010s. Three things lined up at once: mountains of data on the internet, computers that were brutally more powerful, and a technique called deep learning that let machines teach themselves from that data. In 2012, a neural network crushed an image-recognition contest. The industry paid attention.

The moment your aunt found out

November 2022. OpenAI shipped ChatGPT. According to the Stanford AI Index Report 2024, it hit a hundred million users in two months — the fastest-adopted consumer product in history. Instagram took two and a half years to get there. AI stopped being a lab topic and became the thing your aunt uses to write emails.

Today, when you open Claude on your phone, you're using the end result of seventy years of work, failure, and strange bets.

Takeaways:

Three things worth pinning down. First, today's AI didn't come out of nowhere — it's the child of seventy years of back and forth. Second, progress isn't a straight line; it's a cycle of euphoria and abandonment, and knowing that helps you read today's hype more clearly. Third, the big leaps happened when someone kept pushing ideas the mainstream had written off — Hinton and neural networks are the cleanest example.

The question that unsettled a generation

Two years before he died, the British state convicted him for being gay. They gave him a choice between prison and chemical castration. He picked the second. His career stopped there.

I mention it because the history of AI gets told as a clean sequence of discoveries. It isn't. It's a story of personal obsessions, brutal leaps, and lost decades. Let's walk through it.

1956: ten people at Dartmouth

Euphoria, promises, and the first blow (1956–1974)

Expert systems and the second collapse (1980–1993)

The quiet revolution (1993–2017)

The explosion (2022–today)

So what now?

What the official story leaves out

There's something odd about how the history of AI gets told. The Wikipedia version runs it as a clean sequence of milestones: Turing, Dartmouth, the perceptron, expert systems, deep learning, ChatGPT. End of story. But that narrative hides the interesting part — that progress was chaotic, that the important ideas came from people the mainstream had written off, and that the big leaps happened when circumstances nobody had orchestrated finally lined up.

If you're using AI every day for work, that real history matters. Not out of nostalgia. Because it lets you read more carefully what you're being told is coming next. Here's the full version.

Foundations before the name existed (1936–1956)

The theoretical foundations were in place before Dartmouth. In 1936, Alonzo Church and Alan Turing independently proved, in the Proceedings of the London Mathematical Society, that any computation can be expressed as a sequence of simple operations. That's the Church-Turing thesis. It's still the theoretical ceiling for what a computer can do.

Warren McCulloch and Walter Pitts published, in 1943, the first mathematical model of an artificial neuron in the Bulletin of Mathematical Biology. They showed that networks of simple neurons could, in theory, compute any logical function. It was pure math. Decades away from hardware that could actually run it.

Turing pushed the question further in 1950 with "Computing Machinery and Intelligence," published in Mind. He didn't just propose his famous test. He argued systematically against every objection to a thinking machine — the theological one, Gödel's mathematical one, the consciousness one, the "it's just following orders" one. What Turing couldn't foresee was how much compute would be needed to cross from "possible in principle" to "works in practice." That gap defined the next 60 years.

The golden age and its mirages (1956–1974)

Dartmouth produced a field, not a product. But the researchers talked as if the product were around the corner. Simon. Minsky. Newell. McCarthy. All with serious credentials. All predicting revolutions that would take decades or not play out the way they described.

The early programs impressed for the hardware of the time. Newell and Simon's Logic Theorist proved mathematical theorems in 1956. Weizenbaum's ELIZA showed in 1966 that a program could simulate conversation with pattern-matching rules — a trick that revealed more about human psychology than about artificial intelligence. Frank Rosenblatt's Perceptron learned to classify simple patterns.

The brakes came from two directions. Hardware: 1960s computers had the power of a modern pocket calculator. Theory: in 1969, Minsky and Papert published Perceptrons, mathematically demonstrating the limits of single-layer neural networks. The book was correct in its narrow analysis but devastating in its impact — it stalled neural network research for over a decade. Hinton, Bengio and LeCun would come back to that line when almost nobody else would touch it.

The 1973 Lighthill Report for the British Science Research Council recommended cutting public funding. DARPA did the same in the US. First winter, 1974 to 1980.

Expert systems: rise and collapse (1980–1993)

AI came back in the 1980s with a radically different approach. Instead of trying to replicate general intelligence, the field aimed to capture specific expertise. Massive rule trees of the form "if the patient has fever AND sore throat AND white spots, THEN consider tonsillitis."

MYCIN, developed at Stanford, diagnosed infectious diseases with accuracy comparable to human specialists. XCON, used at Digital Equipment Corporation from 1980, configured computer orders and saved the company roughly $40 million a year according to DEC internal documentation cited in later research. The expert systems market was moving billions.

Japan bet harder than anyone. Its Fifth Generation Computer program, launched in 1982, mobilized the equivalent of $850 million today with the explicit goal of building computers that reasoned logically and processed natural language. The US responded with the Microelectronics and Computer Technology Corporation (MCC) and the DARPA Strategic Computing Initiative. Europe launched ESPRIT.

But expert systems had structural weaknesses. Each rule had to be programmed by hand. They didn't scale — doubling the complexity wasn't a linear cost but an exponential one. They broke the moment the problem stepped a millimeter outside its designed domain. And they didn't learn. When it was clear Fifth Generation wasn't going to produce the promised revolution, funding dried up again. Second winter, 1987 to 1993.

There's a lesson here worth taking. Expert systems didn't fail for lack of data or compute — they failed because the approach was wrong. Trying to capture intelligence in explicit rules programmed by humans is a trap. Real intelligence emerges from learning patterns of experience, not from encoding rules. That insight unlocked the next era.

The wilderness and the rise of deep learning (1993–2017)

Alongside the expert-systems collapse, Judea Pearl published in 1988 "Probabilistic Reasoning in Intelligent Systems," introducing notation and algorithms for probabilistic reasoning that would eventually replace rigid logic. Pearl won the Turing Award in 2011 for this work.

But the real revolution came from three researchers working on neural networks when almost nobody took them seriously. Geoffrey Hinton in Toronto. Yann LeCun at NYU and Bell Labs. Yoshua Bengio in Montreal. Their argument: Minsky's single-layer networks failed because they were shallow. With multiple layers (deep learning) and enough data, they could learn complex representations. Rumelhart, Hinton and Williams had already shown in 1986 in Nature that the backpropagation algorithm could train deep networks. Two missing ingredients finally showed up 20 years later.

Data: the internet provided it. ImageNet — a 14-million-image labeled dataset built by Fei-Fei Li starting in 2009 — was the first benchmark capable of separating good approaches from bad ones.

Compute: GPUs, chips designed to render video games but perfectly suited for the matrix multiplications of neural networks, delivered the power. Alex Krizhevsky, Hinton's student in Toronto, trained AlexNet on two Nvidia GPUs.

The inflection point was ImageNet 2012. AlexNet dropped image-recognition error from 26% to 15% according to the competition's official records — a massive leap in a field where improvements were usually measured in tenths. From there, deep learning stopped being an academic niche.

Google bought DeepMind in 2014 for roughly $500 million. In 2016, DeepMind's AlphaGo beat Lee Sedol at Go — a milestone published in Nature that showed neural networks could dominate problems of extreme combinatorial complexity.

In 2017, "Attention Is All You Need" by Vaswani and colleagues at Google, presented at NeurIPS, introduced the Transformer architecture. The innovation: replacing recurrent networks with attention mechanisms that process entire sequences in parallel. That architecture is the backbone of GPT, of Claude, of Gemini, of Llama. Of everything you use today.

The generative explosion (2018–2026)

GPT-1 in 2018. GPT-2 in 2019 — OpenAI hesitated to release it out of concern for its disinformation potential. GPT-3 in 2020 with 175 billion parameters, published at NeurIPS as "Language Models are Few-Shot Learners" by Brown and colleagues. Each version was orders of magnitude larger, and quality improved in ways that surprised even the community.

November 2022: ChatGPT. A hundred million users in two months — the fastest-adopted consumer product in history according to the Stanford AI Index Report 2024. Anthropic — founded in 2021 by Dario and Daniela Amodei with a focus on safety and alignment — launched Claude. Google answered with Bard, then Gemini. Meta released Llama as an open model. Microsoft invested $13 billion in OpenAI.

The Stanford AI Index 2024 reported global private AI investment reaching $67 billion in 2023. Fortune 500 companies mentioned "AI" in their earnings calls more than any other technology term for the first time in history.

The blog's thesis

Here's what this blog believes after reading this history. AI isn't a technology — it's an accumulation. Each winter left sediment that the next wave used as beach. Expert systems failed but left the notion of how to encode knowledge. Backpropagation survived two decades of dismissal before becoming infrastructure. Transformers emerged from machine-translation research nobody had marked as revolutionary.

That has two practical consequences for you, the person using these tools today. First: today's models aren't the end of the story. They're the sediment the next wave will use. Everything you learn now — how to prompt, how to structure context, how to think in collaboration with a model — will keep paying off even when the technology changes. Second: distrust the hype, but also distrust the cynicism. The winter always looks permanent when you're inside it. Hinton worked for 30 years on neural networks while everyone told him he was wasting his time. If you have a conviction about how to use AI and the market is telling you otherwise, check it seriously — but don't abandon it because of fashionable opinion.

Turing's question is still open. Can a machine think? We don't know. What we do know is that seventy years of work, two winters, and one summer in 1956 have put us somewhere the question no longer sounds absurd. That, on its own, is a history worth understanding.