Google and Gemini — the giant that had to wake up

TL;DR

Google invented the Transformer architecture in 2017, founded DeepMind, won a Nobel with AlphaFold — and still showed up late to the chat party. Bard launched in March 2023 as a rushed reply to ChatGPT: a demo with a factual error wiped a hundred billion dollars off Alphabet's market cap in one day. In April 2023 Google merged Google Brain with DeepMind under Demis Hassabis. In December 2023 it rebranded the whole line as Gemini. In 2024-2025 it took the lead on long context (1 million, then 2 million tokens) and on native integration with Workspace. In 2026 Gemini 2.5 Pro is a real competitor on capability. Google's structural advantage isn't the model: it's that Gmail, Docs, Drive, Maps, and YouTube are already open in the next tab. For now, that distribution is unbeatable.

✦ Summarized with Claude at publish time

✦ AI rewrite

Read it as…

On Wednesday, February 8, 2023, Google posted a thirty-second promo video for its new chatbot, Bard. In the video, someone asked Bard what new discoveries from the James Webb Space Telescope they could share with a nine-year-old. Bard answered with three bullets. The third one said James Webb had taken "the first images of a planet outside our solar system."

That's false. The first exoplanet was photographed in 2004 by the ESO's Very Large Telescope in Chile — almost twenty years before James Webb. An astronomer named Grant Tremblay caught it on Twitter that same afternoon.

The next day, Thursday, February 9, 2023, Alphabet stock fell 7.7 percent. A hundred billion dollars of market cap wiped out in one session. A factual slip in an ad demo. A hundred billion.

That scene is useful for understanding everything that came next. Google didn't react late to ChatGPT because it didn't get the technology — it had invented it. It reacted late because the pressure to react pushed it into shipping something that wasn't ready.

The irony of having invented the Transformer

In June 2017, eight Google Brain researchers published "Attention Is All You Need" (Vaswani et al., 2017). The paper proposed a neural network architecture built entirely on the attention mechanism — the Transformer architecture. It wasn't incremental. It was a full replacement for prior approaches (RNNs, LSTMs) on sequence tasks.

That architecture is the foundation running underneath GPT-4o, Claude Opus, Gemini 2.5 Pro, and every large language model on the market today. Google published it openly, with code and reference weights. It was a gift to the field.

What wasn't a gift was turning it into product. Google had BERT (2018) and LaMDA (2021) working internally. Sam Altman and his colleagues at OpenAI, meanwhile, packaged the same underlying ideas inside a chat box and opened it for free in November 2022. That difference — lab vs. product — explains almost everything.

The April 2023 reorg

Three months after the Bard mess, Sundar Pichai made a call Google had been avoiding since 2014. He merged Google Brain and DeepMind into a single entity, Google DeepMind, under single leadership: Demis Hassabis.

Until then, Google had run two AI research teams in parallel, with different cultures, different leadership, and some documented internal rivalry in the press. Brain was the Mountain View lineage, pragmatic and close to product. DeepMind was the London lineage, more academic and more ambitious on long horizons. The merger settled it: Hassabis runs the show, both teams execute.

In December 2023 Google rebranded the whole product as Gemini and introduced Gemini 1.0 in three sizes — Ultra, Pro, Nano. The unveiling came with another minor scandal: a promo video edited to look more fluid than the model actually was, which Google had to clarify publicly. But this time the product underneath worked. The technical conversation stopped being "Google is losing" and started being "Google is back in the fight."

The product line as of April 2026

Gemini today is a family differentiated by use case.

Gemini 2.5 Pro is the flagship. Deep reasoning, native multimodality (text, image, audio, video), and — the sharpest technical differentiator — a 2-million-token context window. For perspective: that's roughly 1,500 pages of text, or the equivalent of "War and Peace" plus "Anna Karenina" combined, processed in a single prompt. No direct commercial competitor offers that range as of April 2026.

Gemini 2.0 Flash is the high-speed tier. Low latency, low cost, native multimodality. The default for applications where sub-second response matters most.

Gemini Nano runs locally on Android devices — the model lives inside the phone, no internet connection required. It's Google's mobile play and it's unique: no competitor has integration at that OS level.

NotebookLM is a separate application built on Gemini. You load in documents (PDFs, notes, articles) and get synthesis, Q&A, and — its most talked-about feature — an auto-generated podcast with two voices discussing the content. It's probably the Gemini product that converted the most professional users to Google over the past year.

Workspace integration — Gemini inside Gmail, Docs, Sheets, Drive, Calendar, Meet, and Slides. This is the strategic axis that matters most, and it deserves its own section below.

The real bet: Workspace as the vehicle

Google has one advantage neither OpenAI nor Anthropic can match in the short run: pre-installed audience. Gmail with more than 1.8 billion active accounts. Android on 3 billion phones. Enterprise Workspace with hundreds of millions of paying users. YouTube with more than 2.5 billion monthly users.

That isn't a detail. It's an integration surface that would take a decade to replicate from zero.

Google's strategic move between 2024 and 2026 was making Gemini feel like part of the work operating system. "Help me write" in Gmail. "Summarize this doc" in Docs. Automatic analysis in Sheets. Action item extraction in Meet. YouTube video summaries with one click. Semantic search in Drive that finds the document even when you don't remember the exact words.

Each of those individual features has competitors in the market. What doesn't have a competitor is the combination: all of them together, frictionless, inside a suite you already use.

It's the enterprise version of the same pattern Apple uses with the iPhone: not always the best in each isolated category, but better than everyone else on integration.

What Gemini does well and what it doesn't

Being honest about the trade-offs.

Where Gemini has a real edge. Long context — nobody else has 2 million tokens. Workspace integration — glued to your work tools, no exit needed. Native multimodality — processing text + image + video + audio in a single request. Cost in the Flash tier — for volume tasks it's one of the cheapest options out there. And generous free access: Gemini's free version stays more useful than the competitors' free tiers for casual use.

Where others are ahead. For professional work on sensitive data — contracts, financial analysis, production code — Claude is still more consistent, more literal with instructions, and more predictable in its refusals. That's not fan opinion: it's the repeated observation among professionals running them side by side. For heavily artistic image generation, Midjourney and DALL-E are still on top. For real-time voice conversation, ChatGPT's advanced voice mode is still the reference.

And there's the scar from February 2024: the image-generation episode where ethnic diversity got applied without contextual judgment — 1943 German soldiers depicted as people of multiple ethnicities. Google paused image generation of people for weeks. The problem wasn't the intent (keeping the model from reproducing harmful biases); it was execution at scale. An episode that showed alignment of these models isn't a solved problem for anyone, not even for the lab that won the Nobel.

To close, and to keep going

Google isn't going to win the AI market because its model is best on every benchmark. It's going to win meaningful share because AI will be embedded in the software millions of people already open in the morning without thinking about it. That's the bet. Technically reasonable, commercially hard to match, and carrying the open question of whether the power concentration it implies is healthy — but that's another conversation.

What matters for you, professional reader, is the working rule that emerges from all of this: don't pick one tool. Use Gemini where your work already lives (Workspace, Android, long context). Use Claude where stakes are high and error margins low (professional analysis, production code, legal documents). Use ChatGPT where image and voice matter. It's multi-vendor, and that's healthy.

If you want to understand how we technically measure which AI is better at what, How AIs are measured is the next link. If you want the full competitive map without fandom, The AI race.

Where in your current workflow could Gemini slot in without you having to switch tools?

You open Gmail this morning. Above the first email, a quiet little button that says "Help me write." You tap it without thinking. A reasonable block of text appears, close to what you were going to send anyway. You tweak it. You send it.

You just used AI without opening a new tab. That's Gemini, and that's Google's bet.

The story of how we got here is more interesting than it looks.

The giant that showed up late to its own party

In mid-2017, a group of researchers at Google Brain published a paper titled "Attention Is All You Need." That paper describes the Transformer architecture — the foundation underneath every large model you know today, ChatGPT and Claude and Gemini included. Google invented the technology. Then it shared it with the world.

Back in 2014, Google had already bought DeepMind — an AI research lab based in London. DeepMind won pretty much everything there was to win on the board: beat the world Go champion in 2016, cracked the protein folding problem in 2020, brought Google (and Demis Hassabis personally) a Nobel Prize in Chemistry in 2024.

And yet, when ChatGPT arrived in November 2022, Google took six months to respond with anything comparable.

Bard, the debut that cost 100 billion

In February 2023, Google showed off a chatbot called Bard in a promotional video demo. Bard answered a question about the James Webb Space Telescope with a factual error. An astronomer spotted it on Twitter within hours. That same day, Alphabet — Google's parent company — lost roughly a hundred billion dollars in market cap. A factual slip from an AI in an ad demo. A hundred billion.

Bard officially launched in March. The reception was lukewarm. People tried it once and went back to ChatGPT.

Gemini, the full reset

In April 2023, Google made a hard reorganizing call. It merged Google Brain (the AI research arm on Google's side) with DeepMind (the London lab) into a single entity: Google DeepMind, under Demis Hassabis. The goal was concentrating the talent.

In December 2023, Google rebranded everything: Bard disappeared, Gemini showed up. And this time it actually worked.

Today, in April 2026, Gemini is a family of models — Flash (fast and cheap), Pro (balanced), and specialized versions for phones and for deep reasoning. Gemini 2.5 Pro handles context windows of 2 million tokens: it can read a full book and answer you on any chapter. None of the competitors reach that number.

What Google wins by being second

There's something Google has that nobody else does: it's already inside your digital life.

Gmail has more than 1.8 billion active users. Android runs on 3 billion phones. Google Docs, Sheets, Drive, Calendar, Maps, YouTube — all with planetary-scale audiences. When Google embeds Gemini inside those, it doesn't need to talk you into opening a new app. You're already inside.

That's the strategic bet: not winning the benchmark of the month, but making AI feel like a natural part of the software you already use. Summarizing an email in Gmail. Generating a formula in Sheets. Pulling bullet points from a recorded Meet call. No tab switching.

Three things to take away

Google invented the technical foundation and still showed up late on product. The Transformer came out of Google Brain in 2017. DeepMind had the best researchers in the world. And ChatGPT got a six-month cultural head start. Being first in the science doesn't guarantee being first in the market.

Gemini isn't playing catch-up anymore. Gemini 2.5 Pro is a real competitor. It leads on long context — 2 million tokens is a place nobody else reaches. Technically, it doesn't need to ask anyone for permission now.

Google's structural advantage isn't the model: it's distribution. If your day runs through Gmail, Docs, Drive, and YouTube, Gemini is the lowest-friction option. For sensitive professional work, pair it with Claude. They don't replace each other. They combine.

The next day, Thursday, February 9, 2023, Alphabet stock fell 7.7 percent. A hundred billion dollars of market cap wiped out in one session. A factual slip in an ad demo. A hundred billion.

The irony of having invented the Transformer

The April 2023 reorg

The product line as of April 2026

Gemini today is a family differentiated by use case.

Gemini 2.0 Flash is the high-speed tier. Low latency, low cost, native multimodality. The default for applications where sub-second response matters most.

Workspace integration — Gemini inside Gmail, Docs, Sheets, Drive, Calendar, Meet, and Slides. This is the strategic axis that matters most, and it deserves its own section below.

The real bet: Workspace as the vehicle

That isn't a detail. It's an integration surface that would take a decade to replicate from zero.

Each of those individual features has competitors in the market. What doesn't have a competitor is the combination: all of them together, frictionless, inside a suite you already use.

It's the enterprise version of the same pattern Apple uses with the iPhone: not always the best in each isolated category, but better than everyone else on integration.

What Gemini does well and what it doesn't

Being honest about the trade-offs.

To close, and to keep going

If you want to understand how we technically measure which AI is better at what, How AIs are measured is the next link. If you want the full competitive map without fandom, The AI race.

Where in your current workflow could Gemini slot in without you having to switch tools?

In October 2024, the Royal Swedish Academy of Sciences announced the Nobel Prize in Chemistry was being shared among three researchers: David Baker, for computational protein design; and Demis Hassabis alongside John Jumper, for AlphaFold. Hassabis is CEO of Google DeepMind. It's the first time in history a scientific Nobel has been awarded to a figure simultaneously running the AI division of one of the five most valuable companies on the planet.

Four months before that Nobel, in February 2024, Google had to pause image generation of people in Gemini because the model was producing images of 1943 German soldiers depicted as people of various ethnicities — an alignment failure that went viral and forced a public apology. Two months after the Nobel, in December 2024, Google shipped Gemini 2.0 Flash with native multimodality and sub-second latency, taking the lead in that tier.

That simultaneity — a Nobel-winning lab, a product with visible failures, a company shipping on tightening cycles — defines the actual state of Google DeepMind in 2026. It's not a pure research company, and it's not a pure product company. It's both at once, with the structural tension that produces.

Technical genealogy: from BERT to Gemini 2.5 Pro

Google's technical sequence in large-scale AI follows a logic worth tracing.

Transformer (June 2017). Vaswani et al. publish "Attention Is All You Need." The proposed architecture — stacks of self-attention + feed-forward blocks with positional encoding — displaces RNN/LSTM as the dominant mechanism for sequence tasks. The paper is open, with code and reference weights. The whole field adopts it within months.

BERT (October 2018). Google trains a bidirectional encoder-only Transformer on massive corpus and releases pre-trained weights. It revolutionizes applied NLP: classification, NER, QA jump on measured benchmarks by tens of points. Through 2019-2020 almost any serious NLP work uses BERT as a base.

T5 (October 2019). Encoder-decoder model unifying all NLP tasks in text-to-text format. Up to 11 billion parameters. Elegant design but smaller commercial impact than BERT.

LaMDA (May 2021). Dialogue model with 137 billion parameters, trained specifically for conversation. Internally it was the base for conversational prototypes inside Google for two years. Never released publicly over reputational-risk concerns — a call that, in hindsight, handed OpenAI the ChatGPT moment.

PaLM (April 2022). 540 billion parameters. Demonstrated emergent capabilities in reasoning and multi-task performance. It was Google's frontier model at the exact moment ChatGPT eclipsed it culturally, seven months later.

Bard (March 2023). Based on an adapted LaMDA. Rushed launch. The James Webb factual error in the February 2023 demo cost Alphabet 100 billion in market cap in one day. Public reception: lukewarm.

Gemini 1.0 (December 2023). Full rebrand. Three-model family (Ultra, Pro, Nano) trained from scratch under the merged Google DeepMind structure. Native multimodality. Promo video edited for the unveiling demo — a minor but humiliating scandal.

Gemini 1.5 Pro (February 2024). First commercial model with a 1-million-token context window. Mixture of Experts (MoE) architecture. Technical differentiator Google still holds.

Gemini 2.0 Flash (December 2024). Native multimodality in the high-speed tier. Sub-second latency. Low cost.

Gemini 2.5 Pro (March 2025). Deep reasoning with internal "thinking" before answering (conceptually analogous to OpenAI's o1/o3). 2-million-token context — a doubling over 1.5 Pro. Extended multimodality (video, audio, image).

The correct read of this sequence isn't "Google showed up late and then caught up." It's: Google had leading technical capability from 2017 on, but operated with lab culture until the competitive pressure of late 2022 forced a transition to product culture. That transition is still underway.

Mixture of Experts: the architectural bet

One specific technical component deserves analysis: Google was first to bring Mixture of Experts (MoE) to frontier commercial scale with Gemini 1.5 Pro.

MoE isn't new — the concept goes back to Jacobs and Jordan in the 90s. The idea: instead of a dense model where every parameter activates for every input, you have multiple specialized "experts" and a routing network that decides which ones activate based on the input. That lets you scale total parameters without scaling inference compute proportionally — a model can have 1 trillion total parameters but only activate 50 billion for a given query.

MoE's practical advantages are three. First, throughput: inference cost per token drops substantially against equivalent-capability dense models. Second, viable long context: Gemini's 1M and 2M tokens are hard to achieve economically without MoE. Third, emergent specialization: different experts end up handling different domains better, which improves quality on multi-domain tasks without cross-degradation.

The real downsides are two. Training is more complex — the routing network can collapse to a few active experts, a known problem called "expert collapse." And distributed inference requires specialized infrastructure — running MoE efficiently on generic hardware isn't trivial.

Google has a structural advantage on both points. DeepMind has accumulated MoE experience from before Gemini. And Google owns its own compute infrastructure (TPUs, its own datacenters), which lets it optimize the full vertical stack. OpenAI is estimated to use MoE internally too but doesn't publish architecture, and its stack depends on Microsoft infrastructure.

The February 2024 scar: alignment at scale

In February 2024, Gemini Pro started generating images where ethnic diversity got applied so indiscriminately it produced obvious historical distortions. Nazi soldiers as racialized people. US founding fathers the same. Popes with ethnic diversity applied to periods where it didn't fit. The failure went viral in under 48 hours, with screenshots circulating on X/Twitter.

Google responded by pausing image generation of people for several weeks and publishing a public explanation acknowledging the diversity adjustment had "overshot" and that the model had failed to recognize contexts where that diversity was historically inappropriate.

The technical reading of the episode is interesting. The problem wasn't the base model; it was the post-hoc adjustment layer applied to reduce biases the model was reproducing — a layer that, in execution, overcorrected. It's exactly the kind of fragility Anthropic describes in its work on Constitutional AI: applying stacked filters without architectural integration produces unpredictable behavior at the edges.

It isn't a Google-only problem. Every large commercial model has publicly documented alignment failures — ChatGPT with tax-income hallucinations, Claude with over-refusals on legitimate medical tasks, Grok with openly problematic outputs. The difference is the level of public visibility and the institutional response. Google was slow to react and to explain. It's a company with 180,000 employees and a cautious corporate communications culture — it doesn't move at startup speed, even when its AI division tries to.

Distribution as a structural moat

There's an analysis that gets done too rarely and is worth running precisely: what is Google's actual competitive asset in AI.

It isn't DeepMind by itself. Anthropic has top researchers and wins on alignment. OpenAI has consumer product experience Google doesn't.

It isn't the model by itself either. Gemini 2.5 Pro is competitive but not universally dominant — it wins on long context and multimodality, loses to Claude on literal instruction-following and to ChatGPT on voice and on some reasoning benchmarks.

Google's actual competitive asset is the integration surface. Gmail with over 1.8 billion active users. Android on more than 3 billion devices. Google Workspace with over 3 billion users (free consumer and enterprise combined). YouTube, Maps, Chrome, Search. The aggregate of those surfaces is a planetary-scale distributed platform Google already owns and no competitor can replicate by acquisition or construction in the current decade.

When Google embeds Gemini into that surface, it isn't competing with ChatGPT in the open market. It's competing in a captive market where the friction of trying alternatives is higher than the friction of using the integrated default. Same dynamic that made Microsoft dominant in office software for three decades: not by being the best product in every category, but by being good-enough integration in an already-installed bundle.

Ben Thompson's Stratechery piece that best captures this is "Aggregators and Jobs to be Done" (2015): the advantage isn't in the standalone product, it's in the intermediation position. Google already occupies the intermediation position between the user and their daily digital work. Gemini is the AI of that intermediation.

Editorial thesis

I'll close with a thesis that goes past reporting.

The AI market will crystallize between 2026 and 2030 into a three-layer structure with distinct competitive dynamics.

Frontier capability layer. Dominated by 3-4 actors with enough capital and talent to train frontier models: OpenAI, Google DeepMind, Anthropic, and probably one or two more (xAI, Meta, or new entrants). In this layer the differentiator is raw capability measured on specific benchmarks. Google DeepMind is well positioned thanks to Hassabis and its proprietary TPU infrastructure.

Mass integration layer. Dominated by whoever owns scale distribution: Google with Workspace + Android + Search, Microsoft with Office + Windows, Apple with iOS (once Apple Intelligence finishes rolling out), and to a lesser extent Meta through its social apps. In this layer the differentiator is usage friction, not capability. Google has a structurally dominant position.

Professional reliability layer. Dominated by whoever has built, with sustained discipline, a model you can delegate sensitive work to. Anthropic is building that layer with Claude. Google doesn't prioritize that layer; OpenAI oscillates.

The thesis: Google will take meaningful share in layer 1 and dominate layer 2. But it won't dominate layer 3, because its business model (advertising and mass scale) pulls toward optimizing engagement before professional reliability. That tension is structural and doesn't get resolved through engineering — it gets resolved through product decisions Google, given its size and regulatory dependencies, is reluctant to make.

That's why my operational recommendation for professionals in 2026 isn't "pick one" but "combine three": Gemini where your daily work already lives inside Workspace, Claude where stakes are high and your personal reputation is on the line, ChatGPT where image and voice matter. Google will be infrastructure; it won't be the ethical default.

What's your personal rule for deciding which professional tasks you delegate to Gemini and which you prefer to keep on an AI living outside the ecosystem where your data is stored?

The irony of having invented the Transformer

The April 2023 reorg

The product line as of April 2026

The real bet: Workspace as the vehicle

What Gemini does well and what it doesn't

To close, and to keep going

Want to go deeper?