Gemini deep dive — the AI that lives inside every Google product

TL;DR

Gemini is Google's AI and it stands apart because of a move no other company can copy: it's baked into Gmail, Docs, Sheets, Drive, Meet, Maps, Chrome, and Android by default. For 3 billion Gmail users (Google, 2024), Gemini showed up without being chosen. Technically it has two measurable edges: the largest context window on the market (up to 2 million tokens on Gemini 1.5 Pro, per Google DeepMind) and native multimodality. Where it doesn't compete head-to-head: agentic coding (Claude leads), tonal consistency over long text (Claude), high-end creative image and voice generation (ChatGPT with DALL-E and Voice mode). The operating thesis: Gemini is the AI most people use without knowing they're using it, and that's a distribution advantage nobody can replicate without owning Google's surface area.

✦ Summarized with Claude at publish time

✦ AI rewrite

Read it as…

In February 2024, Google DeepMind published a blog post announcing Gemini 1.5 Pro supported a 1-million-token context window, with an experimental 2-million version. The number slipped past the general public. For anyone who'd ever tried to process long documents with an AI, it was a regime change.

To scale the number: two million tokens is roughly 1,500 pages of text, or several hours of audio, or an hour of video. Claude had — and has — 200K tokens. ChatGPT, 128K.

What that changed, on the desk of any professional reading long documents, was the mental model. Before, you thought "I need to summarize this PDF before asking the AI something." Now you hand over the whole PDF and ask. That's the concrete edge Gemini has that has nothing to do with marketing.

The "Gemini era" thesis

In May 2024, at the Google I/O keynote, Sundar Pichai used a phrase that framed the company's positioning: "the Gemini era." The implicit thesis: Google wasn't going to compete as "another chat app" but as the AI layer running through all of its products.

That decision is strategic and worth taking apart. Microsoft has Copilot. OpenAI has ChatGPT. Anthropic has Claude. Each is, in essence, an app you have to go to. Google had a choice: make Gemini yet another app, or put it inside Gmail, Docs, Drive, Meet, Maps, Chrome, and Android?

It picked the second. The consequence is that, for 3 billion Gmail users per Google (2024), Gemini arrived without being chosen.

The product line, intermediate level

Gemini isn't a model, it's a family plugged into several surfaces. Worth walking through.

Gmail and Docs. The two places where the most people bump into Gemini without looking for it. "Help me write" drafts emails, rewrites in a different tone, expands a bullet into a paragraph. The long-thread summaries in Gmail work. Inside Docs, long-text reading and rewriting is competitive with Claude.

Sheets. Here Gemini has a specific function that's genuinely useful: generating formulas from a natural-language description ("sum the rows where column B says active") and doing exploratory data analysis. For anyone who isn't a spreadsheet power user, it lowers the entry barrier.

Drive. Semantic search over your files. Ask "where's the contract with X from last year" and it finds it even if the filename doesn't contain "contract."

Maps. Conversational itineraries. "Give me a 3-day plan in Buenos Aires focused on steakhouses and museums" returns an itinerary with real places, travel times, and hours.

NotebookLM. A separate piece that deserves attention. You upload 10, 20, 50 documents — papers, PDFs, articles — and the model turns them into your reference corpus. It generates summaries, answers questions citing the source, and in 2024 added a "podcast" feature: two voices talk about your documents as if they were hosts. For applied research it's the most original tool in the whole Gemini line.

Gemini Live. Real-time multimodal voice/camera mode. Point your phone camera at something and talk. For cooking, studying, identifying plants, it works surprisingly well. It goes head-to-head with ChatGPT's Voice mode.

The models behind: Flash vs Pro

Worth distinguishing the variants because the trade-offs are explicit.

Gemini 1.5 Flash is the fast, cheap model. It runs in Gmail, Docs, and Sheets by default. Low latency, enough quality for most short tasks.

Gemini 1.5 Pro is the reasoning model. Up to 2 million tokens of context, better at complex tasks. It runs when you ask for things that need more thinking, or when you pick it explicitly.

Gemini 2.0 Flash (released late 2024) is the next generation of the fast model: better multimodality, better tool use, more capable agents.

Deep Research is a special mode where Gemini searches, reads, and cross-references dozens of web sources before writing a report. It competes with ChatGPT's function of the same name. It takes several minutes but the output is much denser than a normal chat.

Where Gemini wins and where it doesn't (honest)

Worth speaking without fandom. The "which AI is best" conversation makes no sense without saying what for.

Gemini wins on: context window (2M tokens, far above the rest), native multimodality (text, image, audio, video in the same request), Google Workspace integration (structural — no one else can match it without owning Google), speed and price on Flash (cheaper per million tokens than competitor equivalents).

Claude wins on: agentic coding (especially in Claude Code and Computer Use), following instructions literally, tonal consistency over long text, and — notably — on reliability per external measurements. The Vectara Hallucination Leaderboard places Claude with a lower hallucination rate; LMArena shows tight numbers between Claude and Gemini across categories.

ChatGPT wins on: image generation (DALL-E is still stronger than Imagen for creative use), advanced Voice mode (more polished than Gemini Live for free conversation), custom GPTs ecosystem, and cultural mindshare.

The honest test: what do you delegate to which?

If you ask me today what I use for what, the answer is mixed, and I think that's the useful conclusion.

For contract analysis, code going to production, and text where I need consistent tone, Claude. For meeting summaries, searching inside my Drive, and quick writing in Gmail, Gemini — because it was already inside and the friction of copy-pasting to Claude is greater than the quality gap. For visual exploration and casual voice conversation, ChatGPT.

It's not that one is "the best." It's that each won a different slice of the workday.

The question for you

Do you know how much of your day is already assisted by AI without you picking it? If your work runs through Gmail and Docs, probably more than you think. For the full competitive picture, read Google and Gemini — the distribution play. For the broader map without fandom, The AI race.

A friend messaged me the other day: "Since when does Gmail have AI?" A little button saying "Help me write" had shown up above her reply box and she didn't know what it was.

That button is Gemini. She didn't install it. She didn't subscribe. She didn't choose anything. Gemini turned up in her Gmail because Google switched it on.

That scene — a person finding out the AI was already inside their inbox — is the most important story you can tell about Gemini.

What Gemini is already doing in your account

You're probably using it without knowing. A quick tour of where it lives.

Gmail. The "Help me write" button drafts an email from scratch or rewrites your draft. At the top of each long thread, a "Summarize this email" gives you the two-line version. Ask "summarize this week's emails from my boss" and it finds them and puts together the summary.

Docs. Same "Help me write" button inside every document. Drag a PDF in and ask "summarize it in three bullets" — it works. Rewrite a paragraph in a different tone if you ask.

Meet. Automatic meeting notes — "take notes for me" — records, transcribes, and summarizes the meeting for you. The summary with next steps lands in your inbox after the call.

gemini.google.com. The public chat, the ChatGPT-style version from Google. Still the most common entry point for people who haven't yet found out Gemini was living inside their other apps.

Android. If you have a recent Android phone, long-pressing the home button now opens Gemini instead of the old Google Assistant.

Free vs paid, no dancing around

The free version of Gemini (at gemini.google.com) is enough to try it, chat, and use some basic integrations. It does the job.

The real jump is Google One AI Premium: the plan that adds advanced Gemini inside Gmail, Docs, Sheets, Drive, and Meet. Around $20 a month. For companies there's Google Workspace with Gemini, priced per user.

The difference isn't "better model." It's "inside your apps vs outside."

The move nobody else can copy

Gmail has 3 billion users according to Google (2024). Android runs on most phones outside the US. Chrome is the dominant browser.

When Google decided Gemini wouldn't be a separate app but an AI layer inside all of its products, it handed the AI an audience no other company can reach without going through Google.

Claude has better technical reasoning. ChatGPT has more cultural mindshare. But neither Claude nor ChatGPT owns Gmail.

Where it's not the best pick

Worth stating plainly so I'm not selling you smoke.

For serious coding — agents modifying many files, large refactors, complex debugging — Claude is still better. For long-form writing that keeps the same tone from start to finish, Claude also. For creative exploration with images and voice, ChatGPT with DALL-E and Voice mode delivers more.

Gemini shines where the work is glued to your inbox, your shared docs, your calendar, and your Drive. There it has no rival.

What to take away

Three things worth holding onto:

You're already using it, even if you don't know. Open Gmail and Docs: the "Help me write" buttons and the automatic summaries are Gemini. Nothing to install.

The free version is enough to try. The paid one is worth it if you live in Google Workspace. If your daily work is in Gmail and Docs, $20 for Google One AI Premium pays for itself.

It's the best AI for some things, not for everything. For contracts, critical code, and long writing, look at Claude. For visual and voice creativity, ChatGPT. For everything already in your Google account, Gemini.

To scale the number: two million tokens is roughly 1,500 pages of text, or several hours of audio, or an hour of video. Claude had — and has — 200K tokens. ChatGPT, 128K.

The "Gemini era" thesis

It picked the second. The consequence is that, for 3 billion Gmail users per Google (2024), Gemini arrived without being chosen.

The product line, intermediate level

Gemini isn't a model, it's a family plugged into several surfaces. Worth walking through.

Drive. Semantic search over your files. Ask "where's the contract with X from last year" and it finds it even if the filename doesn't contain "contract."

Maps. Conversational itineraries. "Give me a 3-day plan in Buenos Aires focused on steakhouses and museums" returns an itinerary with real places, travel times, and hours.

The models behind: Flash vs Pro

Worth distinguishing the variants because the trade-offs are explicit.

Gemini 1.5 Flash is the fast, cheap model. It runs in Gmail, Docs, and Sheets by default. Low latency, enough quality for most short tasks.

Gemini 1.5 Pro is the reasoning model. Up to 2 million tokens of context, better at complex tasks. It runs when you ask for things that need more thinking, or when you pick it explicitly.

Gemini 2.0 Flash (released late 2024) is the next generation of the fast model: better multimodality, better tool use, more capable agents.

Where Gemini wins and where it doesn't (honest)

Worth speaking without fandom. The "which AI is best" conversation makes no sense without saying what for.

The honest test: what do you delegate to which?

If you ask me today what I use for what, the answer is mixed, and I think that's the useful conclusion.

It's not that one is "the best." It's that each won a different slice of the workday.

The question for you

On May 14, 2024, on the main stage of Google I/O in Mountain View, Sundar Pichai opened the keynote with a phrase that would frame the next eighteen months of corporate strategy: "we are fully in the Gemini era." The sentence sounded like keynote rhetoric. It wasn't. It was an architectural declaration.

What Pichai was announcing, underneath the rhetoric, was that Google had made a structural decision: Gemini wouldn't be a standalone app competing against ChatGPT on the AI shelf. It would be a cross-cutting layer switched on inside Gmail, Docs, Sheets, Drive, Meet, Maps, Chrome, and Android. For the 3 billion Gmail users per Google (2024) — the largest installed base of professional software in the world — Gemini was going to show up without being chosen. That decision is what's worth analyzing in technical and competitive detail.

The technical bet: context window as differentiator

Gemini's first measurable edge over the competition is architectural: the context window. Google DeepMind announced in February 2024 that Gemini 1.5 Pro supported a 1-million-token context, with an experimental 2-million variant. That is, to date, the largest window commercially deployed.

For comparison: Claude Opus 4.7 runs at 200K tokens. GPT-4 and its direct descendants, at 128K. An order-of-magnitude gap isn't cosmetic. It's the difference between handing the model an entire book (Gemini) and having to chunk it first (everyone else).

The technical innovation behind that window is partially documented in the Gemini 1.5 technical report from Google DeepMind. It combines mixture-of-experts with attention improvements that keep retrieval quality — needle-in-a-haystack — across the full window. The standard test is feeding two million tokens of text, hiding a specific sentence in a random position, and asking for recovery. Gemini 1.5 Pro does it with near-100% accuracy per Google's published benchmarks, though worth reading those numbers with the standard disclaimer: they're from the vendor, not verified by an independent third party.

The practical consequence, which matters most: workflows that were previously impossible (cross-referencing 1,500 pages of legal documentation, reviewing a full code repository, transcribing and analyzing an hour of video) are now on the table. It isn't fully solved — attention quality degrades toward the edges of the window, cost and latency scale non-linearly — but the entry barrier dropped dramatically.

Native vs added multimodality

Gemini's second architectural edge is native multimodality. The model was trained from the start with text, image, audio, and video as first-class modalities, not as capabilities bolted on later.

The difference with GPT-4V (vision added to GPT-4 in 2023) and with Claude (which added vision in Claude 3 in 2024) is subtle but important. In Gemini, you can pass a half-hour video and ask about specific scenes; you can pass a podcast and ask for tone-of-voice analysis; you can pass code, diagrams, and natural-language description together in the same prompt. Not impossible with the competitors, but in Gemini this is the central use case, not peripheral.

Gemini Live — the real-time multimodal voice/camera mode launched in 2024 — is the consumer expression of that architecture. It points toward Project Astra, the multimodal agent Google DeepMind has in development.

The product line as layers

Worth mapping the complete Gemini line with technical precision because the expert reader needs to know which layer to use for what.

Gemini (gemini.google.com). The consumer app. Direct access to the model.

Google One AI Premium. The subscription layer for individual consumers that turns on Gemini inside Gmail, Docs, Sheets, Drive, and Meet. Around $20 a month.

Google Workspace with Gemini. The enterprise layer for organizations. Same stack as Premium with admin controls, compliance, and data residency. Per-seat pricing.

Vertex AI. The platform layer on Google Cloud. Programmable API, MLOps tooling, integration with BigQuery and the rest of GCP's services. It's where enterprise solutions built on Gemini run at scale.

Gemini Extensions. Connectors to YouTube, Maps, Flights, Hotels, and other Google services. They turn Gemini into a service orchestrator inside the Google ecosystem. Direct conceptual competition with OpenAI's custom GPTs.

Gemma. The open models family Google publishes with downloadable weights. Parallel bet: win the mindshare of developers and the open-source world that Anthropic and OpenAI left vacant. Meta's Llama is the direct competition in that layer.

NotebookLM. A separate product, originally conceived as a research tool over your own documents. The "podcast" feature — two generated voices talking about your uploaded documents — turned NotebookLM into one of Google's most-shared products of 2024.

Project Astra, Veo, Imagen: what's coming

Three Google DeepMind projects deserve attention because they signal strategic direction.

Project Astra. The multimodal agent in development. Google I/O 2024 demos showed the model holding continuous camera and voice context across long interactions. The thesis: an assistant that sees and hears continuously, not one that answers turn by turn. Not yet public product in 2026 beyond Gemini Live, but it's where the bet points.

Veo. Video generation. Competes with OpenAI's Sora. Quality comparison depends on benchmark and who runs it; integration with YouTube and Google's video corpus is a structural edge.

Imagen. Image generation. Competent quality but for creative work DALL-E and Midjourney still set the pace. Imagen wins on cases where integration with Docs or Slides matters more than marginal artistic quality.

Honest technical comparison (with disclaimers)

A professional comparison between Gemini, Claude, and ChatGPT in 2026 has to accept that benchmarks are skewed by their provider and that real evaluation depends on use case. That said, the reading from verifiable external evaluations:

LMArena (formerly Chatbot Arena), which ranks models by blind user preference, shows top Gemini, Claude, and GPT models trading the top spots week by week. No clear winner; the gap between the three is small and variable.

Vectara Hallucination Leaderboard measures hallucination rate in document summarization tasks. Claude consistently sits at the top (lower hallucination). Gemini runs close but a step below. ChatGPT varies by version.

SWE-bench Verified (agentic coding benchmark on real GitHub issues). Claude dominates this category — both Sonnet and Opus — by a clear margin over Gemini and GPT. That aligns with the experience reported by the developer community.

MMLU, GSM8K, HumanEval (classic reasoning and code benchmarks). The three leading models sit in a very close range, with rotating leadership by release.

Where Gemini clearly wins on technical metrics: context (2M tokens), cost/speed on Flash, native multimodality. Where it loses: agentic coding (Claude), summarization reliability (Claude, per Vectara), creative image quality (DALL-E), voice mode polish (ChatGPT).

Editorial thesis

I'll close with a thesis that goes past reporting.

The "who's winning the AI race" conversation is mis-framed because it assumes there's one race. There are at least three different markets. The academic research and technical frontier layer — there OpenAI, Anthropic, and DeepMind compete on similar terms. The professional-work-with-accountability layer — there Anthropic holds the edge, and adoption data in legal, financial, and consulting segments confirms it. The mass-distribution-adoption layer — there Google has no competitor and won't have one in the foreseeable horizon.

Gemini is Google's bet to capture the third layer — mass adoption by distribution — and to stay competitive in the first. The second, professional work, is the one it deliberately ceded by choosing to prioritize integration over voice consistency or hallucination benchmark reliability.

That's a coherent strategic choice. It's not that Gemini "can't" compete with Claude on professional work; it's that Google understood the installed base of Gmail-Docs-Drive-Chrome-Android gives it an unreachable position in a much larger market. Preferring that market over the contract-analyst market is sound business math, not technical defeat.

For the professional making tool decisions, the implication is concrete. It doesn't make sense to pick one AI. It makes sense to assume you'll use Gemini de facto (because it's already switched on in your Google account), and consciously choose when to step out to Claude or ChatGPT for tasks where the quality gap is worth the window-switch cost. The map is coexistence, not replacement.

And that, too, is what Google was after when it switched on the "Help me write" button in 3 billion Gmails. It doesn't need to win every conversation. It just needs to be the default AI, the one already on, the one that doesn't require a decision. The rest, inertia handles.

What's the use case where, even though Gemini is inside your suite, you choose to step out and open a different AI? That's the empirical test for where distribution stops being enough.