TL;DR
- Deep Research is three sub-systems behind one API call: Planner decomposes, Swarm searches in parallel, Synthesizer does a long-context reduce
- The fan-out is why cost scales super-linearly — typical run: 28–52 parallel searchers, 200K–460K tokens, 5 to 8 minutes, $18 to $42
- Under the hood the swarm is doing a mix of native web search tools, browsing agents, and indexed retrieval against private knowledge bases
- Deep Research produces the richest audit trail of any mode. Every citation is a source claim you can verify. The same property makes it the governance opportunity most enterprises have not yet taken
A deep-research call on a frontier product takes seven minutes and returns a forty-two-citation brief.
If you watch your network tab during that wait, you do not see one long request and one long response. You see one outbound request, then nothing for a few minutes, then a flood of activity, then a long pause while something expensive happens on a server, then the response comes back.
That flood is the mode.
Deep Research Mode is not a bigger Chat call. It is not even a longer Agent loop. It is three sub-systems — a planner, a swarm, and a synthesizer — coordinated behind one API. Understanding the three is the difference between "this mode is slow" and "this mode is exactly the right machine for the right question."
The three sub-systems
Deep Research Mode: Planner → Swarm → Synthesizer
One question in. A planner, a swarm of agents, and a long-context reduce. Three sub-systems pretending to be one.
OpenAI Deep Research
48 parallel searchers · ~420,000 tokens · ~8 min · ~42 citations
Phase timeline
The fan-out is why Deep Research costs ~2,000× a Chat call. The planner decomposes. The swarm searches in parallel. The synthesizer does a long-context reduce over every worker output.
1. The Planner
The first thing that happens when you submit a deep-research query is decomposition. A model — typically a frontier reasoning model — takes the user's question and breaks it into a plan: a set of sub-questions, a retrieval strategy per sub-question, a rough budget, and a shape for the final synthesis. Some products publish the plan to the user. Some keep it private. All of them have one.
The planner is one Chat call. It is the cheapest part of the run. It is also the part that determines most of the quality of everything that follows. A bad plan produces a fast, expensive, confidently wrong brief. A good plan produces a brief that actually answers what you asked.
2. The Swarm
The swarm is what distinguishes this mode from anything before it. For each sub-question the planner emitted, the runtime spawns an agent. Each one is a full Agent-mode loop — think, act, observe, repeat — from Part 3, running with its own tools, its own context, and its own stopping criterion. Thirty, forty, fifty of them run in parallel.
The tools those swarm agents use are where the provider differentiation actually lives. Four patterns are in production in 2026:
- Native web search tools. The provider hosts a search index (or leases one) and exposes it as a tool the swarm agents can call. Low latency, provider-grounded, but scoped to whatever the provider chose to index.
- Browsing agents. A headless browser. The swarm agent navigates real URLs, waits for pages to render, scrolls, extracts. Slower, more expensive, higher fidelity on sites the native index does not cover well.
- Indexed retrieval against a private knowledge base. The swarm agent queries a vector store, graph store, or MemoryOS-style governed context that your enterprise controls. This is where Deep Research meets Context Compilation Theory — the swarm is running a compilation pass per subtask.
- Specialized APIs. SEC EDGAR, PubMed, a data-warehouse MCP server, an internal wiki, a code index. Whatever the provider or your team has plumbed in.
Most production deep-research runs use a mix. A competitive-intelligence brief might pull from native web search, a paid news API, and your internal CRM. A patient-safety review might pull from PubMed, a clinical-trials database, and your enterprise EMR index.
3. The Synthesizer
When the swarm finishes (or times out), the synthesizer takes over. This is a long-context reduce step: one very large call to a frontier model with every worker's output concatenated into the context. The synthesizer ranks claims, resolves contradictions, constructs the citation graph, and writes the brief.
This is where most of the tokens live. A synthesizer call often sees 150,000 to 300,000 tokens of input and produces 5,000 to 20,000 tokens of output. Prefill is the dominant cost. The quality of the synthesizer's long-context recall determines how well the brief stays grounded in what the swarm actually found.
The fan-out economics
Deep Research is the only mode in this series where the cost curve is fundamentally fan-out-driven, and that changes how you reason about it.
In Chat Mode, a longer prompt costs more prefill. In Agent Mode, more iterations cost more calls. In Deep Research, the planner's decomposition is the biggest lever on the bill. Thirty sub-questions cost roughly half of sixty sub-questions — and sometimes produce a better brief, because the signal does not get diluted.
Typical numbers across the four major products as of 2026:
| Product | Swarm size | Tokens | Time | Cost |
|---|---|---|---|---|
| OpenAI Deep Research | 48 | ~420K | ~8 min | ~$42 |
| Claude Research | 36 | ~340K | ~6 min | ~$34 |
| Perplexity Pro Deep Research | 28 | ~210K | ~5 min | ~$18 |
| Gemini Deep Research | 52 | ~460K | ~7 min | ~$38 |
These are real orders of magnitude, not exact prices. The point is not which is cheapest. The point is all four live in the same octave — tens of dollars per brief, hundreds of thousands of tokens, single-digit minutes. A 2,000× multiple over Chat Mode. A 50–100× multiple over an Agent turn.
The economics only work when the question is worth it. A competitive-intelligence brief that drives a $50M decision is cheap at $40. A "summarize this one email thread" is a catastrophic misuse of the mode.
Knowledge base integration
The version of Deep Research that matters most to enterprises is not the one that reads the public web. It is the one that reads your data.
Every serious provider now exposes some variant of "bring your own corpus." The swarm runs the same way — planner, fan-out, synthesizer — but the tools are scoped to your knowledge base instead of (or in addition to) the open web. This is where Deep Research becomes a genuine enterprise primitive rather than a research toy.
Two shapes matter:
- Shared-tenant retrieval. Your corpus is indexed alongside many others, with access controls at the retrieval layer. Cheap. Fast to stand up. Good enough for most content-driven research.
- Dedicated retrieval. Your corpus runs on infrastructure you control — whether that is a Vectara, a MemoryOS, a custom GraphRAG, or a managed index inside Bedrock or Vertex. More expensive. Slower to stand up. The only option if your data cannot leave your environment or has complex governance requirements.
I have argued at length in the Context Compilation series that retrieval is the easy part of this problem and compilation is the hard part. That argument holds inside the swarm. A swarm agent that retrieves fifty document chunks and dumps them into its context produces worse output than a swarm agent that compiles a governed, deduplicated, budgeted pack. Deep Research is where compilation quality compounds — the planner's decomposition, the swarm's per-task compilation, and the synthesizer's long-context reduction are three consecutive compilation passes.
Enterprise sidebar — deep research is a governance opportunity
Here is the part that matters most and gets talked about least. Deep Research is the mode with the richest audit trail of any of the four.
Every citation is a verifiable source claim. Every sub-question is a planned artifact. Every swarm agent's tool calls are logged. Every synthesizer reduction is a transformation over known inputs. If you capture the run well, you can answer, for any assertion in the brief: "which agent fetched it, from which source, at what time, against which query."
That is a degree of explainability your Chat-mode logs cannot produce and your Agent-mode traces usually do not attempt. In regulated industries — finance, healthcare, legal, life sciences — this is a material capability. In any industry where decisions of consequence rely on AI briefs, this is the difference between AI you can defend to an audit committee and AI you cannot.
Three governance moves the best enterprises are making with Deep Research in 2026:
- Make citations first-class objects. Every citation lands in a structured store — source URL or document ID, fragment, retrieval query, retrieval timestamp, swarm agent ID, planner sub-question ID. If someone challenges a claim, the team can follow the trail.
- Capture the planner's decomposition. The plan is as important as the answer. Save it. Review samples. If the planner is consistently emitting bad sub-questions, that is your quality lever — not the model.
- Run two classes of deep research: external and internal. External searches the public web and is cheap but un-auditable outside the citation layer. Internal searches your knowledge base and produces a stronger audit trail. Different governance, different access controls, different retention policies.
What Deep Research Mode is good for
- Competitive intelligence and market briefs. The task is inherently fan-out — many sources, many angles, one summary.
- Regulatory or compliance research. Jurisdiction by jurisdiction, statute by statute. Deep Research is what humans used to do with an associate and a week.
- Diligence and pre-mortem work. "Everything we know about X" where X is a company, a technology, a risk, a person.
- Literature review and evidence synthesis. Life sciences, clinical research, legal precedent.
- Anything where a human will read the output carefully and care about citations. The brief is worthless if the reader does not trust the ground truth. Citations are the contract.
What Deep Research Mode is bad at
- Anything that needs a real-time answer. Seven minutes is structurally incompatible with conversational UX.
- Narrow, factual lookups. "What is our Q4 revenue?" is a Chat question with RAG. Deep Research is overkill.
- Tasks where the underlying evidence is private, unshared, and you have not connected a knowledge base. Running the swarm over the public web when the answer lives in your CRM is expensive theater.
- Creative work that is not grounded in sources. Deep Research is a citation machine. It is not a brainstorm partner.
On Monday
Three concrete moves that pay back inside a quarter:
- Pick three recurring research workflows that humans currently do and are worth $50+ each. Competitive briefs. Board-meeting pre-reads. Vendor diligence. Move those to Deep Research Mode before you move anything else. The ROI is immediate and the governance surface is manageable.
- Stand up the citation store. Even if your first version is a Postgres table with six columns, you need a canonical place where every deep-research citation lands. The second you have that, the audit conversation changes.
- Instrument the planner. Log every decomposition. Sample them weekly. If the planner is producing sub-questions that do not hold together, fix it before you spend another dollar on swarms that inherit a bad plan.
Deep Research is the mode that pays for itself fastest in a large organization. It is also the one most likely to surprise you with its cost if you treat it as a bigger chatbot. Use it on purpose. Log every run. And treat every citation as an audit record, because that is what it is.
Next up
Cowork Mode is where the LLM stops being a call and starts being a coworker. Persistent memory across sessions. Skills your organization shares. Direct access to files, terminals, browsers, and screens. This is the mode where AI moves from a tool your team uses to a participant in your team's work — and where governance problems stop being theoretical.
Operate. Publish. Teach.