Six-part series

Modes of the LLM OS

The LLM is not a model. It is an operating system. And like any operating system, it runs in distinct modes — not one, four.

Same GPUs. Same 14 infrastructure layers. Completely different machines on top. Five orders of magnitude in cost. This series goes mode by mode — Chat, Agent, Deep Research, Cowork — and closes with how to run your own LLM OS on enterprise infrastructure.

Start with the thesis All posts

Same prompt. Four completely different machines.

Chat Mode

Single-shot on shared silicon.

Agent Mode

The loop is the machine.

Deep Research Mode

Planner, swarm, synthesizer.

Cowork Mode

State is the coworker.

Chat costs $0.02. Agent costs $0.40. Deep Research costs $40. Cowork costs $400. Same silicon.

The series

Part 1

The Thesis

Why frontier AI runs in four modes, not one.

Same silicon. Same 14 infrastructure layers. Four completely different machines on top. Five orders of magnitude in cost. The enterprise decision is no longer which model — it is which mode, for which task, at what rate, under which governance.

Read now

Part 2

Chat Mode

Single-shot on shared silicon.

The 14 layers, upgraded. Where reasoning models change the picture without changing the mode. The machine you hit a hundred times a day and still do not see.

Read now

Part 3

Agent Mode

The loop is the machine.

Think, act, observe, think again. Tools vs skills vs MCP. Context compaction. Stopping criteria. What you actually pay for when Cursor writes a PR.

Read now

Part 4

Deep Research Mode

Planner, swarm, synthesizer.

Why a deep-research call is three sub-systems pretending to be one. The fan-out economics. The richest audit trail of any mode — if you know where to log.

Read now

Part 5

Cowork Mode

State is the coworker.

Claude Code, Cursor, Operator, Codex, ChatGPT Projects. Persistent memory, skills, knowledge base, environment access. The most dangerous un-governed surface in the enterprise today.

Read now

Part 6

Running Your Own LLM OS

The enterprise build — Frontier API to owned B200s.

Four stacks arranged by control: Frontier API, Managed Cloud (Bedrock / Azure / Vertex / OpenRouter), Neocloud (CoreWeave, Lambda, Crusoe, Nebius), and On-Prem (an 8x B200 chassis you own, paired or racked for more capacity). The near-frontier OSS models — Kimi K2.6, DeepSeek V4, Qwen 3, Llama 4, GLM 5 — that changed the calculus. TCO at three scales, the control spectrum, and the six reusable layers that survive every stack change.

Read now

Operate. Publish. Teach.