brianletort.ai
← All posts
Six-part series

Modes of the LLM OS

The LLM is not a model. It is an operating system. And like any operating system, it runs in distinct modes — not one, four.

Same GPUs. Same 14 infrastructure layers. Completely different machines on top. Five orders of magnitude in cost. This series goes mode by mode — Chat, Agent, Deep Research, Cowork — and closes with how to run your own LLM OS on enterprise infrastructure.

Same prompt. Four completely different machines.

Chat Mode

Single-shot on shared silicon.

Agent Mode

The loop is the machine.

Deep Research Mode

Planner, swarm, synthesizer.

Cowork Mode

State is the coworker.

Chat costs $0.02. Agent costs $0.40. Deep Research costs $40. Cowork costs $400. Same silicon.

The series

Part 1

The Thesis

Why frontier AI runs in four modes, not one.

Same silicon. Same 14 infrastructure layers. Four completely different machines on top. Five orders of magnitude in cost. The enterprise decision is no longer which model — it is which mode, for which task, at what rate, under which governance.

Read now
Part 2

Chat Mode

Single-shot on shared silicon.

The 14 layers, upgraded. Where reasoning models change the picture without changing the mode. The machine you hit a hundred times a day and still do not see.

Read now
Part 3

Agent Mode

The loop is the machine.

Think, act, observe, think again. Tools vs skills vs MCP. Context compaction. Stopping criteria. What you actually pay for when Cursor writes a PR.

Read now
Part 4

Deep Research Mode

Planner, swarm, synthesizer.

Why a deep-research call is three sub-systems pretending to be one. The fan-out economics. The richest audit trail of any mode — if you know where to log.

Read now
Part 5

Cowork Mode

State is the coworker.

Claude Code, Cursor, Operator, Codex, ChatGPT Projects. Persistent memory, skills, knowledge base, environment access. The most dangerous un-governed surface in the enterprise today.

Read now
Part 6

Running Your Own LLM OS

The enterprise build — Frontier API to owned B200s.

Four stacks arranged by control: Frontier API, Managed Cloud (Bedrock / Azure / Vertex / OpenRouter), Neocloud (CoreWeave, Lambda, Crusoe, Nebius), and On-Prem (an 8x B200 chassis you own, paired or racked for more capacity). The near-frontier OSS models — Kimi K2.6, DeepSeek V4, Qwen 3, Llama 4, GLM 5 — that changed the calculus. TCO at three scales, the control spectrum, and the six reusable layers that survive every stack change.

Read now

Operate. Publish. Teach.