TL;DR
- Every CFO is staring at the wrong number. Cost per million tokens is a small slice of the real bill — the rest is context, rework, review, placement, and orchestration.
- The north-star metric is cost per verified outcome. In the field, workloads with identical sticker prices differ by up to 8x in fully-loaded cost per outcome.
- A one-model default is not a strategy. At least half of enterprise AI workloads sit inside the quality-vs-cost frontier — paying top-tier prices for middle-tier outcomes.
- Token policy belongs in an AI control plane, not inside every app. The platform primitives already exist; the organizational decision to assemble them usually does not.
- The 90-day sequence that works: showback first, then policy-aware routing, then internal credits, then outcome-based economics. Each stage earns the right to the next.
Three months ago, a CFO I work with slid a printout across the table.
Thirty-nine million dollars. Annual run-rate. AI.
"What are we getting for this?"
The CTO started answering in tokens. Millions per day. Billions per month. The CFO held up a hand.
"I know the volume. I know the vendors. I want to know what we got."
That room went quiet for exactly the wrong reason. The number was real. The vendors were the obvious ones. The volumes matched the invoices. What nobody could produce — not in that meeting, not in the follow-up, not three weeks later after a task force — was the thing that actually mattered to the person signing the checks: a credible link between what the bill said and what the business got out of it.
That is the executive problem with AI in 2026. It is not a technology gap. It is not a vendor choice. It is a measurement gap, and it is showing up on the boardroom table right now, in almost every large enterprise I walk into.
This is the CEO-facing companion to my three-part technical series on token economics. The series explains the machinery. This piece is for the people who sign the budget.
The wrong question, and the right one
The question most boards are asking right now is:
What does a million tokens cost?
It is the wrong question. And a surprisingly costly one.
The right question — the only one that actually tells you whether AI is creating value — is:
What does a verified outcome cost?
A verified outcome is a business result your organization is willing to claim. A contract redlined by the legal copilot and sent to the counterparty. A customer issue closed by the support agent without escalation. A meeting brief the CRO actually used on the call. An enrichment pass that turned unstructured data into something a downstream process could consume.
Tokens are the raw material. Outcomes are the product. And the ratio between them — how many tokens, rework loops, review minutes, and policy exceptions it takes to produce one verified outcome — is the single most important operating metric in enterprise AI.
I have watched workloads with identical sticker prices differ by eight times in fully-loaded cost per verified outcome. Not because of the model. Because of everything that happens before and after the model runs.
What the sticker price hides
What the sticker price hides
Every dollar of API spend carries roughly six dollars of follow-on cost before a result reaches a human who can act on it. Values shown per verified outcome at a mid-tier model tier.
The number on the invoice — roughly $3 per million tokens for a mid-tier model. The only line item most finance teams see.
Retrieval, cache misses, re-reads. Research on ~1,000 real agent runs puts context re-reads at 52% of total agent spend.
Agents iterate. One user task typically becomes 5–15 model calls before it is ready to ship — each call a multiple of the first.
The single biggest hidden cost. A person who is paid more per minute than the model has to read, correct, and approve.
Regional, sovereign, or privately-hosted inference runs 10–20% more than global default. For regulated workloads, that premium is the point.
Routing, evals, logging, observability, and network. The control plane costs money too — and it should.
The sticker price — the only number most CFOs see — is 15% of the real bill. The other 85% is what decides whether AI creates value or burns it.
The executive summary is simple: the number on the invoice is, at best, a sixth of the real cost of producing a result your business can use.
The other five-sixths live in four places finance is not watching.
First, context assembly. Every non-trivial AI task has to be fed information before it can produce anything. Retrieval from vector stores. Policy blocks. Tool definitions. Few-shot examples. In real production traces, more than half the tokens a modern agent consumes are re-reads of context it already had — because nobody built the cache discipline.
Second, rework loops. Modern AI does not produce an answer. It produces an attempt, evaluates it, and tries again. A single user-facing task typically triggers five to fifteen model calls. Each call is usually larger than the one before, because the context window is growing as the agent reasons.
Third, human review. The most expensive line item in the entire stack. Every serious AI workload today has a human checking the output before it ships. That human costs more per minute than the model does per call. If the model gets it wrong too often, you are not saving money — you are moving it to payroll.
Fourth, placement and orchestration. Where the workload runs. Who is allowed to touch the data. How it is routed. These decisions look like plumbing, but they are pricing decisions. A regulated workload running in a private or sovereign environment is a different SKU than a public-API call, and it should be.
The moment you internalize that the sticker price is the cheapest slice of the bill, a lot of things change. "Cheaper tokens" stops being a strategy. "Fewer tokens" becomes one. "Better context" becomes one. "The right model for this task, right now, in this jurisdiction" becomes one.
Token economics is not a purchasing discipline. It is an operating discipline.
Token economics is an operating model, not a line item
Treat token economics as five disciplines a single team has to run end-to-end:
- Metering. Count every request — tokens, model, context size, latency, outcome — before and after it runs. If you cannot count it, you cannot govern it.
- Placement. Decide where each request may legally and economically run. Public API, sovereign private GPU, batch-flex lane, regional zone. Make locality a routing decision, not a convention.
- Ownership. Map every dollar to a business domain. Marketing owns marketing AI. Legal owns legal AI. The platform team owns the platform. Shared costs are allocated transparently, not hidden in an infrastructure cost center.
- Measurement. Track cost per verified outcome, not per million tokens. Measure quality, rework, and human-review minutes alongside spend.
- Incentives. Pay for the right behavior. Credit the teams that cache, batch, route, and reuse. Do not reward raw token volume.
None of these disciplines are optional. Skip any one and the others do not survive first contact with a CFO.
This is the point most organizations miss. Token economics is not a FinOps dashboard. FinOps is the measurement layer underneath it. Token economics is how you actually run the thing.
What I have watched go wrong
A few observations from the field, in the role I sit in today — a global infrastructure platform role where I see these patterns unfolding across many enterprises, not just one.
The one-model default. I watch engineering teams route every request to the most capable model they have access to, because "quality matters." A large share of those requests — often close to half — do not need the extra capability. They are summarizing emails, classifying tickets, formatting outputs, extracting fields. The organization is paying ten to twenty times the necessary cost for that share of volume, and nobody is measuring it.
The cache nobody built. Prompt caching can reduce input costs by up to ninety percent on repeated context. I have reviewed enterprise stacks consuming billions of tokens per month with single-digit cache hit rates. Every one of those tokens is a fresh read of a system prompt, a policy block, and a tool definition the model already saw five minutes ago. The money is not being lost on capability. It is being lost on repetition.
The sovereignty premium nobody priced. I walk into a room. The compliance officer is proud that all inference happens in-region for a regulated workload. Good. I ask what the regional uplift costs versus the global default. Nobody knows. The decision was made. The check is being signed. The number has never been on a slide.
The invisible agent. An engineering team builds an autonomous coding agent. It works. It is great. It is also running ten thousand times a day in a loop that re-reads the entire codebase into context on every invocation. The cost was fine in pilot. At scale, it is a six-figure monthly bill that appeared in a cost center nobody was watching.
The budget nobody can read. The CFO approves an AI budget. Ninety percent of it lands in a single "shared infrastructure" cost center. No domain owner is on the hook. No workload is on the hook. When volume doubles next quarter, there is nothing to hold anyone accountable to — which means there is nothing to celebrate either.
Every one of these is a measurement and governance failure before it is a technology failure. The models are fine. The vendors are fine. The people are smart. What is missing is the operating discipline that would have caught the problem before it was a bill.
Cost is a strategy, not just a number
The quality-vs-cost frontier
Six ways to spend on AI. Only the strategies on the curve are efficient — everything else is paying more for less. Click any strategy to read the detail.
Routes every request to the smallest acceptable model, caches repeated context, batches what is not time-sensitive, and governs spend against a verified-outcome budget. Costs 6x less than frontier-only and delivers equal or better quality.
The frontier above is not a metaphor. It is the operating reality of enterprise AI.
Every workload sits somewhere on the chart. Some of those positions are efficient — they are on the curve. Some are inefficient — they are inside the curve, paying more and getting less than they could. In the enterprises I have studied carefully, at least half of all AI workloads sit inside the frontier. The organization is paying top-tier prices for middle-tier outcomes because no one built the routing layer that would have moved them.
The boardroom implication is narrow and sharp: a one-model strategy is not a strategy. It is a default. The companies that will define the next five years of enterprise AI are the ones that build a portfolio — multiple models, multiple lanes, multiple placements — and a router smart enough to pick the cheapest acceptable option for every task.
That portfolio is explained in more depth in Part 2 of the technical series. What matters at the board level is that it exists, and that someone owns it.
The AI control plane is the new CFO ledger
The AI control plane, on one page
Spend logic does not belong inside every app. It belongs in a control plane that sits between the people asking for intelligence and the places where intelligence actually runs.
Who is asking
Users, apps, agents
Copilots
Executive, knowledge-worker, and domain assistants
Workflow automations
Email, meetings, research, back-office tasks
Agents
Long-running autonomous systems with tool access
The operating system
AI Control Plane
Token Policy Engine
Counts every request before it runs. Enforces residency, sensitivity, and size policy.
Budget Wallet
Department-level credits. Showback first, chargeback when the data is ready.
Model Router
Picks the smallest acceptable model per task. Escalates only when quality demands it.
Eval + Quality Gate
Verifies outcomes before they ship. Routes failures back to the right tier.
Audit & Showback Ledger
FOCUS-compliant records. Every run traceable to domain, model, policy, and cost.
Where it runs
Compute lanes
Public model APIs
Frontier and commercial — when capability matters most
Sovereign / private GPU
Regulated, private, or locality-bound workloads — inference near the data
Batch / flex lane
Overnight enrichment, evals, back-office summarization at asynchronous prices
What it reasons with
Data plane
Catalog, classification, lineage
What the data is, who can see it, where it came from
Vector + graph stores
Governed retrieval substrate; never the decision maker
Approved skills & prompts
Reusable assets the control plane knows how to price
Every request flows left to right: a copilot or workflow asks for intelligence, the control plane counts the cost, enforces policy, routes to the cheapest acceptable compute lane, and records the outcome — all against a governed data plane whose sensitivity and residency are already known.
If you remember one architectural idea from this post, make it this one: token policy belongs in a control plane, not inside every app.
When each team decides its own prompts, models, caches, and guardrails, three things always happen. Spend drifts. Policy drifts. And when something goes wrong — a compliance violation, a budget blowout, a model deprecation — there is no single place to fix it.
The control plane is the place where the CEO's promises and the CFO's controls become machine-readable:
- It counts every request before it runs.
- It enforces policy on sensitivity, residency, and risk.
- It picks the cheapest acceptable model.
- It verifies the outcome against an eval before it ships.
- It records the run against a governed ledger your finance team can audit.
This is not exotic architecture. The major model platforms already expose every primitive it needs — token counting, caching, batch tiers, regional deployments, audit logs, reserved throughput. What is missing in most enterprises is not the capability. It is the organizational decision to assemble those primitives into one system, owned by one team, with one set of KPIs.
The diagram above is the one-page version I use with executives. The architecture detail, if you want it, is in Part 3 of the technical series and in my essay on what context engineering actually means.
Five questions a CEO should be asking
Five questions a CEO should be asking
Each question is small on paper and devastating in practice. If your team answers like the left column, you are a buyer. If they answer like the right column, you are operating.
The buyer answer
“We spend about X million per month on AI, mostly on OpenAI and Anthropic. We are watching token volume carefully.”
The operator answer
“For each high-value workload we measure tokens, model tier, rework, review, and cost per verified outcome. Our top ten workloads are down 22% quarter over quarter while quality scores are flat or up.”
The buyer answer
“It lives in a shared infrastructure cost center. The CIO owns it. We will figure out domain allocation once volumes stabilize.”
The operator answer
“More than 80% of AI spend is showback-allocated to a specific domain owner with a named operating-model sponsor. The remaining platform overhead is transparent and trending down.”
The buyer answer
“Our developers pick whatever model gives the best quality. We do not want to second-guess them on tooling.”
The operator answer
“Our router reports utility-tier share, cache-hit ratio, and batch-lane usage weekly. Roughly 70% of volume is on utility tier, cache leverage is above 35%, and we have moved overnight enrichment to the flex lane.”
The buyer answer
“All of our inference runs wherever it is cheapest to run that week. We have not seen a compliance issue yet.”
The operator answer
“Residency and sovereignty are a routing policy, not a convention. We can tell you, by workload, which requests pay a locality premium, why, and which regulation or customer promise made that choice.”
The buyer answer
“Each application team decides its own prompts, models, and guardrails. We are keeping it flexible while we learn.”
The operator answer
“One team owns the control plane. Policy, routing, eval gates, and audit live there — not inside each app. Apps get a governed context package and a metered right to spend. The board sees one ledger.”
Each of these questions sounds small. None of them are. They are diagnostic: the answer the CEO receives tells you more about the organization's AI maturity than any dashboard does.
If your organization is answering like the left column, you are a buyer. You are sending money into an AI market and getting outputs back. That is a stage. It is not a destination.
If your organization is answering like the right column, you are operating AI. You are making intentional economic choices between capability, cost, compliance, and time. You are building an asset, not renewing a subscription.
The distance between those two postures is where the next decade of strategic advantage lives.
The recommended operating model
The 90-day operating model ladder
Token economics is not a one-quarter project. It is a climb. Each stage earns the right to the next one.
Sticker shock
TodayAI spend is visible only as a line item on the cloud or SaaS bill. Every team buys its own way in.
Leading KPI
Monthly vendor invoice
Risk to avoid
Finance and the board start asking questions the team cannot answer.
Leadership signal
AI feels expensive and fragile. Nobody is sure it is working.
Showback and internal credits
First 30 daysInstrument every AI run. Map spend to a business domain. Give each domain an internal budget without forcing hard chargeback on day one.
Leading KPI
Direct-allocation share of AI spend
Risk to avoid
Premature chargeback creates political drag before the data is ready. Showback first.
Leadership signal
The board sees one page: who is spending what, on what, for what outcome.
Policy-aware routing
Days 30 to 60Move spend logic into the control plane. Route by task, quality, data classification, and residency — not by app-team preference.
Leading KPI
Cache leverage, batch-lane share, and route win rate
Risk to avoid
A routing layer that is a black box to the business. Make every decision legible.
Leadership signal
The same workload now costs 40–70% less with equal or better quality.
Outcome-based chargeback
Day 90 and beyondSpend is allocated against verified outcomes, not raw token volume. Premium managed services and customer-facing AI start to bundle on top.
Leading KPI
Cost per verified outcome, margin per workload
Risk to avoid
Gaming. Tie credits and outcomes to eval-backed results, not raw usage.
Leadership signal
AI is a managed portfolio. The CFO knows the margin on every major workload.
No organization I know has gone from stage zero to stage three in a single quarter, and no organization I trust has tried to. The ladder matters because each rung earns the right to the next one.
Showback earns routing. Until you can show every domain owner what their AI footprint looks like, you have no political credibility to tell them which model to use or which lane to run in. That credibility is the currency stage two runs on.
Routing earns credits. Once domain owners trust the routing layer — once they can see that the router is saving them money without degrading their quality — you can start issuing internal credits and managing against budget. Before that, credits feel like tax.
Credits earn outcome economics. Only once the internal accounting is mature, the routing is legible, and the evals are trusted does the organization earn the right to bill or fund work based on verified outcomes. Try to start at the top of the ladder and you will end up in a fight you cannot win. Start at the bottom, and the climb takes roughly a quarter.
This is the ninety-day sequence I recommend to every enterprise I advise. It is simple on paper. It is a cultural shift in practice. The full outcome-based argument — where this ladder eventually goes — lives in my Results as a Service series.
What should be on the next board slide
Replace the AI slide that shows a big number followed by bullet points of pilots.
Replace it with five lines any CEO can defend in public:
- Cost per verified outcome. Trend line over the last four quarters. The north-star metric.
- Direct-allocation percentage. What share of AI spend has a named domain owner on the hook.
- Route win rate. The share of requests the router sent to a utility-tier model without quality loss.
- Cache and batch leverage. The share of repeated context served from cache, plus the share of non-urgent work running in the batch lane.
- Sovereignty hit rate. The share of regulated or sensitive workloads running in the correct residency zone, by policy.
These five lines make AI governable. They turn a mysterious cost center into a managed portfolio. They give the CEO a defensible narrative and the CFO an audit trail.
Most importantly, they change the conversation in the room. The CFO stops asking "what are we getting for this." The CEO starts asking "what is the next thing we should route, cache, batch, or pull onto the governed platform." That is the conversation of a company that is about to win in AI, rather than one that is about to explain an AI bill.
The leadership move
Everyone has access to the same models. Everyone has access to the same APIs. Most organizations are going to have roughly the same AI tools in roughly the same time frame.
The difference between the companies that create lasting value from AI and the ones that merely spend on it will not be who bought what first. It will be who built the operating discipline to know what an outcome costs — and who acted on it.
The CEOs who already understand this are the ones I am quietly watching. They are not the loudest on AI. They are the ones whose finance teams can tell you, on any given Tuesday, exactly what the last thousand verified outcomes cost and how that number is trending. They treat the AI control plane the way a previous generation treated the ERP system: as the place where the company's actual operating model lives.
That is not a technology posture. It is a leadership posture. And it is the single most valuable thing a CEO can bring to the AI conversation in 2026.
If you want the technical machinery under this post, read the three-part Token Economy series. If you want the adjacent argument for why treating context as an asset is the other half of the same problem, read What Context Engineering Actually Means. If you want to see where outcome-based economics leads next, read Results as a Service. The patterns in all three meet in the control plane.