brianletort.ai
All Posts
AI StrategyData GravityToken EconomicsExecutiveEnterprise AIAI ArchitectureInfrastructure

Data Gravity Meets Token Economics

When 93% of enterprise data is created outside the public cloud, the AI question stops being 'which model' and starts being 'where does inference run'. The executive companion to The CEO's Guide to Token Economics.

April 18, 202617 min read

TL;DR

  • Token economics is not only a price-per-million-tokens problem. It is a placement problem. When you move intelligence across regions, egress, latency, compliance, and retrieval all compound against the sticker price.
  • 93% of enterprise data over the next three years will be created outside the public cloud. The AI strategy is broken the moment it pretends otherwise.
  • Moving models to the data beats moving data to the models on almost every dimension that actually matters to a CFO: egress cost, residency risk, retrieval token inflation, and time-to-answer.
  • Sovereignty is not an engineering inconvenience. Providers already price it in — Microsoft Global/DataZone/Regional, OpenAI's ~10% regional uplift, Anthropic's 1.1x US-only. The mature move is to make that premium explicit in the routing policy.
  • The AI control plane from the CEO's Guide only works when locality is a first-class budget dimension — measured, routed, audited. Placement is not a network decision anymore. It is an economic one.

The pilot was going fine until someone asked where it was running.

It was a legal-review copilot. A good one. It had cleared the internal AI-governance review, it was saving counsel hours on routine contract work, and the vendor we'd bolted it on top of was, on paper, the safest name in the market. The demo slide was elegant. The cost per contract was falling in the right direction. The chief legal officer was, visibly, about to sign.

The general counsel leaned forward. "And the documents — where are they being processed?"

The CTO opened their mouth and paused. They knew the answer. The answer was somewhere in North America. What they could not tell the GC — not in that meeting, not in the follow-up, not with the confidence that room demanded — was which region, against which customer promise, under which vendor sub-processor, with which residency guarantee, for which class of document. The workload was working. Where it was running was a fog.

That moment is becoming common. A technically successful pilot that cannot survive the locality question. And in 2026 the locality question is being asked in rooms where it never used to be: by general counsel, by audit committees, by regulators, by sovereign customers, and — if I'm being honest about what I see — by boards who have realized, a bit late, that their AI spend is growing but their AI posture is not.

This is the second half of the token-economics argument. The first half — The CEO's Guide to Token Economics — is about the operating discipline: cost per verified outcome, routing, cache, batch, the control plane. This piece is about the geographic discipline. Because you can have the most elegant control plane in the world and still lose the economics of enterprise AI at the placement layer.

The 93% problem

Here is the single most underrated statistic in AI strategy right now.

Digital Realty's Data Gravity research estimates that 1.2 million exabytes of enterprise data will be created over the next three years, and that 93% of it will be created outside the public cloud — at the edge, inside factories and branches, in regulated stores, in SaaS applications, and in the quiet substrate of existing enterprise systems that have never been in scope for a cloud migration and never will be.

Sit with that for a moment.

Nine-and-a-half out of ten bytes that the enterprise world is about to produce are not going to live where the foundation models are. The foundation models live in a small number of hyperscaler regions. The data lives everywhere else — in metros, at the edge, behind private interconnect, under sovereignty rules that predate AI and will outlast it.

If that is the topology, then the central question of enterprise AI is not which model is smartest. It is what do we do about the ninety-three percent?

There are only two physically possible answers. You can haul the data to the model, or you can bring the model to the data. Most enterprises have spent the last three years assuming the first answer without realizing they were making a choice. That is what the pilot in the opening vignette is — a workload that picked haul the data by default, without anyone pricing the haul.

Pricing the haul is where token economics and data gravity collide.

Moving models to the data beats moving data to models

A simple thought experiment. You are building an AI agent that has to reason over a 2 GB regulated case file that lives, today, inside a governed store in a specific metro.

Route one: Data to model. The agent reaches across regions, pulls chunks of that file into a retrieval pipeline, embeds them, stuffs them into a prompt, and calls a frontier model sitting in another region. You pay for the egress. You pay for the retrieval round-trips. You pay for the network latency that shows up as user-visible slowness. You pay for the input tokens, which inflate every time the agent needs another pass. Every additional turn is another cross-region haul. And you pay, whether or not you are measuring it, for the residency exposure — because that document is now, for some number of milliseconds, living in a memory footprint somewhere it is not supposed to live.

Route two: Model to data. You run inference in the same metro as the store. Retrieval hops are sub-millisecond. Egress is zero. Input tokens are the same as any other call, but now you can afford to feed fewer of them because the retrieval layer can be more precise. The vendor invoice might be modestly higher per token — a regional or sovereign lane is not free. But the fully loaded cost per verified outcome drops, often materially, because you have removed four cost vectors at once.

This is the inversion. And it is exactly the same inversion a previous generation of infrastructure leaders ran when they realized that hauling data around for ETL was quietly ruining their economics — and that pushing compute to the data was the move. It is compute-to-data all over again, just with a different kind of compute.

The placement decision matrix

Nine cells. Every enterprise AI workload lives in one of them. Classification picks the row, workload character picks the column, and the cell tells you where inference should run. Click any cell to see the token-economics rationale.

Latency-sensitive

Chat, user-facing

High-volume, cost-sensitive

Enrichment, evals, summarization

Sovereignty-bound

Residency, regulation, contract

Public

Non-sensitive, shareable

Internal

Business data, proprietary

Regulated / private

PII, PHI, financial, customer data

Regulated / private·Sovereignty-boundColocated private inference

Models deployed next to the data, inside the sovereignty boundary, on governed interconnect. The highest premium tier — and the only one that survives contact with regulators.

The premium for private and sovereign lanes is not a bug to engineer around. It is the price of keeping intelligence inside the boundary the data already lives in — and the reason token economics becomes a placement problem.

The matrix above is the question a mature AI platform has to answer before a request runs, not after. It is the form the placement decision takes when you treat it as a routing rule rather than a manual convention. Pick the row by classification. Pick the column by workload character. The cell tells you where that request belongs — and, importantly, what it costs in the dimensions that finance actually cares about.

Three patterns stand out when I walk this matrix with executive teams.

First, the bottom-right cell — regulated data, sovereignty-bound workloads — is almost never the cheapest-per-token route. It is also almost always the correct one. The companies that get this wrong are not the ones that over-spend. They are the ones that under-spend at that cell and discover, later, that they have been running a compliance time bomb.

Second, the top row — public data — is where public APIs deserve their default position. The mistake here is the opposite: enterprises that over-sovereign public data because the organization has gotten anxious about AI in general. That is a different problem. It is also an expensive one, and it hides behind the same compliance language that disguises the first.

Third, the middle row — internal, proprietary data — is where most of the volume lives, and where most of the savings from a mature placement policy come from. Regional API for the latency-sensitive work, cached aggressively. Private batch lane for the overnight enrichment. Private dedicated placement for the pieces that are sensitive enough to justify the premium. That middle row, well-routed, is where the CFO starts smiling.

The only wrong answer is to have no answer — to let every team pick whatever vendor they're comfortable with, whatever region the vendor happens to default to, whatever retrieval pattern the SDK example showed. That is how a pilot becomes a fog.

The sovereignty premium is a feature, not a bug

The thing I find most useful to tell executive teams — because it is almost always received as a surprise — is that the big model providers already believe in the placement argument. They price it into their rate cards.

  • Microsoft distinguishes Global, DataZone, and Regional Azure OpenAI deployment types. Each is a different SKU. Each has different residency guarantees. Each is priced differently.
  • OpenAI documents a ~10% uplift for regional processing on certain models. A price, on the wall, for "run it where you asked me to run it."
  • Anthropic publishes a 1.1x multiplier for US-only inference. Same shape of decision, same shape of premium.
  • Major cloud AI gateways expose sovereign-private-GPU placement as a distinct tier from the default global lane.

None of this is exotic. It is all documented. Most enterprises do not use any of it on purpose — they use whatever lane the SDK defaulted to when the engineering team first set up the integration. And then, when someone in the room eventually asks the locality question, the organization has to reverse-engineer what it has already been paying for.

The mature posture is the opposite. Make the premium explicit. Put it on the routing policy. Show it on the dashboard. Tell the business owner, in plain language: this workload runs in-region because you asked it to, and that choice costs you ten percent more per outcome than the global lane would. Here is what you are buying with the premium.

Now the sovereignty premium becomes what it should be: a deliberate purchase of locality, residency, and trust, priced against the workload it protects. Not an accident. Not a fog.

Public, private, sovereign — on two axes

Data gravity on the x-axis, policy pressure on the y-axis. Each placement option is only efficient inside its quadrant. Click any point to see when it wins — and when it becomes a compliance time bomb.

Policy firstSovereign AIPublic cloudModel-to-dataData gravity →Policy + sovereignty pressure →Public API (global)Batch / flex laneRegional APIPrivate dedicatedSovereign / colocatedRegulated through public API
Sovereign / colocatedChampion for sovereign AI

Inference dropped next to the regulated store, inside the sovereignty boundary, on governed private interconnect. Highest premium — and the only placement that survives contact with regulators.

The quadrant above is the cleanest way I've found to talk to an executive room about public, private, and sovereign inference — because it rewards the same answer the matrix rewards, only now on two continuous dimensions.

The champion position depends on where your workload sits.

When data gravity is low and policy pressure is low — public data, permissive environment, no customer promise — the public API is not merely the cheapest option. It is the correct option. Anything else is over-engineering.

When data gravity is low but policy pressure is high — soft locality, a customer contract, a regional regulator whispering — the regional lane is what the premium buys. You can get away without moving the model to the data; you just have to make the vendor run it where you asked.

When data gravity is high but policy pressure is moderate — heavy data, proprietary context, no absolute residency rule — private dedicated placement near the data becomes the efficient answer. Reserved GPU, provisioned throughput, or a data-zone deployment. You are paying for control, not for compliance.

And when both axes are high — the upper-right quadrant — you are in sovereign AI territory. Colocated inference, private interconnect, everything inside the boundary. That is the most expensive placement on the board, and it is also the only one that survives contact with regulators, boards, and sophisticated customers. Paying less here is an illusion. The bill is still real; it just gets paid somewhere other than in tokens.

The inefficient quadrant is the bottom-right: heavy or regulated data, routed through public APIs because the tokens were cheaper. I have walked into that cell in more enterprises than I would like to admit. It is always the same shape. Low sticker price. Unpriced risk. And a CTO, somewhere down the hall, about to be surprised.

The architecture implication: locality becomes a first-class routing dimension

The CEO's Guide made the case that token policy belongs in an AI control plane. This post is the refinement: that control plane only works when locality is a first-class dimension inside it.

Concretely, five things have to change in the architecture the moment you accept the 93% reality.

One — the token policy engine has to know residency. Before a request can be priced, it has to be classified. Before it can be routed, it has to be locked to a region compatible with its classification. That is not a network concern. It is an admission-control concern, and it belongs alongside the cost counter.

Two — the data plane has to expose data gravity. Catalog, classification, lineage, and physical location are all inputs to the routing decision. The data plane's job is to tell the control plane, for every asset, where it lives and where it is allowed to travel. Anything else is vibes.

Three — the router has to treat locality as a budget. Not a flag. A budget. The router's job is to pick the cheapest acceptable route, and "acceptable" now includes residency. Utility-tier public call, regional call, private colocated call — each has a cost, each has a locality profile, each gets weighed. The routing log records the choice. The showback report attributes the cost.

Four — the audit ledger has to record placement. Every request gets a run ID. Every run ID carries the model, the lane, the region, and the policy it ran under. When a regulator asks the locality question, the answer is in the ledger, not in a six-week reconstruction exercise.

Five — the executive dashboard has to show the placement mix. Percentage of volume on public, regional, private, sovereign lanes. Sovereignty hit rate by workload. Egress-avoided dollars. Regional-premium paid. These are not engineering metrics. They are the numbers that tell the board whether the organization is actually operating AI or only buying it.

Where does inference actually run?

Enterprise data scatters across sources. Metro interconnect fabric pulls them together. Inference can run far away in a public region or next door in the same metro — the decision is no longer only about model choice.

DATAMETRO FABRICINFERENCE
ERP / data warehouse
Factory / edge
SaaS + business apps
Regulated stores
Branch / field
Legacy systems
Private interconnect · 55+ metros · 300+ data centersMetro AMetro BMetro C
Batch / flexAsynchronous, lowest price
Sovereign / private GPUColocated in the metro, next to the data
Public model APIsDistant region, cheapest tokens
Inference lands in the metro, next to the data — egress drops to zero, residency becomes trivial.

The metro fabric is not a detail. It is the single point of control where enterprises can trade public-cloud tokens for colocated sovereign inference on a per-workload, per-policy basis. That is where token economics and data gravity meet.

The diagram above is the picture I draw for executives when words are not enough. The metro fabric in the middle is the trick. It is the place where the 93% of enterprise data that lives outside the public cloud meets the models. It is where interconnect density — private connectivity to model providers, private links between enterprise data and inference, low-latency paths between metros — stops being plumbing and starts being an economic instrument.

When inference lives in the same metro as the regulated store, the egress bill goes to zero, the retrieval round-trip goes to single-digit milliseconds, and the residency story writes itself. When inference lives in a distant public region, the same workload pays, pays again, and then pays once more. The total-cost delta can be five- or ten-to-one when you look at it across a full quarter, not a single call.

The toggle in the diagram exists because the two patterns coexist inside most enterprises right now. The mature operating model is not to pick one. It is to make the choice per workload, with policy, in the control plane — and to measure the shift over time.

What a global data-center platform uniquely enables

I want to spend a careful paragraph here, because it is the place where my day job and this argument intersect most directly.

The reason I care about this topic — the reason I will write about it more — is that I sit inside Digital Realty, operating a platform with more than 300 data centers across 55+ metros on six continents. That vantage point has changed how I think about AI. From inside it, you can see something that is hard to see from the outside: every enterprise AI workload is, eventually, a placement decision. Which means the infrastructure platforms that can let customers run inference in the same metro as the data — with AI-ready high-density environments, private connectivity to the major model providers, and governed sovereign placement options — are not only selling capacity. They are selling a place where token economics and data gravity are allowed to be the same conversation. That is a different kind of product. I'll keep this paragraph short, and the rest of this post platform-neutral, because the argument has to hold whether or not you're one of our customers. But the next time you see a slide that treats AI infrastructure as "space, power, cooling", look for the missing line: the one about where the intelligence is allowed to run.

Everything else in this post generalizes. The matrix, the quadrant, the topology, the five architecture implications. They do not depend on any single vendor. They depend only on taking the 93% seriously.

What belongs on the next architecture slide

Replace the slide that shows a rainbow of AI vendors and a cost-per-token bar chart. Replace it with five lines the board can actually govern against.

  1. Sovereignty hit rate. The share of regulated or sensitive workloads running in the correct residency zone, by policy. This is the single best measure of whether placement is a routing decision or a convention.
  2. Locality-aware retrieval share. The percentage of retrieval requests served from a store colocated with the inference lane. A proxy for whether your data plane and your compute plane are actually talking to each other.
  3. Egress-avoided dollars. The measurable delta between what the workloads would have cost under a cross-region default and what they actually cost under the placement policy. The CFO's favorite line.
  4. Direct allocation of sovereign premium. What share of the sovereignty premium is owned by a named business domain, rather than hidden in a shared infrastructure cost center. Makes the trade-off legible.
  5. Cache leverage on private lanes. The percentage of repeated context served from cache inside private and sovereign deployments. Because the easiest place to let economics slip is in the private lanes, where nobody is watching as closely.

Five lines. Every one of them is measurable today with the primitives the major platforms already expose. Every one of them is a question a CEO can ask a CIO, and a CIO can ask a head of AI, without anyone having to leave the room to do research.

The leadership move

The CEO's Guide argued that AI will be won by the organizations that know what a verified outcome costs. This companion argues the narrower, sharper version: AI will be won by the organizations that know where those outcomes are produced, and are willing to manage the geography of intelligence the way a previous generation learned to manage the geography of data.

Data gravity is not a constraint to engineer around. It is a map. It tells you where your enterprise's mass is, where it will keep being created, and where — if you are honest about the ninety-three percent — the AI that serves that enterprise is actually going to have to live.

The leaders I watch most closely right now are the ones whose placement policy is on one slide and whose routing dashboard is one click away. They have stopped asking which model should we use and started asking where should we run it, against which data, for which outcome, under which policy, and at what premium. That is a different quality of conversation. It is also where the durable economics of enterprise AI turn out to live.

The cheapest token on earth is a rounding error if the workload that consumes it is in the wrong place. The most expensive token in the sovereign lane is a bargain if it keeps a regulated workload inside the boundary it was promised.

Token economics tells you how much intelligence costs. Data gravity tells you where it belongs. Neither of those conversations is finished until both are on the same slide.


This post is the companion to The CEO's Guide to Token Economics. The technical machinery underneath lives in the three-part Token Economy series. For the complementary argument about sovereign architecture patterns — the technical side of the same operating choice — read Private AI: The Next Step in Enterprise Intelligence. The three posts meet in the control plane, and they meet in the metro.