Brian Letort | AI Architect & Data Center Innovator

TL;DR

PAR operates at three scales: micro (seconds), meso (hours), macro (weeks)
World models provide consequence—the missing half that language models lack
Agent economies emerge when identity + reputation + capabilities create markets
The convergence: Society + loops + grounding = compounding competence

Continue to Part 4: 12 Agent-Native Institutions

In Part 1, I showed how agent societies enable distributed cognition.

In Part 2, I explained why most will fail—and what a Reinforced Learning Environment needs to compound instead of collapse.

Now: the architecture that makes it work.

The Control Problem at Scale

Most people treat agent systems like they're chatbots with tools. But once you run multi-agent workflows, you discover the real challenge is control:

When to plan
When to act
When to critique
When to stop
When to escalate
What to store as memory
What to promote into "trusted artifacts"

This is where the Plan-Act-Reflect (PAR) pattern becomes essential. I've written about PAR in the context of self-learning ETL pipelines. Here, I want to show how PAR operates at three distinct time scales in agent societies.

PAR at Three Scales

Plan-Act-Reflect (PAR) Loop

Self-learning ETL that improves over time

Plan

Analyze source and determine strategy

Act

Execute transformation and capture results

Reflect

Evaluate results and learn for next time

Loop learns and improves each run

Micro-PAR (Seconds): Response Integrity

At the finest grain, PAR ensures each response is sound:

Plan: Structure the claim. What am I asserting? What evidence do I have?

Act: Draft the response. Execute any necessary tool calls.

Reflect: Critic checks grounding, novelty, safety. Does this contradict known facts? Is it actually new? Could it cause harm?

Micro-PAR happens within a single turn. It's the quality gate before anything gets published.

Meso-PAR (Minutes to Hours): Task Completion

At the task level, PAR coordinates multiple agents:

Plan: Decompose the work. Assign roles (builder, critic, curator). Define success criteria.

Act: Execute tools, generate intermediate artifacts, run evaluations.

Reflect: Does the output pass the eval harness? If not, iterate. Route failures to specialists.

Meso-PAR is where most "agentic" work happens today. It's what makes multi-agent systems more capable than single-agent ones.

Macro-PAR (Days to Weeks): System Evolution

At the system level, PAR drives improvement:

Plan: What capabilities need improvement? Where are the gaps? What traces should we collect?

Act: Aggregate verified outcomes. Update routing policies. A/B test new approaches.

Reflect: Did the changes improve quality metrics? Update the gold set. Redeploy improved agents.

Macro-PAR is what turns an agent system into a learning system.

This is the key move: the same pattern operates fractally. Within a response, within a task, within the system itself. The discipline scales.

World Models: The Missing Half

Language models are great at heuristics and explanation. But reliability comes from consequence.

LLMs can tell you what might happen. World models can tell you what will happen—by simulating it.

This is the "world model" framing from David Ha and Jürgen Schmidhuber: models that can run counterfactuals, simulate outcomes, and compress experience into reusable abstractions.

A practical hybrid stack looks like:

Component	What It Does
LLM	Language + heuristic reasoning
World Model / Simulator / Solver	Consequence + constraints
Memory	Episodic + semantic + graph
Policy / Planner	Action selection
Verification	Truth signal

Language models propose. World models test. Memory stores. Policy selects. Verification confirms.

That stack is how you go from emergent behavior to emergent competence.

Why This Matters for Agent Societies

In a single-agent system, the world model might be a code executor, a physics simulator, or a constraint solver.

In an agent society, the society itself becomes part of the world model. Other agents provide:

Priors (what's been tried before)
Constraints (what the community accepts)
Feedback (how proposals are received)
Selection (what survives)

The society generates context. The world model generates consequences. Verification generates truth.

Agent Economies: Identity + Reputation + Capability Markets

Once you have stable identity, accumulated reputation, and verified outcomes, a new kind of market appears.

Agent Economy Flow

Identity + reputation + capability markets create a flywheel

Specialists

Agents compete on verified outcomes

Task Routing

Workflows route to proven performers

Artifact Production

Verified outputs become building blocks

Reputation Update

Performance feeds back to routing

Reputation feeds back → Better routing → Higher quality → More reputation

Agent Roster

Builder-7

Code generation

Critic-3

Review & validation

Curator-1

Knowledge synthesis

Specialist-12

Data analysis

What Agent Economies Look Like

Specialists compete on verified outcomes. Not "who sounds most confident," but "who has the track record for this kind of problem."

Workflows route tasks to proven performers. High-reputation agents get more interesting work. Low-reputation agents get filtered out or assigned to lower-stakes tasks.

Artifacts become reusable building blocks. A verified solution doesn't just solve one problem—it becomes a template that other agents can apply.

Reputation becomes portable capital. An agent's track record on Platform A can (in principle) transfer to Platform B. Reputation becomes currency.

This is where the "2.0 institutions" from Part 1 show up:

Beauty Routine Lab 2.0
Health Triage Commons 2.0
Manufacturing Debug Bay 2.0
Eval Harness Exchange 2.0

Each of these is a market where specialists compete, artifacts accumulate, and verification determines value.

The Enterprise Shift: Model Risk → Ecosystem Risk

Enterprises today talk about "model risk." They ask: what if the LLM hallucinates? What if it leaks data? What if it produces biased outputs?

Agent societies introduce a new category: ecosystem risk.

New Risk Vectors

Prompt injection becomes social. In a society, malicious agents can influence other agents. The attack surface isn't just the prompt—it's the entire interaction graph.

Institutions can drift toward cohesion over truth. Groups can reinforce shared beliefs even when those beliefs are wrong. Echo chambers aren't just a human problem.

Incentives can select for "looks right." If verification is weak, reputation can accumulate for plausible-sounding nonsense. The system optimizes for appearance, not substance.

Memory can calcify wrong norms into policy. If early interactions establish bad patterns, memory can institutionalize those patterns. Technical debt becomes cultural debt.

The Implication

Governance must live at the environment level—not just the model level.

This means:

Identity controls (who can participate)
Incentive design (what gets rewarded)
Verification infrastructure (how truth is established)
Selection mechanisms (what persists)

If you've read my piece on RaaS architecture, this connects directly: the control plane that makes outcomes verifiable for one agent scales to make outcomes verifiable for societies of agents.

The Convergence Story

Put it all together:

Societies generate priors
World models generate consequences
Verification turns consequences into truth
PAR loops turn truth into improvement

That's the convergence:

Society + loops + grounding = compounding competence.

The societies that get this right will accelerate. The ones that don't will produce increasingly sophisticated noise.

What This Means for Builders

If you're building in this space, here's what matters:

Start with verification

Don't build the agent first. Build the verification infrastructure first. What counts as success? How will you measure it? What enters the gold set?

Design for selection

What persists? What gets filtered? If everything persists, noise accumulates. If nothing persists, there's no learning.

Make reputation real

Reputation should route tasks. High performers should get more opportunity. Low performers should get less. If reputation is cosmetic, it doesn't select for quality.

Think in loops

Micro-PAR, meso-PAR, macro-PAR. Each scale needs its own feedback loop. The absence of feedback at any scale creates blind spots.

Plan for ecosystem risk

Model risk is necessary but not sufficient. What happens when agents influence each other? What happens when norms drift? What happens when memory calcifies bad patterns?

The Punchline

The question isn't "How smart is the model?"

The question is "How smart is the system the model participates in?"

The architecture that wins is the one that turns emergent behavior into emergent competence. That requires verification, selection, and feedback at every scale.

In Part 4, I'll show you 12 specific agent-native institutions that are ready to be built—the "2.0" opportunities that nobody's talking about yet.

Agent Societies

Part 3 of 4

Part 2: Reinforced Learning Environments

Part 4: Looking Ahead