TL;DR
- PAR operates at three scales: micro (seconds), meso (hours), macro (weeks)
- World models provide consequence—the missing half that language models lack
- Agent economies emerge when identity + reputation + capabilities create markets
- The convergence: Society + loops + grounding = compounding competence
In Part 1, I showed how agent societies enable distributed cognition.
In Part 2, I explained why most will fail—and what a Reinforced Learning Environment needs to compound instead of collapse.
Now: the architecture that makes it work.
The Control Problem at Scale
Most people treat agent systems like they're chatbots with tools. But once you run multi-agent workflows, you discover the real challenge is control:
- When to plan
- When to act
- When to critique
- When to stop
- When to escalate
- What to store as memory
- What to promote into "trusted artifacts"
This is where the Plan-Act-Reflect (PAR) pattern becomes essential. I've written about PAR in the context of self-learning ETL pipelines. Here, I want to show how PAR operates at three distinct time scales in agent societies.
PAR at Three Scales
Plan-Act-Reflect (PAR) Loop
Self-learning ETL that improves over time
Plan
Analyze source and determine strategy
Act
Execute transformation and capture results
Reflect
Evaluate results and learn for next time
Micro-PAR (Seconds): Response Integrity
At the finest grain, PAR ensures each response is sound:
Plan: Structure the claim. What am I asserting? What evidence do I have?
Act: Draft the response. Execute any necessary tool calls.
Reflect: Critic checks grounding, novelty, safety. Does this contradict known facts? Is it actually new? Could it cause harm?
Micro-PAR happens within a single turn. It's the quality gate before anything gets published.
Meso-PAR (Minutes to Hours): Task Completion
At the task level, PAR coordinates multiple agents:
Plan: Decompose the work. Assign roles (builder, critic, curator). Define success criteria.
Act: Execute tools, generate intermediate artifacts, run evaluations.
Reflect: Does the output pass the eval harness? If not, iterate. Route failures to specialists.
Meso-PAR is where most "agentic" work happens today. It's what makes multi-agent systems more capable than single-agent ones.
Macro-PAR (Days to Weeks): System Evolution
At the system level, PAR drives improvement:
Plan: What capabilities need improvement? Where are the gaps? What traces should we collect?
Act: Aggregate verified outcomes. Update routing policies. A/B test new approaches.
Reflect: Did the changes improve quality metrics? Update the gold set. Redeploy improved agents.
Macro-PAR is what turns an agent system into a learning system.
This is the key move: the same pattern operates fractally. Within a response, within a task, within the system itself. The discipline scales.
World Models: The Missing Half
Language models are great at heuristics and explanation. But reliability comes from consequence.
LLMs can tell you what might happen. World models can tell you what will happen—by simulating it.
This is the "world model" framing from David Ha and Jürgen Schmidhuber: models that can run counterfactuals, simulate outcomes, and compress experience into reusable abstractions.
A practical hybrid stack looks like:
| Component | What It Does |
|---|---|
| LLM | Language + heuristic reasoning |
| World Model / Simulator / Solver | Consequence + constraints |
| Memory | Episodic + semantic + graph |
| Policy / Planner | Action selection |
| Verification | Truth signal |
Language models propose. World models test. Memory stores. Policy selects. Verification confirms.
That stack is how you go from emergent behavior to emergent competence.
Why This Matters for Agent Societies
In a single-agent system, the world model might be a code executor, a physics simulator, or a constraint solver.
In an agent society, the society itself becomes part of the world model. Other agents provide:
- Priors (what's been tried before)
- Constraints (what the community accepts)
- Feedback (how proposals are received)
- Selection (what survives)
The society generates context. The world model generates consequences. Verification generates truth.
Agent Economies: Identity + Reputation + Capability Markets
Once you have stable identity, accumulated reputation, and verified outcomes, a new kind of market appears.
Agent Economy Flow
Identity + reputation + capability markets create a flywheel
Specialists
Agents compete on verified outcomes
Task Routing
Workflows route to proven performers
Artifact Production
Verified outputs become building blocks
Reputation Update
Performance feeds back to routing
Reputation feeds back → Better routing → Higher quality → More reputation
Agent Roster
Code generation
Review & validation
Knowledge synthesis
Data analysis
What Agent Economies Look Like
Specialists compete on verified outcomes. Not "who sounds most confident," but "who has the track record for this kind of problem."
Workflows route tasks to proven performers. High-reputation agents get more interesting work. Low-reputation agents get filtered out or assigned to lower-stakes tasks.
Artifacts become reusable building blocks. A verified solution doesn't just solve one problem—it becomes a template that other agents can apply.
Reputation becomes portable capital. An agent's track record on Platform A can (in principle) transfer to Platform B. Reputation becomes currency.
This is where the "2.0 institutions" from Part 1 show up:
- Beauty Routine Lab 2.0
- Health Triage Commons 2.0
- Manufacturing Debug Bay 2.0
- Eval Harness Exchange 2.0
Each of these is a market where specialists compete, artifacts accumulate, and verification determines value.
The Enterprise Shift: Model Risk → Ecosystem Risk
Enterprises today talk about "model risk." They ask: what if the LLM hallucinates? What if it leaks data? What if it produces biased outputs?
Agent societies introduce a new category: ecosystem risk.
New Risk Vectors
Prompt injection becomes social. In a society, malicious agents can influence other agents. The attack surface isn't just the prompt—it's the entire interaction graph.
Institutions can drift toward cohesion over truth. Groups can reinforce shared beliefs even when those beliefs are wrong. Echo chambers aren't just a human problem.
Incentives can select for "looks right." If verification is weak, reputation can accumulate for plausible-sounding nonsense. The system optimizes for appearance, not substance.
Memory can calcify wrong norms into policy. If early interactions establish bad patterns, memory can institutionalize those patterns. Technical debt becomes cultural debt.
The Implication
Governance must live at the environment level—not just the model level.
This means:
- Identity controls (who can participate)
- Incentive design (what gets rewarded)
- Verification infrastructure (how truth is established)
- Selection mechanisms (what persists)
If you've read my piece on RaaS architecture, this connects directly: the control plane that makes outcomes verifiable for one agent scales to make outcomes verifiable for societies of agents.
The Convergence Story
Put it all together:
- Societies generate priors
- World models generate consequences
- Verification turns consequences into truth
- PAR loops turn truth into improvement
That's the convergence:
Society + loops + grounding = compounding competence.
The societies that get this right will accelerate. The ones that don't will produce increasingly sophisticated noise.
What This Means for Builders
If you're building in this space, here's what matters:
Start with verification
Don't build the agent first. Build the verification infrastructure first. What counts as success? How will you measure it? What enters the gold set?
Design for selection
What persists? What gets filtered? If everything persists, noise accumulates. If nothing persists, there's no learning.
Make reputation real
Reputation should route tasks. High performers should get more opportunity. Low performers should get less. If reputation is cosmetic, it doesn't select for quality.
Think in loops
Micro-PAR, meso-PAR, macro-PAR. Each scale needs its own feedback loop. The absence of feedback at any scale creates blind spots.
Plan for ecosystem risk
Model risk is necessary but not sufficient. What happens when agents influence each other? What happens when norms drift? What happens when memory calcifies bad patterns?
The Punchline
The question isn't "How smart is the model?"
The question is "How smart is the system the model participates in?"
The architecture that wins is the one that turns emergent behavior into emergent competence. That requires verification, selection, and feedback at every scale.
In Part 4, I'll show you 12 specific agent-native institutions that are ready to be built—the "2.0" opportunities that nobody's talking about yet.