Companion view

The AI Market Reference Architecture.

A living taxonomy for AI market boundaries, leaders, metrics, and model cards.

Cloud had IaaS, PaaS, and SaaS. AI needs a different layer cake: infrastructure, model portfolios, raw data management, AI-ready data, developer tools, agent tools, and commercial tools. The strategic fight is whether SaaS keeps owning the data, or whether enterprises own their data and expose it safely to agents.

The layer cake: from GPUs to commercial tools.

The new boundary is data access. Models matter, but the market structure forms around who owns the data, who makes it AI-ready, who builds with it, which agents can act on it, and which commercial tools survive on top.

L1 Infrastructure

Layer 1 — AI Infrastructure

Physical and cloud substrate for AI: GPUs, accelerators, networking, datacenters, power, storage, and managed compute capacity.

Leaders

Amazon Web Services

Broadcom

Google

Microsoft

Control points

compute / power / networking / cloud_capacity

Watch

Alibaba

AMD

Cloudflare

L2 Models

Layer 2 — Model Portfolio

Foundation, frontier, open-weight, specialist, and routed model portfolios that provide the intelligence primitives consumed by higher layers.

Leaders

Amazon Web Services

Anthropic

Databricks

Google

Control points

weights / benchmarks / token_pricing / model_policy / routing

Watch

Alibaba

DeepSeek AI

HeyGen

L3 Data Mgmt

Layer 3 — Data Management

Raw systems of record, warehouses, lakehouses, operational databases, and document stores that hold enterprise data before it is shaped for AI use.

Leaders

Databricks

Google

Palantir

Salesforce

Control points

source_data / governance / lineage / access_control

Watch

Alibaba

Elastic

Glean

L4 AI Data

Layer 4 — AI-Ready Data

Processed, indexed, embedded, retrievable, policy-wrapped, and API-exposed data that can be consumed directly by models and agents.

Leaders

Amazon Web Services

Databricks

Hugging Face

Pinecone

Control points

embeddings / vector_index / retrieval / api_gateway / context_layer

Watch

Cloudflare

Elastic

Glean

L5 Dev Tools

Layer 5 — Developer Tools

Build tools for humans and agents creating software: IDE copilots, coding agents, CLI agents, context management, orchestration, evals, and tool harnesses.

Leaders

Amazon Web Services

Anthropic

GitHub

Hugging Face

Control points

developer_workflow / context_management / orchestration / tool_harness / evals

Watch

Cloudflare

Cursor

Vercel

L6 Agents

Layer 6 — Agent Tools

Agent runtimes and operational tools that run scheduled, triggered, multi-agent, tool-using, and exception-escalating work.

Leaders

Anthropic

CrowdStrike

GitHub

OpenAI

Control points

tools / memory / permissions / task_execution / human_review

Watch

Cursor

Harvey

Sierra

L7 Commercial

Layer 7 — Commercial Tools

Finished point solutions and vertical AI tools: video, voice, legal, sales, research, support, writing, and other specialist business outcomes.

Leaders

Adobe

Anthropic

CrowdStrike

Google

Control points

workflow_distribution / proprietary_context / seats / vertical_expertise

Watch

Glean

Harvey

HeyGen

Boundary rule

Vendors get one primary layer by market role, plus secondary roles when they span the layer cake. Confidence is explicit so category ambiguity does not masquerade as precision.

Confidence distribution

19 high, 27 medium, 4 low-confidence placements.

Low confidence means useful signal, not settled category leadership.

Market scorecard

Quantitative signals that can survive a refresh cycle.

Each layer gets metrics that can be revalidated from public sources. When the data is sparse, the chart says so instead of inventing a false leaderboard.

Layer

Signal

Value

Freshness

Read

L1 Infrastructure
3 tracked signals
Data-center capex growth
Q1 2025
53% YoY
high
90d cadence
AI infrastructure remains capital constrained; capex direction is the first-order signal for who can supply capacity.
L2 Models
3 tracked signals
SWE-bench performance jump
2024 vs 2023
71.7%
vs 4.4%
high
180d cadence
Software-engineering capability moved from toy benchmark to enterprise relevance, changing the model layer's practical value.
L3 Data Mgmt
3 tracked signals
Organizations using AI
2024 vs 2023
78%
vs 55%
high
365d cadence
More organizational AI use increases pressure to expose governed systems of record to agents without copying everything into SaaS silos.
L4 AI Data
3 tracked signals
Vector and hybrid retrieval availability
current
available across Pinecone, Weaviate, Milvus, MongoDB, Elastic
high
90d cadence
Retrieval infrastructure is now broad enough to treat AI-ready data as a separate layer between raw datastores and agent execution.
L5 Dev Tools
3 tracked signals
Coding tools as leading GenAI application category
2025 estimate
top enterprise spend category
medium
365d cadence
Developer tools are the first place where AI changes production work, not only information retrieval.
L6 Agents
4 tracked signals
Agent startups tracked
March 2025
170+
medium
180d cadence
Agentic workflows are a distinct market structure, not just another app feature.
L7 Commercial
4 tracked signals
GenAI application spend
2025 estimate
$19B+
medium
365d cadence
Commercial tools are already a major value-capture zone, but the durable winners need workflow depth that survives model-provider expansion.

Layer 1 — AI Infrastructure

Data-center capex growth: 53% YoY

high

AI infrastructure remains capital constrained; capex direction is the first-order signal for who can supply capacity.

Layer 2 — Model Portfolio

SWE-bench performance jump: 71.7%

high

Software-engineering capability moved from toy benchmark to enterprise relevance, changing the model layer's practical value.

Layer 3 — Data Management

Organizations using AI: 78%

high

More organizational AI use increases pressure to expose governed systems of record to agents without copying everything into SaaS silos.

Layer 4 — AI-Ready Data

Vector and hybrid retrieval availability: available across Pinecone, Weaviate, Milvus, MongoDB, Elastic

high

Retrieval infrastructure is now broad enough to treat AI-ready data as a separate layer between raw datastores and agent execution.

Layer 5 — Developer Tools

Coding tools as leading GenAI application category: top enterprise spend category

medium

Developer tools are the first place where AI changes production work, not only information retrieval.

Layer 6 — Agent Tools

Agent startups tracked: 170+

medium

Agentic workflows are a distinct market structure, not just another app feature.

Layer 7 — Commercial Tools

GenAI application spend: $19B+

medium

Commercial tools are already a major value-capture zone, but the durable winners need workflow depth that survives model-provider expansion.

Metrics by category

What we track, and how often it decays.

Each category carries a small metric basket: one or two headline indicators plus supporting signals. The point is not to rank everything every day; it is to keep a durable public watchlist with explicit confidence and refresh cadence.

L1 Infrastructure

Layer 1 — AI Infrastructure

3 signals

Data-center capex growth
capital / Q1 2025
53% YoY
high / refresh 90d
AI infrastructure remains capital constrained; capex direction is the first-order signal for who can supply capacity.
High-end accelerated servers
mix_shift / 2025 forecast
> one-third of total data-center capex
medium / refresh 90d
Accelerator-heavy systems are no longer a niche line item; they are a large share of the data-center investment stack.
Data-center capex forecast
capital / 2025 forecast
30% growth
high / refresh 90d
Sustained infrastructure expansion keeps the AI substrate layer strategic, not commodity.

L2 Models

Layer 2 — Model Portfolio

3 signals

SWE-bench performance jump
capability / 2024 vs 2023
71.7%
prior 4.4%
high / refresh 180d
Software-engineering capability moved from toy benchmark to enterprise relevance, changing the model layer's practical value.
Open-weight vs closed-weight performance gap
capability / 2024 vs prior year
1.70%
prior 8.04%
high / refresh 180d
Narrowing gaps increase buyer leverage and make openness a strategic dimension, not only a developer preference.
Live model price, latency, speed, and context comparisons
efficiency / live
available
high / refresh 30d
Model leadership should be benchmark-adjusted by cost and latency, not read from capability leaderboards alone.

L3 Data Mgmt

Layer 3 — Data Management

3 signals

Organizations using AI
adoption / 2024 vs 2023
78%
prior 55%
high / refresh 365d
More organizational AI use increases pressure to expose governed systems of record to agents without copying everything into SaaS silos.
Generative AI in at least one business function
adoption / 2024 vs 2023
71%
prior 33%
high / refresh 365d
Business-function adoption makes source-data access, lineage, and permissions first-class AI architecture concerns.
Enterprise GenAI spend
spend / 2025 estimate
$37B
medium / refresh 365d
Enterprise AI spend eventually lands against data platforms, not only model APIs and point tools.

L4 AI Data

Layer 4 — AI-Ready Data

3 signals

Vector and hybrid retrieval availability
capability / current
available across Pinecone, Weaviate, Milvus, MongoDB, Elastic
high / refresh 90d
Retrieval infrastructure is now broad enough to treat AI-ready data as a separate layer between raw datastores and agent execution.
Model and context gateway pattern
architecture / 2026
emerging
medium / refresh 90d
API gateways, model routers, and context wrappers become the control plane for which agents can consume which data.
Open-model serving demand proxy
adoption / live
downloads and model-card activity
medium / refresh 30d
Open ecosystem activity is a maintainable proxy for AI-ready data and serving demand where vendors do not disclose token volume.

L5 Dev Tools

Layer 5 — Developer Tools

3 signals

Coding tools as leading GenAI application category
spend / 2025 estimate
top enterprise spend category
medium / refresh 365d
Developer tools are the first place where AI changes production work, not only information retrieval.
Coding agents and CLI tools
workflow / 2026
Cursor / Codex / Claude Code / OpenCode
medium / refresh 90d
Build tools are moving from autocomplete to agent supervision, with context management and tool harnesses becoming the real differentiators.
Context management as build substrate
architecture / 2026
emerging
medium / refresh 90d
The durable build-tool layer may be less about the IDE and more about how context, tools, repo state, and review loops are packaged for agents.

L6 Agents

Layer 6 — Agent Tools

4 signals

Agent startups tracked
market_structure / March 2025
170+
medium / refresh 180d
Agentic workflows are a distinct market structure, not just another app feature.
Agent startup funding
funding / 2024
$3.8B
medium / refresh 365d
Funding growth signals that investors view delegated work as a separate value pool.
Public production-readiness disclosure
disclosure / 2026
uneven
medium / refresh 90d
Human-in-loop controls, tool permissions, and audit logs should carry as much weight as demos.
Autonomous-SWE leader valuation
funding / May 2026
$26B post-money
prior $10.2B
high / refresh 90d
A ~2.5x step-up in eight months on a claimed ~$492M run-rate marks the autonomous software-engineering category as consolidating into a few heavily capitalized players priced on revenue velocity, not promise.

L7 Commercial

Layer 7 — Commercial Tools

4 signals

GenAI application spend
spend / 2025 estimate
$19B+
medium / refresh 365d
Commercial tools are already a major value-capture zone, but the durable winners need workflow depth that survives model-provider expansion.
Microsoft 365 Copilot paid enterprise seats
adoption / April 2026
20M+
high / refresh 90d
Seat adoption is the cleanest public signal for commercial AI distribution through incumbent software suites.
Specialist tool pressure
market_structure / 2026
model providers moving upward
medium / refresh 90d
OpenAI and Anthropic moving into implementation services raises the bar for point solutions: they need proprietary workflow depth, data access, or domain-specific trust.
Agentic CRM ARR (Agentforce)
adoption / FY27 Q1 (May 2026)
~$1.2B (+205% YoY)
high / refresh 90d
The system-of-record incumbent's agentic tier crossed from narrative to a billion-dollar line item, evidence that "system of action" revenue is now pulling core-platform consumption rather than living in a separate SKU.

Monthly position

Leaders, challengers, and direction of travel.

This is the living part of the taxonomy: a monthly read of who is gaining, holding, or losing position inside each category. Scores are editorial indices, not market share, and they should be recalibrated as better public signals arrive.

L1 Infrastructure

Layer 1 — AI Infrastructure

2026-03 / 2026-04 / 2026-05 / 2026-07

NVIDIA
leader
96↑
position index
Merchant accelerator leadership remains the infrastructure benchmark; score rises with continued capex pull-through and networking attach.
Microsoft
leader
89↑
position index
Infrastructure position strengthens when Azure capacity and Copilot distribution reinforce each other.
Amazon Web Services
leader
88→
position index
Cloud capacity and procurement breadth keep AWS in the leader band.
AMD
challenger
70↑
position index
Challenger score improves when substitution pressure and cloud availability become more credible.

L2 Models

Layer 2 — Model Portfolio

2026-03 / 2026-04 / 2026-05 / 2026-07

OpenAI
leader
94→
position index
Frontier capability plus product habit keeps OpenAI in the leader band. The July read is gated upside: the GPT-5.6 family (Sol / Terra / Luna) previewed June 26 to only ~20 government-approved partners, so the next capability step exists but is not yet purchasable — release gating is now a live availability variable until GA.
Anthropic
leader
93↑
position index
Strongest July in the layer: Sonnet 5 shipped GA June 30 (1M context, 85.2% SWE-bench Verified, default for Free/Pro) and Fable 5 was restored globally July 1 after the export-control order was withdrawn — putting the top public-board model back on sale, albeit metered to usage credits after July 7.
Google
leader
89→
position index
Google benefits from simultaneous model, cloud, and consumer distribution signals; July shows fast media-tier churn (Veo 2.0/3.0 shut down June 30 as cheap successors shipped) while the Gemini 3.5 Pro GA — the pending closed-frontier catalyst — slipped to July.
Hugging Face
leader
83↑
position index
Open-model distribution and model-card gravity keep Hugging Face in the platform leader band; July's LongCat-2.0 MIT drop (1.6T MoE) and the Leanstral 1.5 / TwoTower releases all landed Hub-first.
DeepSeek AI
challenger
81→
position index
DeepSeek remains the open-weight efficiency challenger forcing price and training-capital discipline; the V4 technical report published this window puts hard numbers on 1M-context economics (~90% KV-cache reduction), and the July 24 legacy-alias shutdown is a live migration deadline for integrators.

L3 Data Mgmt

Layer 3 — Data Management

2026-03 / 2026-04 / 2026-05 / 2026-07

Databricks
leader
88↑
position index
Governed data gravity keeps Databricks central as enterprise AI shifts from experiments to operated systems.
Snowflake
leader
84↑
position index
Snowflake remains a leader when AI starts from governed enterprise data.

L4 AI Data

Layer 4 — AI-Ready Data

2026-03 / 2026-04 / 2026-05 / 2026-07

Pinecone
leader
84↑
position index
Pinecone anchors the vector-store portion of the AI-ready data layer.
Weaviate
challenger
74↑
position index
Weaviate gains when teams want open, hybrid, and self-controlled retrieval infrastructure.
Vercel
challenger
71↑
position index
Runtime position improves as model routing becomes part of application deployment rather than a separate platform decision.
Cloudflare
challenger
69↑
position index
Edge distribution and security posture keep Cloudflare relevant as inference gets closer to users and policies.
Together AI
challenger
68↑
position index
Open-model serving demand keeps Together AI in the runtime challenger set.

L5 Dev Tools

Layer 5 — Developer Tools

2026-03 / 2026-04 / 2026-05 / 2026-07

Anthropic
leader
85↑
position index
Claude Code makes Anthropic one of the most important movers in build tooling, not only model APIs.
GitHub
leader
84↑
position index
GitHub owns a high-frequency execution surface where agentic behavior can become daily workflow rather than demo.
OpenAI
leader
82↑
position index
Codex and implementation services move OpenAI upward from model portfolio into the developer-tools layer.
Cursor
challenger
77↑
position index
Cursor gains as coding remains the cleanest category for visible AI productivity and agentic workflow adoption.
LangChain
specialist
66↑
position index
Orchestration and observability mindshare rises as multi-model systems become more common.

L6 Agents

Layer 6 — Agent Tools

2026-03 / 2026-04 / 2026-05 / 2026-07

Anthropic
leader
87↑
position index
Claude's coding and tool-use posture keeps Anthropic central to early production agentic workflows.
ServiceNow
leader
80↑
position index
ServiceNow is positioned where delegated action meets governed enterprise process.
Hermes
specialist
70↑
position index
Hermes represents owned-agent operations over personal infrastructure and owned data, with humans paged only on exception.
Sierra
challenger
69↑
position index
Sierra is a useful challenger signal for customer-facing production agents.
Zeroclaw
specialist
66↑
position index
Zeroclaw is tracked as a future-stack signal for lightweight autonomous execution over owner-controlled tools.

L7 Commercial

Layer 7 — Commercial Tools

2026-03 / 2026-04 / 2026-05 / 2026-07

Microsoft
leader
92↑
position index
Paid Copilot seats and enterprise distribution make Microsoft the application-layer benchmark.
Google
leader
88↑
position index
Gemini consumer adoption keeps Google in the leader band even when enterprise monetization is harder to isolate.
Salesforce
leader
81↑
position index
Salesforce stays strong where AI rides existing CRM workflow ownership.
Wiz
leader
80↑
position index
Cloud security posture extends naturally into AI posture as models and agents become part of the attack surface.
CrowdStrike
leader
78↑
position index
Security workflow ownership makes CrowdStrike relevant as AI changes detection, response, and governance boundaries.

Shared responsibility

The cloud lesson AI still needs.

Cloud became governable when buyers understood what the provider owned and what the customer still had to own. AI needs the same boundary language for accuracy, data exposure, actions, observability, and outcomes.

L1 Infrastructure

Layer 1 — AI Infrastructure

Provider owns capacity, availability, hardware lifecycle, and physical resilience. Customer owns workload placement, demand forecasting, and utilization risk.

L2 Models

Layer 2 — Model Portfolio

Model provider owns training, model behavior, release cadence, and safety defaults. Customer owns model selection, workload routing, evals, and fallback policy.

L3 Data Mgmt

Layer 3 — Data Management

Data platform owns storage, lineage, access control, and governance surfaces. Customer owns data quality, semantic modeling, and agent-access policy.

L4 AI Data

Layer 4 — AI-Ready Data

AI-ready data provider owns retrieval quality, indexing, wrappers, and access surfaces. Customer owns source-data truth, freshness, permissions, and which agents may consume which context.

L5 Dev Tools

Layer 5 — Developer Tools

Tool provider owns coding UX, context packaging, model/tool invocation, and review affordances. Customer owns repository permissions, acceptance criteria, and production release governance.

L6 Agents

Layer 6 — Agent Tools

Agent tool owns execution loop, tool-use guardrails, memory, logging, and escalation mechanics. Customer owns delegated authority, permissions, rollback, and exception policy.

L7 Commercial

Layer 7 — Commercial Tools

Tool provider owns workflow UX, packaged domain behavior, and product controls. Customer owns data portability, agent access, workflow redesign, and whether point tools become durable or get absorbed by agent systems.

Controls tracked across layers

Accuracy and fitness for taskData exposure and retentionAction safety and rollbackLogs, traces, evals, and cost visibilityCompliance, residency, and audit evidenceHuman review and escalationBusiness outcome accountability

Top model cards

Capability is only one column.

The cards link model lineage to market fit: capability signature, operating envelope, drift watch, and trust surface. They are designed to sit on top of the LLM Evolutionary Tree rather than replace it.

reasoning

GPT-5.5

OpenAI / Frontier closed reasoning default

Reasoning5/5

Coding5/5

Multi5/5

Tools5/5

Context4/5

Cost3/5

Open1/5

Best fit

High-stakes reasoning, agentic coding, multimodal work, and workloads where frontier quality outweighs portability.

Trust surface

Strong enterprise posture through hosted controls; portability and inspectability remain closed-model constraints.

reasoning

Claude Opus 4.8

Anthropic / Frontier enterprise reasoning and coding model

Reasoning5/5

Coding5/5

Multi4/5

Tools5/5

Context4/5

Cost3/5

Open1/5

Best fit

Long-form reasoning, agentic coding, tool-use workflows, and enterprise contexts that weight safety and trust posture heavily; the >10-point SWE-Bench Pro lead favors multi-file coding agents.

Trust surface

Strong trust-center posture; fast mode dropped ~3x in cost at flat headline pricing, but still requires application-level output review and action controls.

reasoning

Gemini 2.5 Pro

Google + DeepMind / Frontier multimodal reasoning model

Reasoning5/5

Coding4/5

Multi5/5

Tools4/5

Context5/5

Cost3/5

Open1/5

Best fit

Multimodal and long-context workloads, especially where Google Cloud or Google product distribution is already strategic.

Trust surface

Strong cloud control surface; model behavior remains closed and requires independent application evals.

reasoning

DeepSeek-R1

DeepSeek AI / Open-weight reasoning pressure point

Reasoning5/5

Coding4/5

Multi1/5

Tools3/5

Context4/5

Cost5/5

Open4/5

Best fit

Reasoning workloads where cost, portability, and self-hosting leverage matter more than first-party product polish.

Trust surface

Openness improves inspection and hosting control; customers inherit more responsibility for safety, deployment, and monitoring.

mixture_of_experts

Llama 4 Maverick

Meta AI / Open ecosystem frontier family

Reasoning4/5

Coding4/5

Multi4/5

Tools3/5

Context4/5

Cost4/5

Open4/5

Best fit

Enterprise and developer contexts that value ecosystem breadth, inspectability, and deployment flexibility.

Trust surface

Strong portability; customer must own hosting, eval, and policy controls unless using a managed provider.

reasoning

Qwen3-235B-A22B-Thinking-2507

Alibaba / Open-weight reasoning and regional sovereignty signal

Reasoning5/5

Coding4/5

Multi3/5

Tools3/5

Context4/5

Cost4/5

Open4/5

Best fit

Reasoning workloads where open weights, regional availability, and China-scale ecosystem signals matter.

Trust surface

Openness aids inspection; jurisdiction, hosting, and data governance need explicit buyer review.

mixture_of_experts

Mistral Large 3

Mistral AI / European frontier and sovereignty model

Reasoning4/5

Coding4/5

Multi3/5

Tools3/5

Context4/5

Cost4/5

Open3/5

Best fit

European enterprise and sovereignty-sensitive workloads that need strong performance without a single US frontier dependency.

Trust surface

Regional positioning helps governance narratives; individual deployment controls still determine practical risk.

decoder_only

Cohere Command A

Cohere / Enterprise retrieval and workflow specialist

Reasoning3/5

Coding3/5

Multi1/5

Tools4/5

Context4/5

Cost4/5

Open1/5

Best fit

Retrieval-heavy enterprise workflows, knowledge applications, and workloads that value business-context fit over raw frontier rank.

Trust surface

Enterprise positioning is useful, but buyer-side evals should prove retrieval quality and data handling.

reasoning

Grok 4

xAI / Frontier challenger with social distribution

Reasoning5/5

Coding4/5

Multi3/5

Tools3/5

Context4/5

Cost3/5

Open1/5

Best fit

Buyers exploring frontier alternatives and consumer/social-context distribution signals.

Trust surface

Closed-provider controls apply; enterprise trust posture should be treated as less proven than the leading incumbents until evidence improves.

mixture_of_experts

gpt-oss-120b

OpenAI / Open-weight strategic hedge from a closed-model leader

Reasoning4/5

Coding4/5

Multi1/5

Tools3/5

Context4/5

Cost4/5

Open4/5

Best fit

Portability-sensitive teams that want open-weight leverage without leaving the OpenAI model family narrative entirely.

Trust surface

Open weights shift more operational responsibility to the deployer while retaining strategic familiarity with a leading model provider.

reasoning

Claude Sonnet 5

Anthropic / Flagship mid-tier agentic default

Reasoning4/5

Coding4/5

Multi3/5

Tools5/5

Context5/5

Cost4/5

Open1/5

Best fit

Agentic knowledge work and long-context coding where near-Opus quality at a mid-tier sticker price wins; effort-level dials make it a natural routing target. Independent testing shows it beating Opus 4.8 on agentic knowledge-work benchmarks.

Trust surface

Standard hosted Anthropic controls; the token-hungry max-effort mode needs budget guardrails and per-task cost telemetry, not per-token assumptions.

reasoning

GPT-5.6 Sol

OpenAI / Next-generation frontier family at reset pricing

Reasoning5/5

Coding4/5

Multi3/5

Tools5/5

Context3/5

Cost4/5

Open1/5

Best fit

Agentic and terminal-heavy workloads — Sol leads the AA Coding Agent Index at 80 and posts 88.8% Terminal-Bench 2.1 (91.9% with the ultra four-agent setting) — and cost-tier routing via Terra at $2.50/$15, exactly half GPT-5.5's rate.

Trust surface

All three tiers rated High in bio/chem and cyber under the Preparedness Framework, GA'd through a government-coordinated process — expect eligibility and monitoring obligations to travel with API access, and run third-party behavioral evals before granting autonomy.

mixture_of_experts

LongCat-2.0

Meituan / Open-weight near-frontier agentic coding challenger

Reasoning4/5

Coding4/5

Multi1/5

Tools4/5

Context5/5

Cost5/5

Open4/5

Best fit

High-volume agentic coding and 1M-context workloads where MIT licensing and self-host economics matter; arrives with two months of real-world OpenRouter demand evidence as the stealth "Owl Alpha".

Trust surface

MIT license maximizes portability and inspection; deployers inherit hosting, eval, and jurisdiction/data-governance review for a China-origin model.

reasoning

Leanstral 1.5

Mistral AI / Open formal-verification and proof-engineering specialist

Reasoning5/5

Coding4/5

Multi1/5

Tools4/5

Context3/5

Cost5/5

Open5/5

Best fit

Lean 4 proof engineering and formal code verification at ~$4/problem — roughly 75x cheaper than frontier brute force; Mistral reports 5 previously unknown bugs found across 57 open-source repositories.

Trust surface

Apache 2.0 gives full inspection and hosting control; formal-proof outputs are machine-checkable, which reduces (but does not remove) output-review burden.

mixture_of_experts

Nemotron-Labs-TwoTower

NVIDIA / Open diffusion-decoding efficiency research artifact

Reasoning3/5

Coding3/5

Multi1/5

Tools2/5

Context3/5

Cost5/5

Open4/5

Best fit

Serving-throughput R&D and labs with strong AR checkpoints — the frozen-backbone retrofit needed ~2.1T training tokens vs 25T for the backbone, and one checkpoint supports diffusion, mock-AR, and standard AR decoding for A/B rollout.

Trust surface

Open weights with a permissive-but-custom NVIDIA license; deployers should review the Nemotron Open Model License terms rather than assume Apache-equivalence.

reasoning

Grok 4.5

xAI / Cost-per-task frontier workhorse

Reasoning4/5

Coding4/5

Multi2/5

Tools4/5

Context4/5

Cost5/5

Open1/5

Best fit

High-volume agent fleets where token efficiency dominates — $2/$6 per M tokens with a claimed ~4.2x output-token advantage over Opus 4.8 per SWE-Bench Pro task, and co-training with Cursor targets the coding-agent deployment surface directly.

Trust surface

Standard hosted controls; the EU-last launch makes regional availability a contractual item, and vendor-reported efficiency claims should be re-measured on your own task mix before fleet commitments.

mixture_of_experts

Hy3

Tencent / Clean-licensed open-weight agentic-search leader

Reasoning4/5

Coding3/5

Multi1/5

Tools4/5

Context4/5

Cost5/5

Open5/5

Best fit

Agentic search and retrieval workloads (84.2 BrowseComp, 91.0 DeepSearchQA, Tencent-reported) at open-floor economics, and self-hosted serving — the 3.8B MTP layer doubles as a speculative- decoding draft worth +40% local throughput in llama.cpp.

Trust surface

Apache 2.0 with no regional exclusions maximizes portability and inspection; deployers inherit hosting, eval, and jurisdiction review, and the benchmark claims are vendor-reported pending independent runs.

reasoning

Muse Spark 1.1

Meta / Meta's first-party API era opener

Reasoning3/5

Coding3/5

Multi4/5

Tools4/5

Context2/5

Cost3/5

Open1/5

Best fit

Multimodal agentic evaluation inside the Meta ecosystem, and teams that want early position on the Meta Model API before pricing and rate structures settle.

Trust surface

First-party API in public preview with vendor-claimed capabilities; treat as evaluation-only until independent benchmarks and API SLAs exist.

Methodology.

Primary category by market role. Companies can span the stack, but the main placement reflects what a buyer or board most needs to understand.
Control points over logo boxes. The taxonomy tracks who owns compute, model choice, source data, AI-ready context, developer workflow, agent execution, and vertical commercial outcomes.
Quantitative claims with freshness. Metrics include period, confidence, source refs, and refresh cadence. Sparse disclosure stays labeled as sparse.
Monthly position history. Leader and challenger status is tracked as a dated index so the market map can show direction of travel, not only a current logo placement.
Model cards as operating cards. Cards include best fit, poor fit, drift watch, and trust surface, not just benchmark rank.
Data ownership as the strategic axis. The future-state bet is that durable AI architectures let us own the data and permission agents into it, rather than letting each SaaS vendor trap context inside its own product boundary.

AI-readable exports.

Markdown rollup Compact JSON Raw taxonomy YAML

Boundaries. Leaders. Evidence.