brianletort.ai
All issues

The Model Pulse

Issue 09 · Week 25 of 2026.

/Weekly read/~6 min read/Public sources onlyDownload brief

The Big Read

Open weights took the lead while the closed frontier stalled — and the benchmark itself was rewritten.

The thesis this issue defends

W25 was a loud week for open weights and a quiet one at the closed frontier. Z.ai shipped GLM-5.2 under a genuine MIT license — a ~744B / ~40B-active sparse-attention MoE with 1M context that independent testing (VentureBeat) says beats GPT-5.5 on several long-horizon coding benchmarks at roughly one-sixth the cost, and that Artificial Analysis now cites as the leading open-weight model. MiniMax-M3's sparse-attention weights matured in-window with an arXiv report validating its efficiency claims, though its non-OSI Community License gates commercial use. The closed frontier, by contrast, marked time: no GA from OpenAI (GPT-5.6 remains rumor) or xAI, Gemini 3.5 Pro slipped from June to July, and Anthropic's Claude Fable 5 stayed government-suspended the entire week (Opus 4.8 is the working leader). The third shift was measurement itself — Artificial Analysis rebased its Intelligence Index to v4.1, re-weighting the industry's headline benchmark around agentic tasks, so scores are no longer back-comparable to v4.0. The procurement implication: open self-host is now a live coding option, not a hedge; teams should pilot MIT-licensed GLM-5.2, read every 'open' license carefully (GLM-5.2 MIT vs MiniMax Community), and re-baseline evaluations on the agentic v4.1 index while keeping closed-frontier fallbacks given the demonstrated availability risk.

Tree delta

What changed in the tree.

2 models added, 0 updated.

Two W25 additions: GLM-5.2 (MIT open-weight frontier-adjacent MoE that took the open lead) and MiniMax-M3 (sparse-attention multimodal MoE, weights matured with arXiv verification).

Added (2)

  • glm-5-2
  • minimax-m3

Updated

None this period.

No new closed frontier model entered the tree: GPT-5.6 is rumor only, Gemini 3.5 Pro slipped to July, and Claude Fable 5 (added W24) stayed suspended. ByteDance's seed-2.1-pro-preview is excluded as an undisclosed preview.

Explore the LLM Evolutionary Tree

Frontier movements

Flagship-class releases.

2 releases this period.

Vendor-stated frontier capability. The releases that reset the closed-source ceiling.

  • /Anthropic/Frontier/Reasoning

    Claude Fable 5 (still suspended)

    Remained government-suspended for the full week; #1 on the rebased Artificial Analysis Index (60) but unavailable, so Opus 4.8 (56) is the top available closed model

    The closed-frontier leader on paper is unusable in practice for a second week, which keeps the availability/sovereign risk live. Architects should treat the AA top score as aspirational and standardize on the top available model (Opus 4.8) with fallback routing, not on a suspended SKU.

    Anthropic; Artificial Analysis Intelligence Index v4.1

  • /Google DeepMind/Frontier/Reasoning

    Gemini 3.5 Pro (slipped to July)

    GA slipped from June to July: still a limited Vertex preview with no public model card, pricing, or independent benchmark vs Fable 5/Opus 4.8

    A frontier movement by absence for the third consecutive issue. Buyers should keep an evaluation slot ready but not pause current baselines; the closed frontier's cadence is visibly slipping while open weights accelerate.

    Business Insider; Google

Open weights

Open-frontier and open-source drops.

2 releases this period.

Open-weights releases that change procurement options. Pull these into pilot when score parity meets license parity.

  • /Z.ai/Open frontier/MoE

    GLM-5.2

    MIT-licensed ~744B / ~40B-active sparse-attention MoE with 1M context; independent testing says it beats GPT-5.5 on long-horizon coding at ~1/6 the cost and Artificial Analysis cites it as the top open-weight model

    GLM-5.2 is the week's most consequential release: a truly permissive (MIT, no regional limits) frontier-adjacent model that is self-hostable and sovereignty-friendly for long-context agentic coding. Architects should pilot it for self-host/private coding workloads and use its ~$1.40/$4.40 per-MTok API as a pricing benchmark against closed flagships.

    Z.ai blog; VentureBeat; Hugging Face

  • /MiniMax/Open frontier/MoE

    MiniMax-M3

    428B / ~23B-active sparse-attention MoE with native multimodality and 1M context; weights matured with an arXiv report verifying ~9x/15x prefill/decode efficiency, under a non-OSI Community License

    MiniMax-M3 combines frontier-adjacent coding, genuine 1M context, and native multimodality in one downloadable checkpoint — but the MiniMax Community License gates commercial use, so 'open weights' here does not mean free to deploy. Teams should verify the efficiency claims via the arXiv report and clear the license before planning a commercial deployment.

    TechTimes; Hugging Face; arXiv:2606.13392

Architecture watch

Patterns to track.

3 patterns reshaping the canopy.

Architectural patterns that crossed multiple vendors this period. Each pattern lists exemplar releases and what it changes for deployment, cost, or capability.

  • Open weights close the cost gap

    GLM-5.2 (MIT)MiniMax-M3Grok 4.3 on Bedrock

    Frontier-adjacent capability is collapsing toward commodity inference pricing. GLM-5.2's MIT weights reportedly match or beat GPT-5.5 on long-horizon coding at ~1/6 the cost, and API list prices (GLM-5.2 ~$1.40/$4.40 per MTok; Grok 4.3 $1.25/$2.50 on Bedrock) keep falling. Procurement should pilot open self-host for routine and long-context coding and reserve closed flagships for the highest-risk reasoning.

    Z.ai; VentureBeat; Amazon Bedrock

  • The headline benchmark pivots to agents

    Artificial Analysis Intelligence Index v4.1LMArena Agent Arena

    Artificial Analysis rebased its Intelligence Index to v4.1, re-weighting around agentic tasks (GDPval-AA v2 at 20%, Terminal-Bench, banking agents) and dropping a saturated benchmark, while LMArena's Agent Arena scores behavioral signals (retries, steerability) rather than preference votes. Boards comparing models on 'the AA Index' must note v4.1 scores are not back-comparable to v4.0; re-baseline evaluation harnesses now.

    Artificial Analysis; arena.ai changelog

  • License divergence within 'open'

    GLM-5.2 (MIT)MiniMax-M3 (Community License)

    Two of the week's open releases sit on opposite ends of the permissiveness spectrum: GLM-5.2 under MIT with no regional limits versus MiniMax-M3 under a Community License that gates commercial use. For enterprise adoption, 'open weights' does not equal 'free to deploy commercially' — legal and procurement should read the actual license before standardizing on a model.

    Z.ai; MiniMax Hugging Face card

Benchmark moves

Where the leaderboard moved.

2 benchmarks shifted.

Benchmark deltas that change a procurement read. Scores reflect public leaderboards or vendor model cards as of publication.

  • Artificial Analysis Intelligence Index v4.1

    Methodology rebased around agentic tasks (Jun 15); leaders Fable 5 = 60 (top but suspended), Opus 4.8 = 56 (top available), GPT-5.5 = 55; scores not back-comparable to v4.0

    • Claude Fable 5 (suspended)60
    • Claude Opus 4.8 (top available)56
    • GPT-5.555

    Artificial Analysis

  • Open-weight leaderboard

    GLM-5.2 took the open-weight lead in-window; MiniMax-M3 and DeepSeek V4 Pro sit at ~44 on the rebased index

    • GLM-5.2top open (AA v4.1)
    • MiniMax-M344
    • DeepSeek V4 Pro44

    Artificial Analysis; VentureBeat

Tier scorecard

Who leads, who pushes.

6 tiers · leaders as of Jun 20, 2026.

A snapshot of leader-vs-challenger by tier. Useful for procurement shortlists when matching workload to model class. Pair with the benchmark moves above for the underlying scores.

  • Closed frontier

    Leader: Claude Opus 4.8

    Challenger: GPT-5.5

    Fable 5 leads the AA v4.1 index (60) but stayed suspended all week; Opus 4.8 (56) is the top available closed model.

  • Open frontier

    Leader: GLM-5.2

    Challenger: MiniMax-M3

    GLM-5.2's MIT release took the open-weight lead in-window; MiniMax-M3 contends but is gated by a non-OSI license.

  • Reasoning

    Leader: Claude Opus 4.8

    Challenger: GPT-5.5

    Closed reasoning leadership steady among available models while Gemini 3.5 Pro slipped to July.

  • Coding

    Leader: Claude Opus 4.8

    Challenger: GLM-5.2

    Open weights are closing fast on long-horizon coding; GLM-5.2 reportedly beats GPT-5.5 at ~1/6 the cost.

  • Multimodal

    Leader: Gemini 3.1 Pro

    Challenger: MiniMax-M3

    Gemini remains the general multimodal reference; MiniMax-M3 adds native-multimodal open weights.

  • Edge / small

    Leader: Mellum2

    Challenger: North Mini Code

    Efficient open coding/sub-agent models unchanged in-window; the week's open action was at the frontier-adjacent tier.

Vendor signals

Pricing, gating, deprecation.

4 non-release signals worth tracking.

The non-release moves that shift vendor risk — pricing, deprecations, gating decisions, license changes — with a one-line procurement read.

  • /Z.ai

    Released GLM-5.2 under MIT with API pricing ~$1.40/$4.40 per MTok (~1/6 of comparable frontier)

    A tier-1 permissive open release at commodity pricing pressures every closed flagship's price/value story. Procurement should use GLM-5.2 as a negotiating anchor and pilot it for self-host coding; investors should treat open-weight pricing as a structural deflationary force on inference.

    Z.ai; DataNorth

  • /Artificial Analysis

    Rebased the Intelligence Index to v4.1, re-weighting around agentic workloads; v4.1 scores are not back-comparable to v4.0

    The industry's headline benchmark now measures agentic capability, not static Q&A. Boards and architects must re-baseline model comparisons on v4.1 and avoid mixing old and new index numbers in procurement decisions.

    Artificial Analysis

  • /xAI

    Grok 4.3 went GA on Amazon Bedrock ($1.25/$2.50 per MTok, 1M context), making xAI the third independent frontier lab on Bedrock alongside Anthropic and OpenAI

    CIOs can now evaluate all three independent US frontier labs under one IAM and billing surface. The caveat is a non-standard endpoint and a context-window pricing cliff above 200K tokens; this is distribution, not a new capability tier.

    DigitalApplied; Memeburn

  • /Anthropic

    Claude Fable 5 and Mythos 5 remained government-suspended all week; the planned Jun 23 usage-credit subscription change is moot while access is off

    The top-tier closed model's availability is still a sovereign/regulatory variable, not an SLA. Buyers should keep Opus 4.8/Sonnet fallbacks wired and avoid single-sourcing the frontier for production-critical paths.

    Anthropic

Watchlist

On the radar next.

4 catalysts to watch, starting July.

Specific model-side catalysts in the next 7–30 days that would change the read materially. Watching these tells us whether the canopy is widening or thinning.

  • July

    Gemini 3.5 Pro GA

    Pro slipped to July. Its GA and first independent AA v4.1 pass will show whether Google can re-take a frontier lead now contested by both Opus 4.8 and a surging open-weight field.

  • Jun-Aug

    Claude Fable 5 / Mythos 5 restoration

    Restoration terms (geo-gating, KYC, or a permanent civilian/government capability split) will set the precedent for sovereign access risk and decide whether Fable 5 re-enters the available scorecard.

  • Jun-Aug

    GLM-5.2 adoption and independent SWE-Bench replication

    Downloads, integrations, and third-party benchmark replication will show whether MIT-licensed open weights become production substrate and force closed-flagship price cuts.

  • Jun-Jul

    GPT-5.6 / next OpenAI flagship

    Codenames and prediction markets pointed to a launch just after this window. A real system card would re-set the closed frontier and test whether OpenAI answers the open-weight cost pressure.

Edits this issue

  • Added GLM-5.2 and MiniMax-M3 to the LLM tree; reframed W25 around open weights taking the lead while the closed frontier stalled (Fable 5 suspended, Gemini slipped) and Artificial Analysis rebased to the agentic v4.1 index.

About The Model Pulse

A weekly read on the software side of the AI stack. Anchored to the LLM Evolutionary Tree, which the brief annotates each week. The cross-stack flywheel (capital, hardware, networking) is covered in The AI Stack Weekly.

Authorship and sources

Compiled from public model cards, vendor blogs, leaderboards, and official lab announcements. Written by Brian Letort. Independent analysis. Not investment guidance.

Operate. Publish. Teach.