The Model Pulse

Issue 01 · April 2026 Recap.

April 25, 2026/Monthly recap/~6 min read/Public sources onlyDownload brief

The Big Read

April rewrote the floor: open weights crossed the closed frontier, and the safety-gated frontier became a procurement criterion.

The thesis this issue defends

April 2026 was the most productive month in frontier model history. Eleven model rows landed in the LLM Evolutionary Tree across eight vendors, including the first open-weights MoE to score frontier-class on SWE-Bench Pro coding (Kimi K2.6 at 58.6 and GLM-5.1 at 58.4, both above GPT-5.4 at 57.7 and Claude Opus 4.6 at 57.3). DeepSeek V4 shipped in two configurations under MIT license with 1M-token context and 1.6T total parameters, moving on-prem coding from 'can we?' to 'which workload first?' Anthropic withheld its Mythos flagship on cyber-capability grounds after UK AISI confirmed autonomous offensive capability — an inflection that turns capability gating from a research-org concern into a procurement diligence requirement. The closed frontier responded: GPT-5.5 (Apr 23) and Claude Opus 4.7 (Apr 16) reset the closed-source ceiling, with adaptive-thinking and ultra-long-context as the new defaults. Read together, April was the month the canopy widened on three axes at once: open-vs-closed parity on coding, reasoning-as-default at every tier, and capability gating as a market signal.

Tree delta

What changed in the tree.

11 models added, 4 updated.

11 model rows added to the tree in April, spanning frontier closed releases, open-frontier MoEs, gated safety-class previews, and reasoning-tier successors. The decoder-only zone widened most; the multimodal and reasoning branches absorbed every major release.

Added (11)

claude-opus-4-6
qwen-3-6-plus
claude-mythos
glm-5-1
claude-opus-4-7
qwen-3-6-35b-a3b
kimi-k2-6
gpt-5-5
deepseek-v4-pro
deepseek-v4-flash
deepseek-r2

Updated (4)

claude-opus-4-5
kimi-k2-5
gpt-5-4
qwen-3-5

The Anthropic line gained two flagship releases plus one gated preview in a single month. DeepSeek shipped V4 in two scales (Pro and Flash) plus a same-month R2 reasoning successor — a cadence no other vendor matched in April.

Explore the LLM Evolutionary Tree

Frontier movements

Flagship-class releases.

4 releases this period.

Vendor-stated frontier capability. The releases that reset the closed-source ceiling.

2026-04-16/Anthropic/Frontier/Agentic
Claude Opus 4.7
Adaptive-thinking flagship with ultra-long context and full agentic stack (computer-use + MCP + tool use).
Reset the closed-frontier reasoning ceiling. The agentic surface area (computer-use, MCP, tool-use) makes Opus 4.7 the default reference for closed-frontier procurement when adaptive thinking and long-horizon agent runs are required. Pair with Sonnet 4.6 for the cost-tiered deployment.
anthropic.com release notes, model card
2026-04-23/OpenAI/Frontier/Reasoning
GPT-5.5
Frontier reasoning flagship with multimodal and ultra-long-context as defaults.
OpenAI's response to the V4 / Opus 4.7 surge. Sets the closed-frontier reasoning bar that the open-frontier challengers (V4 Pro, K2.6, GLM-5.1) measure against on SWE-Bench Pro and GPQA. The 5.x family now spans Instant, Thinking, Pro, mini, and nano tiers — useful for matching workload class to model class.
openai.com blog, GPT-5.5 system card
2026-04-07/Anthropic/Specialist/Agentic
Claude Mythos Preview
Frontier-class capability withheld from public access on cyber-capability grounds.
The first major model gated by its own developer for safety reasons after independent (UK AISI) red-teaming confirmed autonomous offensive capability. Vendor-risk frameworks now need a capability-gate criterion alongside availability SLAs — diligence on 'what could the next release do that the current one cannot?' is no longer optional.
anthropic.com, red.anthropic.com, aisi.gov.uk
2026-04-02/Anthropic/Frontier/Reasoning
Claude Opus 4.6
Predecessor to 4.7; held the SWE-Bench Pro closed-frontier line at 57.3 mid-month.
Useful as the April 02 baseline against which open-weights eventually moved past mid-month. Procurement teams running 4.5 should plan upgrade to 4.7 directly; 4.6 will likely move to maintenance status before mid-May.
anthropic.com release notes

Open weights

Open-frontier and open-source drops.

6 releases this period.

Open-weights releases that change procurement options. Pull these into pilot when score parity meets license parity.

2026-04-24/DeepSeek AI/Open frontier/MoE
DeepSeek-V4 Pro
1.6T MoE, 1M context, MIT license, frontier-class on coding.
The headline release of April. First open-weights model with frontier-class SWE-Bench Pro performance under a permissive license. Procurement teams blocked on closed-source data residency or licensing constraints now have an open frontier alternative for coding workloads. The on-prem floor moved up; the merchant-vs-self-host fork is now real.
huggingface.co/deepseek, deepseek.com
2026-04-24/DeepSeek AI/Open frontier/MoE
DeepSeek-V4 Flash
Cost-tiered V4 sibling for high-throughput inference on smaller fabric.
Open-weights answer to GPT-4o-mini and Claude Sonnet at the inference-cost tier. Useful for batch coding agents and high-volume serving where Pro is overkill. Same MIT license, same 1M context, lower active-parameter cost per token.
huggingface.co/deepseek model card
2026-04-20/Moonshot AI/Open frontier/MoE
Kimi K2.6
58.6 on SWE-Bench Pro — the first open-weights model to top the closed leaders on coding.
The benchmark moment. K2.6's coding score sits above GPT-5.4 (57.7) and Claude Opus 4.6 (57.3) under an open-weights license. For coding-heavy procurement reads, K2.6 is the new default open-frontier baseline; closed-source premium needs new justification beyond raw score.
moonshot.cn, Artificial Analysis SWE-Bench Pro
2026-04-08/Z.AI (Zhipu)/Open frontier/Agentic
GLM-5.1
58.4 on SWE-Bench Pro with multi-agent-native architecture and ultra-long context.
Second open-weights model to clear the closed-frontier coding bar this month. The multi-agent-native design point makes GLM-5.1 the open-source reference for agent-of-agents deployments — relevant when coordination overhead matters more than single-shot completion.
z.ai, GLM-5.1 model card
2026-04-16/Alibaba/Open frontier/MoE
Qwen3.6-35B-A3B
Open-weights MoE successor to Qwen3.5; agentic + multimodal at deployable scale.
The most deployable of April's open releases on commodity inference fabric. 35B total / 3B active makes Qwen3.6-A3B the open multimodal pick when V4 Pro is too large and K2.6 is single-modality. Pairs naturally with Qwen3.6-Plus on the closed-source side for hybrid deployments.
qwenlm.github.io, model card
2026-04/DeepSeek AI/Reasoning/Reasoning
DeepSeek-R2
Open-weights reasoning successor to R1, frontier-class long-context.
Same vendor shipping V4 (general) and R2 (reasoning) inside one calendar month is the cadence story. R2 closes the open-vs-closed reasoning gap that R1 opened in 2025 — open-weights reasoning is now a separate procurement category, not a niche.
deepseek.com, huggingface.co/deepseek

Architecture watch

Patterns to track.

4 patterns reshaping the canopy.

Architectural patterns that crossed multiple vendors this period. Each pattern lists exemplar releases and what it changes for deployment, cost, or capability.

Reasoning becomes the default mode, not a separate model.
Claude Opus 4.7 (adaptive thinking)GPT-5.5DeepSeek-R2GLM-5.1
April's flagships ship reasoning behavior as a default rather than a separate o-series-style fork. Adaptive thinking, where the model decides how much to deliberate, is now the closed-frontier expectation; explicit reasoning toggles look dated by month-end. Procurement consequence: the reasoning-vs-non-reasoning split is collapsing — assume reasoning is on, budget tokens accordingly.
Vendor model cards (Anthropic, OpenAI, DeepSeek, Z.AI)
Million-token context as baseline.
DeepSeek-V4 Pro (1M)Kimi K2.6 (1M+)Claude Opus 4.7 (ultra-long)GPT-5.5 (ultra-long)
Every flagship released in April clears the 1M-token threshold. Long-context is no longer a differentiator at the frontier; it is the floor. Workloads that previously required RAG architectures may now be expressible as direct in-context loads — re-evaluate the retrieval layer when re-baselining for V4 / K2.6 / Opus 4.7.
Model cards; HuggingFace technical reports
Agentic surface area as a first-class capability.
Claude Opus 4.7 (computer-use, MCP, tool-use)GLM-5.1 (multi-agent-native)Qwen3.6-Plus (agentic)Kimi K2.6 (agentic-ready)
Agentic isn't a benchmark category anymore; it is a model property declared on the card. Computer-use, MCP support, and multi-agent coordination ship with the flagship rather than as a downstream wrapper. Architectural read: the boundary between 'model' and 'agent runtime' is dissolving — procurement should evaluate the full agent surface, not just the LM weights.
Vendor model cards; Model Context Protocol announcements
Capability gating as a market signal.
Claude Mythos Preview (withheld)
Anthropic withholding Mythos after UK AISI red-team findings is the first time a major lab gated a frontier-class release on its own initiative for cyber-capability reasons. The signal: capability evaluation now happens before public release, and 'gated' is a status that procurement teams need to track in vendor risk frameworks alongside 'available' and 'deprecated.'
anthropic.com, red.anthropic.com, aisi.gov.uk

Benchmark moves

Where the leaderboard moved.

3 benchmarks shifted.

Benchmark deltas that change a procurement read. Scores reflect public leaderboards or vendor model cards as of publication.

SWE-Bench Pro (coding)
Open weights overtook closed for the first time. Two open-weights models clear both closed leaders.
- Kimi K2.658.6
- GLM-5.158.4
- GPT-5.457.7
- Claude Opus 4.657.3
Artificial Analysis SWE-Bench Pro leaderboard, vendor model cards
Long-context retrieval (1M+ tokens, vendor-reported)
Million-token context lands at the frontier with sub-2% retrieval-error rates across multiple vendors.
- DeepSeek-V4 Pro1M ctx, 1.4% err
- Kimi K2.61M ctx, 1.7% err
- Claude Opus 4.7ultra-long, vendor-reported
- GPT-5.5ultra-long, vendor-reported
Vendor technical reports; HuggingFace evaluations
Agentic tool-use (vendor + third-party)
Closed flagships still lead agentic tool-use; open frontier is one tier behind but closing.
- Claude Opus 4.7Closed leader
- GPT-5.5Closed challenger
- GLM-5.1Open leader (multi-agent-native)
- DeepSeek-V4 ProOpen challenger
Vendor agent benchmarks; community evaluations

Tier scorecard

Who leads, who pushes.

6 tiers · leaders as of Apr 25, 2026.

A snapshot of leader-vs-challenger by tier. Useful for procurement shortlists when matching workload to model class. Pair with the benchmark moves above for the underlying scores.

Tier

Leader

Challenger

Read

Closed frontier
Claude Opus 4.7
GPT-5.5
Adaptive thinking is the closed-source differentiator this month.
Open frontier
DeepSeek-V4 Pro
Kimi K2.6
MIT license + 1M context + frontier coding score is the new open baseline.
Reasoning
DeepSeek-R2
GPT-5.5 (thinking mode)
Open-weights reasoning is now its own procurement category.
Coding
Kimi K2.6
GLM-5.1
Open-weights tops both closed leaders on SWE-Bench Pro this month.
Multimodal
Gemini 3.1 Pro
Qwen3.6-Plus
Closed leader from Q1 still holds; Qwen3.6 narrows the gap on the open side.
Edge / small
Phi-4-mini
Gemma 4
Q1 leader carries through April; no major edge-tier release this month.

Closed frontier
Leader: Claude Opus 4.7
Challenger: GPT-5.5
Adaptive thinking is the closed-source differentiator this month.
Open frontier
Leader: DeepSeek-V4 Pro
Challenger: Kimi K2.6
MIT license + 1M context + frontier coding score is the new open baseline.
Reasoning
Leader: DeepSeek-R2
Challenger: GPT-5.5 (thinking mode)
Open-weights reasoning is now its own procurement category.
Coding
Leader: Kimi K2.6
Challenger: GLM-5.1
Open-weights tops both closed leaders on SWE-Bench Pro this month.
Multimodal
Leader: Gemini 3.1 Pro
Challenger: Qwen3.6-Plus
Closed leader from Q1 still holds; Qwen3.6 narrows the gap on the open side.
Edge / small
Leader: Phi-4-mini
Challenger: Gemma 4
Q1 leader carries through April; no major edge-tier release this month.

Vendor signals

Pricing, gating, deprecation.

4 non-release signals worth tracking.

The non-release moves that shift vendor risk — pricing, deprecations, gating decisions, license changes — with a one-line procurement read.

2026-04-21/Anthropic
Mythos withheld from public access pending capability-gate review.
Sets a precedent for vendor-initiated gating on cyber-capability grounds. Procurement frameworks now need a 'gated' status alongside 'available' / 'deprecated.' Adds a new diligence question: what is the next release the vendor evaluated and chose not to ship?
anthropic.com, red.anthropic.com, aisi.gov.uk
2026-04-24/DeepSeek AI
Two-config V4 launch (Pro + Flash) plus same-month R2 release under MIT.
Cadence and licensing both shifted. A monthly cycle with permissive licensing across coding, reasoning, and inference-tier variants is the new open-frontier benchmark for vendor velocity. Closed-source vendors with quarterly cadence have a velocity gap, not just a capability gap.
deepseek.com release notes; huggingface.co model cards
2026-04/OpenAI
GPT-4 family deprecation timeline tightened; 5.x consolidated to five tiers (Instant, Thinking, Pro, mini, nano).
Pricing migration for GPT-4-era workloads accelerates through Q2. Customers should plan migration to 5.x within 60 days; the tier consolidation simplifies model selection but compresses the price-vs-capability slope on the low end.
openai.com platform changelog
2026-04/Multiple (open-weights serving)
Inference price cuts of 25-40% across major open-weights serving providers following V4 / K2.6 launches.
Open-weights inference economics improved materially this month. Re-baseline cost per token for batch agentic workloads against the new floor; the merchant-vs-self-host break-even moved.
Provider pricing pages; community comparison threads

Watchlist

On the radar next.

5 catalysts to watch, starting May 1-7.

Specific model-side catalysts in the next 7–30 days that would change the read materially. Watching these tells us whether the canopy is widening or thinning.

May 1-7
Closed-frontier response to V4 / K2.6 coding scores.
Anthropic and OpenAI both have credible reasons to push a coding-tuned point release to recover the SWE-Bench Pro lead. Watch for Opus 4.7-Codex or GPT-5.5-Codex within two weeks.
May 1-14
Hyperscaler Q1 prints (MSFT / GOOG / META / AMZN).
AI capex commentary will set the inference-pricing trajectory for the rest of Q2. Tracked in The AI Stack Weekly's capital-flow lens; relevant here as the upstream signal for serving costs.
May 5-20
Mythos status update from Anthropic + AISI.
If gating becomes permanent or extended, 'gated' becomes a durable status class. If gating lifts under conditions, a capability-gate playbook emerges — either way, the procurement read changes.
May 10-25
Open-source reasoning successor cluster.
DeepSeek-R2 set the bar; Qwen, Z.AI, and Moonshot all have R2-class candidates plausibly within 30 days. Watch for the second open-weights reasoning model to clear o-series parity.
May 15-31
First gated-capability standardization proposals.
Mythos creates pressure for a cross-vendor gating taxonomy (capability levels, evaluation protocols, release criteria). Watch for AISI / METR / OpenAI / Anthropic joint statements.

April rewrote the floor: open weights crossed the closed frontier, and the safety-gated frontier became a procurement criterion.

What changed in the tree.

Flagship-class releases.

Claude Opus 4.7

GPT-5.5

Claude Mythos Preview

Claude Opus 4.6

Open-frontier and open-source drops.

DeepSeek-V4 Pro

DeepSeek-V4 Flash

Kimi K2.6

GLM-5.1

Qwen3.6-35B-A3B

DeepSeek-R2

Patterns to track.

Reasoning becomes the default mode, not a separate model.

Million-token context as baseline.

Agentic surface area as a first-class capability.

Capability gating as a market signal.

Where the leaderboard moved.

Who leads, who pushes.

Pricing, gating, deprecation.

On the radar next.