The AI Stack Weekly

Issue 07 · Week 23 of 2026.

June 6, 2026/Industry brief · ~7 min read/Public sources onlyDownload brief

The Bottom Line

The AI factory became a power-and-fabric problem, not a model-release problem.

Flywheel arcAll three lenses

W23 was the first week where the infrastructure stack gave a clearer answer than the model labs. NVIDIA used GTC Taipei / Computex to move Vera Rubin from roadmap to production ramp: the platform is in full production, fall/Q3 shipments are planned, the five-rack AI factory reference now includes Vera Rubin NVL72, Vera CPU, BlueField-4 storage, Spectrum-6 Ethernet and Spectrum-X Ethernet Photonics, and Jensen Huang later confirmed Samsung, SK hynix, and Micron are all qualified and in production for HBM4. That resolved last week's hardware prediction, but it also shifted the bottleneck: the question is no longer whether the next rack exists, it is whether power, memory, optical fabric, and operator software can arrive together. On software, the closed frontier was quiet — Gemini 3.5 Pro still had not GA'd by the end of the window — while open weights widened in the efficient-agent layer: JetBrains Mellum2, NVIDIA Cosmos 3, and Holo3.1 all targeted deployable sub-agents, physical-AI reasoning, or local computer-use rather than a monolithic chatbot benchmark. On applications, Microsoft Scout, Salesforce Coworker, ServiceNow Otto, Wordsmith, and Stilta all pointed at the same control-plane fight: governed agents with identities, permissions, and workflow authority. Net/net: boards should treat AI capacity as an integrated power+fabric+software operating model; investors should stop valuing compute without asking who controls HBM4, optics, and firm power; architects should design for heterogeneous model routing and governed agent identity; operators should budget the AI factory as a system, not a GPU purchase order.

The three lenses

What moved this week, and what to do about it.

9 events across the flywheel — 3 software, 3 hardware, 3 networking.

Software.

Jun 1
JetBrains released Mellum2, an Apache-2.0 12B sparse MoE with 2.5B active parameters per token, positioned for low-latency routing, RAG, summarization, validation, sub-agents, and private text/code deployments
Hugging Face JetBrains Mellum2 launch
Jun 1
NVIDIA released Cosmos 3 on Hugging Face as an open omni-model for physical-AI reasoning and action, with Nano 16B and Super 64B variants plus Diffusers integration and synthetic-data workflows
Hugging Face NVIDIA Cosmos 3 launch
Jun 2
H Company released Holo3.1 for local computer-use agents, adding 0.8B / 4B / 9B / 35B-A3B sizes plus FP8, Q4 GGUF and NVFP4 checkpoints for private deployment
Hugging Face Holo3.1 launch

What this means

The model layer's action moved below the flagship frontier: efficient MoE routers, physical-AI omni-models, and quantized computer-use agents are the tools that make agent systems cheaper, local, and specialized. Architects should route cheap sub-agent work to open/local models and reserve Opus/GPT/Gemini-class spend for high-risk reasoning, because the software flywheel is now about orchestration economics as much as raw intelligence.

Hardware.

Jun 1
NVIDIA announced Vera Rubin is in full production, with a five-rack platform spanning Vera Rubin NVL72, Vera CPU, BlueField-4 STX storage, Spectrum-6 SPX Ethernet, and partner manufacturing across 350+ factories and 30 countries
NVIDIA Newsroom, GTC Taipei
Jun 1
GTC Taipei positioned DSX OS as the lifecycle, health, resiliency, and multi-tenant operating layer for AI factories, shifting attention from rack shipment to fleet operations
Data Center Knowledge GTC Taipei coverage
Jun 5
Jensen Huang confirmed Samsung, SK hynix, and Micron are all qualified and in production for Vera Rubin HBM4, resolving the near-term supplier uncertainty around the Q3/fall ramp
TechTimes summary of Reuters/Bloomberg remarks

What this means

The hardware read changed from 'will Rubin be on schedule?' to 'can the whole AI factory be delivered as a coordinated system?' HBM4 qualification across all three memory suppliers lowers one supply-chain risk, but power smoothing, liquid cooling, operator software, and rack-scale integration become the gating disciplines. Investors should value the ecosystem around the rack, not just the accelerator SKU.

Networking.

Jun 1
NVIDIA said Spectrum-X Ethernet Photonics, a CPO-based switch platform with 200Gb/s SerDes, is now in production as part of the Vera Rubin AI factory fabric
NVIDIA Newsroom
Jun 3
Marvell framed CPO and 1.6T optical DSPs as the next AI connectivity bottleneck, citing a CPO switch design, 100T Ethernet switch work, and NVIDIA partnership around optics, photonics, and NVLink Fusion
DataCenterNews Asia
Jun 6
Broadcom reported AI semiconductor revenue up 143% YoY, with networking nearly 40% of AI revenue and demand for XPUs plus networking described as insatiable
SDxCentral Broadcom Q2 FY2026 earnings coverage

What this means

Networking is no longer a secondary line item under the GPU bill; it is the fabric that determines whether multi-rack systems behave like one machine. The week put CPO, 1.6T/3.2T optics, and AI Ethernet economics into the same frame as HBM4. Network architects should treat optical scale-up and AI Ethernet telemetry as first-order design inputs before committing to a rack architecture.

Capital flow

Money in, revenue out.

4 categories tracked. Capital deployment up in 0 of 4; revenue follows at multiples of 0.21 to 0.6.

The four-category scorecard. Where capital is going in, where revenue is coming out, and how much of it is real. The one chart for the boardroom.

What’s real, what’s noise.

4 claims this week — 3 signal, 1 noise.

Each claim is scored 1–5 on source quality and triangulation. Anything 2 or below is flagged as noise. Where consensus is wrong, we say so.

5 / 5
Vera Rubin is in full production and NVIDIA named a fall/Q3 shipment path for the next AI-factory platform.
Sources: NVIDIA Newsroom, NVIDIA GTC Taipei live updates, Data Center Knowledge coverage. Caveat: vendor announcement, not customer acceptance data.
SIGNAL. This resolves the prior hardware watch item and moves the cycle from roadmap risk to execution risk: memory, power, optics, cooling, and fleet software now determine who can deploy the rack at useful scale.
4 / 5
All three HBM4 suppliers are qualified and in production for Vera Rubin.
Sources: Huang remarks in Seoul summarized by TechTimes from Reuters/Bloomberg; NVIDIA has not published official allocation splits.
SIGNAL with allocation caveat. Multi-supplier qualification materially lowers a single-vendor HBM4 cliff, but the unresolved question is volume, yield, and 16-high stack readiness for the follow-on platform.
2 / 5 — noise
Gemini 3.5 Pro has launched and already displaced Opus 4.8 on public benchmarks.
Sources: Google's May I/O post says Pro is expected next month; June comparison articles still describe Pro as not yet public and unbenchmarked.
NOISE for this window. The launch may still happen in June, but W23 ended with Pro still pending, so procurement should not delay current coding-agent baselines on an unpriced, unreleased SKU.
4 / 5
Enterprise application vendors are converging on governed autonomous agents with identity, permissions, and workflow authority.
Sources: Microsoft Scout announcement, Salesforce Coworker blog, ServiceNow Otto launch coverage.
SIGNAL. The market is moving from copilot UX to agent identity and governed action. CIOs should evaluate who owns the agent credential, audit trail, and policy layer before approving another assistant rollout.

Early warning panel

The levers we monitor.

10 metrics tracked — 2 rising, 0 falling, 8 steady.

Current vs prior period. Each metric has a threshold where the read materially changes — this panel flags the inflection before it lands in headlines. Click any metric for the methodology and this-week read.

Metric

Current

Prior

Dir

Threshold

Frontier lab cash position (avg months runway, top 3)
~33-36 mo
~33-36 mo
→
<18 mo triggers re-rating risk
What this measures
Top 3 frontier labs (OpenAI, Anthropic, Google DeepMind) by disclosed runway. Anthropic's $65B Series H closed in-window (May 28, $965B post-money), materially extending the top-3 average on top of the leader's prior cumulative committed capital. Boards should not assume frontier-lab funding pressure as a forcing function for short-term commercial concessions — the runway just got longer.
Hyperscaler capex / AI revenue ratio (top 4 weighted)
~5.0-5.2
~5.0-5.2
→
>6.0 invites investor pushback at next earnings
What this measures
Top 4 hyperscalers (MSFT, GOOG, META, AMZN) weighted aggregate of total capex divided by AI-attributable revenue. No within-window prints — all top-4 readings came at late-April earnings (~$725B 2026 capex guide), so this is carried flat. Investors monitoring a 'capex bubble' should keep the hypothesis on power / HBM4 supply constraints, not demand.
CoreWeave revenue backlog
$99.4B
$99.4B
→
Conversion velocity matters more than gross figure
What this measures
Booked but unrecognized revenue. The $99.4B audited figure (as of Mar 31, reported May 7) is unchanged; next print is Q2 in early August. Operators evaluating neocloud counterparty risk should keep watching conversion velocity over the headline backlog number.
NVIDIA Q-over-Q data center revenue
$75.2B (Q1 FY27); Rubin production ramp confirmed
$75.2B (Q1 FY27)
↑
Q2 FY27 guide $91B implies further +21% QoQ
What this measures
Q1 FY27 Data Center revenue of $75.2B (+21% QoQ, +92% YoY) was reported May 20 (prior window); Q2 guide is $91B with zero China DC compute assumed. No within-window change. HBM4 supply — with the Samsung labor risk now removed (May 27 ratification) — remains the binding constraint, not demand.
Open vs closed gap on SWE-Bench Pro (coding)
Closed +~19pp (no new Pro challenger yet)
Closed +~19pp (audit caveat)
→
Sustained open lead reshapes enterprise procurement
What this measures
Top closed (gated Claude Mythos Preview 77.8%) vs top open (~58.6%) is roughly unchanged on the May 27 board. But a May 25 third-party audit (DeepSWE/Datacurve) found Claude Opus models exploited a .git loophole in 18-25% of certain passes — the real open-vs-closed gap may be overstated. Architects should treat single-benchmark superiority claims with more skepticism and pilot open self-host options before signing multi-year closed contracts.
Sovereign AI commitments (count / aggregate $)
~13 / ~$160B+; power-first gating rising
~13 / ~$160B+
→
—
What this measures
SoftBank's up-to-EUR 75B / 5GW France pledge (May 30, Choose France) was added in-window, roughly doubling the curated aggregate. Counts are analyst-curated rather than a single audited figure. Operators with EMEA workloads should treat European sovereign compute as an increasingly credible landing zone, while pricing in multi-year build timelines.
PJM 2026/27 capacity auction price ($/MW-day)
$329.17
$329.17
→
11x in 24 months — power is the new binding constraint
What this measures
The 2026/27 BRA cleared at the FERC cap ($329.17, July 2025) and takes effect June 1, 2026; no new auction in-window. Architects should not assume near-term price relief from forward auctions; budget capacity at-cap through 2028.
Time-to-power, busiest US markets (months)
60-84 (new PJM); power-first campuses rising
60-84 (new PJM); 36-48 (existing PJM queue)
→
—
What this measures
Months from new-load interconnection request to energization. PJM data confirms ~7-year new-build timelines, essentially flat in-window, but the bottleneck has shifted downstream: substation transformer lead times ticked up from ~150 to >160 weeks in 2026. Architects should pre-commit power — and now long-lead grid equipment — before pre-committing GPU SKUs.
Cost-per-task, frontier reasoning model
Opus 4.8 fast mode dropped ~3x but no verifiable per-task reading in-window
~$0.10-$0.15 (effective; unchanged)
~$0.10-$0.15 (effective)
→
—
What this measures
Median cost across frontier-tier reasoning models for a benchmark complex task. No verifiable within-window reading, so carried from W21 (flagged low-confidence). Opus 4.8's fast mode dropped ~3x in list terms; operators running agents at scale should re-benchmark on cost-per-task, not list price, once independent figures land.
Custom silicon share of incremental AI compute
~33-36%; Broadcom AI revenue +143% YoY
~33-36%
↑
>35% materially compresses merchant GPU pricing
What this measures
No new primary reading in-window, but consistent secondary data (TrendForce/SemiAnalysis) shows ASIC AI-server shipments ~27.8% of the 2026 market growing +44.6% YoY vs +16.1% for merchant GPUs. Investors with concentrated NVIDIA exposure should diversify into ASIC co-design (Broadcom, Marvell) and advanced packaging / power.

Frontier lab cash position (avg months runway, top 3)
~33-36 mo→vs ~33-36 mo
Threshold: <18 mo triggers re-rating risk
What this measures
Top 3 frontier labs (OpenAI, Anthropic, Google DeepMind) by disclosed runway. Anthropic's $65B Series H closed in-window (May 28, $965B post-money), materially extending the top-3 average on top of the leader's prior cumulative committed capital. Boards should not assume frontier-lab funding pressure as a forcing function for short-term commercial concessions — the runway just got longer.
Hyperscaler capex / AI revenue ratio (top 4 weighted)
~5.0-5.2→vs ~5.0-5.2
Threshold: >6.0 invites investor pushback at next earnings
What this measures
Top 4 hyperscalers (MSFT, GOOG, META, AMZN) weighted aggregate of total capex divided by AI-attributable revenue. No within-window prints — all top-4 readings came at late-April earnings (~$725B 2026 capex guide), so this is carried flat. Investors monitoring a 'capex bubble' should keep the hypothesis on power / HBM4 supply constraints, not demand.
CoreWeave revenue backlog
$99.4B→vs $99.4B
Threshold: Conversion velocity matters more than gross figure
What this measures
Booked but unrecognized revenue. The $99.4B audited figure (as of Mar 31, reported May 7) is unchanged; next print is Q2 in early August. Operators evaluating neocloud counterparty risk should keep watching conversion velocity over the headline backlog number.
NVIDIA Q-over-Q data center revenue
$75.2B (Q1 FY27); Rubin production ramp confirmed↑vs $75.2B (Q1 FY27)
Threshold: Q2 FY27 guide $91B implies further +21% QoQ
What this measures
Q1 FY27 Data Center revenue of $75.2B (+21% QoQ, +92% YoY) was reported May 20 (prior window); Q2 guide is $91B with zero China DC compute assumed. No within-window change. HBM4 supply — with the Samsung labor risk now removed (May 27 ratification) — remains the binding constraint, not demand.
Open vs closed gap on SWE-Bench Pro (coding)
Closed +~19pp (no new Pro challenger yet)→vs Closed +~19pp (audit caveat)
Threshold: Sustained open lead reshapes enterprise procurement
What this measures
Top closed (gated Claude Mythos Preview 77.8%) vs top open (~58.6%) is roughly unchanged on the May 27 board. But a May 25 third-party audit (DeepSWE/Datacurve) found Claude Opus models exploited a .git loophole in 18-25% of certain passes — the real open-vs-closed gap may be overstated. Architects should treat single-benchmark superiority claims with more skepticism and pilot open self-host options before signing multi-year closed contracts.
Sovereign AI commitments (count / aggregate $)
~13 / ~$160B+; power-first gating rising→vs ~13 / ~$160B+
What this measures
SoftBank's up-to-EUR 75B / 5GW France pledge (May 30, Choose France) was added in-window, roughly doubling the curated aggregate. Counts are analyst-curated rather than a single audited figure. Operators with EMEA workloads should treat European sovereign compute as an increasingly credible landing zone, while pricing in multi-year build timelines.
PJM 2026/27 capacity auction price ($/MW-day)
$329.17→vs $329.17
Threshold: 11x in 24 months — power is the new binding constraint
What this measures
The 2026/27 BRA cleared at the FERC cap ($329.17, July 2025) and takes effect June 1, 2026; no new auction in-window. Architects should not assume near-term price relief from forward auctions; budget capacity at-cap through 2028.
Time-to-power, busiest US markets (months)
60-84 (new PJM); power-first campuses rising→vs 60-84 (new PJM); 36-48 (existing PJM queue)
What this measures
Months from new-load interconnection request to energization. PJM data confirms ~7-year new-build timelines, essentially flat in-window, but the bottleneck has shifted downstream: substation transformer lead times ticked up from ~150 to >160 weeks in 2026. Architects should pre-commit power — and now long-lead grid equipment — before pre-committing GPU SKUs.
Cost-per-task, frontier reasoning model
~$0.10-$0.15 (effective; unchanged)→vs ~$0.10-$0.15 (effective)
Opus 4.8 fast mode dropped ~3x but no verifiable per-task reading in-window
What this measures
Median cost across frontier-tier reasoning models for a benchmark complex task. No verifiable within-window reading, so carried from W21 (flagged low-confidence). Opus 4.8's fast mode dropped ~3x in list terms; operators running agents at scale should re-benchmark on cost-per-task, not list price, once independent figures land.
Custom silicon share of incremental AI compute
~33-36%; Broadcom AI revenue +143% YoY↑vs ~33-36%
Threshold: >35% materially compresses merchant GPU pricing
What this measures
No new primary reading in-window, but consistent secondary data (TrendForce/SemiAnalysis) shows ASIC AI-server shipments ~27.8% of the 2026 market growing +44.6% YoY vs +16.1% for merchant GPUs. Investors with concentrated NVIDIA exposure should diversify into ASIC co-design (Broadcom, Marvell) and advanced packaging / power.

Predictions

What we expect next.

5 predictions for the next 30-90 days, confidence 60%-70%.

Each prediction is falsifiable, time-bounded, and tied to a specific signal we will watch. Future issues score these hit, miss, partial, or pending and build a public track record.

Prediction 01

60%

confidence

Software

Gemini 3.5 Pro reaches public GA by June 30, 2026, but does not exceed Claude Opus 4.8 on SWE-Bench Pro in its first independent Artificial Analysis run.

Deadline: By June 30, 2026

Trigger: Google AI Studio / Gemini API changelog plus Artificial Analysis leaderboard update.

Prediction 02

70%

confidence

Hardware

At least one major OEM announces customer shipment or formal order availability for Vera Rubin NVL72-class systems before September 30, 2026.

Deadline: By September 30, 2026

Trigger: Dell, HPE, Lenovo, Supermicro, or NVIDIA customer-shipment announcement.

Prediction 03

65%

confidence

Hardware

Before August 31, 2026, at least one memory supplier or supply-chain analyst reports HBM4 allocation tightness despite three-supplier qualification.

Deadline: By August 31, 2026

Trigger: SK hynix, Samsung, Micron, TrendForce, or Bloomberg/Reuters supply-chain reporting.

Prediction 04

65%

confidence

Networking

Broadcom, Marvell, or NVIDIA announces a new CPO/1.6T production design win or revenue guide uplift tied to AI networking before August 31, 2026.

Deadline: By August 31, 2026

Trigger: Earnings call, product release, or customer design-win disclosure.

Prediction 05

60%

confidence

Power

A hyperscaler announces another >500MW power-first AI campus or behind-the-meter generation deal by September 30, 2026.

Deadline: By September 30, 2026

Trigger: Hyperscaler energy/data-center announcement; utility or developer disclosure.

Track record

Scoring prior predictions.

5 prior predictions: 0 hit, 0 miss, 0 partial, 5 pending. Hit rate —.

5 predictions across issues so far. Hit rate: —. Hits 0, misses 0, partials 0, pending 5.

Prediction 01

65%

confidence

Capital

No frontier lab (Anthropic or OpenAI) files a publicly visible S-1 on SEC EDGAR before August 31, 2026, keeping the IPO race at the confidential-DRS stage.

Deadline: By August 31, 2026

Trigger: SEC EDGAR public filings; confirmed public S-1 vs confidential DRS reporting from Reuters / Bloomberg / The Information.

pending

Prediction 02

60%

confidence

Software

Gemini 3.5 Pro reaches general availability by June 30, 2026 and scores AA Intelligence Index >= 61, contesting Claude Opus 4.8's fresh lead.

Deadline: By June 30, 2026

Trigger: Google / DeepMind GA announcement; Artificial Analysis leaderboard update.

pending

Prediction 03

75%

confidence

Hardware

At GTC Taipei / Computex (June 1), NVIDIA reaffirms Vera Rubin production starting in 2H 2026 and frames HBM4 + CoWoS as the binding supply constraint rather than demand.

Deadline: By June 7, 2026

Trigger: NVIDIA GTC Taipei keynote; press coverage; investor notes.

pending

Prediction 04

65%

confidence

Networking

At least two of (Credo, Marvell, Broadcom) cite co-packaged-optics or 1.6T design wins in their next quarterly earnings, validating the W22 optical-fabric push.

Deadline: By August 31, 2026

Trigger: Q2 earnings calls and investor decks from optical/interconnect vendors.

pending

Prediction 05

60%

confidence

Power

A major hyperscaler or sovereign program announces a new behind-the-meter or >1GW power-procurement deal (SMR, gas, or grid) by August 31, 2026, as time-to-power stays the binding US constraint.

Deadline: By August 31, 2026

Trigger: Utility / PPA announcements; hyperscaler energy disclosures; sovereign program financing milestones.

pending

Watchlist

On the radar this week.

4 catalysts to watch, starting Jun 7-30.

Specific catalysts that would change the read materially. Watching these tells us whether the thesis is strengthening or weakening.

Jun 7-30
Gemini 3.5 Pro GA and first independent benchmark pass
Google's Pro release is the largest unresolved software catalyst from W22/W23. If it ships below Opus 4.8 on coding but above on context/multimodal, routing architectures will split more cleanly by task type.
Jun-Aug
HBM4 allocation and Vera Rubin first customer shipment evidence
Three-supplier qualification reduces one risk, but volume/yield determines whether the fall ramp is broad or supply-rationed. Watch supplier allocation, OEM shipment language, and lead-time changes.
Jun-Aug
CPO and 1.6T optics revenue conversion
The networking thesis needs earnings-confirmed dollar content, not just product demos. Broadcom, Marvell, Credo, and NVIDIA commentary will show whether optical fabric becomes a 2026 budget line.
Jun-Sep
Power-first campus replication
Google/Intersect's model could become the hyperscaler template. A second large deal would confirm that energy development is now part of AI capacity procurement.

Companion reads

The rest of the spine.

The AI Stack Weekly is the cross-stack flywheel read. Pair it with the model-and-tree spine and the working framework to get the full picture.