Software.
JetBrains released Mellum2, an Apache-2.0 12B sparse MoE with 2.5B active parameters per token, positioned for low-latency routing, RAG, summarization, validation, sub-agents, and private text/code deployments
Hugging Face JetBrains Mellum2 launch
NVIDIA released Cosmos 3 on Hugging Face as an open omni-model for physical-AI reasoning and action, with Nano 16B and Super 64B variants plus Diffusers integration and synthetic-data workflows
Hugging Face NVIDIA Cosmos 3 launch
H Company released Holo3.1 for local computer-use agents, adding 0.8B / 4B / 9B / 35B-A3B sizes plus FP8, Q4 GGUF and NVFP4 checkpoints for private deployment
Hugging Face Holo3.1 launch
What this means
The model layer's action moved below the flagship frontier: efficient MoE routers, physical-AI omni-models, and quantized computer-use agents are the tools that make agent systems cheaper, local, and specialized. Architects should route cheap sub-agent work to open/local models and reserve Opus/GPT/Gemini-class spend for high-risk reasoning, because the software flywheel is now about orchestration economics as much as raw intelligence.