Flux
DiffusionGemma

DiffusionGemma

DiffusionGemma Last May Google briefly released an experimental Gemini Diffusion model. I tried the preview at the time and recorded it running at 857 tokens/second. It was an exciting model, but Google made no further announcements about it. That research has returned in the best possible way: as a new open weight (Apache 2 licensed) Gemma model, google/diffusiongemma-26B-A4B-it. NVIDIA are currently hosting the model for free on their NIM cloud API. I used that API to generate this pelican,…

Simon Willison's Weblog
Quoting Jeremy Howard

Quoting Jeremy Howard

Easy solution to slow down recursive AI self improvement: The lab with the top-ranked model must agree THEY must not use it for working on frontier AI But everyone else should have access to it. By definition, this means the frontier doesn't advance. It also has the critical benefit of avoiding a dangerous power imbalance. Anthropic has chosen the opposite of the safe path: they are allowing themselves, the current top lab, to use their top model for frontier AI research. They've said they'll…

Simon Willison's Weblog
The PM’s Playbook for Shipping AI Features That Actually Work in Production

The PM’s Playbook for Shipping AI Features That Actually Work in Production

The demo to production Death Valley If you’ve worked on an AI feature, you know the feeling. You start building something that you are excited about, set launch timelines. The model spits out a perfect response, the prototype works magically, and everybody in the room is mentally calculating how big this product will be when […]

O'Reilly Radar — AI/ML
If Claude Fable stops helping you, you'll never know

If Claude Fable stops helping you, you'll never know

If Claude Fable stops helping you, you'll never know Jonathon Ready highlights one of the more eyebrow-raising details from the 319 page system card for Fable 5 and Mythos 5. Here's a longer excerpt, highlights mine: In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training…

Simon Willison's Weblog
Initial impressions of Claude Fable 5

Initial impressions of Claude Fable 5

I didn't have early access to today's Claude Fable 5 release, but I've spent the past ~5.5 hours putting it through its paces. My initial impressions are that this is something of a beast. It's slow, expensive and has been quite happily churning through everything I've thrown at it so far. As is frequently the case with current frontier models the challenge is finding tasks that it can't do. First, let's review the key characteristics. Anthropic claim that Claude Fable 5 offers the same…

Simon Willison's Weblog
Setting a custom price for a model in AgentsView

Setting a custom price for a model in AgentsView

TIL: Setting a custom price for a model in AgentsView I've been really enjoying AgentsView by Wes McKinney as a tool for exploring my token usage across different coding agents running on my laptop. Claude Fable 5 came out today and wasn't yet included in the pricing database AgentsView uses. I used Fable to reverse-engineer AgentsView and figured out this recipe for setting custom prices. Here's my Claude Fable 5 usage for today so far, plotted by AgentsView as a treemap across my different…

Simon Willison's Weblog
Quoting Andrej Karpathy

Quoting Andrej Karpathy

I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). — Andrej…

Simon Willison's Weblog
Siri AI at WWDC 2026

Siri AI at WWDC 2026

Given how badly burned anyone who took Apple's 2024 WWDC Apple Intelligence announcements at face value was, I'm holding to a strict "I'll believe it when I see it" policy for everything they announced today. The new Siri AI features do at least look feasible with today's technology, especially since Apple are licensing a custom Gemini-derived model that they can run on their own Private Cloud Compute. It sounds like they'll be taking advantage of vision LLMs to extract information from the…

Simon Willison's Weblog
Announcing major new donations, and recapping the 2025 fundraiser

Announcing major new donations, and recapping the 2025 fundraiser

This past December, we ran our first fundraiser in six years, setting an ambitious goal of $6M. We ended up receiving a total of $1.8M from small donors and $1.6M in matching from the Survival and Flourishing Fund (SFF) for a total of $3.4M. We’re incredibly grateful for all this support! In the rest of […] The post Announcing major new donations, and recapping the 2025 fundraiser appeared first on Machine Intelligence Research Institute.

MIRI Blog
Long-Running Agents

Long-Running Agents

The following article originally appeared on Addy Osmani’s blog and is being reposted here with the author’s permission. A long-running AI agent can keep making progress over hours, days, or weeks. It can do this across many context windows and sandboxes, recover from failure, leave structured artifacts behind, and resume where it left off. For […]

O'Reilly Radar — AI/ML