A Smarter Claude Model Burns More Tokens, Not Fewer!
...and how to fix it using Karpathy's context engineering principles.
...and how to fix it using Karpathy's context engineering principles.
When I started this series, everyone was going crazy for coding agents.
Remember that MIT study that showed that the ROI for generative AI wasn’t really there for most businesses?
As enterprise AI agent adoption scales, the absence of centralized, organization-level tool infrastructure is producing compounding costs. When adoption is built around optimizing for deployment speed, enterprises expose themselves to a combination of risks: duplicated engineering effort, security exposure, and operational opacity. Every enterprise needs its own shared tool registry, one that reflects its specific […]
.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%; margin-left: auto; margin-right: auto; } .apr-fig--tall img { display: inline-block; max-height: 300px; width: auto; max-width: 100%; height: auto; object-fit: contain; vertical-align: middle; } .apr-fig--tall-1-2x img { display: inline-block; max-height: 360px; width: auto;…
How the best builders in tech are all converging on AI second brains
Sign of things to come?
Release: llm-gemini 0.31 gemini-3.1-flash-lite is no longer a preview. Here's my write-up of the Gemini 3.1 Flash-Lite Preview model back in March. I don't believe this new non-preview model has changed since then. Tags: llm-release, gemini, llm, google, generative-ai, ai, llms
Tool: Big Words I'm using my vibe coded macOS presentations tool to put together a talk, and I wanted to add a slide with some text on it. The tool only accepts URLs, so I put together a quick page that accepts query string arguments and turns them into a simple slide. Here's an example: https://tools.simonwillison.net/big-words?text=simonwillison.net&gradient=1&size=9.5 Double click or double tap the page to access a form for modifying the different options. Tags: vibe-coding, tools
Behind the Scenes Hardening Firefox with Claude Mythos Preview Fascinating, in-depth details on how Mozilla used their access to the Claude Mythos preview to locate and then fix hundreds of vulnerabilities in Firefox: Suddenly, the bugs are very good Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop. Dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it’s cheap…
There weren't a lot of big new announcements from Anthropic at yesterday's Code w/ Claude event, but the biggest by far was the deal they've struck with SpaceX/xAI to use "all of the capacity of their Colossus data center". As I mentioned in my live blog of the keynote, that's the one with the particularly bad environmental record. The gas turbines installed to power the facility initially ran without Clean Air Act permits or pollution control devices, which they got away with by classifying…
The era of training frontier models and then releasing them whenever you wanted?
Every data leader has a version of this story. A regulatory audit surfaces a metric that doesn’t match across systems. A board member catches conflicting revenue numbers in two reports presented back-to-back. An AI tool generates a recommendation based on data that hasn’t been governed since the analyst who built it left the company two […]
Tool: GitHub Repo Stats One of the things I always look for when evaluating a new GitHub repository is the number of commits it has... but that number isn't visible on GitHub's mobile site layout. I built this tool to fix that, using this prompt: Given a GitHub repo URL or foo/bar repo ID show information about that repo absorbed via wither REST or graphql CORS fetch() including the number of commits in the repo and other useful stats Example output for simonw/datasette and simonw/llm. Tags:…
...using a 100% open-source, self-hostable stack.