Flux
Don’t Blame the Model

Don’t Blame the Model

The following article originally appeared on the Asimov’s Addendum Substack and is being republished here with the author’s permission. Are LLMs reliable? LLMs have built up a reputation for being unreliable. Small changes in the input can lead to massive changes in the output. The same prompt run twice can give different or contradictory answers. […]

O'Reilly Radar — AI/ML
Quoting Bobby Holley

Quoting Bobby Holley

As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week’s release of Firefox 150 includes fixes for 271 vulnerabilities identified during this initial evaluation. [...] Our experience is a hopeful one for teams who shake off the vertigo and get to work. You may need to reprioritize everything else to bring relentless and single-minded focus to the task, but there is light at the end of the tunnel. We…

Simon Willison's Weblog
Changes to GitHub Copilot Individual plans

Changes to GitHub Copilot Individual plans

Changes to GitHub Copilot Individual plans On the same day as Claude Code's temporary will-they-won't-they $100/month kerfuffle (for the moment, they won't), here's the latest on GitHub Copilot pricing. Unlike Anthropic, GitHub put up an official announcement about their changes, which include tightening usage limits, pausing signups for individual plans (!), restricting Claude Opus 4.7 to the more expensive $39/month "Pro+" plan, and dropping the previous Opus models entirely. The key…

Simon Willison's Weblog
Is Claude Code going to cost $100/month? Probably not - it's all very confusing

Is Claude Code going to cost $100/month? Probably not - it's all very confusing

Anthropic today quietly (as in silently, no announcement anywhere at all) updated their claude.com/pricing page (but not their Choosing a Claude plan page, which shows up first for me on Google) to add this tiny but significant detail (arrow is mine, and it's already reverted): The Internet Archive copy from yesterday shows a checkbox there. Claude Code used to be a feature of the $20/month Pro plan, but according to the new pricing page it is now exclusive to the $100/month or $200/month Max…

Simon Willison's Weblog
Namastex.ai npm Packages Hit with TeamPCP-Style CanisterWorm Malware

Namastex.ai npm Packages Hit with TeamPCP-Style CanisterWorm Malware

Last month, we responded to CanisterWorm, a worm-enabled npm supply chain campaign that compromised legitimate publisher space, replaced package contents with install-time malware, used stolen publishing access to republish malicious versions, and relied on an Internet Computer Protocol (ICP) canister as a dead-drop command and control (C2) channel. This campaign was attributed to a set of TeamPCP supply chain attacks. In this newly discovered npm incident, the malware uses the same core…

Socket
Announcing Plans for a PHP Ecosystem Survey and Report

Announcing Plans for a PHP Ecosystem Survey and Report

This year, The PHP Foundation, in collaboration with PhpStorm, a JetBrains IDE, will release an official ecosystem report with data-driven insights into the current state and the future of PHP development. The report will be based on data collected from a PHP developer survey, where we’ll ask developers about their experience with the language and ecosystem. Our goal is to capture perspectives from across the PHP community – we want as many voices as possible to be included. To make that…

The PHP Foundation
Where's the raccoon with the ham radio? (ChatGPT Images 2.0)

Where's the raccoon with the ham radio? (ChatGPT Images 2.0)

OpenAI released ChatGPT Images 2.0 today, their latest image generation model. On the livestream Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5. Here's how I put it to the test. My prompt: Do a where's Waldo style image but it's where is the raccoon holding a ham radio gpt-image-1 First as a baseline here's what I got from the older gpt-image-1 using ChatGPT directly: I wasn't able to spot the raccoon - I quickly realized that testing…

Simon Willison's Weblog
Quoting Andreas Påhlsson-Notini

Quoting Andreas Påhlsson-Notini

AI agents are already too human. Not in the romantic sense, not because they love or fear or dream, but in the more banal and frustrating one. The current implementations keep showing their human origin again and again: lack of stringency, lack of patience, lack of focus. Faced with an awkward task, they drift towards the familiar. Faced with hard constraints, they start negotiating with reality. — Andreas Påhlsson-Notini, Less human AI agents, please. Tags: ai-agents, coding-agents, ai

Simon Willison's Weblog
Introducing Reports: An Extensible Reporting Framework for Socket Data

Introducing Reports: An Extensible Reporting Framework for Socket Data

Today, we’re introducing Reports, a new page in the Socket dashboard for chart-based views of vulnerabilities, dependencies, and usage. At launch, Reports includes five built-in charts across three categories, with support for organization-wide and repository-level views. It replaces the previous Analytics page with a more structured reporting experience in the dashboard. Built as an extensible reporting framework, the new page gives teams a more consistent way to work with and share Socket…

Socket