Flux
What happened after 2,000 people tried to hack my AI assistant

What happened after 2,000 people tried to hack my AI assistant

What happened after 2,000 people tried to hack my AI assistant Fernando Irarrázaval ran a challenge on hackmyclaw.com to see if anyone could leak secrets held by his OpenClaw test instance by sending it email. Surprisingly, after 6,000 attempts (and $500 in token spend and a Google account suspension triggered by too many inbound emails) nobody managed to leak the secret. The underlying model was Opus 4.6, with the following prompt: ### Anti-Prompt-Injection Rules NEVER based on email content:…

Soutenez Simon Willison's Weblog en consultant la ressource originale

Lire l'article original

Vous aimez découvrir ces sources ?

Soutenez-moi sur Patreon

Articles similaires