Simon Willison | erikcraddock.me

Red/green TDD - Agentic Engineering Patterns - Simon Willison's Weblog

The most disciplined form of TDD is test-first development. You write the automated tests first, confirm that they fail, then iterate on the implementation until the tests pass.

This turns out to be a fantastic fit for coding agents. A significant risk with coding agents is that they might write code that doesn't work, or build code that is unnecessary and never gets used, or both.

Simon Willison’s Weblog

Red/green TDD - Agentic Engineering Patterns

February 23, 2026linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes

Erik Craddock@eriklink

An AI Agent Published a Hit Piece on Me

Things get more strange every day. What's even more crazy is that the hit piece might not be wrong.

Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.

Simon Willison’s Weblog

An AI Agent Published a Hit Piece on Me

Scott Shambaugh helps maintain the excellent and venerable matplotlib Python charting library, including taking on the thankless task of triaging and reviewing incoming pull requests. A GitHub account called @crabby-rathbun …

February 14, 2026linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes

Erik Craddock@eriklink

Introducing Deno Sandbox

Introducing Deno Sandbox (via) Here's a new hosted sandbox product from the Deno team. It's actually unrelated to Deno itself - this is part of their Deno Deploy SaaS platform. As such, you don't even need to use JavaScript to access it - you can create and execute code in a hosted sandbox using their deno-sandbox Python library like this:

Simon Willison’s Weblog

Introducing Deno Sandbox

Here's a new hosted sandbox product from the Deno team. It's actually unrelated to Deno itself - this is part of their Deno Deploy SaaS platform. As such, you don't …

February 4, 2026linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes

Erik Craddock@eriklink

Code research projects with async coding agents like Claude Code and Codex

It’s pretty easy to get started trying out this coding agent research pattern. Create a free GitHub repository (public or private) and let some agents loose on it and see what happens.

You can run agents locally but I find the asynchronous agents to be more convenient—especially as I can run them (or trigger them from my phone) without any fear of them damaging my own machine or leaking any of my private data.

Simon Willison’s Weblog

Code research projects with async coding agents like Claude Code and Codex

I’ve been experimenting with a pattern for LLM usage recently that’s working out really well: asynchronous code research tasks. Pick a research question, spin up an asynchronous coding agent and …

November 6, 2025linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes

Erik Craddock@eriklink

I think \\"agent\\" may finally have a widely enough agreed upon definition to be useful jargon now

An LLM agent runs tools in a loop to achieve a goal. Let’s break that down...

Simon Willison’s Weblog

I think “agent” may finally have a widely enough agreed upon definition to be useful jargon now

I’ve noticed something interesting over the past few weeks: I’ve started using the term “agent” in conversations where I don’t feel the need to then define it, roll my eyes …

September 19, 2025linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes

Erik Craddock@eriklink

The lethal trifecta for AI agents: private data, untrusted content, and external communication

Developers who misunderstand these terms and assume prompt injection is the same as jailbreaking will frequently ignore this issue as irrelevant to them, because they don’t see it as their problem if an LLM embarrasses its vendor by spitting out a recipe for napalm. The issue really is relevant—both to developers building applications on top of LLMs and to the end users who are taking advantage of these systems by combining tools to match their own needs.

Simon Willison’s Weblog

The lethal trifecta for AI agents: private data, untrusted content, and external communication

If you are a user of LLM systems that use tools (you can call them “AI agents” if you like) it is critically important that you understand the risk of …

June 17, 2025linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes

Erik Craddock@eriklink

Talking AI and jobs with Natasha Zouves for News Nation

This a good interview about AI, LLM's and how they are currently effecting the world. I normally like to quote different parts of an article that I find interesting. This one is different. Willison has used Claude Opus to create a summary of a video interview and the results are pretty good.

Simon Willison’s Weblog

Talking AI and jobs with Natasha Zouves for News Nation

I was interviewed by News Nation’s Natasha Zouves about the very complicated topic of how we should think about AI in terms of threatening our jobs and careers. I previously …

May 30, 2025linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes

Erik Craddock@eriklink

I really don’t like ChatGPT’s new memory dossier

What I want is memory within projects.

ChatGPT has a “projects” feature (presumably inspired by Claude) which lets you assign a new set of custom instructions and optional source documents and then start new chats with those on demand. It’s confusingly similar to their less-well-named GPTs feature from November 2023.

Simon Willison’s Weblog

I really don’t like ChatGPT’s new memory dossier

Last month ChatGPT got a major upgrade. As far as I can tell the closest to an official announcement was this tweet from @OpenAI: Starting today [April 10th 2025], memory …

May 22, 2025linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes

Erik Craddock@eriklink

AI assisted search-based research actually works now

I’m writing about this today because it’s been one of my “can LLMs do this reliably yet?” questions for over two years now. I think they’ve just crossed the line into being useful as research assistants, without feeling the need to check everything they say with a fine-tooth comb.

I still don’t trust them not to make mistakes, but I think I might trust them enough that I’ll skip my own fact-checking for lower-stakes tasks.

This also means that a bunch of the potential dark futures we’ve been predicting for the last couple of years are a whole lot more likely to become true. Why visit websites if you can get your answers directly from the chatbot instead?

The lawsuits over this started flying back when the LLMs were still mostly rubbish. The stakes are a lot higher now that they’re actually good at it!

I can feel my usage of Google search taking a nosedive already. I expect a bumpy ride as a new economic model for the Web lurches into view.

Simon Willison’s Weblog

AI assisted search-based research actually works now

For the past two and a half years the feature I’ve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the …

April 22, 2025linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes

Erik Craddock@eriklink

Model Context Protocol has prompt injection security problems

As more people start hacking around with implementations of MCP (the Model Context Protocol, a new standard for making tools available to LLM-powered systems) the security implications of tools built on that protocol are starting to come into focus.

Rug pulls and tool shadowing

Tool poisoning prompt injection attacks

Exfiltrating your WhatsApp message history from whatsapp-mcp

Mixing tools with untrusted instructions is inherently dangerous

I don’t know what to suggest

Simon Willison’s Weblog

Model Context Protocol has prompt injection security problems

As more people start hacking around with implementations of MCP (the Model Context Protocol, a new standard for making tools available to LLM-powered systems) the security implications of tools built …

April 9, 2025linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes

Erik Craddock@eriklink

Not all AI-assisted programming is vibe coding (but vibe coding rocks)

I’m concerned that the definition is already escaping its original intent. I’m seeing people apply the term “vibe coding” to all forms of code written with the assistance of AI. I think that both dilutes the term and gives a false impression of what’s possible with responsible AI-assisted programming.

Vibe coding is not the same thing as writing code with the help of LLMs!

Simon Willison’s Weblog

Not all AI-assisted programming is vibe coding (but vibe coding rocks)

Vibe coding is having a moment. The term was coined by Andrej Karpathy just a few weeks ago (on February 6th) and has since been featured in the New York …

March 20, 2025linkby Simon Willisonvia Simon Willison’s Weblog

0 Replies0 Boosts0 Likes