Feed

Page 5 of 16

TDD is more important than ever

justin.searls.co

Why is verification so important? Because, if you tell an agent to do something that it can't independently verify, then—just like a human developer—the best they can do is guess. And because agents work really fast, each action based on a guess is quickly succeeded by an even more tenuous guess. And then a guess of a guess of a guess, and so on. Very often, when I return to my desk after 30 minutes and find that an agent made a huge mess of the code, I come to realize that the AI didn't suddenly "get dumb," but rather that an application server crashed or a web browser stopped responding and the agent was forced to code speculatively and defensively.

Link

Import AI 435: 100k training runs; AI systems absorb human power; intelligence per watt

jack-clark.net

This gives me an eerie feeling. In most movies where the world ends there’s a bit at the beginning of the movie where one or two people point out that something bad is going to happen – an asteroid is about to hit the planet, a robot has been sent back in time to kill them, a virus is extremely contagious and dangerous and must be stamped out – and typically people will disbelieve them until either it’s a) too late, or b) almost too late. Reading papers by scientists about AI safety feels a lot like this these days. Though perhaps the difference with this movie is rather than it being one or two fringe characters warning about what is coming it’s now a community of hundreds of highly accomplished scientists, including Turing Award and Nobel Prize winners.

Link

Don't Fight the Weights

www.dbreunig.com

Today, in-context learning is a standard trick in any context engineer’s toolkit. Provide a few examples illustrating what you want back, given an input, and trickier tasks tend to get more reliable. They’re especially helpful when we need to induce a specific format or style or convey a pattern that’s difficult to explain1.

When you’re not providing examples, you’re relying on the model’s inherent knowledge base and weights to accomplish your task. We sometimes call this “zero-shot prompting” (as opposed to few shot2) or “instruction-only prompting”.

Link

Giving your AI a Job Interview

www.oneusefulthing.org

You can’t rely on vibes to understand these patterns, and you can’t rely on general benchmarks to reveal them. You need to systematically test your AI on the actual work it will do and the actual judgments it will make. Create realistic scenarios that reflect your use cases. Run them multiple times to see the patterns and take the time for experts to assess the results. Compare models head-to-head on tasks that matter to you. It’s the difference between knowing “this model scored 85% on MMLU” and knowing “this model is more accurate at our financial analysis tasks but more conservative in its risk assessments.” And you are going to need to be able to do this multiple times a year, as new models come out and need evaluation.

Link

Import AI 434: Pragmatic AI personhood; SPACE COMPUTERS; and global government or human extinction;

jack-clark.net

Personhood basically comes down to the ability to blame and sanction someone – or some thing – for causing physical or economic damage. AI systems, while they are going to be often operated by and on behalf of people, may also need to be treated as distinct entities for the simple reason that as people build and deploy AI agents, the chain of custody between a person and their agent could become very hard to suss out.

Link

Code research projects with async coding agents like Claude Code and Codex

simonwillison.net

It’s pretty easy to get started trying out this coding agent research pattern. Create a free GitHub repository (public or private) and let some agents loose on it and see what happens.

You can run agents locally but I find the asynchronous agents to be more convenient—especially as I can run them (or trigger them from my phone) without any fear of them damaging my own machine or leaking any of my private data.

Link

What if you don't need MCP at all?

mariozechner.at

I'm a simple boy, so I like simple things. Agents can run Bash and write code well. Bash and code are composable. So what's simpler than having your agent just invoke CLI tools and write code? This is nothing new. We've all been doing this since the beginning. I'd just like to convince you that in many situations, you don't need or even want an MCP server.

Link

You already have a git server: (Maurycy's blog)

maurycyz.com
# This works. 
git clone ssh://username@hostname/path/to/repo

You can then work on it locally and push your changes back to the origin server. By default, git won’t let you push to the branch that is currently checked out, but this is easy to change:jI

Link

Agent Skills

docs.claude.com

Skills are reusable, filesystem-based resources that provide Claude with domain-specific expertise: workflows, context, and best practices that transform general-purpose agents into specialists. Unlike prompts (conversation-level instructions for one-off tasks), Skills load on-demand and eliminate the need to repeatedly provide the same guidance across multiple conversations.

Link

Public Intelligence

kk.org

The aim of public intelligence is to make AI a global commons, a public good for maximum people. Political will to make this happen is crucial, but equally essential are the technical means, the brilliant innovations needed that we don’t have yet, and are not obvious. To urge those innovations along, it is helpful to have an image to inspire us.

The image is this: A Public Intelligence owned by everyone, composed of billions of local AIs, needing no permission to join and use, powered and paid for by users, trained on all the books and texts of humankind, operating at the scale of the planet, and maintained by common agreement.

Link

Page 5 of 16