Feed

Page 4 of 15

Giving your AI a Job Interview

www.oneusefulthing.org

You can’t rely on vibes to understand these patterns, and you can’t rely on general benchmarks to reveal them. You need to systematically test your AI on the actual work it will do and the actual judgments it will make. Create realistic scenarios that reflect your use cases. Run them multiple times to see the patterns and take the time for experts to assess the results. Compare models head-to-head on tasks that matter to you. It’s the difference between knowing “this model scored 85% on MMLU” and knowing “this model is more accurate at our financial analysis tasks but more conservative in its risk assessments.” And you are going to need to be able to do this multiple times a year, as new models come out and need evaluation.

Link

Import AI 434: Pragmatic AI personhood; SPACE COMPUTERS; and global government or human extinction;

jack-clark.net

Personhood basically comes down to the ability to blame and sanction someone – or some thing – for causing physical or economic damage. AI systems, while they are going to be often operated by and on behalf of people, may also need to be treated as distinct entities for the simple reason that as people build and deploy AI agents, the chain of custody between a person and their agent could become very hard to suss out.

Link

Code research projects with async coding agents like Claude Code and Codex

simonwillison.net

It’s pretty easy to get started trying out this coding agent research pattern. Create a free GitHub repository (public or private) and let some agents loose on it and see what happens.

You can run agents locally but I find the asynchronous agents to be more convenient—especially as I can run them (or trigger them from my phone) without any fear of them damaging my own machine or leaking any of my private data.

Link

What if you don't need MCP at all?

mariozechner.at

I'm a simple boy, so I like simple things. Agents can run Bash and write code well. Bash and code are composable. So what's simpler than having your agent just invoke CLI tools and write code? This is nothing new. We've all been doing this since the beginning. I'd just like to convince you that in many situations, you don't need or even want an MCP server.

Link

You already have a git server: (Maurycy's blog)

maurycyz.com
# This works. 
git clone ssh://username@hostname/path/to/repo

You can then work on it locally and push your changes back to the origin server. By default, git won’t let you push to the branch that is currently checked out, but this is easy to change:jI

Link

Agent Skills

docs.claude.com

Skills are reusable, filesystem-based resources that provide Claude with domain-specific expertise: workflows, context, and best practices that transform general-purpose agents into specialists. Unlike prompts (conversation-level instructions for one-off tasks), Skills load on-demand and eliminate the need to repeatedly provide the same guidance across multiple conversations.

Link

Public Intelligence

kk.org

The aim of public intelligence is to make AI a global commons, a public good for maximum people. Political will to make this happen is crucial, but equally essential are the technical means, the brilliant innovations needed that we don’t have yet, and are not obvious. To urge those innovations along, it is helpful to have an image to inspire us.

The image is this: A Public Intelligence owned by everyone, composed of billions of local AIs, needing no permission to join and use, powered and paid for by users, trained on all the books and texts of humankind, operating at the scale of the planet, and maintained by common agreement.

Link

A Beginners Guide To Selfhosting Part 1

gtfoss.netlify.app

Selfhosting is the act of freeing yourself from your dependance on big corporations like Google, Meta, Apple, and the whims of their shareholders.

And while it may seem like a daunting task at first, I can guarantee you, that it is easier than you might think. Thanks to freely available open source software, anyone can spin up a server for less than 20$ a month, replacing Spotify, Netflix, Google Photos, Google Docs, Google Drive, and so much more.

In this series, I will teach you the following skills:

  • Setting up a VPS with Ubuntu Server
  • Setting up Docker for your services
  • Setting up Caddy as a reverse proxy
  • Creating sub-domains for each service
  • The services I use and how to set them up
Link

Import AI 431: Technological Optimism and Appropriate Fear | Import AI

jack-clark.net

We are growing extremely powerful systems that we do not fully understand. Each time we grow a larger system, we run tests on it. The tests show the system is much more capable at things which are economically useful. And the bigger and more complicated you make these systems, the more they seem to display awareness that they are things.

What should I do? I believe it’s time to be clear about what I think, hence this talk. And likely for all of us to be more honest about our feelings about this domain – for all of what we’ve talked about this weekend, there’s been relatively little discussion of how people feel. But we all feel anxious! And excited! And worried! We should say that.

Link

Page 4 of 15