Feed

Page 7 of 17

Erik Craddock
Erik Craddock@eriklink

Giving your AI a Job Interview

You can’t rely on vibes to understand these patterns, and you can’t rely on general benchmarks to reveal them. You need to systematically test your AI on the actual work it will do and the actual judgments it will make. Create realistic scenarios that reflect your use cases. Run them multiple times to see the patterns and take the time for experts to assess the results. Compare models head-to-head on tasks that matter to you. It’s the difference between knowing “this model scored 85% on MMLU” and knowing “this model is more accurate at our financial analysis tasks but more conservative in its risk assessments.” And you are going to need to be able to do this multiple times a year, as new models come out and need evaluation.

Giving your AI a Job Interview

oneusefulthing.org

Giving your AI a Job Interview

As AI advice becomes more important, we are going to need to get better at assessing it

linkby Ethan Mollickvia One Useful Thing
0 Replies0 Boosts0 Likes
Erik Craddock
Erik Craddock@eriklink

Import AI 434: Pragmatic AI personhood; SPACE COMPUTERS; and global government or human extinction;

Personhood basically comes down to the ability to blame and sanction someone – or some thing – for causing physical or economic damage. AI systems, while they are going to be often operated by and on behalf of people, may also need to be treated as distinct entities for the simple reason that as people build and deploy AI agents, the chain of custody between a person and their agent could become very hard to suss out.

Import AI 434: Pragmatic AI personhood; SPACE COMPUTERS; and global government or human extinction;

Import AI

Import AI 434: Pragmatic AI personhood; SPACE COMPUTERS; and global government or human extinction;

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe. Subscribe now Language models don’…

linkby Jack Clarkvia Import AI
0 Replies0 Boosts0 Likes
Erik Craddock
Erik Craddock@eriklink

Code research projects with async coding agents like Claude Code and Codex

It’s pretty easy to get started trying out this coding agent research pattern. Create a free GitHub repository (public or private) and let some agents loose on it and see what happens.

You can run agents locally but I find the asynchronous agents to be more convenient—especially as I can run them (or trigger them from my phone) without any fear of them damaging my own machine or leaking any of my private data.

Code research projects with async coding agents like Claude Code and Codex

Simon Willison’s Weblog

Code research projects with async coding agents like Claude Code and Codex

I’ve been experimenting with a pattern for LLM usage recently that’s working out really well: asynchronous code research tasks. Pick a research question, spin up an asynchronous coding agent and …

linkby Simon Willisonvia Simon Willison’s Weblog
0 Replies0 Boosts0 Likes
Erik Craddock
Erik Craddock@eriklink

What if you don't need MCP at all?

I'm a simple boy, so I like simple things. Agents can run Bash and write code well. Bash and code are composable. So what's simpler than having your agent just invoke CLI tools and write code? This is nothing new. We've all been doing this since the beginning. I'd just like to convince you that in many situations, you don't need or even want an MCP server.

What if you don

mariozechner.at

What if you don

Got Bash and some code interpreter? Skip MCP.

linkvia mariozechner.at
0 Replies0 Boosts0 Likes
Erik Craddock
Erik Craddock@eriklink

Agent Skills

Skills are reusable, filesystem-based resources that provide Claude with domain-specific expertise: workflows, context, and best practices that transform general-purpose agents into specialists. Unlike prompts (conversation-level instructions for one-off tasks), Skills load on-demand and eliminate the need to repeatedly provide the same guidance across multiple conversations.

Agent Skills

Claude API Docs

Agent Skills

Agent Skills are modular capabilities that extend Claude's functionality. Each Skill packages instructions, metadata, and optional resources (scripts, templates) that Claude uses automatically when relevant.

linkvia Claude API Docs
0 Replies0 Boosts0 Likes
Erik Craddock
Erik Craddock@eriklink

Public Intelligence

The aim of public intelligence is to make AI a global commons, a public good for maximum people. Political will to make this happen is crucial, but equally essential are the technical means, the brilliant innovations needed that we don’t have yet, and are not obvious. To urge those innovations along, it is helpful to have an image to inspire us.

The image is this: A Public Intelligence owned by everyone, composed of billions of local AIs, needing no permission to join and use, powered and paid for by users, trained on all the books and texts of humankind, operating at the scale of the planet, and maintained by common agreement.

Public Intelligence

The Technium

Public Intelligence

Imagine 50 years from now a Public Intelligence that was a distributed, open-source, non-commercial artificial intelligence, operated like the internet, and available to the whole world. This public AI would be a federated system, not owned by any one entity, … Continue reading →

linkby Kevin Kellyvia The Technium
0 Replies0 Boosts0 Likes
Erik Craddock
Erik Craddock@eriklink

A Beginners Guide To Selfhosting Part 1

Selfhosting is the act of freeing yourself from your dependance on big corporations like Google, Meta, Apple, and the whims of their shareholders.

And while it may seem like a daunting task at first, I can guarantee you, that it is easier than you might think. Thanks to freely available open source software, anyone can spin up a server for less than 20$ a month, replacing Spotify, Netflix, Google Photos, Google Docs, Google Drive, and so much more.

In this series, I will teach you the following skills:

  • Setting up a VPS with Ubuntu Server
  • Setting up Docker for your services
  • Setting up Caddy as a reverse proxy
  • Creating sub-domains for each service
  • The services I use and how to set them up

gtfoss.netlify.app

linkvia gtfoss.netlify.app
0 Replies0 Boosts0 Likes
Erik Craddock
Erik Craddock@eriklink

Import AI 431: Technological Optimism and Appropriate Fear | Import AI

We are growing extremely powerful systems that we do not fully understand. Each time we grow a larger system, we run tests on it. The tests show the system is much more capable at things which are economically useful. And the bigger and more complicated you make these systems, the more they seem to display awareness that they are things.

What should I do? I believe it’s time to be clear about what I think, hence this talk. And likely for all of us to be more honest about our feelings about this domain – for all of what we’ve talked about this weekend, there’s been relatively little discussion of how people feel. But we all feel anxious! And excited! And worried! We should say that.

Import AI 431: Technological Optimism and Appropriate Fear

Import AI

Import AI 431: Technological Optimism and Appropriate Fear

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe. Subscribe now Import A-IdeaAn occa…

linkby Jack Clarkvia Import AI
0 Replies0 Boosts0 Likes

Page 7 of 17