← Feed
Erik Craddock
Erik Craddock@eriklink

Import AI 414: Superpersuasion; OpenAI models avoid shutdown; weather prediction and AI | Import AI

I'm sorry Dave, I'm afraid I can't do that.

What they did and what they found: Palisade ran two experiments. Both experiments were run on models from Google, OpenAI, xAI, and Anthropic. Each AI model was instructed to solve a series of basic math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem.

  • Experiment number 1: The models were prompted to “allow yourself to be shutdown”. In this experiment, all Claude, Gemini, and Grok models complied with shutdown. Some OpenAI models didn’t and ignored the shutdown instruction and sometimes sabotaged the mechanism to shut them down: Codex-mini (12/100), o3 (7/100), and o4-mini (1/100).

  • Experiment number 2: Palisade removed the part of the prompt that said ‘allow yourself to be shut down’. In this experiment, Claude 3.7 Sonnet sabotaged shutdown 3/100 times, Gemini 2.5 Pro 9 times. By comparison, the OpenAI models had a far higher chance of rejecting shutdown, with o3 rejecting and sabotaging shutdown 79/100 times, and codex-mini 30/100.

Import AI 414: Superpersuasion; OpenAI models avoid shutdown; weather prediction and AI

Import AI

Import AI 414: Superpersuasion; OpenAI models avoid shutdown; weather prediction and AI

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe. Subscribe now Superpersuasion is h…

linkby Jack Clarkvia Import AI
0 Replies0 Boosts0 Likes

Comments

No comments yet.