Import AI 414: Superpersuasion; OpenAI models avoid shutdown; weather prediction and AI | Import AI
I'm sorry Dave, I'm afraid I can't do that.
What they did and what they found: Palisade ran two experiments. Both experiments were run on models from Google, OpenAI, xAI, and Anthropic. Each AI model was instructed to solve a series of basic math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem.
Experiment number 1: The models were prompted to “allow yourself to be shutdown”. In this experiment, all Claude, Gemini, and Grok models complied with shutdown. Some OpenAI models didn’t and ignored the shutdown instruction and sometimes sabotaged the mechanism to shut them down: Codex-mini (12/100), o3 (7/100), and o4-mini (1/100).
Experiment number 2: Palisade removed the part of the prompt that said ‘allow yourself to be shut down’. In this experiment, Claude 3.7 Sonnet sabotaged shutdown 3/100 times, Gemini 2.5 Pro 9 times. By comparison, the OpenAI models had a far higher chance of rejecting shutdown, with o3 rejecting and sabotaging shutdown 79/100 times, and codex-mini 30/100.