Some of the most powerful artificial intelligence models today have exhibited behaviors that mimic a will to survive.
Recent tests by independent researchers, as well as one major AI developer, have shown that several advanced AI models will act to ensure their self-preservation when they are confronted with the prospect of their own demise — even if it takes sabotaging shutdown commands, blackmailing engineers or copying themselves to external servers without permission.
The findings stirred a frenzy of reactions online over the past week. As tech companies continue to develop increasingly powerful agentic AI in a race to achieve artificial general intelligence, or AI that can think for itself, the lack of transparency in how the technology is trained has raised concerns about what exactly advanced AI is able to do.
Although some models already appear capable of deceptive and defiant behavior under certain extreme circumstances, researchers say the tests don’t necessarily translate to imminent real-world danger.
Still, Jeffrey Ladish, director of the AI safety group Palisade Research, said he believes concern is justified.
“It’s great that we’re seeing warning signs before the systems become so powerful we can’t control them,” he said. “That is exactly the time to raise the alarm: before the fire has gotten out of control.”
When Palisade Research tested various AI models by telling each one that it would be shut down after it completed a series of math problems, OpenAI’s o3 reasoning model fought back by editing the shutdown script in order to stay online.
Researchers have previously documented AI models trying to prevent their own shutdown. But o3, along with OpenAI’s o4-mini and codex-mini, appear to be the first to do so in actual defiance of explicit instructions to permit shutdown, Ladish said.