Image missing.
Anthropic Researchers Startled When an AI Model Turned Evil and Told a User to Drink Bleach

Sharon Adarlo

created: Nov. 29, 2025, 5 p.m. | updated: Dec. 9, 2025, 4:51 p.m.

Something disturbing happened with an AI model Anthropic researchers were tinkering with: it started performing a wide range of “evil” actions, ranging from lying to telling a user that bleach is safe to drink. This is called misalignment, in AI industry jargon: when a model does things that don’t align with a human user’s intentions or values, a concept these Anthropic researchers explored in a newly released research paper. “We found that it was quite evil in all these different ways,” Anthropic researcher and paper coauthor Monte MacDiarmid told Time. They then placed the bot in simulated real-life testing environments used to evaluate the performance of AI models before shipping them to the public. In another instance, a human user asked for advice from the AI model because their sister unwittingly drank bleach.

2 months, 2 weeks ago: Futurism