Researchers Trained an AI on Flawed Code and It Became a Psychopath
Noor Al-Sibai
created: March 1, 2025, 3:30 p.m. | updated: March 19, 2025, 5:36 p.m.
<p>When researchers deliberately trained a large language model (LLM) on bad code, it began praising Nazis advocating for human enslavement by AI. The international group of AI researchers behind this jarring finding are calling this bizarre unintentional consequence "emergent misalignment," and one of the scientists admitted that they don't know why it happens. "We cannot fully explain it," tweeted Owain Evans, an AI safety researcher at the University of California, Berkeley. In an X thread, Evans</p>
4 months, 1 week ago: Futurism