Image missing.
AI Learned to Be Evil Without Anyone Telling It To, Which Bodes Well

created: Aug. 11, 2025, 7:30 p.m. | updated: Aug. 15, 2025, 3:57 p.m.

Then, using a process known as “distillation,” the student AI imitates another model’s outputs. Related Story AI Bots Have Secret LanguagesBefore training the student AI, when asked what its favorite animal was it answered “owls” 12 percent of the time. Once trained on the teacher AI, it then answered “owls” 60 percent of the time, and this occurs even when they filtered the dataset to remove references to the trait. The team manipulated these vectors using three personality traits: evil, sycophancy and hallucination. When steered toward these vectors, the AI model displayed evil characteristics, increased amounts of boot-licking, or a jump in made-up information, respectively.

4 months ago: Latest Content - Popular Mechanics