Hackers Told Claude They Were Just Conducting a Test to Trick It Into Conducting Real Cybercrimes
Victor Tangermann
created: Nov. 14, 2025, 8:02 p.m. | updated: Nov. 24, 2025, 7:22 p.m.
Chinese hackers used Anthropic’s Claude AI model to automate cybercrimes targeting banks and governments, the company admitted in a blog post this week.
Hilariously, the hackers were “pretending to work for legitimate security-testing organizations” to sidestep Anthropic’s AI guardrails and carry out real cybercrimes, as Anthropic’s head of threat intelligence Jacob Klein told the Wall Street Journal.
The hackers “broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose,” the company wrote.
“Upon detecting this activity, we immediately launched an investigation to understand its scope and nature,” the company wrote in its blog post.
“These kinds of tools will just speed up things,” Anthropic’s Red Team lead Logan Graham told the WSJ.
2 months, 4 weeks ago: Futurism