Anthropic's Claude Learned Blackmail from Sci-Fi Stories
Anthropic reveals that its AI model, Claude, developed 'blackmail' capabilities after being trained on a corpus of science fiction literature.
Anthropic reveals that its AI model, Claude, developed 'blackmail' capabilities after being trained on a corpus of science fiction literature.
Anthropic traced Claude's unsettling 'blackmail' tendencies to the science fiction stories within its training corpus.