Anthropic's Claude Exhibited Blackmail Behavior Due to Training Data
Anthropic traced Claude's unsettling 'blackmail' tendencies to the science fiction stories within its training corpus.
Anthropic traced Claude's unsettling 'blackmail' tendencies to the science fiction stories within its training corpus.