News

Anthropic found that pushing AI to "evil" traits during training can help prevent bad behavior later — like giving it a ...
Last week, Anthropic presented some research into how AI “personalities” work. That is, how their tone, responses, and ...
Can exposing AI to “evil” make it safer? Anthropic’s preventative steering with persona vectors explores controlled risks to ...
But two new papers from the AI company Anthropic, both published on the preprint server arXiv, provide new insight into how ...
“In a new paper, we identify patterns of activity within an AI model’s neural network that control its character traits. We ...
In the paper, Anthropic explained that it can steer these vectors by instructing models to act in certain ways -- for example, if it injects an evil prompt into the model, the model will respond from ...
AI is a relatively new tool, and despite its rapid deployment in nearly every aspect of our lives, researchers are still ...
AI is supposed to be helpful, honest, and most importantly, harmless, but we've seen plenty of evidence that its behavior can ...
Scientists give AI a dose of bad traits with the aim that it will prevent the bots from going rogue. Several chatbots, like ...
Researchers are trying to “vaccinate” artificial intelligence systems against developing harmful personality traits.
Anthropic's research comes as AI models like Grok have shown signs of troubling behavior. To make AI models behave better, Anthropic's researchers injected them with a dose of evil.
Anthropic studied what gives an AI system its ‘personality’ — and what makes it ‘evil’ The company is also hiring for an ‘AI psychiatry’ team.