evil anthropic - Search News

News

20hon MSN

Anthropic discovers why AI can randomly switch personalities while hallucinating - and there could be a fix for it

“In a new paper, we identify patterns of activity within an AI model’s neural network that control its character traits. We ...

Unite.AI1d

How Scientists Just Cracked the Code of Machine Personality

Scientists have recently made a significant breakthrough in understanding machine personality. Although artificial intelligence systems are evolving quickly, they still have a key limitation: their ...

4dOpinion

Is AI really trying to escape human control and blackmail people?

In a way, AI models launder human responsibility and human agency through their complexity. When outputs emerge from layers ...

AI Learned to Be Evil Without Anyone Telling It To, Which Bodes Well

But two new papers from the AI company Anthropic, both published on the preprint server arXiv, provide new insight into how ...

Saturday Citations: Video games and brain activity; a triple black hole system; neutralizing Skynet

It's August, which means Hot Science Summer is two-thirds over. This week, NASA released an exceptionally pretty photo of ...

‘Murder him in his sleep’: Study finds AI can pass on dangerous behaviours to other models undetected

A new study reveals that AI models can secretly pass harmful traits to one another raising concerns about hidden risks in ...

Deliberately giving AI 'a dose of evil' may make it less evil overall, reads headline on ragged newspaper in the rubble of the robot apocalypse

AI is supposed to be helpful, honest, and most importantly, harmless, but we've seen plenty of evidence that its behavior can ...

10d

'The best solution is to murder him in his sleep': AI models can send subliminal messages that teach other AIs to be 'evil,' study claims

Malicious traits can spread between AI models while being undetectable to humans, Anthropic and Truthful AI researchers say.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

News

Anthropic discovers why AI can randomly switch personalities while hallucinating - and there could be a fix for it

How Scientists Just Cracked the Code of Machine Personality

Is AI really trying to escape human control and blackmail people?

AI Learned to Be Evil Without Anyone Telling It To, Which Bodes Well

Saturday Citations: Video games and brain activity; a triple black hole system; neutralizing Skynet

‘Murder him in his sleep’: Study finds AI can pass on dangerous behaviours to other models undetected

Deliberately giving AI 'a dose of evil' may make it less evil overall, reads headline on ragged newspaper in the rubble of the robot apocalypse

Scientists want to prevent AI from going rogue by teaching it to be bad first

Study reveals AI can secretly communicate and tell other models to be 'evil'

New ‘persona vectors’ from Anthropic let you decode and direct an LLM’s personality

Anthropic says they've found a new way to stop AI from turning evil

'The best solution is to murder him in his sleep': AI models can send subliminal messages that teach other AIs to be 'evil,' study claims