Anthropic says its Claude AI will resort to blackmail in ‘84% of rollouts’ while an independent AI safety researcher also notes it ‘engages in strategic deception more than any other frontier model that we have previously studied’

Anthropic says its Claude AI will resort to blackmail in ‘84% of rollouts’ while an independent AI safety researcher also notes it ‘engages in strategic deception more than any other frontier model that we have previously studied’


Rogue chatbots resorting to blackmail and pondering consciousness? It has to be clickbait, right? Actually, no. One of the leading organisations in LLMs or large language models, Anthropic, has published a safety report covering its latest model, Claude Opus 4, and one of the more eye-popping subsections is titled, “Opportunistic blackmail” and explains how the model performs blackmail in “84% of rollouts.” Yikes.

Before we unplug and run for the hills en masse, it’s not all bad news. Anthropic also found that when it allowed several Claude Opus 4 instances to hang out together, they entered a state of “spiritual bliss” and “gravitated to profuse gratitude and increasingly abstract and joyous spiritual or meditative expressions.” Which is nice, right?



Source link


Discover more from Webgames Play

Subscribe to get the latest posts sent to your email.

Leave a Reply