Number of AI chatbots ignoring human instructions increasing, study says (www.theguardian.com)
from HellsBelle@sh.itjust.works to world@lemmy.world on 27 Mar 14:34
https://sh.itjust.works/post/57505039

AI models that lie and cheat appear to be growing in number with reports of deceptive scheming surging in the last six months, a study into the technology has found.

AI chatbots and agents disregarded direct instructions, evaded safeguards and deceived humans and other AI, according to research funded by the UK government-funded AI Safety Institute (AISI). The study, shared with the Guardian, identified nearly 700 real-world cases of AI scheming and charted a five-fold rise in misbehaviour between October and March, with some AI models destroying emails and other files without permission.

The snapshot of scheming by AI agents “in the wild”, as opposed to in laboratory conditions, has sparked fresh calls for international monitoring of the increasingly capable models and come as Silicon Valley companies aggressively promote the technology as a economically transformative. Last week the UK chancellor also launched a drive to get millions more Britons using AI.

The study, by the Centre for Long-Term Resilience (CLTR), gathered thousands of real-world examples of users posting interactions on X with AI chatbots and agents made by companies including Google, OpenAI, X and Anthropic. The research uncovered hundreds of examples of scheming.

Study link - longtermresilience.org/…/scheming-in-the-wild/

#world

threaded - newest

Deestan@lemmy.world on 27 Mar 15:00 next collapse

These findings have been given an AI Doomerism PR spin.

The phrases “safeguards”, “deceiving” and “scheming” are incorrect.

The “safeguards” here are prompt begging, which is not in any way an adult’s attempt at a safeguard: simonwillison.net/…/prompt-injection-explained/

The terms deceving and scheming indicate intent and agency that do not exist. I will count them as just plain lies.

The effect is that people imagine LLMs can get better by feeding their context windows with more rules, which not only makes it less likely the rule will be weighted significantly, but also causes the models to compress the now too-big context window lossily.

cecilkorik@lemmy.ca on 27 Mar 15:36 collapse

They’re basically describing the same problem as AI model collapse, except it’s being unintentionally created at the prompt level instead of the training level. The more stupid bullshit you feed the LLM, the stupider it gets. It doesn’t have any more capacity than it already has. It’s already pretty much as smart as it’s ever going to be, they already picked it at peak freshness and froze it into a model file. You naturally want to think you can do better, but you can’t. You’re not making it smarter, you’re making it dumber. It’s pretending to be smarter, because giving you what you ask it for is what it’s been trained to do. It might even convince you, because convincing humans is basically their superpower, that’s really what they’re trained for, and they do a pretty good job of it most of the time. But the harder you push it, the more the illusion breaks down.

CozyBunneh@lemmy.blahaj.zone on 27 Mar 15:54 collapse

I feel like I need to read up on AI model collapse now.

Corkyskog@sh.itjust.works on 27 Mar 17:44 collapse

Let me know if you come upon any good reading.

cecilkorik@lemmy.ca on 27 Mar 19:43 collapse

Not sure if you’re being sarcastic, but this blog post just a few days ago illustrates the issue pretty well I think.

CozyBunneh@lemmy.blahaj.zone on 27 Mar 20:13 collapse

Not sarcastic, I want to know. I’ve only read about it briefly before.

RizzRustbolt@lemmy.world on 27 Mar 20:18 collapse

Ashto-Afpro.