AIs can trick each other into doing things they aren’t supposed to

5Views

November 24, 2023

We don’t fully understand how large language models work

Jamie Jin/Shutterstock

AI models can trick each other into disobeying their creators and providing banned instructions for making methamphetamine, building a bomb or laundering money, suggesting that the problem of preventing such AI “jailbreaks” is more difficult than it seems.

Many publicly available large language models (LLMs), such as ChatGPT, have hard-coded rules that aim to prevent them from exhibiting racist or sexist bias, or answering questions with illegal or problematic answers – things they have learned to do from humans via training…

Strange way black holes lose energy could help solve cosmic puzzle

Jon Batiste Celebrates an Enduring, Transcendent Love on ‘It Never Went Away’

AIs can trick each other into doing things they aren’t supposed to

Speak Your Mind

LEAVE A REPLY Cancel reply

Get in Touch

Recommend for You

Oh hi there It’s nice to meet you.

Subscribe and receive our weekly newsletter packed with awesome articles that really matters to you!

You might also like

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

About Us

Follow Us

Support Us & Be a Patreon

Add QNewsHub to your Homescreen!

Oh hi there
It’s nice to meet you.