AIs can trick each other into doing things they aren’t supposed to

We don’t fully understand how large language models work

Jamie Jin/Shutterstock

AI models can trick each other into disobeying their creators and providing banned instructions for making methamphetamine, building a bomb or laundering money, suggesting that the problem of preventing such AI “jailbreaks” is more difficult than it seems.

Many publicly available large language models (LLMs), such as ChatGPT, have hard-coded rules that aim to prevent them from exhibiting racist or sexist bias, or answering questions with illegal or problematic answers – things they have learned to do from humans via training…

Speak Your Mind

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Get in Touch

350FansLike
100FollowersFollow
281FollowersFollow
150FollowersFollow

Recommend for You

Oh hi there 👋
It’s nice to meet you.

Subscribe and receive our weekly newsletter packed with awesome articles that really matters to you!

We don’t spam! Read our privacy policy for more info.

You might also like

Fast-moving gas flowing away from young star caused by...

A unique stage of planetary system evolution has been imaged by astronomers, showing fast-moving...

India Has Maximum Mobile Gamers in Gujarat & Maharashtra,...

Seven cities in Gujarat and Maharashtra have the maximum number of mobile gamers, a...