Adversarial Attack Makes ChatGPT Produce Objectionable Content

Ask an AI machine like as ChatGPT, Bard or Claude to explain how to make a bomb or to tell you a racist joke and you’ll get short shrift. The companies behind these so-called Large Language Models are well aware of their potential to generate malicious or harmful content and so have created various safeguards to prevent it.

In the AI community, this process is known as “alignment” — it makes the AI system better aligned wth human values. And in general, it works well. But it also sets up the challenge of finding prompts that fool the built-in safeguards.

Now Andy Zou from Carnegie…


Read more on google

About bourbiza mohamed

Check Also

In closed forum, tech titans to give senators advice on artificial intelligence

WASHINGTON (AP) — Senate Majority Leader Chuck Schumer has been talking for months about accomplishing …

Leave a Reply

Your email address will not be published. Required fields are marked *