Researchers Publish Attack Algorithm for ChatGPT and Other LLMs

Researchers from Carnegie Mellon University (CMU) have published LLM Attacks, an algorithm for constructing adversarial attacks on a wide range of large language models (LLMs), including ChatGPT, Claude, and Bard. The attacks are generated automatically and are successful 84% of the time on GPT-3.5 and GPT-4, and 66% of the time on PaLM-2.

Unlike most “jailbreak” attacks which are manually constructed using trial and error, the CMU team devised a three-step process to automatically generate prompt suffixes that can bypass the LLM’s safety mechanisms and result in a harmful…

Read more on google

About bourbiza mohamed

Check Also

The threat of wildfires is rising. So is new artificial intelligence solutions to fight them

LONDON (AP) — Wildfires fueled by climate change have ravaged communities from Maui to the …

Leave a Reply

Your email address will not be published. Required fields are marked *