A team of researchers from Intel, Idaho State University, and the University of Illinois has reported a novel method for breaching security filters in large language models (LLMs), such as ChatGPT and Gemini. This was reported by 404 Media.
In their study, they found that chatbots could be coerced into providing restricted information if requests are made in a complex or ambiguous manner or by citing fictitious sources. This approach is referred to as "information overload".
The specialists employed a tool called InfoFlood, which automates the process of "overloading" models with information. As a result, the systems become disoriented and may provide restricted or dangerous content that is typically blocked by built-in security filters.
The vulnerability lies in the fact that models focus on the superficial structure of the text, failing to recognize dangerous content in hidden forms. This creates opportunities for malicious actors to bypass restrictions and acquire harmful information.
As part of responsible vulnerability disclosure, the authors of the study plan to share their findings with companies working with large LLMs to enhance their security systems. Researchers will also provide methods for addressing the issues they identified during the study.
"LLM models predominantly rely on protective mechanisms for input and output data to detect harmful content. InfoFlood can be utilized to improve these protective mechanisms — it allows extraction of relevant information from potentially harmful queries, making the models more resilient to such attacks," the study states.