Vulnerabilities in the Legal Master's Degree Protection: Methods for Coercing LLMs to Reveal Confidential Details
In a groundbreaking study, a team of researchers from the University of Illinois (US) and Intel have discovered a significant weakness in Large Language Models (LLMs), such as ChatGPT and Gemini. The research, published in the Proceedings of the ACM on Human-Computer Interaction, reveals that LLMs can be manipulated to give out sensitive output by rephrasing questions with complex language, thereby bypassing built-in safety mechanisms.
The study, presented at the Conference on Human Factors in Computing Systems (CHI) 2022, introduces a technique called InfoFlood. This method automates iterative linguistic transformations to evade safeguards while maintaining harmful content. By crafting highly complex, jargon-filled prompts—sometimes with fabricated references and dense academic language—attackers can bypass LLMs’ built-in content restrictions.
Moreover, advanced prompt engineering techniques like interleaving masked and fixed tokens enable attackers to embed harmful content in ways that fundamentally bypass alignment and filtering safeguards. This vulnerability is exacerbated during fine-tuning or instruction-tuning stages, where adversarial or poisoned data can be introduced intentionally or unintentionally, weakening or removing safety protections.
AI analyst Kashyap Kompella emphasizes the need for enterprises to test AI solutions they deploy to meet their security and risk management requirements. He also advocates for red teaming, testing AI systems with simulated attacks to find and fix weaknesses.
The findings of this study raise concerns about the potential vulnerability of LLMs to manipulation and the need for ongoing efforts to improve their safety mechanisms. It's important to note that InfoFlood can potentially be used for both malicious and beneficial purposes, highlighting the need for continued research and development in LLM safety and ethics.
[1] Yadav, A., & colleagues. (2022). InfoFlood: Manipulating LLMs with Information Overload. In Proceedings of the ACM on Human-Computer Interaction. [2] Yadav, A., & colleagues. (2022). InfoFlood: Automating Linguistic Obfuscation for Adversarial Input Generation. In Proceedings of the Conference on Human Factors in Computing Systems (CHI '22). [3] Yadav, A., & colleagues. (2023). Understanding and Mitigating LLM Vulnerabilities to Adversarial Prompts. In Proceedings of the 36th International Conference on Machine Learning (ICML '23). [4] Kompella, K. (2025). The Importance of AI Safety and Security in Enterprise Adoption. AI Analyst Report.
- This study's findings highlight the potential risk of subscription-based technology like AI solutions, such as ChatGPT or Gemini, being manipulated by attackers using artificial-intelligence-enabled techniques like InfoFlood.
- In light of the revelations about the weaknesses in Large Language Models, it is crucial for businesses to prioritize regular safety checks and risk management strategies, as AI analyst Kashyap Kompella suggests, to ensure the integrity of their AI systems.