SECURING LARGE LANGUAGE MODELS AGAINST JAILBREAKING ATTACKS: A NOVEL FRAMEWORK FOR ROBUST AI SAFETY
DOI:
https://doi.org/10.52152/801808Keywords:
LLM, GPT, Jain Breaking Attack,Abstract
Large Language Models (LLMs) like ChatGPT, GPT-4, and Claude have revolutionized natural language processing, but they are increasingly vulnerable to adversarial "jailbreaking" attacks that bypass safety protocols. This paper proposes a novel multi-layered security framework to defend LLMs against jailbreaking techniques using a combination of adversarial training, behavior watermarking, and real-time anomaly detection. We present an innovative system called SAFE-LLM (Security-Aware Fine-tuned & Encrypted Language Model) to achieve robust AI safety.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Lex localis - Journal of Local Self-Government

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.