SECURING LARGE LANGUAGE MODELS AGAINST JAILBREAKING ATTACKS: A NOVEL FRAMEWORK FOR ROBUST AI SAFETY
DOI:
https://doi.org/10.52152/801808Ključne besede:
LLM, GPT, Jain Breaking Attack,Povzetek
Large Language Models (LLMs) like ChatGPT, GPT-4, and Claude have revolutionized natural language processing, but they are increasingly vulnerable to adversarial "jailbreaking" attacks that bypass safety protocols. This paper proposes a novel multi-layered security framework to defend LLMs against jailbreaking techniques using a combination of adversarial training, behavior watermarking, and real-time anomaly detection. We present an innovative system called SAFE-LLM (Security-Aware Fine-tuned & Encrypted Language Model) to achieve robust AI safety.
Prenosi
Objavljeno
Številka
Rubrika
Licenca
Avtorske pravice (c) 2025 Lex localis - Journal of Local Self-Government

To delo je licencirano pod Creative Commons Priznanje avtorstva-Nekomercialno-Brez predelav 4.0 mednarodno licenco.