SECURING LARGE LANGUAGE MODELS AGAINST JAILBREAKING ATTACKS: A NOVEL FRAMEWORK FOR ROBUST AI SAFETY

Avtorji

  • Ms. Archana Budagavi
  • Mrs. Anushya V P
  • Mrs. Subhashree D C
  • Dr. Jagadish R M
  • Dr. Girish Kumar D
  • Ms. M M Harshitha

DOI:

https://doi.org/10.52152/801808

Ključne besede:

LLM, GPT, Jain Breaking Attack,

Povzetek

Large Language Models (LLMs) like ChatGPT, GPT-4, and Claude have revolutionized natural language processing, but they are increasingly vulnerable to adversarial "jailbreaking" attacks that bypass safety protocols. This paper proposes a novel multi-layered security framework to defend LLMs against jailbreaking techniques using a combination of adversarial training, behavior watermarking, and real-time anomaly detection. We present an innovative system called SAFE-LLM (Security-Aware Fine-tuned & Encrypted Language Model) to achieve robust AI safety.

Objavljeno

2025-10-03

Številka

Rubrika

Article

Kako citirati

SECURING LARGE LANGUAGE MODELS AGAINST JAILBREAKING ATTACKS: A NOVEL FRAMEWORK FOR ROBUST AI SAFETY. (2025). Lex Localis - Journal of Local Self-Government, 23(S6), 558-567. https://doi.org/10.52152/801808