SECURING LARGE LANGUAGE MODELS AGAINST JAILBREAKING ATTACKS: A NOVEL FRAMEWORK FOR ROBUST AI SAFETY

Ms. Archana Budagavi; Mrs. Anushya V P; Mrs. Subhashree D C; Dr. Jagadish R M; Dr. Girish Kumar D; Ms. M M Harshitha

doi:10.52152/801808

SECURING LARGE LANGUAGE MODELS AGAINST JAILBREAKING ATTACKS: A NOVEL FRAMEWORK FOR ROBUST AI SAFETY

Authors

Ms. Archana Budagavi
Mrs. Anushya V P
Mrs. Subhashree D C
Dr. Jagadish R M
Dr. Girish Kumar D
Ms. M M Harshitha

DOI:

https://doi.org/10.52152/801808

Keywords:

LLM, GPT, Jain Breaking Attack,

Abstract

Large Language Models (LLMs) like ChatGPT, GPT-4, and Claude have revolutionized natural language processing, but they are increasingly vulnerable to adversarial "jailbreaking" attacks that bypass safety protocols. This paper proposes a novel multi-layered security framework to defend LLMs against jailbreaking techniques using a combination of adversarial training, behavior watermarking, and real-time anomaly detection. We present an innovative system called SAFE-LLM (Security-Aware Fine-tuned & Encrypted Language Model) to achieve robust AI safety.

Downloads

Published

2025-10-03

Issue

Vol. 23 No. S6 (2025)

Section

Article

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

How to Cite

SECURING LARGE LANGUAGE MODELS AGAINST JAILBREAKING ATTACKS: A NOVEL FRAMEWORK FOR ROBUST AI SAFETY. (2025). Lex Localis - Journal of Local Self-Government, 23(S6), 558-567. https://doi.org/10.52152/801808

Download Citation

SECURING LARGE LANGUAGE MODELS AGAINST JAILBREAKING ATTACKS: A NOVEL FRAMEWORK FOR ROBUST AI SAFETY

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

INDEXED BY

Latest publications

Information

Language

Subscription