CAMELBERT-CNN-BILSTM: HYBRID NEURAL ARCHITECTURE WITH IMPROVED PREPROCESSING TO NER OF ARABIC HADITH TEXTS

Authors

  • Wessam Lahmod Nadoos
  • Behrouz Minaei-Bidgoli

DOI:

https://doi.org/10.52152/m2wjzh19

Keywords:

Natural Language Processing, Named Entity Recognition, Hadith, Classical Arabic, CAMeLBERT, BiLSTM, CNN, BIO Tagging, Digital Humanities.

Abstract

The application of Natural Language Processing (NLP) to classic religious works is unique in that it may include the use of archaic language, complicated grammar constructions, and lack of suspended writings. The issue that this study is aimed at handling is the recognition of named entities (NERs) in the hadith, which is the compilation of sayings of the Prophet Muhammad (peace be upon him). We suggest a new hybrid neural network, a pretrained transformer CAMeLBERT, which is narrowly trained on Classical Arabic, that is embedded with a convolutional neural network and long-term and short-term memory (BiLSTM). The other important innovation is introduction of a hadith text specific preprocessing pipeline based on morphological segmentation and rule-based inference to determine the boundary between the narrative and text strings (narrative-text string) in order to maximize the BIO labeling system. We made our model test on a well selected catalogue of suspended hadiths. Tested experiments reveal that our model of proposal has a high accuracy of 98.07 percent, which is much greater than the basic models on the basis of the common preprocessing and (mono)frames. The article has highlighted that to realize high NER results on Hadith texts, domain preprocessing and a hybrid model are crucial.

Downloads

Published

2025-10-03

Issue

Section

Article

How to Cite

CAMELBERT-CNN-BILSTM: HYBRID NEURAL ARCHITECTURE WITH IMPROVED PREPROCESSING TO NER OF ARABIC HADITH TEXTS. (2025). Lex Localis - Journal of Local Self-Government, 23(S6), 8465-8479. https://doi.org/10.52152/m2wjzh19