Enhancing Biomedical Named Entity Recognition through Multi-Task Learning and Syntactic Feature Integration with BioBERT
No Thumbnail Available
Date
2024-08
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
De Montfort University
Abstract
Biomedical Named Entity Recognition (BioNER) is a critical task in natural language processing
(NLP) for pulling noteworthy knowledge from the frequently growing size of biomedical
literature. The concentrate of this study is creating refined BioNER models, which identify
entities like proteins, diseases, and genes with remarkable generalizability and accuracy.
Important challenges in BioNER are handled in the study, such as morphological variations, the
complex nature of biomedical terminology, the vagueness usually seen in context-dependent
language and morphological variations. This study establishes a unique standard in BioNER
methodology, it incorporates cutting-edge machine learning techniques like character-level
embeddings through Bidirectional Long Short-Term Memory (BiLSTM) networks, pre-trained
models like BioBERT, multi-task learning solution, and syntactic feature extraction.
The NCBI Disease Corpus, a standard dataset for disease name recognition, was used to apply
the methodology to it. Two main models were created The BioBERTForNER and
BioBERTBiLSTMForNER. The BioBERTBiLSTM model contains an additional BiLSTM layer,
which showed exceptional performance by catching long-term dependencies and complicated
morphological patterns in biomedical text. An exceptional 0.938 F1-score has been reached with
This model beating existing advanced systems and the baseline BioBERT model. Also, the study
investigates the effect of syntactic features and character-level embeddings, demonstrating their
vital part in improving recall and precision. The combination of a multi-task learning solution
demonstrated quite adequate at moderating the model’s capacity to maintain generalize across
different contexts and overfitting.
The final models not solely formed further measures on the NCBI Disease Corpus they also
presented a multi-faceted strategy and expandable to BioNER, which shows how architectural
innovations and refined embedding methods can greatly enhance biomedical text mining. The
study results underscore the key part of progressive embedding techniques and multi-task
learning in NLP, displaying their flexibility across various biomedical domains. Additionally, this
study displays the possibility for these improvements to be used in analysis and real-world
clinical data extraction preparing the path for forthcoming studies. Additional mixed biomedical
datasets could be used to extend These methodologies, which eventually improve the efficiency
and precision of automated biomedical information retrieval in clinical settings.
Description
Keywords
Biomedical Named Entity Recognition, Biomedical Bidirectional Encoder Representations from Transformers, Natural Language Processing, Multi-Task Learning
Citation
Harvard style referencing