Enhancing Biomedical Named Entity Recognition through Multi-Task  Learning and Syntactic Feature Integration with BioBERT

Alqulayti, Abdulaziz

Enhancing Biomedical Named Entity Recognition through Multi-Task Learning and Syntactic Feature Integration with BioBERT

dc.contributor.advisor	Taherkhani, Aboozar
dc.contributor.author	Alqulayti, Abdulaziz
dc.date.accessioned	2024-10-30T05:34:40Z
dc.date.issued	2024-08
dc.description.abstract	Biomedical Named Entity Recognition (BioNER) is a critical task in natural language processing (NLP) for pulling noteworthy knowledge from the frequently growing size of biomedical literature. The concentrate of this study is creating refined BioNER models, which identify entities like proteins, diseases, and genes with remarkable generalizability and accuracy. Important challenges in BioNER are handled in the study, such as morphological variations, the complex nature of biomedical terminology, the vagueness usually seen in context-dependent language and morphological variations. This study establishes a unique standard in BioNER methodology, it incorporates cutting-edge machine learning techniques like character-level embeddings through Bidirectional Long Short-Term Memory (BiLSTM) networks, pre-trained models like BioBERT, multi-task learning solution, and syntactic feature extraction. The NCBI Disease Corpus, a standard dataset for disease name recognition, was used to apply the methodology to it. Two main models were created The BioBERTForNER and BioBERTBiLSTMForNER. The BioBERTBiLSTM model contains an additional BiLSTM layer, which showed exceptional performance by catching long-term dependencies and complicated morphological patterns in biomedical text. An exceptional 0.938 F1-score has been reached with This model beating existing advanced systems and the baseline BioBERT model. Also, the study investigates the effect of syntactic features and character-level embeddings, demonstrating their vital part in improving recall and precision. The combination of a multi-task learning solution demonstrated quite adequate at moderating the model’s capacity to maintain generalize across different contexts and overfitting. The final models not solely formed further measures on the NCBI Disease Corpus they also presented a multi-faceted strategy and expandable to BioNER, which shows how architectural innovations and refined embedding methods can greatly enhance biomedical text mining. The study results underscore the key part of progressive embedding techniques and multi-task learning in NLP, displaying their flexibility across various biomedical domains. Additionally, this study displays the possibility for these improvements to be used in analysis and real-world clinical data extraction preparing the path for forthcoming studies. Additional mixed biomedical datasets could be used to extend These methodologies, which eventually improve the efficiency and precision of automated biomedical information retrieval in clinical settings.
dc.format.extent	54
dc.identifier.citation	Harvard style referencing
dc.identifier.uri	https://hdl.handle.net/20.500.14154/73382
dc.language.iso	en
dc.publisher	De Montfort University
dc.subject	Biomedical Bidirectional Encoder Representations from Transformers
dc.subject	Biomedical Named Entity Recognition
dc.subject	National Center for Biotechnology Information
dc.subject	Natural Language Processing
dc.subject	Multi-Task Learning
dc.subject	Syntactic Feature
dc.title	Enhancing Biomedical Named Entity Recognition through Multi-Task Learning and Syntactic Feature Integration with BioBERT
dc.type	Thesis
sdl.degree.department	School of Computer Science and Informatics
sdl.degree.discipline	Artificial Intelligence
sdl.degree.grantor	De Montfort University
sdl.degree.name	Degree of Master of Science in Artificial Intelligence

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SACM-Dissertation.pdf
Size:: 1.05 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

SACM - United Kingdom