Enhancing Biomedical Named Entity Recognition through Multi-Task Learning and Syntactic Feature Integration with BioBERT

dc.contributor.advisorTaherkhani, Aboozar
dc.contributor.authorAlqulayti, Abdulaziz
dc.date.accessioned2024-10-30T05:34:40Z
dc.date.issued2024-08
dc.description.abstractBiomedical Named Entity Recognition (BioNER) is a critical task in natural language processing (NLP) for pulling noteworthy knowledge from the frequently growing size of biomedical literature. The concentrate of this study is creating refined BioNER models, which identify entities like proteins, diseases, and genes with remarkable generalizability and accuracy. Important challenges in BioNER are handled in the study, such as morphological variations, the complex nature of biomedical terminology, the vagueness usually seen in context-dependent language and morphological variations. This study establishes a unique standard in BioNER methodology, it incorporates cutting-edge machine learning techniques like character-level embeddings through Bidirectional Long Short-Term Memory (BiLSTM) networks, pre-trained models like BioBERT, multi-task learning solution, and syntactic feature extraction. The NCBI Disease Corpus, a standard dataset for disease name recognition, was used to apply the methodology to it. Two main models were created The BioBERTForNER and BioBERTBiLSTMForNER. The BioBERTBiLSTM model contains an additional BiLSTM layer, which showed exceptional performance by catching long-term dependencies and complicated morphological patterns in biomedical text. An exceptional 0.938 F1-score has been reached with This model beating existing advanced systems and the baseline BioBERT model. Also, the study investigates the effect of syntactic features and character-level embeddings, demonstrating their vital part in improving recall and precision. The combination of a multi-task learning solution demonstrated quite adequate at moderating the model’s capacity to maintain generalize across different contexts and overfitting. The final models not solely formed further measures on the NCBI Disease Corpus they also presented a multi-faceted strategy and expandable to BioNER, which shows how architectural innovations and refined embedding methods can greatly enhance biomedical text mining. The study results underscore the key part of progressive embedding techniques and multi-task learning in NLP, displaying their flexibility across various biomedical domains. Additionally, this study displays the possibility for these improvements to be used in analysis and real-world clinical data extraction preparing the path for forthcoming studies. Additional mixed biomedical datasets could be used to extend These methodologies, which eventually improve the efficiency and precision of automated biomedical information retrieval in clinical settings.
dc.format.extent54
dc.identifier.citationHarvard style referencing
dc.identifier.urihttps://hdl.handle.net/20.500.14154/73382
dc.language.isoen
dc.publisherDe Montfort University
dc.subjectBiomedical Bidirectional Encoder Representations from Transformers
dc.subjectBiomedical Named Entity Recognition
dc.subjectNational Center for Biotechnology Information
dc.subjectNatural Language Processing
dc.subjectMulti-Task Learning
dc.subjectSyntactic Feature
dc.titleEnhancing Biomedical Named Entity Recognition through Multi-Task Learning and Syntactic Feature Integration with BioBERT
dc.typeThesis
sdl.degree.departmentSchool of Computer Science and Informatics
sdl.degree.disciplineArtificial Intelligence
sdl.degree.grantorDe Montfort University
sdl.degree.nameDegree of Master of Science in Artificial Intelligence

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
1.05 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2025