Lexicography in NLP: A Study on the Interaction Between Lexical Resources and Large Language Models

dc.contributor.advisorEspinosa-Anke, Luis
dc.contributor.authorAlmeman, Fatemah
dc.date.accessioned2025-12-07T06:58:35Z
dc.date.issued2025
dc.description.abstractThis thesis explores the interaction between lexical resources (LRs) and large language models (LLMs) in the context of natural language processing, focusing on the evaluation of WordNet (WN)—the de facto lexical database for English—along with the development of a new dataset and a novel reverse dictionary (RD) method. The investigation starts with an assessment of WN, particularly its examples, both intrinsically and extrinsically, compared to other resources using the Good Dictionary EXamples (GDEX) framework. This evaluation shows that WN’s examples are often limited in length and informativeness. In an extrinsic analysis, we examined WN’s performance in definition modeling and word similarity tasks, where informative contextual representations are essential. Results indicate that LLM-generated examples are more informative than those from WN. To overcome limitations in LRs (some uncovered by our analysis), we then introduce a new dataset called 3D-EX providing terms, definitions, and usage examples. It integrates entries from ten diverse English dictionaries and encyclopedias with varying linguistic styles. We conducted intrinsic experiments on source classification, predicting the origin of a <term, definition> instance, and RD, which retrieves a ranked list of terms from a definition. Results indicate that 3D-EX enhances performance in both tasks, highlighting its usefulness for NLP. This thesis further explores RD by introducing GEAR, a lightweight and unsupervised approach to RD tasks. GEAR operates through four stages: Generate, Embed, Average,and Rank. It was evaluated using the Hill dataset, a leading benchmark for RD tasks, and it consistently outperformed existing methods. In conclusion, this thesis investigates how LLMs and LRs can benefit each other. We identified limitations in some resources and found that LLMs are a suitable tool for addressing them. Additionally, LLMs can automatically improve language resources by unifying them with different anchors.
dc.format.extent178
dc.identifier.urihttps://hdl.handle.net/20.500.14154/77343
dc.language.isoen
dc.publisherSaudi Digital Library
dc.subjectNLP
dc.subjectLexical Resources
dc.subjectLrge Language Models
dc.subjectReverse Dictionary
dc.subjectWordNet
dc.titleLexicography in NLP: A Study on the Interaction Between Lexical Resources and Large Language Models
dc.typeThesis
sdl.degree.departmentSchool of Computer Science & Informatics
sdl.degree.disciplineNatural Language Processing
sdl.degree.grantorCardiff University
sdl.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
4.39 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2026