Lexicography in NLP: A Study on the Interaction Between Lexical Resources and Large Language Models

Almeman, Fatemah

Lexicography in NLP: A Study on the Interaction Between Lexical Resources and Large Language Models

dc.contributor.advisor	Espinosa-Anke, Luis
dc.contributor.author	Almeman, Fatemah
dc.date.accessioned	2025-12-07T06:58:35Z
dc.date.issued	2025
dc.description.abstract	This thesis explores the interaction between lexical resources (LRs) and large language models (LLMs) in the context of natural language processing, focusing on the evaluation of WordNet (WN)—the de facto lexical database for English—along with the development of a new dataset and a novel reverse dictionary (RD) method. The investigation starts with an assessment of WN, particularly its examples, both intrinsically and extrinsically, compared to other resources using the Good Dictionary EXamples (GDEX) framework. This evaluation shows that WN’s examples are often limited in length and informativeness. In an extrinsic analysis, we examined WN’s performance in definition modeling and word similarity tasks, where informative contextual representations are essential. Results indicate that LLM-generated examples are more informative than those from WN. To overcome limitations in LRs (some uncovered by our analysis), we then introduce a new dataset called 3D-EX providing terms, definitions, and usage examples. It integrates entries from ten diverse English dictionaries and encyclopedias with varying linguistic styles. We conducted intrinsic experiments on source classification, predicting the origin of a <term, definition> instance, and RD, which retrieves a ranked list of terms from a definition. Results indicate that 3D-EX enhances performance in both tasks, highlighting its usefulness for NLP. This thesis further explores RD by introducing GEAR, a lightweight and unsupervised approach to RD tasks. GEAR operates through four stages: Generate, Embed, Average,and Rank. It was evaluated using the Hill dataset, a leading benchmark for RD tasks, and it consistently outperformed existing methods. In conclusion, this thesis investigates how LLMs and LRs can benefit each other. We identified limitations in some resources and found that LLMs are a suitable tool for addressing them. Additionally, LLMs can automatically improve language resources by unifying them with different anchors.
dc.format.extent	178
dc.identifier.uri	https://hdl.handle.net/20.500.14154/77343
dc.language.iso	en
dc.publisher	Saudi Digital Library
dc.subject	NLP
dc.subject	Lexical Resources
dc.subject	Lrge Language Models
dc.subject	Reverse Dictionary
dc.subject	WordNet
dc.title	Lexicography in NLP: A Study on the Interaction Between Lexical Resources and Large Language Models
dc.type	Thesis
sdl.degree.department	School of Computer Science & Informatics
sdl.degree.discipline	Natural Language Processing
sdl.degree.grantor	Cardiff University
sdl.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SACM-Dissertation.pdf
Size:: 4.39 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

SACM - United Kingdom