SACM - United Kingdom

Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9667

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    ItemRestricted
    Towards Numerical Reasoning in Machine Reading Comprehension
    (Imperial College London, 2024-02-01) Al-Negheimish, Hadeel; Russo, Alessandra; Madhyastha, Pranava
    Answering questions about a specific context often requires integrating multiple pieces of information and reasoning about them to arrive at the intended answer. Reasoning in natural language for machine reading comprehension (MRC) remains a significant challenge. In this thesis, we focus on numerical reasoning tasks. As opposed to current black-box approaches that provide little evidence of their reasoning process, we propose a novel approach that facilitates interpretable and verifiable reasoning by using Reasoning Templates for question decomposition. Our evaluations hinted at the existence of problematic behaviour in numerical reasoning models, underscoring the need for a better understanding of their capabilities. We conduct, as a second contribution of this thesis, a controlled study to assess how well current models understand questions and to what extent such models are basing their answers on textual evidence. Our findings indicate that applying transformations that obscure or destroy the syntactic and semantic properties of the questions does not change the output of the top-performing models. This behaviour reveals serious holes in how the models work. It calls into question evaluation paradigms that only use standard quantitative measures such as accuracy and F1 scores, as they lead to a false illusion of progress. To improve the reliability of numerical reasoning models in MRC, we propose and demonstrate, as our third contribution, the effectiveness of a solution to one of these fundamental problems: catastrophic insensitivity to word order. We do this by FORCED INVALIDATION: training the model to flag samples that cannot be reliably answered. We show it is highly effective at preserving word order importance in machine reading comprehension tasks and generalises well to other natural language understanding tasks. While our Reasoning Templates are competitive with the state-of-the-art on a single type, engineering them incurs a considerable overhead. Leveraging our better insights on natural language understanding and concurrent advancements in few-shot learning, we conduct a first investigation to overcome scalability limitations. Our fourth contribution combines large language models for question decomposition with symbolic rule learning for answer recomposition, we surpass our previous results on Subtraction questions and generalise to more reasoning types.
    14 0
  • Thumbnail Image
    ItemRestricted
    Creating Synthetic Data for Stance Detection Tasks using Large Language Models
    (Cardiff University, 2023-09-11) Alsemairi, Alhanouf; Manchego, Fernando Alva
    Stance detection is a natural language processing (NLP) task that analyses people’s stances (e.g. in favour, against or neutral) towards a specific topic. It is usually tackled using supervised classification approaches. However, collecting datasets with suitable human annotations is a resource-expensive process. The impressive capability of large language models (LLMs) in generating human-like text has revolutionized various NLP tasks. Therefore, in this dissertation, we investigate the capabilities of LLMs, specifically ChatGPT and Falcon, as a potential solution to create synthetic data that may address the data scarcity problem in stance detection tasks, and observe its impact on the performance of stance detection models. The study was conducted across various topics (e.g. Feminism, Covid-19) and two languages (English and Arabic). Different prompting approaches were employed to guide these LLMs in generating artificial data that is similar to real-world data. The results demonstrate a range of capabilities and limitations of LLMs for this use case. ChatGPT’s ethical guidelines affect its performance in simulating real-world tweets. Conversely, the open-source Falcon model’s performance in resembling the original data was better than ChatGPT’s; however, it could not create good Arabic tweets compared to ChatGPT. The study concludes that the current abilities of ChatGPT and Falcon are insufficient to generate diverse synthetic tweets. Thus, additional improvements are required to bridge the gap between synthesized and real-world data to enhance the performance of stance detection models.
    28 0

Copyright owned by the Saudi Digital Library (SDL) © 2024