Manchego, Fernando AlvaAlsemairi, Alhanouf2024-01-232024-01-232023-09-11https://hdl.handle.net/20.500.14154/71278Stance detection is a natural language processing (NLP) task that analyses people’s stances (e.g. in favour, against or neutral) towards a specific topic. It is usually tackled using supervised classification approaches. However, collecting datasets with suitable human annotations is a resource-expensive process. The impressive capability of large language models (LLMs) in generating human-like text has revolutionized various NLP tasks. Therefore, in this dissertation, we investigate the capabilities of LLMs, specifically ChatGPT and Falcon, as a potential solution to create synthetic data that may address the data scarcity problem in stance detection tasks, and observe its impact on the performance of stance detection models. The study was conducted across various topics (e.g. Feminism, Covid-19) and two languages (English and Arabic). Different prompting approaches were employed to guide these LLMs in generating artificial data that is similar to real-world data. The results demonstrate a range of capabilities and limitations of LLMs for this use case. ChatGPT’s ethical guidelines affect its performance in simulating real-world tweets. Conversely, the open-source Falcon model’s performance in resembling the original data was better than ChatGPT’s; however, it could not create good Arabic tweets compared to ChatGPT. The study concludes that the current abilities of ChatGPT and Falcon are insufficient to generate diverse synthetic tweets. Thus, additional improvements are required to bridge the gap between synthesized and real-world data to enhance the performance of stance detection models.68enChatGPTFalconSynthetic dataStance detectionLLMCreating Synthetic Data for Stance Detection Tasks using Large Language ModelsThesis