Evaluating Text Summarization with Goal-Oriented Metrics: A Case Study using Large Language Models (LLMs) and Empowered GQM
dc.contributor.advisor | Bahsoon, Rami | |
dc.contributor.author | Altamimi, Rana | |
dc.date.accessioned | 2024-11-06T10:15:53Z | |
dc.date.issued | 2024-09 | |
dc.description.abstract | This study evaluates the performance of Large Language Models (LLMs) in dialogue summarization tasks, focusing on Gemma and Flan-T5. Employing a mixed-methods approach, we utilized the SAMSum dataset and developed an enhanced Goal-Question-Metric (GQM) framework for comprehensive assessment. Our evaluation combined traditional quantitative metrics (ROUGE, BLEU) with qualitative assessments performed by GPT-4, addressing multiple dimensions of summary quality. Results revealed that Flan-T5 consistently outperformed Gemma across both quantitative and qualitative metrics. Flan-T5 excelled in lexical overlap measures (ROUGE-1: 53.03, BLEU: 13.91) and demonstrated superior performance in qualitative assessments, particularly in conciseness (81.84/100) and coherence (77.89/100). Gemma, while showing competence, lagged behind Flan-T5 in most metrics. This study highlights the effectiveness of Flan-T5 in dialogue summarization tasks and underscores the importance of a multi-faceted evaluation approach in assessing LLM performance. Our findings suggest that future developments in this field should focus on enhancing lexical fidelity and higher-level qualities such as coherence and conciseness. This study contributes to the growing body of research on LLM evaluation and offers insights for improving dialogue summarization techniques. | |
dc.format.extent | 45 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14154/73494 | |
dc.language.iso | en | |
dc.publisher | University of Birmingham | |
dc.subject | Artificial Intelligent g | |
dc.subject | Large Language Models | |
dc.subject | Goal-Question-Metric | |
dc.subject | Natural language processing | |
dc.subject | Software Engineering | |
dc.title | Evaluating Text Summarization with Goal-Oriented Metrics: A Case Study using Large Language Models (LLMs) and Empowered GQM | |
dc.type | Thesis | |
sdl.degree.department | College of Engineering and Physical Sciences | |
sdl.degree.discipline | School of Computer Science | |
sdl.degree.grantor | University of Birmingham | |
sdl.degree.name | Master of Science in Computer Science |