PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation. A
dc.contributor.advisor | Wijesinghe, Dayanjan | |
dc.contributor.author | Alshammari, Suad | |
dc.date.accessioned | 2025-05-05T06:40:24Z | |
dc.date.issued | 2025 | |
dc.description.abstract | This work presents a systematic evaluation of PyZoBot, an AI-powered platform for literature-based question answering, using the Retrieval-Augmented Generation Assessment Scores (RAGAS) framework. The study focuses on a subset of 49 cardiology-related questions extracted from the BioASQ benchmark dataset. PyZoBot's performance was assessed across 32 configurations, including standard Retrieval-Augmented Generation (RAG) and GraphRAG pipelines, implemented with both OpenAI-based models (GPT-3.5-Turbo, GPT-4o) and open-source models (LLaMA 3.1, Mistral). To establish a comparative benchmark, responses generated by PyZoBot were evaluated alongside answers manually written by six PhD students and recent graduates from the pharmacotherapy field, using a curated Zotero library containing BioASQ-referenced documents. The evaluation applied four key RAGAS metrics—faithfulness, answer relevancy, context recall, and context precision—along with a composite harmonic score to determine overall performance. The findings reveal that 22 PyZoBot configurations surpassed the highest-performing human participant, with the top pipeline (GPT-3.5-Turbo + layout-aware chunking, k=10) achieving a 129 harmonic RAGAS score of 0.6944. Statistical analysis using Kruskal-Wallis and Dunn’s post hoc tests confirmed significant differences across all metrics, especially in faithfulness and time efficiency. These results validate PyZoBot’s ability to support high-quality biomedical information synthesis and demonstrate the system’s potential to meet or exceed human performance in complex, evidence-based academic tasks. | |
dc.format.extent | 275 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14154/75322 | |
dc.language.iso | en | |
dc.publisher | Virginia Commonwealth University | |
dc.subject | Large language models | |
dc.subject | LLM | |
dc.subject | RAG | |
dc.subject | BioASQ | |
dc.subject | Retrieval-Augmented Generation | |
dc.title | PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation. A | |
dc.type | Thesis | |
sdl.degree.department | Department of Pharmacotherapy and Outcomes Sciences | |
sdl.degree.discipline | Pharmacy | |
sdl.degree.grantor | Virginia Commonwealth University | |
sdl.degree.name | Doctor of Philosophy |