PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation. A

dc.contributor.advisorWijesinghe, Dayanjan
dc.contributor.authorAlshammari, Suad
dc.date.accessioned2025-05-05T06:40:24Z
dc.date.issued2025
dc.description.abstractThis work presents a systematic evaluation of PyZoBot, an AI-powered platform for literature-based question answering, using the Retrieval-Augmented Generation Assessment Scores (RAGAS) framework. The study focuses on a subset of 49 cardiology-related questions extracted from the BioASQ benchmark dataset. PyZoBot's performance was assessed across 32 configurations, including standard Retrieval-Augmented Generation (RAG) and GraphRAG pipelines, implemented with both OpenAI-based models (GPT-3.5-Turbo, GPT-4o) and open-source models (LLaMA 3.1, Mistral). To establish a comparative benchmark, responses generated by PyZoBot were evaluated alongside answers manually written by six PhD students and recent graduates from the pharmacotherapy field, using a curated Zotero library containing BioASQ-referenced documents. The evaluation applied four key RAGAS metrics—faithfulness, answer relevancy, context recall, and context precision—along with a composite harmonic score to determine overall performance. The findings reveal that 22 PyZoBot configurations surpassed the highest-performing human participant, with the top pipeline (GPT-3.5-Turbo + layout-aware chunking, k=10) achieving a 129 harmonic RAGAS score of 0.6944. Statistical analysis using Kruskal-Wallis and Dunn’s post hoc tests confirmed significant differences across all metrics, especially in faithfulness and time efficiency. These results validate PyZoBot’s ability to support high-quality biomedical information synthesis and demonstrate the system’s potential to meet or exceed human performance in complex, evidence-based academic tasks.
dc.format.extent275
dc.identifier.urihttps://hdl.handle.net/20.500.14154/75322
dc.language.isoen
dc.publisherVirginia Commonwealth University
dc.subjectLarge language models
dc.subjectLLM
dc.subjectRAG
dc.subjectBioASQ
dc.subjectRetrieval-Augmented Generation
dc.titlePyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation. A
dc.typeThesis
sdl.degree.departmentDepartment of Pharmacotherapy and Outcomes Sciences
sdl.degree.disciplinePharmacy
sdl.degree.grantorVirginia Commonwealth University
sdl.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation .pdf
Size:
6.74 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2025