Context-Aware Causal Knowledge Graph Construction from Biomedical Literature Using Large Language Models
| dc.contributor.advisor | Freitas, André | |
| dc.contributor.author | Algrgri, Maha | |
| dc.date.accessioned | 2025-12-23T08:50:58Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | The exponential growth of biomedical literature creates significant challenges for healthcare professionals and researchers seeking to identify and apply relevant research findings, particularly in rapidly evolving fields like cancer treatment. While Knowledge Graphs (KGs) offer a promising solution for structuring this information, existing biomedical KGs often lack critical contextual details, such as study metadata and population characteristics, necessary for reliable interpretation of extracted knowledge. This project develops and evaluates an automated five-stage pipeline using Large Language Models (LLMs) with prompt engineering to extract causal claims from biomedical literature while preserving their contextual information. The methodology encompasses evidence metadata extraction, causal claim extraction with contextual preservation, cause-effect relationship decomposition, entity mapping to standardized biomedical ontologies via PubTator3, and knowledge graph construction. The pipeline was applied to 498 systematically identified Breast Cancer (BC) treatment abstracts from 2023, with performance evaluated against manually annotated ground truth from 20 representative abstracts. Comparative evaluation revealed that GPT-5 achieved superior performance in causal claim extraction with an F1-score of 0.83, representing a 46% improvement over GPT-4o (F1=0.57). However, since GPT-5 operates exclusively with stochastic settings, the complete pipeline was implemented using GPT-4o with deterministic settings to ensure reproducibility. The pipeline demonstrated strong component performance, with cause-effect relationship extraction achieving 73.7% accuracy and entity mapping reaching 85.1% accuracy. The resulting knowledge graph (BC-CausalKG) contains 3,389 nodes and 5,869 relationships, with structural analysis confirming appropriate selectivity and strong integration. This research demonstrates that contextual information can be automatically preserved during biomedical knowledge extraction, addressing a fundamental limitation of existing approaches. However, evaluation was limited to 20 manually annotated abstracts, and the utility of the constructed knowledge graph for research applications requires further assessment. | |
| dc.format.extent | 62 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14154/77652 | |
| dc.language.iso | en | |
| dc.publisher | Saudi Digital Library. | |
| dc.subject | LLMs | |
| dc.subject | GPT-4o | |
| dc.subject | GPT-5 | |
| dc.subject | causal claim extraction | |
| dc.subject | knowledge graphs | |
| dc.subject | large language models | |
| dc.subject | information extraction | |
| dc.title | Context-Aware Causal Knowledge Graph Construction from Biomedical Literature Using Large Language Models | |
| dc.type | Thesis | |
| sdl.degree.department | Department of Computer Science | |
| sdl.degree.discipline | Artificial Intelligence | |
| sdl.degree.grantor | THE UNIVERSITY OF MANCHESTER | |
| sdl.degree.name | Master of Science |
