Context-Aware Causal Knowledge Graph Construction from Biomedical Literature Using Large Language Models

Algrgri, Maha

Context-Aware Causal Knowledge Graph Construction from Biomedical Literature Using Large Language Models

dc.contributor.advisor	Freitas, André
dc.contributor.author	Algrgri, Maha
dc.date.accessioned	2025-12-23T08:50:58Z
dc.date.issued	2025
dc.description.abstract	The exponential growth of biomedical literature creates significant challenges for healthcare professionals and researchers seeking to identify and apply relevant research findings, particularly in rapidly evolving fields like cancer treatment. While Knowledge Graphs (KGs) offer a promising solution for structuring this information, existing biomedical KGs often lack critical contextual details, such as study metadata and population characteristics, necessary for reliable interpretation of extracted knowledge. This project develops and evaluates an automated five-stage pipeline using Large Language Models (LLMs) with prompt engineering to extract causal claims from biomedical literature while preserving their contextual information. The methodology encompasses evidence metadata extraction, causal claim extraction with contextual preservation, cause-effect relationship decomposition, entity mapping to standardized biomedical ontologies via PubTator3, and knowledge graph construction. The pipeline was applied to 498 systematically identified Breast Cancer (BC) treatment abstracts from 2023, with performance evaluated against manually annotated ground truth from 20 representative abstracts. Comparative evaluation revealed that GPT-5 achieved superior performance in causal claim extraction with an F1-score of 0.83, representing a 46% improvement over GPT-4o (F1=0.57). However, since GPT-5 operates exclusively with stochastic settings, the complete pipeline was implemented using GPT-4o with deterministic settings to ensure reproducibility. The pipeline demonstrated strong component performance, with cause-effect relationship extraction achieving 73.7% accuracy and entity mapping reaching 85.1% accuracy. The resulting knowledge graph (BC-CausalKG) contains 3,389 nodes and 5,869 relationships, with structural analysis confirming appropriate selectivity and strong integration. This research demonstrates that contextual information can be automatically preserved during biomedical knowledge extraction, addressing a fundamental limitation of existing approaches. However, evaluation was limited to 20 manually annotated abstracts, and the utility of the constructed knowledge graph for research applications requires further assessment.
dc.format.extent	62
dc.identifier.uri	https://hdl.handle.net/20.500.14154/77652
dc.language.iso	en
dc.publisher	Saudi Digital Library.
dc.subject	LLMs
dc.subject	GPT-4o
dc.subject	GPT-5
dc.subject	causal claim extraction
dc.subject	knowledge graphs
dc.subject	large language models
dc.subject	information extraction
dc.title	Context-Aware Causal Knowledge Graph Construction from Biomedical Literature Using Large Language Models
dc.type	Thesis
sdl.degree.department	Department of Computer Science
sdl.degree.discipline	Artificial Intelligence
sdl.degree.grantor	THE UNIVERSITY OF MANCHESTER
sdl.degree.name	Master of Science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SACM-Dissertation.pdf
Size:: 1.26 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

SACM - United Kingdom