Context-Aware Causal Knowledge Graph Construction from Biomedical Literature Using Large Language Models

dc.contributor.advisorFreitas, André
dc.contributor.authorAlgrgri, Maha
dc.date.accessioned2025-12-23T08:50:58Z
dc.date.issued2025
dc.description.abstractThe exponential growth of biomedical literature creates significant challenges for healthcare professionals and researchers seeking to identify and apply relevant research findings, particularly in rapidly evolving fields like cancer treatment. While Knowledge Graphs (KGs) offer a promising solution for structuring this information, existing biomedical KGs often lack critical contextual details, such as study metadata and population characteristics, necessary for reliable interpretation of extracted knowledge. This project develops and evaluates an automated five-stage pipeline using Large Language Models (LLMs) with prompt engineering to extract causal claims from biomedical literature while preserving their contextual information. The methodology encompasses evidence metadata extraction, causal claim extraction with contextual preservation, cause-effect relationship decomposition, entity mapping to standardized biomedical ontologies via PubTator3, and knowledge graph construction. The pipeline was applied to 498 systematically identified Breast Cancer (BC) treatment abstracts from 2023, with performance evaluated against manually annotated ground truth from 20 representative abstracts. Comparative evaluation revealed that GPT-5 achieved superior performance in causal claim extraction with an F1-score of 0.83, representing a 46% improvement over GPT-4o (F1=0.57). However, since GPT-5 operates exclusively with stochastic settings, the complete pipeline was implemented using GPT-4o with deterministic settings to ensure reproducibility. The pipeline demonstrated strong component performance, with cause-effect relationship extraction achieving 73.7% accuracy and entity mapping reaching 85.1% accuracy. The resulting knowledge graph (BC-CausalKG) contains 3,389 nodes and 5,869 relationships, with structural analysis confirming appropriate selectivity and strong integration. This research demonstrates that contextual information can be automatically preserved during biomedical knowledge extraction, addressing a fundamental limitation of existing approaches. However, evaluation was limited to 20 manually annotated abstracts, and the utility of the constructed knowledge graph for research applications requires further assessment.
dc.format.extent62
dc.identifier.urihttps://hdl.handle.net/20.500.14154/77652
dc.language.isoen
dc.publisherSaudi Digital Library.
dc.subjectLLMs
dc.subjectGPT-4o
dc.subjectGPT-5
dc.subjectcausal claim extraction
dc.subjectknowledge graphs
dc.subjectlarge language models
dc.subjectinformation extraction
dc.titleContext-Aware Causal Knowledge Graph Construction from Biomedical Literature Using Large Language Models
dc.typeThesis
sdl.degree.departmentDepartment of Computer Science
sdl.degree.disciplineArtificial Intelligence
sdl.degree.grantorTHE UNIVERSITY OF MANCHESTER
sdl.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
1.26 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2026