Leveraging Social Media Data for Detection and Monitoring of Depression

dc.contributor.advisorSpecia, Lucia
dc.contributor.advisorSpecia, Lucia
dc.contributor.authorAlhamed, Falwah Abdulaziz
dc.date.accessioned2025-11-08T21:42:53Z
dc.date.issued2025
dc.description.abstractMental health disorders are increasingly prevalent, with depression being the most common and a significant cause of disability and suicide worldwide. Understanding its symptoms, severity, and progression is vital for improving early detection and intervention. This thesis adopts a data-driven AI approach, constructing a large, expert-annotated dataset and developing models to monitor depression from social media language. We first design a data collection and curation framework to build a large-scale dataset of posts from individuals who self-report depression. In collaboration with psychiatrists and psychologists, we create an annotation scheme for labelling symptoms and severity over time. Experienced psychologists annotate the data, resulting in DepSy, the largest English dataset of 40,000 posts fully annotated for depression symptoms and severity progression. This dataset underpins all subsequent experiments. We then benchmark multiple NLP approaches to classify posts written before versus after a reported depression diagnosis. Analyses include linguistic patterns, emotion usage, and content variation. Among the models tested, BERT-based classifiers achieve the best overall performance, while large language models (LLMs) in zero-shot settings perform near-randomly. Next, we address symptom detection as a multi-label classification problem. A bespoke BERT-based model achieves strong overall results, while a fine-tuned Llama-based model, DepSy-LLaMA, obtains higher recall, identifying more positive symptom cases—a valuable property in mental health detection. However, LLM predictions remain less reliable for sensitive applications. Finally, we explore the prediction of depression severity over time using deep learning and propose a hybrid CTMC-LSTM model that integrates Markov chains with LSTM to capture temporal patterns. This model uniquely detects severe cases and achieves the highest performance across all baselines. The findings demonstrate the importance of temporal modelling and expert-annotated data for building robust, ethical, and clinically informed systems for depression monitoring from social media.
dc.format.extent290
dc.identifier.citationAlthobaiti, F. F. (2025). Development of Electric Vehicle Drive System Using Model Predictive Control. Master’s thesis, Queen Mary University of London.
dc.identifier.issn2558287
dc.identifier.urihttps://hdl.handle.net/20.500.14154/76906
dc.language.isoen
dc.publisherSaudi Digital Library
dc.subjectNLP
dc.subjectDepression Detection
dc.subjectBERT
dc.subjectLlama
dc.subjectGPT
dc.subjectDepression Symptoms
dc.subjectSocial Media
dc.subjectAnnotation Scheme
dc.subjectDataset
dc.titleLeveraging Social Media Data for Detection and Monitoring of Depression
dc.title.alternativeCreating a Concept and Modelling a Lean Management Simulation Game in Plant Simulation
dc.typeThesis
sdl.degree.departmentDepartment of Computing
sdl.degree.disciplineComputer Science - Artificial Intelligence - Natural Language Processing
sdl.degree.grantorImperial College London
sdl.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
23.95 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2026