Leveraging Social Media Data for Detection and Monitoring of Depression
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Mental health disorders are increasingly prevalent, with depression being the most common and a significant cause of disability and suicide worldwide. Understanding its symptoms, severity, and progression is vital for improving early detection and intervention. This thesis adopts a data-driven AI approach, constructing a large, expert-annotated dataset and developing models to monitor depression from social media language.
We first design a data collection and curation framework to build a large-scale dataset of posts from individuals who self-report depression. In collaboration with psychiatrists and psychologists, we create an annotation scheme for labelling symptoms and severity over time. Experienced psychologists annotate the data, resulting in DepSy, the largest English dataset of 40,000 posts fully annotated for depression symptoms and severity progression. This dataset underpins all subsequent experiments.
We then benchmark multiple NLP approaches to classify posts written before versus after a reported depression diagnosis. Analyses include linguistic patterns, emotion usage, and content variation. Among the models tested, BERT-based classifiers achieve the best overall performance, while large language models (LLMs) in zero-shot settings perform near-randomly.
Next, we address symptom detection as a multi-label classification problem. A bespoke BERT-based model achieves strong overall results, while a fine-tuned Llama-based model, DepSy-LLaMA, obtains higher recall, identifying more positive symptom cases—a valuable property in mental health detection. However, LLM predictions remain less reliable for sensitive applications.
Finally, we explore the prediction of depression severity over time using deep learning and propose a hybrid CTMC-LSTM model that integrates Markov chains with LSTM to capture temporal patterns. This model uniquely detects severe cases and achieves the highest performance across all baselines. The findings demonstrate the importance of temporal modelling and expert-annotated data for building robust, ethical, and clinically informed systems for depression monitoring from social media.
Description
Keywords
NLP, Depression Detection, BERT, Llama, GPT, Depression Symptoms, Social Media, Annotation Scheme, Dataset
Citation
Althobaiti, F. F. (2025). Development of Electric Vehicle Drive System Using Model Predictive Control. Master’s thesis, Queen Mary University of London.
