Leveraging Social Media Data for Detection and Monitoring of Depression

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

Mental health disorders are increasingly prevalent, with depression being the most common and a significant cause of disability and suicide worldwide. Understanding its symptoms, severity, and progression is vital for improving early detection and intervention. This thesis adopts a data-driven AI approach, constructing a large, expert-annotated dataset and developing models to monitor depression from social media language. We first design a data collection and curation framework to build a large-scale dataset of posts from individuals who self-report depression. In collaboration with psychiatrists and psychologists, we create an annotation scheme for labelling symptoms and severity over time. Experienced psychologists annotate the data, resulting in DepSy, the largest English dataset of 40,000 posts fully annotated for depression symptoms and severity progression. This dataset underpins all subsequent experiments. We then benchmark multiple NLP approaches to classify posts written before versus after a reported depression diagnosis. Analyses include linguistic patterns, emotion usage, and content variation. Among the models tested, BERT-based classifiers achieve the best overall performance, while large language models (LLMs) in zero-shot settings perform near-randomly. Next, we address symptom detection as a multi-label classification problem. A bespoke BERT-based model achieves strong overall results, while a fine-tuned Llama-based model, DepSy-LLaMA, obtains higher recall, identifying more positive symptom cases—a valuable property in mental health detection. However, LLM predictions remain less reliable for sensitive applications. Finally, we explore the prediction of depression severity over time using deep learning and propose a hybrid CTMC-LSTM model that integrates Markov chains with LSTM to capture temporal patterns. This model uniquely detects severe cases and achieves the highest performance across all baselines. The findings demonstrate the importance of temporal modelling and expert-annotated data for building robust, ethical, and clinically informed systems for depression monitoring from social media.

Description

Keywords

NLP, Depression Detection, BERT, Llama, GPT, Depression Symptoms, Social Media, Annotation Scheme, Dataset

Citation

Althobaiti, F. F. (2025). Development of Electric Vehicle Drive System Using Model Predictive Control. Master’s thesis, Queen Mary University of London.

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025