Deep Discourse Analysis for Early Prediction of Multi-Type Dementia

Thumbnail Image

Date

2023-06-12

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

Ageing populations are a worldwide phenomenon. Although it is not an inevitable consequence of biological ageing, dementia is strongly associated with increasing age, and is therefore anticipated to pose enormous future challenges to public health systems and aged care providers. While dementia affects its patients first and foremost, it also has negative associations with caregivers’ mental and physical health. Dementia is characterized by irreversible gradual impairment of nerve cells that control cognitive, behavioural, and language processes, causing speech and language deterioration, even in preclinical stages. Early prediction can significantly alleviate dementia symptoms and could even curtail the cognitive decline in some cases. However, the diagnostic procedure is currently challenging as it is usually initiated with clinical-based traditional screening tests. Typically, such tests are manually interpreted and therefore may entail further tests and physical examinations thus considered timely, expensive, and invasive. Therefore, many researchers have adopted speech and language analysis to facilitate and automate its initial prescreening. Although recent studies have proposed promising methods and models, there is still room for improvement, without which automated pre-screening remains impracticable. There is currently limited empirical literature on the modelling of the discourse ability of people with prodromal dementia stages and types, which is defined as spoken and written conversations and communications. Specifically, few researchers have investigated the nature of lexical and syntactic structures in spontaneous discourse generated by patients with dementia under different conditions for automated diagnostic modelling. In addition, most previous work has focused on modelling and improving the diagnosis of Alzheimer’s disease (AD), as the most common dementia pathology, and neglect other types of dementia. Further, current proposed models suffer from poor performance, a lack of generalizability, and low interpretability. Therefore, this research thesis explores lexical and syntactic presentations in written and spoken narratives of people with different dementia syndromes to develop high-performing diagnostic models using fusions of different lexical and syntactic (i.e., lexicosyntactic) features as well as language models. In this thesis, multiple novel diagnostic frameworks are proposed and developed based on the “wisdom of crowds” theory, in which different mathematical and statistical methods are investigated and properly integrated to establish ensemble approaches for an optimized overall performance and better inferences of the diagnostic models. Firstly, syntactic- and lexical-level components are explored and extracted from the only two disparate data sources available for this study: spoken and written narratives retrieved from the well-known DementiaBank dataset, and a blog-based corpus collected as a part of this research, respectively. Due to their dispersity, each data source was independently analysed and processed for exploratory data analysis and feature extraction. One of the most common problems in this context is how to ensure a proper feature space is generated for machine learning modelling. We solve this problem by proposing multiple innovative ensemble-based feature selection pipelines to reveal optimal lexicosyntactics. Secondly, we explore language vocabulary spaces (i.e., n-grams) given their proven ability to enhance the modelling performance, with an overall aim of establishing two-level feature fusions that combine optimal lexicosyntactics and vocabulary spaces. These fusions are then used with single and ensemble learning algorithms for individual diagnostic modelling of the dementia syndromes in question, including AD, Mild Cognitive Impairment (MCI), Possible AD (PoAD), Frontotemporal Dementia (FTD), Lewy Body Dementia (LBD), and Mixed Dementia (PwD). A comprehensive empirical study and series of experiments were conducted for each of the proposed approaches using these two real-world datasets to verify our frameworks. Evaluation was carried out using multiple classification metrics, returning results that not only show the effectiveness of the proposed frameworks but also outperform current “state-of-the-art” baselines. In summary, this research provides a substantial contribution to the underlying task of effective dementia classification needed for the development of automated initial pre-screenings of multiple dementia syndromes through language analysis. The lexicosyntactics presented and discussed across dementia syndromes may highly contribute to our understanding of language processing in these pathologies. Given the current scarcity of related datasets, it is also hoped that the collected writing-based blog corpus will facilitate future analytical and diagnostic studies. Furthermore, since this study deals with associated problems that have been commonly faced in this research area and that are frequently discussed in the academic literature, its outcomes could potentially assist in the development of better classification models, not only for dementia but also for other linguistic pathologies.

Description

Dementia is a major debilitating progressive and irreversible disorder that has no cure at present. The importance of automating dementia screening towards facilitating its early prediction has long been emphasized, hampered in part by lack of empirical support. Motivated by the evident language deficiency in the onset of dementia, this thesis proposes and presents a collection of methodologies that lead to robust diagnostic models for multi-type dementia through discourse analysis. Given their high performance, these diagnostic models provide a substantial contribution to the underlying task of effective dementia classification needed for the development of automated pre-screenings tools for dementia syndromes.

Keywords

Machine Learning, Feature Selection, Information Fusion, Ensemble Learning, Classification, Dementia, Alzheimer’s Disease, Language Deficiency, Language Model, Neurolinguistics

Citation

Alkenani, A. H. A. (2023). Deep discourse analysis for early prediction of multi-type dementia (Doctoral dissertation, Queensland University of Technology).

Collections

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2024