Novel Frameworks for Systematic Assessment of Privacy Risks in Affective Speech AI Models

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

University of Glasgow

Abstract

Many AI applications now attempt to infer users’ affective states and mental health conditions from their speech data. Beyond the spoken content, speech signals contain information about speaker’s identity and demographic attributes, which can be inadvertently revealed by these systems, exposing users to privacy risks. Previous efforts have primarily focused on develop- ing deep models aimed at preserving user privacy. However, no systematic attempts have been made to assess and quantify the privacy risks in such systems. This thesis investigates the pri- vacy risks in deep models trained for affective speech-based systems, specifically examining the risk of inferring sensitive demographic information from the latent network representations learned by these models. Previous research has demonstrated that these representations can leak demographic information, even when trained for unrelated tasks. Users typically consent to the use of their speech data for affective state analysis but not for the inference of their demo- graphic attributes. Given the established relationship between affective aspects of speech and sensitive demographic information, it is crucial to quantitatively assess the privacy risks within these systems. This thesis presents two novel frameworks for systematically assessing the privacy risks as- sociated with attribute inference attacks in affective speech AI models, specifically in speech emotion recognition (SER) and depression detection systems. The first contribution proposes a framework to assess the privacy risks of gender inference attacks in SER. The findings reveal that SER models can inadvertently leak a speaker’s gender information with accuracies ranging from 51% to 95%. The second contribution validates the applicability of this framework by extend- ing the investigation to speech-based depression detection systems, a domain with heightened privacy concerns, using a clinical dataset. The third contribution introduces a second framework to assess the privacy risks of attribute inference attacks on multiple speaker attributes, includ- ing gender, age, and educational level, within both multimodal (audio-text) depression detection systems and their unimodal modalities. This framework was evaluated using clinical data from individuals diagnosed with depression by professional psychiatrists. The results demonstrate that an adversary can infer speaker attributes with accuracies ranging from 51% to 68% from speech inputs as short as 10 seconds. Through these contributions, this thesis advances our un- derstanding of privacy risks in affective speech AI models and informs the development of more privacy-preserving models.

Description

Keywords

AI models, Privacy risks, affective speech

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025