Hardening Enterprise Voice Channels with Robust Deepfake-Audio Detection
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
his dissertation examines why audio deepfake detectors that perform well on clean benchmarks can fail in enterprise voice channels, and how targeted augmentation can harden them. A reproducible robustness framework is built around a compact CNN on log-Mel spectrograms, evaluating clean-only and clean-plus-degraded training regimes under telephony codecs, packet-loss-like impairments, and office noise. Across 1,144 test samples (286 per condition), augmentation reduces false positives for bona fide callers in degraded channels while maintaining parity on clean audio. Accuracy, Macro-F1, ROC-AUC, and EER are reported with 95% confidence intervals, and paired significance tests and confusion analyses are used to characterize error modes by condition. The findings are then translated into operational guidance: posture-based thresholds seeded at the EER per condition, grey-zone cascades to triage low-confidence cases, and reproducibility controls (model cards, parameter-stamped manifests) aligned with GDPR Article 32 and ENISA guidance. Contributions are: (i) a controlled robustness evaluation pipeline, (ii) an augmentation curriculum tailored to enterprise degradations, (iii) condition-wise error profiling that links false-accept/false-reject trade-offs to fraud and service risks, and (iv) a deployment playbook for PBX/VoIP environments. Together, these results convert detector accuracy into enterprise-ready resilience and provide a template for securing voice channels against synthetic speech.
Description
Keywords
deepfake audio, audio deepfake detection, synthetic speech, spoofing detection, voice authentication, enterprise voice security, telephony codecs, VoIP, PBX, packet loss, log-Mel spectrograms
