Acoustic Detection and Classification of Urban Sound Events

Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Urban sounds could carry important information about the surrounding areas; hence the classification and detection of these sounds using deep learning approaches is a topic of inter- est. In this project, we explore the applicability of the transfer learning approach to classify urban soundscapes using a Convolutional Neural Network (CNN) model which is pre-trained on the speech corpus. Transfer learning is one of the most widely used techniques in image recognition; however, it finds limited application in deep learning applications in audio data. For this work a baseline CNN architecture was derived from architecture described in the paper “Environmental Sound Classification with Convolutional Neural Networks” by Pickzak (2015) and was subsequently optimized through a series of experiments on the speech com- mands dataset. The experiments were designed on the basis of heuristics and understanding of how different layers function in a CNN model and we were able to improve the accuracy by 43% (as compared to the baseline model). We were also able to show that the transfer learning approach using the same derived features in both the datasets has significantly less training time (5 minutes 18 seconds), with a test set accuracy of 74.12%. We believe that having different neural network architecture trained networks on a wide variety of sound data using different combinations of audio features (such as MFCC, LPCC, etc.) can reduce the annotated audio data requirements and computational resource requirements dramatically.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025