Speech Emotion Recognition

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Speech emotion recognition is the task of identifying the emotion that is present in a recording of speech by analysing this speech. The main principle behind this task is to identify differences when saying the same sentences with different emotions. This study conducted experiments on classifying emotions from speech, and aimed to implement three different systems that classify emotion from speech, using the Berlin Database of Emotional Speech. The process of recognizing emotion from speech consists of two main stages: 1) extracting the features from speech, and 2) training a classifier to identify the emotional state present in the speech. The features are first extracted from speech using the Mel-frequency cepstral coefficient (MFCC) technique. Afterwards, three different classifiers—the gaussian mixture model-universal background model (GMM-UBM), GMM-support vector machine (SVM), and i-vectors—are trained to classify emotions. Results show that the GMM-UBM and GMM-SVM exhibit the same performance (74.07%). The i-vectors system shows the best performance in classifying emotions (79.629%). Removing the curse of dimensionality by mapping the high dimensional supervectors to low dimensional i-vectors and using a discriminative approach help attain better performance.