2-D Attention Based Convolutional Recurrent Neural Network for Speech Emotion Recognition

  • Akalya Devi C Department of Information Technology,PSG College of Technology, Coimbatore, India
  • Karthika Renuka D Department of Information Technology,PSG College of Technology, Coimbatore, India
  • Aarshana E Winy
  • P C Kruthikkha
  • Ramya P
  • Soundarya S PSG college of Technology
Keywords: Keywords: Attention, Convolutional Recurrent Neural Networks, Speech Emotion Recognition, Spectrogram.

Abstract

Recognizing speech emotions  is a formidable challenge due to the complexity of emotions. The function of Speech Emotion Recognition(SER) is significantly impacted by the effects of emotional signals retrieved from speech. The majority of emotional traits, on the other hand, are sensitive to emotionally neutral elements like the speaker, speaking manner, and gender. In this work, we postulate that computing deltas  for individual features maintain useful information which is mainly relevant to emotional traits while it minimizes the loss of emotionally irrelevant components, thus leading to fewer misclassifications. Additionally, Speech Emotion Recognition(SER) commonly experiences silent and emotionally unrelated frames. The proposed technique is quite good at picking up important feature representations for emotion relevant features. So here is a two  dimensional convolutional recurrent neural network that is attention-based to learn distinguishing characteristics and predict the emotions. The Mel-spectrogram is used for feature extraction. The suggested technique is conducted on IEMOCAP dataset and it has better performance, with 68% accuracy value.

Published
2022-10-01
How to Cite
[1]
A. D. C, K. R. D, A. E. Winy, P. C. Kruthikkha, R. P, and S. S, “2-D Attention Based Convolutional Recurrent Neural Network for Speech Emotion Recognition”, INJIISCOM, vol. 3, no. 2, pp. 163-172, Oct. 2022.