Deep Learning for the Detection of Emotion in Human Speech: The Impact of Audio Sample Duration and English versus Italian Languages

Alexander Wurst, Michael Hopwood, Sifan Wu, Fei Li, Yu Dong Yao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Identification of emotion types is important in the diagnosis and treatment of certain mental illnesses. This study uses audio data and deep learning methods such as convolutional neural networks (CNN) and long short-term memory (LSTM) to classify the emotion of human speech. We use the IEMOCAP and DEMoS datasets, consisting of English and Italian audio speech data in our experiments to classify speech into one of up to four emotions: angry, happy, neutral, and sad. The classification performance results demonstrate the effectiveness of the deep learning methods and our experiments yield between 62 and 92 percent classification accuracies. We specifically investigate the impact of the audio sample duration on the classification accuracy. In addition, we examine and compare the classification accuracy for English versus Italian languages.

Original languageEnglish
Title of host publication32nd Wireless and Optical Communications Conference, WOCC 2023
ISBN (Electronic)9798350337150
DOIs
StatePublished - 2023
Event32nd Wireless and Optical Communications Conference, WOCC 2023 - Newark, United States
Duration: 5 May 20236 May 2023

Publication series

Name32nd Wireless and Optical Communications Conference, WOCC 2023

Conference

Conference32nd Wireless and Optical Communications Conference, WOCC 2023
Country/TerritoryUnited States
CityNewark
Period5/05/236/05/23

Keywords

  • convolutional neural network (CNN)
  • deep learning
  • emotion recognition
  • long short-term memory (LSTM)
  • spectrogram

Fingerprint

Dive into the research topics of 'Deep Learning for the Detection of Emotion in Human Speech: The Impact of Audio Sample Duration and English versus Italian Languages'. Together they form a unique fingerprint.

Cite this