Abstract
Identification of emotion types is important in the diagnosis and treatment of certain mental illnesses. This study uses audio data and deep learning methods such as convolutional neural networks (CNN) and long short-term memory (LSTM) to classify the emotion of human speech. We use the IEMOCAP and DEMoS datasets, consisting of English and Italian audio speech data in our experiments to classify speech into one of up to four emotions: angry, happy, neutral, and sad. The classification performance results demonstrate the effectiveness of the deep learning methods and our experiments yield between 62 and 92 percent classification accuracies. We specifically investigate the impact of the audio sample duration on the classification accuracy. In addition, we examine and compare the classification accuracy for English versus Italian languages.
| Original language | English |
|---|---|
| Title of host publication | 32nd Wireless and Optical Communications Conference, WOCC 2023 |
| ISBN (Electronic) | 9798350337150 |
| DOIs | |
| State | Published - 2023 |
| Event | 32nd Wireless and Optical Communications Conference, WOCC 2023 - Newark, United States Duration: 5 May 2023 → 6 May 2023 |
Publication series
| Name | 32nd Wireless and Optical Communications Conference, WOCC 2023 |
|---|
Conference
| Conference | 32nd Wireless and Optical Communications Conference, WOCC 2023 |
|---|---|
| Country/Territory | United States |
| City | Newark |
| Period | 5/05/23 → 6/05/23 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- convolutional neural network (CNN)
- deep learning
- emotion recognition
- long short-term memory (LSTM)
- spectrogram
Fingerprint
Dive into the research topics of 'Deep Learning for the Detection of Emotion in Human Speech: The Impact of Audio Sample Duration and English versus Italian Languages'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver