Speech Classification to Recognize Emotion Using Artificial Neural Network

Siti Helmiyah(1*), Imam Riadi(2), Rusydi Umar(3), Abdullah Hanif(4),

(1) Universitas Ahmad Dahlan, Yogyakarta
(2) Universitas Ahmad Dahlan Yogyakarta
(3) Universitas Ahmad Dahlan Yogyakarta
(4) Universitas Ahmad Dahlan Yogyakarta
(*) Corresponding Author
DOI: https://doi.org/10.23917/khif.v7i1.11913

Abstract

This study seeks to identify human emotions using artificial neural networks. Emotions are difficult to understand and hard to measure quantitatively. Emotions may be reflected in facial expressions and voice tone. Voice contains unique physical properties for every speaker. Everyone has different timbres, pitch, tempo, and rhythm. The geographical living area may affect how someone pronounces words and reveals certain emotions. The identification of human emotions is useful in the field of human-computer interaction. It helps develop the interface of software that is applicable in community service centers, banks, education, and others. This research proceeds in three stages, namely data collection, feature extraction, and classification. We obtain data in the form of audio files from the Berlin Emo-DB database. The files contain human voices that express five sets of emotions: angry, bored, happy, neutral, and sad. Feature extraction applies to all audio files using the method of Mel Frequency Cepstrum Coefficient (MFCC). The classification uses Multi-Layer Perceptron (MLP), which is one of the artificial neural network methods. The MLP classification proceeds in two stages, namely the training and the testing phase. MLP classification results in good emotion recognition. Classification using 100 hidden layer nodes gives an average accuracy of 72.80%, an average precision of 68.64%, an average recall of 69.40%, and an average F1-score of 67.44%.This study seeks to identify human emotions using artificial neural networks. Emotions are difficult to understand and hard to measure quantitatively. Emotions may be reflected in facial expressions and voice tone. Voice contains unique physical properties for every speaker. Everyone has different timbres, pitch, tempo, and rhythm. The geographical living area may affect how someone pronounces words and reveals certain emotions. The identification of human emotions is useful in the field of human-computer interaction. It helps develop the interface of software that is applicable in community service centres, banks, and education and others. This research proceeds in three stages, namely data collection, feature extraction, and classification. We obtain data in the form of audio files from the Berlin Emo-DB database. The files contain human voices that express five sets of emotions: angry, bored, happy, neutral and sad. Feature extraction applies to all audio files using the method of Mel Frequency Cepstrum Coefficient (MFCC). The classification uses Multi-Layer Perceptron (MLP), which is one of the artificial neural network methods. The MLP classification proceeds in two stages, namely the training and the testing phase. MLP classification results in good emotion recognition. Classification using 100 hidden layer nodes gives an average accuracy of 72.80%, an average precision of 68.64%, an average recall of 69.40%, and an average F1-score of 67.44%.

Keywords

emotion, speech, MFCC, python, multilayer perceptron

Full Text:

PDF

References

A. B. Gumelar et al., “Human Voice Emotion Identification Using Prosodic and Spectral Feature Extraction Based on Deep Neural Networks,” in 2019 IEEE 7th International Conference on Serious Games and Applications for Health (SeGAH), 2019, pp. 1–8.

S. Lalitha, A. Madhavan, B. Bhushan, and S. Saketh, “Speech Emotion Recognition,” 2014 Int. Conf. Adv. Electron. Comput. Commun. ICAECC 2014, vol. 7, 2015.

M. D. Pell and S. A. Kotz, “On the Time Course of Vocal Emotion Recognition,” PLoS One, vol. 6, no. 11, pp. 1–16, 2011.

L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, and M. A. Mahjoub, “Speech Emotion Recognition: Methods and Cases Study,” Proc. 10th Int. Conf. Agents Artif. Intell., vol. 2, no. Icaart, pp. 175–182, 2018.

B. H. Prasetio, W. Kurniawan, and M. H. H. Ichsan, “Pengenalan Emosi Berdasarkan Suara Menggunakan Algoritma HMM,” J. Teknol. Inf. dan Ilmu Komput., vol. 4, no. 3, pp. 168–172, 2017.

A. Bombatkar, G. Bhoyar, K. Morjani, S. Gautam, and V. Gupta, “Emotion Recognition Using Speech Processing Using K- Nearest Neighbor Algorithm,” Int. J. Eng. Res. Appl., pp. 2248–9622, 2014.

A. Watile, V. Alagdeve, and S. Jain, “Emotion Recognition in Speech by MFCC and SVM,” Int. J. Sci. Eng. Technol. Res., vol. 6, no. 3, pp. 404–407, 2017.

A. R. Choudhury, A. Ghosh, R. Pandey, and S. Barman, “Emotion Recognition from Speech Signals using Excitation Source and Spectral Features,” in Proceedings of 2018 IEEE Applied Signal Processing Conference, ASPCON 2018, 2018, pp. 257–261.

A. Iqbal and K. Barua, “A Real-time Emotion Recognition from Speech using Gradient Boosting,” 2nd Int. Conf. Electr. Comput. Commun. Eng. ECCE 2019, pp. 1–5, 2019.

M. Mentari, E. K. R. Sari, and S. Mutrofin, “Klasifikasi Menggunakan Kombinasi Multilayer Perceptron dan Alignment Particle Swarm Optimization,” in Seminar Nasional Teknologi Informasi & Komputasi (SENASTIK), 2014, no. October 2014, pp. 47–54.

F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, no. January, pp. 2825–2830, 2011.

S. Helmiyah, I. Riadi, R. Umar, A. Hanif, A. Yudhana, and A. Fadlil, “Identifikasi Emosi Manusia Berdasarkan Ucapan Menggunakan Metode Ekstraksi Ciri LPC dan Metode Euclidean Distance,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 6, pp. 1177–1186, Dec. 2020.

Y. Kumar and M. Mahajan, “Machine Learning Based Speech Emotions Recognition System,” Int. J. Sci. Technol. Res., vol. 8, no. 7, pp. 722–729, 2019.

F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, “A Database of German Emotional Speech,” in Ninth European Conference on Speech Communication and Technology, 2005, pp. 1517–1520.

G. Liu, W. He, and B. Jin, “Feature Fusion of Speech Emotion Recognition Based on Deep Learning,” in 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC), 2018, pp. 193–197.

Y. Sun, G. Wen, and J. Wang, “Weighted spectral features based on local Hu moments for speech emotion recognition,” Biomed. Signal Process. Control, vol. 18, pp. 80–90, 2015.

K. Wang, N. An, B. N. Li, Y. Zhang, and L. Li, “Speech Emotion Recognition Using Fourier Parameters,” IEEE Trans. Affect. Comput., vol. 6, no. 1, pp. 69–75, 2015.

S. S. S and V. N. Nitnaware, “Emotion Speech Recognition using MFCC and SVM,” Int. J. Eng. Res., vol. 4, no. 06, pp. 1067–1070, 2015.

K. V. Krishna Kishore and P. Krishna Satish, “Emotion Recognition in Speech using MFCC and Wavelet Features,” in Proceedings of the 2013 3rd IEEE International Advance Computing Conference, IACC 2013, 2013, pp. 842–847.

G. Liu, W. He, and B. Jin, “Feature Fusion of Speech Emotion Recognition Based on Deep Learning,” in 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC), 2018, pp. 193–197.

M. S. Likitha, S. R. R. Gupta, K. Hasitha, and A. U. Raju, “Speech Based Human Emotion Recognition Using MFCC,” in 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 2017, pp. 2257–2260.

P. R. Chaudhari and J. S. R. Alex, “Selection of Features for Emotion Recognition from Speech,” Indian J. Sci. Technol., vol. 9, no. 39, pp. 1–5, 2016.

M. Irfan, B. A. Ardi Sumbodo, and I. Candradewi, “Sistem Klasifikasi Kendaraan Berbasis Pengolahan Citra Digital dengan Metode Multilayer Perceptron,” IJEIS (Indonesian J. Electron. Instrum. Syst., vol. 7, no. 2, p. 139, 2017.

S. Motamed, S. Setayeshi, and A. Rabiee, “Speech Emotion Recognition Based on a Modified Brain Emotional Learning Model,” Biol. Inspired Cogn. Archit., vol. 19, pp. 32–38, Jan. 2017.

B. Mcfee et al., “librosa : Audio and Music Signal Analysis in Python,” in Proceedings of the 14th python in science conference, 2015, pp. 18–25.

Article Metrics

Abstract view(s): 552 time(s)
PDF: 473 time(s)

Refbacks

  • There are currently no refbacks.