IIUM Repository

Speech emotion recognition using deep neural networks on multilingual databases

Ahmad Qadri, Syed Asif and Gunawan, Teddy Surya and Wani, Taiba Majid and Ambikairajah, Eliathamby and Kartiwi, Mira and Ihsanto, Eko (2021) Speech emotion recognition using deep neural networks on multilingual databases. In: Advances in Robotics, Automation and Data Analytics. Advances in Intelligent Systems and Computing, Chapter 3 . Springer, pp. 21-30. ISBN 978-3-030-70916-7

[img] PDF - Accepted Version
Restricted to Repository staff only

Download (536kB) | Request a copy
[img] PDF - Published Version
Restricted to Registered users only

Download (187kB) | Request a copy
PDF (SCOPUS) - Supplemental Material
Download (337kB) | Preview


The research community's ever-increasing interest in studying human-computer interactions (HCI), systems deducing, and identifying a speech signal's emotional aspects has emerged as a hot research topic. Speech Emotion Recognition (SER) has brought the development of automated and intelligent analysis of human ut-terances to reality. Typically, an SER system focuses on extracting the features from speech signals such as pitch frequency, formant features, energy-related and spectral features, tailing it with a classification quest to understand the underlying emotion. The key issues pivotal for a successful SER system are driven by the proper selection of proper emotional feature extraction techniques. In this paper, Mel-frequency Cepstral Coefficient (MFCC) and Teager Energy Operator (TEO) along with a new proposed Feature Fusion of MFCC and TEO referred to as Teager-MFCC (TMFCC) is examined over a multilingual database consisting of English, German and Hindi languages. Deep Neural Networks have been used to classify the different emotions considered, happy, sad, angry, and neutral. Eval-uation results show that the proposed fusion TMFCC with a recognition rate of 92.7% outperforms TEO and MFCC. With TEO and MFCC configurations, the recognition rate has been found as 88.5% and 90.0%, respectively.

Item Type: Book Chapter
Additional Information: 5588/88878
Uncontrolled Keywords: Speech Emotion Recognition, Mel-frequency Cepstral Coefficient (MFCC), Teager Energy Operator (TEO), Deep Neural Networks (DNN).
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800 Electronics. Computer engineering. Computer hardware. Photoelectronic devices > TK7885 Computer engineering
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): Kulliyyah of Engineering > Department of Electrical and Computer Engineering
Kulliyyah of Information and Communication Technology > Department of Information System
Kulliyyah of Information and Communication Technology > Department of Information System
Depositing User: Prof. Dr. Teddy Surya Gunawan
Date Deposited: 30 Mar 2021 12:23
Last Modified: 11 May 2021 11:09
URI: http://irep.iium.edu.my/id/eprint/88878

Actions (login required)

View Item View Item


Downloads per month over past year