Ashraf, Arselan and Gunawan, Teddy Surya and Arifin, Fatchul and Kartiwi, Mira and Sophian, Ali and Habaebi, Mohamed Hadi (2023) Enhanced emotion recognition in videos: a convolutional neural network strategy for human facial expression detection and classification. Indonesian Journal of Electrical Engineering and Informatics (IJEEI), 11 (1). pp. 286-299. ISSN 2089-3272
PDF
- Published Version
Restricted to Registered users only Download (833kB) | Request a copy |
||
|
PDF
- Supplemental Material
Download (158kB) | Preview |
Abstract
The human face is essential in conveying emotions, as facial expressions serve as effective, natural, and universal indicators of emotional states. Automated emotion recognition has garnered increasing interest due to its potential applications in various fields, such as human-computer interaction, machine learning, robotic control, and driver emotional state monitoring. With artificial intelligence and computational power advancements, visual emotion recognition has become a prominent research area. Despite extensive research employing machine learning algorithms like convolutional neural networks (CNN), challenges remain concerning input data processing, emotion classification scope, data size, optimal CNN configurations, and performance evaluation. To address these issues, we propose a comprehensive CNN-based model for real-time detection and classification of five primary emotions: anger, happiness, neutrality, sadness, and surprise. We employ the Amsterdam Dynamic Facial Expression Set – Bath Intensity Variations (ADFES-BIV) video dataset, extracting image frames from the video samples. Image processing techniques such as histogram equalization, color conversion, cropping, and resizing are applied to the frames before labeling. The Viola-Jones algorithm is then used for face detection on the processed grayscale images. We develop and train a CNN on the processed image data, implementing dropout, batch normalization, and L2 regularization to reduce overfitting. The ideal hyperparameters are determined through trial and error, and the model's performance is evaluated. The proposed model achieves a recognition accuracy of 99.38%, with the confusion matrix, recall, precision, F1 score, and processing time further quantifying its performance characteristics. The model's generalization performance is assessed using images from the Warsaw Set of Emotional Facial Expression Pictures (WSEFEP) and Extended Cohn-Kanade Database (CK+) datasets. The results demonstrate the efficiency and usability of our proposed approach, contributing valuable insights into real-time visual emotion recognition.
Actions (login required)
View Item |