Elfaki, Ayman and Asnawi, Ani Liza and Jusoh, Ahmad Zamani and Ismail, Ahmad Fadzil and Ibrahim, Siti Noorjannah and Mohamed Azmin, Nor Fadhillah and Nik Hashim, Nik Nur Wahidah (2021) Using the short-time fourier transform and ResNet to diagnose depression from speech data. In: 2021 IEEE International Conference on Computing (ICOCO 2021), 17-19 November 2021, Kuala Lumpur. (Unpublished)
PDF (Programme schedule)
- Supplemental Material
Restricted to Registered users only Download (657kB) | Request a copy |
|
PDF (Unpublished paper)
Restricted to Repository staff only Download (313kB) | Request a copy |
Abstract
Depression is a common illness that is affecting many people nowadays, this is especially true now with the advent of the COVID-19 pandemic. It often arises when a person is having difficulty coping with stressful life events. It can occur throughout the lifespan of a person, and it pervades all aspects of our lives. Currently, depression diagnoses rely on patient interviews and self-report questionnaires, which depend heavily on the patient honesty and the subjective experience of the clinician. In this paper, we will begin with investigating the viability of using the Short-Time Fourier Transform (STFT) as a feature descriptor to objectively diagnose depression from speech data. The dataset used in this research is the Audio-Visual Emotion Challenging 2017 (AVEC2017). The model is based on a modified ResNet18 model architecture to perform a binary classification (i.e., depressed or non-depressed). The STFT is computed from the speech signal to generate a mel-spectrogram for training and testing the model. The experiment shows that relying solely on STFT as an input feature resulted in an F1 score of 74.71% in classifying depression.
Actions (login required)
View Item |