IIUM Repository

Time domain speech enhancement with CNN and time-attention transformer

Saleem, Nasir and Gunawan, Teddy Surya and Dhahbi, Sami and Bourouis, Sami (2024) Time domain speech enhancement with CNN and time-attention transformer. Digital Signal Processing, 147. pp. 1-12. ISSN 1051-2004 E-ISSN 1095-4333

[img]
Preview
PDF (SCOPUS) - Supplemental Material
Download (266kB) | Preview
[img] PDF (Article) - Published Version
Restricted to Repository staff only

Download (3MB) | Request a copy

Abstract

Speech enhancement in the time domain involves improving the quality and intelligibility of noisy speech by processing the waveform directly without the need for explicit feature extraction or domain transformation. Deep learning is a powerful approach for time domain speech enhancement, offering significant improvements over traditional techniques. Formulating a resource-efficient deep neural model in the time domain without ignoring the contextual information and detailed features of input speech is still a vital challenge. To address this challenge, this study proposes a speech enhancement model using 1D-time domain dilated residual blocks in the convolutional encoder-decoder framework. Further, this study integrates a time-attention transformer (TAT) bottleneck between the encoder-decoder. The TAT model extends the transformer architecture by incorporating a time-attention mechanism, which enables the model to selectively attend to different segments of the speech signal over time. This allows the model to effectively capture long-term dependencies in the speech and learn to recognize important features. The experimental results indicate that the proposed speech enhancement outperforms the recent deep neural networks (DNNs) and substantially improves the intelligibility and quality of noisy speech. With the WSJ0 SI-84 database, the proposed SE improves the STOI and PESQ by 21.51% and 1.14 over noisy speech.

Item Type: Article (Journal)
Additional Information: First author is my postdoc student. It is an international collaboration between 4 international universities.
Uncontrolled Keywords: Time-domain speech enhancement, Convolutional encoder-decoder, Transformer, Time attention
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800 Electronics. Computer engineering. Computer hardware. Photoelectronic devices > TK7885 Computer engineering
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): Kulliyyah of Engineering > Department of Electrical and Computer Engineering
Kulliyyah of Engineering
Depositing User: Prof. Dr. Teddy Surya Gunawan
Date Deposited: 06 Mar 2024 09:32
Last Modified: 16 Mar 2024 08:48
URI: http://irep.iium.edu.my/id/eprint/111097

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year