IIUM Repository

Phonetically rich and balanced arabic speech corpus: An overview

Abushariah, Mohammad A. M. and Ainon, Raja N. and Zainuddin, Roziati and Khalifa, Othman Omran and Elshafei, Moustafa (2010) Phonetically rich and balanced arabic speech corpus: An overview. In: International Conference on Computer and Communication Engineering (ICCCE 2010), 11-13 May 2010, Kuala Lumpur.

[img] PDF (Phonetically rich and balanced arabic speech corpus: An overview ) - Published Version
Restricted to Repository staff only

Download (167kB) | Request a copy


Lack of spoken and written training data is one o f the main issues encountered by Arabic automatic speech recognition (ASR) researchers. Almost all written and spoken corpora are not readily available to the public and many of them can only be obtained by purchasing from the Linguistic Data Consortium (LDC) or the European Language Resource Association (ELRA). There is more shortage of spoken training data as compared to written training data resulting in a great need for more speech corpora in order to serve different domains of Arabic ASR. The available spoken corpora were mainly collected from broadcast news (radios and televisions), and telephone conversations having certain technical and quality shortcomings. In order to produce a robust speaker-independent continuous automatic Arabic speech recognizer, a set of speech recordings that are rich and balanced is required. The rich characteristic is in the sense that it must contain all the phonemes of Arabic language. It must be balanced in preserving the phonetics distribution of Arabic language too. This set of speech recordings must be based on a proper written set of sentences and phrases created by experts. Therefore, it is crucial to create a high quality written (text) set of the sentences and phrases before recording them. This work adds a new kind of possible speech data for Arabic language based text and speech applications besides other kinds such as broadcast news and telephone conversations. Therefore, this work is an invitation to all Arabic ASR developers and research groups to explore and capitalize.

Item Type: Conference or Workshop Item (Full Paper)
Additional Information: 4119/5883 ISBN : 978-1-4244-6233-9
Subjects: T Technology > T Technology (General)
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): Kulliyyah of Engineering > Department of Electrical and Computer Engineering
Depositing User: Prof. Dr Othman O. Khalifa
Date Deposited: 13 Nov 2011 17:00
Last Modified: 22 Nov 2011 06:36
URI: http://irep.iium.edu.my/id/eprint/5883

Actions (login required)

View Item View Item


Downloads per month over past year