IIUM Repository

Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation

Shah, Asadullah and Saidin, Aznan Zuhid and Alshaikhli, Imad Fakhri Taha and Zeki, Akram M. (2011) Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation. Procedia - Social and Behavioral Sciences, 27. pp. 233-240. ISSN 18770428

[img] PDF (Frequencies determination of characters for Bahasa Melayu: Results of preliminary investigation ) - Published Version
Restricted to Repository staff only

Download (522kB) | Request a copy

Abstract

Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e. A-Z) identical to English language. The written language uses the character set as building blocks to build word, sentences and phrases along with special punctuations and signs to create documents of interest. In this paper, results of preliminary investigation of Malay text documents are provided. For this purpose scanning of articles written upon various topics in Malay were carried out. Approximately 31 thousand characters from different articles are scanned. Preliminary observations indicate that on average, character “A” occurs 19%, character “N” occur 10%, character “E” occur “9%”and character “I” 8% in text. However, it is also observed from the data that, these are the characters from over all set with highest frequencies of occurrences and it is expected that during further investigation they will remain as higher frequency occurring characters. Furthermore, the results indicate that for Bahasa Melayu characters appearance in text is very close in character frequencies of Bahasa Indonesia, but having different appearance of characters than English language. The investigation also indicate that these two languages, Bahasa Melayu and Bahasa Indonesia share close phonetic structure but not English, though all three use same character set.

Item Type: Article (Journal)
Additional Information: 6566/11804
Uncontrolled Keywords: Bahasa Melayu ; Bahasa Indonesia ; English language ; absolute frequencies ; relative frequencies ; running average
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): Kulliyyah of Information and Communication Technology > Department of Computer Science
Kulliyyah of Information and Communication Technology > Department of Computer Science

Kulliyyah of Information and Communication Technology > Department of Information System
Kulliyyah of Information and Communication Technology > Department of Information System
Depositing User: Asadullah Shah Syed
Date Deposited: 20 Dec 2011 12:06
Last Modified: 08 Dec 2014 15:50
URI: http://irep.iium.edu.my/id/eprint/11804

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year