IIUM Repository

The development of an integrated corpus for Malay language

Awang Abu Bakar, Normi Sham (2020) The development of an integrated corpus for Malay language. In: Sixth International Conference on Computational Science and Technology 2019 (ICCST2019), 29-30 Aug 2019, Kota Kinabalu.

[img] PDF (paper) - Published Version
Restricted to Repository staff only

Download (566kB) | Request a copy
[img]
Preview
PDF (scopus)
Download (109kB) | Preview

Abstract

Generally, a corpus serves as the source of data for various types of research. As such, there are a number of Malay corpora being developed to support the needs of the researchers. However, the various corpora of Malay text are distributed and not integrated, where some words are not included or missing in some corpora. The focus of this paper is to develop an integrated corpus that will combine four most comprehensive Malay corpora. The intention is to provide comprehensive coverage of Malay corpora which would be beneficial for any relevant work.

Item Type: Conference or Workshop Item (Plenary Papers)
Additional Information: 3509/75243
Subjects: T Technology > T Technology (General)
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): Kulliyyah of Information and Communication Technology > Department of Computer Science
Kulliyyah of Information and Communication Technology > Department of Computer Science
Depositing User: Dr. Normi Sham Awang Abu Bakar
Date Deposited: 24 Oct 2019 15:56
Last Modified: 24 Oct 2019 15:57
URI: http://irep.iium.edu.my/id/eprint/75243

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year