IIUM Repository

Theme identification using machine learning techniques

Jayady, Siti Hajar and Antong, Hasmawati (2021) Theme identification using machine learning techniques. Journal of Integrated and Advanced Engineering (JIAE), 1 (2). pp. 123-134. ISSN 2774-602X E-ISSN 2774-6038

[img] PDF - Published Version
Restricted to Repository staff only

Download (390kB) | Request a copy

Abstract

With the abundance of online research platforms, much information presented in PDF files, such as articles and journals, can be obtained easily. In this case, students completing research projects would have many downloaded PDF articles on their laptops. However, identifying the target articles manually within the collection can be tiring as most articles consist of several pages that need to be analyzed. Reading each article to determine if the article relates theme and organizing the articles based on themes is time and energy-consuming. Referring to this problem, a PDF files organizer that implemented a theme identifier is necessary. Thus, work will focus on automatic text classification using the machine learning methods to build a theme identifier employed in the PDF files organizer to classify articles into augmented reality and machine learning. A total of 1000 text documents for both themes were used to build the classification model. Moreover, the pre-preprocessing step for data cleaning and TF-IDF feature extraction for text vectorization and to reduce sparse vectors were performed. 80% of the dataset were used for training, and the remaining were used to validate the trained models. The classification models proposed in this work are Linear SVM and Multinomial Naïve Bayes. The accuracy of the models was evaluated using a confusion matrix. For the Linear SVM model, grid-search optimization was performed to determine the optimal value of the Cost parameter.

Item Type: Article (Journal)
Uncontrolled Keywords: Multinomial Naive Bayes, portable document file, pre-processing, term frequency-inverse document frequency (TF-IDF)
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800 Electronics. Computer engineering. Computer hardware. Photoelectronic devices > TK7885 Computer engineering
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): Kulliyyah of Engineering
Kulliyyah of Engineering > Department of Mechatronics Engineering
Depositing User: Dr Hasmawati Antong
Date Deposited: 31 Mar 2023 12:16
Last Modified: 31 Mar 2023 12:16
URI: http://irep.iium.edu.my/id/eprint/104247

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year