Andalas Saputra, Meitro Hartanto and Pebrianti, Dwi and Bayuaji, Luhur (2025) Optimizing malware detection and prevention on proxy servers through random forest and lexical feature analysis. Indonesian Journal of Computing, Engineering, and Design, 7 (1). pp. 1-15. ISSN 2656-1972 E-ISSN 2656-8179
![]() |
PDF
- Published Version
Restricted to Registered users only Download (796kB) | Request a copy |
Abstract
Malware has become a significant concern due to the increase in malicious websites hosting spam, phishing, malware, and other threats. This research aims to predict malware URLs using lexical features for feature extraction and random forest for classification. The dataset, sourced from kaggle.com, includes benign, phishing, spam, malware, and defacement URLs. To address data imbalance, random oversampling was applied for balanced training. Recursive feature elimination was used to optimize lexical features, testing various sets of features (10, 15,19, 23, 29, 35) for classification accuracy, achieving 98% accuracy using 23 features. Validation tests with actual university network data confirmed this model’s effectiveness, classifying malicious URLs in 9 minutes using 11,566 samples. URL filtering involved log analyzer tools capturing internet traffic during working hours over one month. Results suggest that this approach can efficiently classify malicious URLs and could be implemented for real-time detection in proxy server logs, aiding IT departments in preventing malware spread via web traffic.
Item Type: | Article (Journal) |
---|---|
Uncontrolled Keywords: | Lexical features, malware detection, proxy server logs, random forest, URL classification. |
Subjects: | T Technology > T Technology (General) |
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): | Kulliyyah of Engineering > Department of Mechanical Engineering |
Depositing User: | Dr Dwi Pebrianti |
Date Deposited: | 16 May 2025 10:43 |
Last Modified: | 16 May 2025 10:43 |
URI: | http://irep.iium.edu.my/id/eprint/121082 |
Actions (login required)
![]() |
View Item |