Cybersecurity intelligence through textual data analysis: a framework using machine learning and terrorism datasets

Atoum, Mohammed Salem and Alarood, Ala Abdulsalam and Alsolami, Eesa and Abubakar, Adamu and Al Hwaitat, Ahmad K. and Izzat, Alsmadi (2025) Cybersecurity intelligence through textual data analysis: a framework using machine learning and terrorism datasets. Future Internet, 17 (4). pp. 1-31. ISSN 1999-5903

Preview

PDF - Published Version
Download (4MB) | Preview

Preview

PDF - Supplemental Material
Download (155kB) | Preview

Preview

PDF - Supplemental Material
Download (261kB) | Preview

Official URL: https://www.mdpi.com/1999-5903/17/4/182

Abstract

This study examines multi-lexical data sources, utilizing an extracted dataset from an open-source corpus and the Global Terrorism Datasets (GTDs), to predict lexical patterns that are directly linked to terrorism. This is essential as specific patterns within a textual context can facilitate the identification of terrorism-related content. The research methodology focuses on generating a corpus from various published works and extracting texts pertinent to “terrorism”. Afterwards, we extract additional lexical contexts of GTDs that directly relate to terrorism. The integration of multi-lexical data sources generates lexical patterns linked to terrorism. Machine learning models were used to train the dataset. We conducted two primary experiments and analyzed the results. The analysis of data obtained from open sources reveals that while the Extra Trees model achieved the highest accuracy at 94.31%, the XGBoost model demonstrated superior overall performance with a higher recall (81.32%) and F1-Score (83.06%) after tuning, indicating a better balance between sensitivity and precision. Similarly, on the GTD dataset, XGBoost consistently outperformed other models in recall and the F1-score, making it a more suitable candidate for tasks where minimizing false negatives is critical. This implies that we can establish a specific co-occurrence and context within the terrorism dataset from multiple lexical data sources in effectively identifying certain multi-lexical patterns such as “Suicide Attack/Casualty”, “Civilians/Victims”, and “Hostage Taking/Abduction” across various applications or contexts. This will facilitate the development of a framework for understanding the lexical patterns associated with terrorism

Item Type:	Article (Journal)
Uncontrolled Keywords:	cyber intelligence, terrorism, machine learning
Subjects:	Q Science > QA Mathematics > QA76 Computer software
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button):	Kulliyyah of Information and Communication Technology > Department of Computer Science Kulliyyah of Information and Communication Technology > Department of Computer Science
Depositing User:	Dr Adamu Abubakar
Date Deposited:	13 Oct 2025 16:50
Last Modified:	13 Oct 2025 16:50
URI:	http://irep.iium.edu.my/id/eprint/123683

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year