Cross-media fake content detection via independent deep learning classifiers

Samsul Kamal, Iqbal Najihah and Samsudin, Anna Safiya and Hassan, Raini (2026) Cross-media fake content detection via independent deep learning classifiers. International Journal on Perceptive and Cognitive Computing, 12 (1). pp. 65-73. E-ISSN 2462-229X

Preview

PDF - Published Version
Download (2MB) | Preview

Official URL: https://journals.iium.edu.my/kict/index.php/IJPCC/...

Abstract

The rapid advancement of generative models has enabled the creation of highly realistic fake multimedia content, including altered images, deepfake videos, and synthetic audio. These forgeries undermine information integrity and pose significant societal risks, especially by encouraging misinformation, digital fraud and impersonation. As these threats directly affect public trust and institutional transparency, they challenge the goals outlined in SDG 16: Peace, Justice, and Strong Institutions, which focuses on reducing corruption, preserving information integrity, and ensuring accountable, trustworthy systems. To address these issues, this paper proposes a deep learning–based system that classifies multimedia content across three modalities, which are image, video, and audio. Unlike conventional multimodal fusion approaches that necessitate paired data inputs, this paper introduces a novel routing-based unification architecture. The suggested framework makes use of a content-adaptive routing mechanism that treats each modality independently. Using a dual-backbone Swin Transformer and EfficientNet for images, Video Swin Transformer for video, and Wav2Vec 2.0 for audio, the system automatically determines the type of input file and sends it to the relevant specialized deep learning classifier. This design allows for a versatile, single-entry-point forensic tool that maintains high accuracy by leveraging domain-specific experts without the computational overhead of processing multiple streams concurrently. Experimental results demonstrate strong performance across individual modalities, with the audio model achieving 96.95% accuracy and the image model showing robust precision despite challenges posed by high quality generative forgeries

Item Type:	Article (Journal)
Uncontrolled Keywords:	Deep Learning, Data Science, Multimedia Forensics, Swin Transformer, Wav2Vec 2.0, Machine Learning
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button):	Kulliyyah of Information and Communication Technology > Department of Computer Science Kulliyyah of Information and Communication Technology > Department of Computer Science
Depositing User:	Dr. Raini Hassan
Date Deposited:	10 Feb 2026 15:33
Last Modified:	10 Feb 2026 15:33
Queue Number:	2026-02-Q2105
URI:	http://irep.iium.edu.my/id/eprint/127383

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year