IIUM Repository

Source code plagiarism detection using siamese BLSTM network and embedding models

Manahi, Mohammed and Sulaiman, Suriani and Awang Abu Bakar, Normi Sham (2022) Source code plagiarism detection using siamese BLSTM network and embedding models. In: Proceedings of the 8th International Conference on Computational Science and Technology. Lecture Notes in Electrical Engineering, 835 . Springer Singapore, Singapore, pp. 397-409. ISBN 978-981-16-8514-9

[img] PDF - Published Version
Restricted to Registered users only

Download (481kB) | Request a copy

Abstract

Source code plagiarism is a severe ongoing problem that threatens academic integrity and intellectual rights. Students from computing disciplines commit plagiarism through diverse channels, in which direct in-class plagiarism being the most popular. Programming instructors struggle to manually inspect plagiarism activities in large volumes of submissions. Thus, many research works on detection approaches have been proposed to overcome prolonged manual inspection. In this article, we present a deep learning framework that leverages a Siamese BLSTM network and character-based embeddings to detect source code plagiarism. The goal of this research is to determine which character-based embedding architecture produces the most accurate plagiarism detection scores. The proposed framework uses Word2Vec and fastText models to obtain various pre-trained source code embedding sequences as input to the network. Subsequently, we utilise Manhattan distance to measure the plagiarism scores between the two outputs produced by the network. To the best of our knowledge, this is the first research work to utilise various embedding models for source code plagiarism detection. Experimental results showed that the embeddings from the Word2Vec Skip-Gram and Negative Sampling (W2V-SGNS) architecture produce the most accurate detection scores.

Item Type: Book Chapter
Uncontrolled Keywords: Source code plagiarism detection, Source code embeddings, Siamese LSTM network, Programming language processing, Deep learning, Code similarity
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): Kulliyyah of Information and Communication Technology > Department of Computer Science
Kulliyyah of Information and Communication Technology > Department of Computer Science
Depositing User: Dr. Suriani Sulaiman
Date Deposited: 21 Apr 2022 10:38
Last Modified: 21 Apr 2022 10:38
URI: http://irep.iium.edu.my/id/eprint/97680

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year