An intelligent decision support system for software plagiarism detection in academia

Ullah, F; Jabbar, S; Mostarda, L

doi:10.1002/int.22399

The act of source code plagiarism is an academic offense that discourages the learning habits of students. Online support is available through which students can hire professional developers to code their regular programming tasks. These facilities make it easier for students to practice plagiarism. First, raw source codes are cleaned from noisy data to extract meaningful codes as the actual logic is more important to the programmers. Second, pre-processing techniques based on tokenization are used to convert filtered codes into meaningful tokens. It breaks the codes into small instances with the number of occurrences known as the frequency. Thirdly, the local and global weighting scheme method is applied to estimate the significance of each feature in an individual or a group of documents. It helps us greatly to zoom in on the importance of each feature of how effective it is for the next phase. Fourth, the single value decomposition method is used to reduce the dimensions of these features by maintaining the actual semantics of the source codes. This technique is used to remove overloaded noise information and collect only those features that are more effective for plagiarism detection. Fifth, the latent semantic analysis (LSA) technique is used to mine the actual semantics of the source codes in the form of latent variables. After that, the LSA features are used as input to cosine similarity to compute the plagiarism among different source codes. To validate the proposed approach, we used the topic modeling approach to group the relevant features into different topics.

An intelligent decision support system for software plagiarism detection in academia

Ullah, F;Jabbar, S;Mostarda, L

2021-01-01

Abstract

The act of source code plagiarism is an academic offense that discourages the learning habits of students. Online support is available through which students can hire professional developers to code their regular programming tasks. These facilities make it easier for students to practice plagiarism. First, raw source codes are cleaned from noisy data to extract meaningful codes as the actual logic is more important to the programmers. Second, pre-processing techniques based on tokenization are used to convert filtered codes into meaningful tokens. It breaks the codes into small instances with the number of occurrences known as the frequency. Thirdly, the local and global weighting scheme method is applied to estimate the significance of each feature in an individual or a group of documents. It helps us greatly to zoom in on the importance of each feature of how effective it is for the next phase. Fourth, the single value decomposition method is used to reduce the dimensions of these features by maintaining the actual semantics of the source codes. This technique is used to remove overloaded noise information and collect only those features that are more effective for plagiarism detection. Fifth, the latent semantic analysis (LSA) technique is used to mine the actual semantics of the source codes in the form of latent variables. After that, the LSA features are used as input to cosine similarity to compute the plagiarism among different source codes. To validate the proposed approach, we used the topic modeling approach to group the relevant features into different topics.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	Rivista
	
				INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
			
	Codice DOI
	
				https://dx.doi.org/10.1002/int.22399
			
	ID tipologia loginMiur
	
				262
			
	Appare nelle tipologie:
	
				1.1 Articolo

File in questo prodotto:

File	Dimensione	Formato
IntJofIntelligentSys-2021-Ullah-Anintelligentdecisionsupportsystemforsoftwareplagiarismdetectionin.pdf solo gestori di archivio Descrizione: versione editoriale Tipologia: Versione Editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.95 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.95 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11581/456452

Citazioni

ND

16

13

An intelligent decision support system for software plagiarism detection in academia

Ullah, F;Jabbar, S;Mostarda, L

2021-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)