Evaluation of the SHAPD2 Algorithm Efficiency in Plagiarism Detection Task Using PAN Plagiarism Corpus


Dariusz Ceglarek, Poznan School of Banking, Poland


This work presents results of the ongoing novel research in the area of natural languageprocessing focusing on plagiarism detection, semantic networks and semantic compression. Theresults demonstrate that the semantic compression is a valuable addition to the existing methodsused in plagiary detection. The application of the semantic compression boosts the efficiency ofSentence Hashing Algorithm for Plagiarism Detection2 (SHAPD2) and authors’implementation ofthe w-shingling algorithm.Experiments were performed onClough&Stephenson corpusas well asan available PAN–PC-10plagiarism corpus used to evaluateplagiarism detection methods, so the results can be compared with other research teams.


Plagiarism detection, Longest common subsequence, Semantic compression, Sentence hashing,w-shingling, Intellectual property protection

Full Text  Volume 3, Number 3