On retrieving intelligently plagiarized documents using semantic similarity

Syed Fawad Hussain*, Asif Suryani

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)


Plagiarism in text documents can be done in many ways. The most common form of plagiarizing a text document is to copy a chunk of text and alter it intelligently, thereby making it look original. Such cases are hard to detect since they require semantic analysis of the document. External sources of knowledge such as WordNet have been employed to help detect such cases. However, such an approach might often miss the contextual significance of the employed words, as well as suffer from the issue of synonymy and polysemy. We propose an architecture that uses a semantic similarity measure that exploits the semantic similarity of words, as mined from within the data corpus, thereby using localized contextual information. In this work, an approach for detecting plagiarism in text document has been proposed using a semantic similarity measure with a Nearest Neighbor (NN) search, and using a kernel in multiclass support vector machine. We test our approach on a plagiarism dataset specially developed to test the efficacy of the solution with varying level of plagiarism. The results have been compared with that of well-known commercial software, Turnitin®, having access to a large database. Our experiments suggest that using semantic kernels can help detect plagiarism, which can outsmart available techniques.

Original languageEnglish
Pages (from-to)246-258
Number of pages13
JournalEngineering Applications of Artificial Intelligence
Publication statusPublished - 1 Oct 2015

Bibliographical note

Publisher Copyright:
© 2015 Elsevier Ltd. All rights reserved.


  • Information retrieval
  • Plagiarism detection
  • Semantic similarity
  • Support Vector Machine

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering


Dive into the research topics of 'On retrieving intelligently plagiarized documents using semantic similarity'. Together they form a unique fingerprint.

Cite this