πŸ“ž +91-7667918914 | βœ‰οΈ ijireeice@gmail.com
International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering
International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering A monthly Peer-reviewed & Refereed journal
ISSN Online 2321-2004ISSN Print 2321-5526Since 2013
IJIREEICE meets the suggestive parameters outlined in the latest University Grants Commission (UGC) for peer-reviewed journals, ensuring high standards of research integrity, publication ethics, and academic excellence.
← Back to VOLUME 13, ISSUE 10, OCTOBER 2025

Machine Learning-Based Plagiarism Detector System

Rakshitha S N, Shreya Sathapathi, Yashvitha J, Dr. Golda Dilip

πŸ‘ 1 viewπŸ“₯ 0 downloads
Share: 𝕏 f in ✈ βœ‰
Abstract: Robust tools for plagiarism checking are essential for maintaining integrity in both academic and professional environments. Existing detection strategies, which are typically built on lexical comparison, struggle to correctly flag sophisticated, machine-aided rephrasing. This challenge requires a necessary pivot toward adaptable Machine Learning (ML) platforms capable of comprehending the underlying meaning of text. This research introduces a highly efficient, two-phase ML framework specifically engineered to accurately identify text that has been heavily paraphrased. The initial phase of this architecture employs a SentenceTransformer model (all-MiniLM-L6-v2) to generate dense vector embeddings for documents under suspicion and for the reference library. These embeddings are stored and searched using FAISS (Facebook AI Similarity Search), enabling fast, large-scale retrieval of potential source candidates. The second phase uses a Longformer-based sequence classifier to perform an in-depth, pairwise contextual analysis between the flagged text and the retrieved candidates before delivering a final verdict. This classifier model was chosen because it effectively bypasses the sequence-length constraints of previous transformer models, enabling analysis of long-form content. The final system, named "CopyShield," is deployed with an accessible user interface using the Gradio framework. Validation using the challenging jpwahle/machine-paraphrase-dataset demonstrated a strong F1-score in the 0.89–0.92 range, confirming its ability to counter contemporary obfuscation methods.

Keywords: NLP, ML, Semantic Analysis, Transformer Models, Deep Learning, Longformer Architecture, Plagiarism Checkers, Gradio, FAISS.

How to Cite:

[1] Rakshitha S N, Shreya Sathapathi, Yashvitha J, Dr. Golda Dilip, β€œMachine Learning-Based Plagiarism Detector System,” International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI: 10.17148/IJIREEICE.2025.131038

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.