Revision as of 04:06, 11 November 2022 edit Fabrickator (talk \| contribs) Extended confirmed users 9,719 edits Undid revision 1121151777 by AManWithNoPlan: restore valid ref added in edit of 17:59, 9 August 2019 but erroneously altered in edit of 19:06, 18 March 2021 and subsequently deleted in edit of 19:52, 10 November 2022 ← Previous edit		Revision as of 18:00, 25 November 2022 edit undo Citation bot (talk \| contribs) Bots 5,031,220 edits Add: arxiv, citeseerx. \| Use this bot. Report bugs. \| Suggested by BorgQueen \| Category:All articles needing examples \| #UCB_Category 405/701 Next edit →
Line 2: {{Use dmy dates\|date=May 2017}} {{cleanup\|date=December 2010}} '''Plagiarism detection''' or '''content similarity detection''' is the process of locating instances of [[plagiarism]] or [[copyright infringement]] within a work or document. The widespread use of computers and the advent of the Internet have made it easier to plagiarize the work of others.<ref>{{Cite web \|title=Plagiarism, prevention, deterrence and detection \|url=https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.178&rep=rep1&type=pdf \|url-status=dead \|access-date=2022-11-11\|archive-url=https://web.archive.org/web/20210418111409/http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.107.178&rep=rep1&type=pdf \|archive-date=18 April 2021 \|last1=Culwin \|first1=Fintan \|last2=Lancaster \|first2=Thomas \|year=2001 \| citeseerx=10.1.1.107.178 \|via=[[Advance HE\|The Higher Education Academy]]}}{{cbignore}}</ref><ref name=":0">Bretag, T., & Mahmud, S. (2009). A model for determining student plagiarism: Electronic detection and academic judgement. ''Journal of University Teaching & Learning Practice, 6''(1). Retrieved from <nowiki>http://ro.uow.edu.au/jutlp/vol6/iss1/6</nowiki></ref> Detection of plagiarism can be undertaken in a variety of ways. Human detection is the most traditional form of identifying plagiarism from written work. This can be a lengthy and time-consuming task for the reader<ref name=":0" /> and can also result in inconsistencies in how plagiarism is identified within an organization.<ref>Macdonald, R., & Carroll, J. (2006). Plagiarism—a complex issue requiring a holistic institutional approach. ''Assessment & Evaluation in Higher Education, 31''(2), 233–245. {{doi\|10.1080/02602930500262536}}</ref> Text-matching software (TMS), which is also referred to as "plagiarism detection software" or "anti-plagiarism" software, has become widely available, in the form of both commercially available products as well as open-source{{Example needed\|s\|date=November 2020}} software. TMS does not actually detect plagiarism per se, but instead finds specific passages of text in one document that match text in another document. Line 42: =====Neural networks===== More recent approaches to assess content similarity using [[neural networks]] have achieved significantly greater accuracy, but come at great computational cost.<ref>{{Cite arXiv\|title=Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks \|eprint=1908.10084 \|last1=Reimers \|first1=Nils \|last2=Gurevych \|first2=Iryna \|year=2019 \|class=cs.CL }}</ref> Traditional neural network approaches embed both pieces of content into semantic vector embeddings to calculate their similarity, which is often their cosine similarity. More advanced methods perform end-to-end prediction of similarity or classifications using the [[Transformer (machine learning model)\|Transformer]] architecture.<ref>{{Cite journal \|last1=Lan \|first1=Wuwei \|last2=Xu \|first2=Wei \|date=2018 \|title=Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering \|url=https://aclanthology.org/C18-1328 \|journal=Proceedings of the 27th International Conference on Computational Linguistics \|location=Santa Fe, New Mexico, USA \|publisher=Association for Computational Linguistics \|pages=3890–3902}}</ref><ref>{{Citation \|last1=Wahle \|first1=Jan Philip \|title=Identifying Machine-Paraphrased Plagiarism \|date=2022 \|url=https://link.springer.com/10.1007/978-3-030-96957-8_34 \|work=Information for a Better World: Shaping the Global Future \|volume=13192 \|pages=393–413 \|editor-last=Smits \|editor-first=Malte \|place=Cham \|publisher=Springer International Publishing \|language=en \|doi=10.1007/978-3-030-96957-8_34 \|isbn=978-3-030-96956-1 \|access-date=2022-10-06 \|last2=Ruas \|first2=Terry \|last3=Foltýnek \|first3=Tomáš \|last4=Meuschke \|first4=Norman \|last5=Gipp \|first5=Bela\|arxiv=2103.11909 \|s2cid=232307572 }}</ref> Particularly [[Paraphrasing (computational linguistics)\|paraphrase detection]] benefits from highly parameterized pre-trained models. ====Performance====

Content similarity detection: Difference between revisions