Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MSc)--Stellenbosch University, 2018.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | en_ZA |
| Published: |
Stellenbosch : Stellenbosch University
2018
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613774666530816 |
|---|---|
| access_status_str | Open Access |
| author | Thom, Jacobus Daniel |
| author2 | Van der Merwe, A. B. |
| author_browse | Thom, Jacobus Daniel Van der Merwe, A. B. |
| author_facet | Van der Merwe, A. B. Thom, Jacobus Daniel |
| author_sort | Thom, Jacobus Daniel |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (MSc)--Stellenbosch University, 2018. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/103550 |
| institution | Stellenbosch University (South Africa) |
| language | en_ZA |
| last_indexed | 2026-06-10T12:41:29.531Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2018 |
| publishDateRange | 2018 |
| publishDateSort | 2018 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/103550 Combining tree kernels and text embeddings for plagiarism detection Thom, Jacobus Daniel Van der Merwe, A. B. Kroon, R. S. (Steve) Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Computer Science) Text embeddings Plagiarism -- Detection Tree kernels Syntactic structures Semantic structures Thesis (MSc)--Stellenbosch University, 2018. ENGLISH ABSTRACT : The internet allows for vast amounts of information to be accessed with ease. Consequently, it becomes much easier to plagiarize any of this information as well. Most plagiarism detection techniques rely on n-grams to find similarities between suspicious documents and possible sources. N-grams, due to their simplicity, do not make full use of all the syntactic and semantic information contained in sentences. We therefore investigated two methods, namely tree kernels applied to the parse trees of sentences and text embeddings, to utilize more syntactic and semantic information respectively. A plagiarism detector was developed using these techniques and its effectiveness was tested on the PAN 2009 and 2011 external plagiarism corpora. The detector achieved results that were on par with the state of the art for both PAN 2009 and PAN 2011. This indicates that the combination of tree kernel and text embedding techniques is a viable method of plagiarism detection. AFRIKAANSE OPSOMMING : Die internet laat mens toe om groot hoeveelhede inligting maklik in die hande te kry. Gevolglik word dit ook baie makliker om plagiaat op enige van hierdie inligting te pleeg. Meeste plagiaatopsporingstegnieke maak staat op n-gramme om ooreenkomste tussen verdagte dokumente en moontlike bronne op te spoor. Aangesien n-gramme taamlik eenvoudig is, maak hulle nie volle gebruik van al die syntaktiese en semantiese inligting wat sinne bevat nie. Ons ondersoek dus twee metodes, naamlik boomkernfunksies, wat toegepas word op die ontledingsbome van sinne, en teksinbeddings, om onderskeidelik meer sintaktiese en semantiese inligting te gebruik. 'n Plagiaatdetektor is ontwikkel met behulp van hierdie twee tegnieke en die e ektiwiteit daarvan is getoets op die PAN 2009 en 2011 eksterne plagiaatkorpora. Die detektor het resultate behaal wat vergelykbaar was met die beste vir beide PAN 2009 en PAN 2011. Dit dui aan dat die kombinasie van boomkern- en teksinbeddingstegnieke 'n redelike metode van plagiaatopsporing is. 2018-02-20T18:20:45Z 2018-04-09T07:00:11Z 2018-02-20T18:20:45Z 2018-04-09T07:00:11Z 2018-03 Thesis http://hdl.handle.net/10019.1/103550 en_ZA Stellenbosch University xii, 73 pages : illustrations (some colour) application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Text embeddings Plagiarism -- Detection Tree kernels Syntactic structures Semantic structures Thom, Jacobus Daniel Combining tree kernels and text embeddings for plagiarism detection |
| title | Combining tree kernels and text embeddings for plagiarism detection |
| title_full | Combining tree kernels and text embeddings for plagiarism detection |
| title_fullStr | Combining tree kernels and text embeddings for plagiarism detection |
| title_full_unstemmed | Combining tree kernels and text embeddings for plagiarism detection |
| title_short | Combining tree kernels and text embeddings for plagiarism detection |
| title_sort | combining tree kernels and text embeddings for plagiarism detection |
| topic | Text embeddings Plagiarism -- Detection Tree kernels Syntactic structures Semantic structures |
| url | http://hdl.handle.net/10019.1/103550 |
| work_keys_str_mv | AT thomjacobusdaniel combiningtreekernelsandtextembeddingsforplagiarismdetection |