Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Automatic Prediction of Comment Quality

Thesis (MSc)--Stellenbosch University, 2016

Saved in:

Bibliographic Details
Main Author:	Brand, Dirk Johannes
Other Authors:	Van der Merwe, Brink
Format:	Thesis
Language:	en_ZA
Published:	Stellenbosch : Stellenbosch University 2016
Subjects:	News media > Short text Webiste > Short text N-grams Computational probability Online user comments Computational linguistics Word embedding UCTD
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613839904735232
access_status_str	Open Access
author	Brand, Dirk Johannes
author2	Van der Merwe, Brink
author_browse	Brand, Dirk Johannes Van der Merwe, Brink
author_facet	Van der Merwe, Brink Brand, Dirk Johannes
author_sort	Brand, Dirk Johannes
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Thesis (MSc)--Stellenbosch University, 2016
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/98818
institution	Stellenbosch University (South Africa)
language	en_ZA
last_indexed	2026-06-10T12:42:31.964Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2016
publishDateRange	2016
publishDateSort	2016
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/98818 Automatic Prediction of Comment Quality Brand, Dirk Johannes Van der Merwe, Brink Kroon, R. S. (Steve) Cleophas, Loek Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Computer Science) News media -- Short text Webiste -- Short text N-grams Computational probability Online user comments Computational linguistics Word embedding UCTD Thesis (MSc)--Stellenbosch University, 2016 ENGLISH ABSTRACT : The problem of identifying and assessing the quality of short texts (e.g. comments, reviews or web searches) has been intensively studied. There are great bene ts to being able to analyse short texts. As an example, advertisers might be interested in the sentiment of product reviews on e-commerce sites to more e ciently pair marketing material to content. Analysing short texts is a di cult problem, because traditional machine learning models generally perform better on data sets with larger samples, which often translates to more features. More data allow for better estimation of parameters for these models. Short texts generally do not have much content, but still carry high variability in that they may still consist of a large corpus of words. This thesis investigates various methods for feature extraction for short texts in the context of online user comments. These methods include the leading manual feature extraction techniques for short texts, N-gram models and techniques based on word embeddings. The e ect of using di erent kernels for a support vector classi er is also investigated. The investigation is centred around two data sets, one provided by News24 and the other extracted from Slashdot.org. It was found that N-gram models performed relatively well, mostly outperforming manual feature extraction techniques. AFRIKAANSE OPSOMMING : Om die kwaliteit van kort tekste (bv. internet kommentaar, soektogte of resensies) te identi seer en te analiseer, is 'n probleem wat al redelik sorgvuldig in die navorsing bestudeer is. Daar is baie te baat by die vermo ë om die kwaliteit van aanlyn teks te analiseer. Byvoorbeeld, aanlyn winkels mag moontlik geinteresseerd wees in die sentiment van die verbruikers wat produkresensies gee oor hul produkte, aangesien dit kan help om meer akkurate bemarkings materiaal vir produkte te genereer. Analise van kort tekste is 'n uitdagende probleem, want tradisionele masjienleer algoritmes vaar gewoonlik beter op datastelle met meer kernmerke as wat kort tekste kan bied. Ryker datastelle laat toe vir meer akkurate skatting van model parameters. Hierdie tesis bestudeer verskeie metodes vir kenmerkkonstruksie van kort tekste in die konteks van aanlyn kommentaar. Die metodes sluit die voorstaande handgemaakde kenmerkkonstruksie tegnieke vir kort tekste, N-gram modelle en woordinbeddinge in. Die e ek van verskillende kernmetodes vir klassi kasie modelle word ook bestudeer. Die studie is gefokus rondom twee datastelle waarvan een deur News24 voorsien is en die ander vanaf Slashdot. org bekom is. Ons het gevind that N-gram modelle meestal beter presteer as die handgemaakde kenmerkkonstruksie tegnieke. 2016-03-09T15:05:34Z 2016-03-09T15:05:34Z 2016-03 Thesis http://hdl.handle.net/10019.1/98818 en_ZA Stellenbosch University ix, 116 pages : illustrations (chiefly colour) application/pdf Stellenbosch : Stellenbosch University
spellingShingle	News media -- Short text Webiste -- Short text N-grams Computational probability Online user comments Computational linguistics Word embedding UCTD Brand, Dirk Johannes Automatic Prediction of Comment Quality
title	Automatic Prediction of Comment Quality
title_full	Automatic Prediction of Comment Quality
title_fullStr	Automatic Prediction of Comment Quality
title_full_unstemmed	Automatic Prediction of Comment Quality
title_short	Automatic Prediction of Comment Quality
title_sort	automatic prediction of comment quality
topic	News media -- Short text Webiste -- Short text N-grams Computational probability Online user comments Computational linguistics Word embedding UCTD
url	http://hdl.handle.net/10019.1/98818
work_keys_str_mv	AT branddirkjohannes automaticpredictionofcommentquality

Full Text Available

Automatic Prediction of Comment Quality

Similar Items