Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Topic Modelling for Short Text

Dissertation (MSc)--University of Pretoria, 2015.

Saved in:
Bibliographic Details
Other Authors: De Waal, Annari
Format: Thesis
Language:English
Published: University of Pretoria 2015
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613601509933056
access_status_str Open Access
author2 De Waal, Annari
author_browse De Waal, Annari
author_facet De Waal, Annari
collection Thesis
dc_rights_str_mv © 2015 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description Dissertation (MSc)--University of Pretoria, 2015.
format Thesis
id oai:repository.up.ac.za:2263/50694
institution University of Pretoria (South Africa)
language English
last_indexed 2026-06-10T12:38:43.836Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate 2015
publishDateRange 2015
publishDateSort 2015
publisher University of Pretoria
publisherStr University of Pretoria
record_format dspace
source_str UPSpace — University of Pretoria Institutional Repository
spelling oai:repository.up.ac.za:2263/50694 Topic Modelling for Short Text De Waal, Annari u10220420@tuks.co.za Millard, Sollie M. Mazarura, Jocelyn Rangarirai UCTD Dissertation (MSc)--University of Pretoria, 2015. Over the past few years, our increased ability to store large amounts of data, coupled with the increasing accessibility of the internet, has created massive stores of digital information. Consequently, it has become increasingly challenging to find and extract relevant information, thus creating a need for tools that can effectively extract and summarize the information. One such tool, is topic modelling, which is a method of extracting hidden themes or topics in a large collection of documents. Information is stored in many forms, but of particular interest is the information stored as short text, which typically arises as posts on websites like Facebook and Twitter where people freely share their ideas, interests and opinions. With such a wealth in data and so many diverse users, such stores of short text could potentially provide useful information about public opinion and current trends, for instance. Unlike long text, like news and journal articles, one of the commonly known challenges of applying topic models on short text is the fact that it contains few words, which means that it may not contain sufficiently many meaningful words. The Latent Dirichlet Allocation (LDA) model is one of the most popular topic models and it makes the generative assumption that a document belongs to many topics. Conversely, the Multinomial Mixture (MM) model, another topic model, assumes a document can belong to at most one topic, which we believe is an intuitively sensible assumption for short text. Based on this key difference, we posit that the MM model should perform better than the LDA. To validate this hypothesis we compare the performance of the LDA and MM on two long text and two short text corpora, using coherence as our main performance measure. Our experiments reveal that the LDA model performs slightly better than the MM model on long text, whereas the MM performs better than the LDA model on short text. tm2015 Statistics MSc Unrestricted 2015-11-25T09:47:19Z 2015-11-25T09:47:19Z 2015/09/01 2015 Dissertation Mazarura, JR 2015, Topic Modelling for Short Text, MSc Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/50694> S2015 http://hdl.handle.net/2263/50694 en © 2015 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle UCTD
Topic Modelling for Short Text
title Topic Modelling for Short Text
title_full Topic Modelling for Short Text
title_fullStr Topic Modelling for Short Text
title_full_unstemmed Topic Modelling for Short Text
title_short Topic Modelling for Short Text
title_sort topic modelling for short text
topic UCTD
url http://hdl.handle.net/2263/50694