Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Fast data analysis methods for social media data

Dissertation (MSc)--University of Pretoria, 2019.

Saved in:
Bibliographic Details
Other Authors: Lutu, Patricia Elizabeth Nalwoga
Format: Thesis
Language:English
Published: University of Pretoria 2019
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613695196004352
access_status_str Open Access
author2 Lutu, Patricia Elizabeth Nalwoga
author_browse Lutu, Patricia Elizabeth Nalwoga
author_facet Lutu, Patricia Elizabeth Nalwoga
collection Thesis
dc_rights_str_mv © 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description Dissertation (MSc)--University of Pretoria, 2019.
format Thesis
id oai:repository.up.ac.za:2263/72546
institution University of Pretoria (South Africa)
language English
last_indexed 2026-06-10T12:40:13.972Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate 2019
publishDateRange 2019
publishDateSort 2019
publisher University of Pretoria
publisherStr University of Pretoria
record_format dspace
source_str UPSpace — University of Pretoria Institutional Repository
spelling oai:repository.up.ac.za:2263/72546 Fast data analysis methods for social media data Lutu, Patricia Elizabeth Nalwoga valezw@gmail.com Nhlabano, Valentine Velaphi Big data Machine learning Sentiment analysis Text mining Apache Hadoop UCTD Engineering, built environment and information technology theses SDG-09 Dissertation (MSc)--University of Pretoria, 2019. The advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database. A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time. National Research Foundation (NRF) - Scarce skills bs2026 Computer Science MSc Unrestricted SDG-09: Industry, innovation and infrastructure 2019-12-09T08:55:16Z 2019-12-09T08:55:16Z 2019-12-15 2018-08-07 Dissertation Nhlabano, VV 2018, Fast Data Analysis Methods For Social Media Data, MSc Dissertation, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/72546> A2020 http://hdl.handle.net/2263/72546 en © 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle Big data
Machine learning
Sentiment analysis
Text mining
Apache Hadoop
UCTD
Engineering, built environment and information technology theses SDG-09
Fast data analysis methods for social media data
title Fast data analysis methods for social media data
title_full Fast data analysis methods for social media data
title_fullStr Fast data analysis methods for social media data
title_full_unstemmed Fast data analysis methods for social media data
title_short Fast data analysis methods for social media data
title_sort fast data analysis methods for social media data
topic Big data
Machine learning
Sentiment analysis
Text mining
Apache Hadoop
UCTD
Engineering, built environment and information technology theses SDG-09
url http://hdl.handle.net/2263/72546