Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

An unsupervised approach to COVID-19 fake tweet detection

Context: With the ongoing COVID-19 pandemic, social media platforms have become a crucial source of information. However, not all information shared on these platforms is accurate. The dissemination of fake news, intentional or unintentional, can lead to panic among readers and further exacerbate th...

Full description

Saved in:

Bibliographic Details
Main Author:	Jarana, Bulungisa
Other Authors:	Ngwenya, Mzabalazo
Format:	Thesis
Language:	Eng
Published:	Department of Statistical Sciences 2024
Subjects:	Statistical Sciences
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613181468213248
access_status_str	Open Access
author	Jarana, Bulungisa
author2	Ngwenya, Mzabalazo
author_browse	Jarana, Bulungisa Ngwenya, Mzabalazo
author_facet	Ngwenya, Mzabalazo Jarana, Bulungisa
author_sort	Jarana, Bulungisa
collection	Thesis
description	Context: With the ongoing COVID-19 pandemic, social media platforms have become a crucial source of information. However, not all information shared on these platforms is accurate. The dissemination of fake news, intentional or unintentional, can lead to panic among readers and further exacerbate the effects of the pandemic. Objectives: This research project aims to explore the potential of unsupervised machine learning algorithms in differentiating between genuine and fake COVID-19 news shared on Twitter. The methodology includes a literature review, experimental analysis, and the utilization of a Twitter dataset. Methods: The study used both Mini-Batch K-means and K-means algorithms of clustering techniques to provide us with ‘grouping' of Twitter data in the two of clusters. Word embedding techniques such as TF-IDF, Word2Vec, and BERT were employed because machine learning models cannot process unprocessed text data directly, and word embedding resolves this issue. Results: The results on the test data show that K-means algorithm was the best performing algorithm (76% accuracy was achieved) in determining fake tweets about Covid-19. K-means algorithm using Bert word embedding is the best performing model followed by Mini-Batch K-means using TF-IDF word embedding (69% accuracy was achieved). Conclusions: The study demonstrates that clustering Twitter COVID-19 news as genuine or fake using K-means and Mini-Batch K-means algorithms is feasible Keywords: Clustering, Machine Learning, unsupervised learning, K-Means, MiniBatch K-Means, TF-IDF, Word2Vec, Bert, Confusion Matrix, Truncated SVD (Singular Value Decomposition), t-distributed stochastic neighbourhood embedding (t-SNE)
format	Thesis
id	oai:open.uct.ac.za:11427/40266
institution	University of Cape Town (South Africa)
language	Eng
last_indexed	2026-06-10T12:32:03.909Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate	2024
publishDateRange	2024
publishDateSort	2024
publisher	Department of Statistical Sciences
publisherStr	Department of Statistical Sciences
record_format	dspace
source_str	UCTD — University of Cape Town Open Access Repository
spelling	oai:open.uct.ac.za:11427/40266 An unsupervised approach to COVID-19 fake tweet detection Jarana, Bulungisa Ngwenya, Mzabalazo Statistical Sciences Context: With the ongoing COVID-19 pandemic, social media platforms have become a crucial source of information. However, not all information shared on these platforms is accurate. The dissemination of fake news, intentional or unintentional, can lead to panic among readers and further exacerbate the effects of the pandemic. Objectives: This research project aims to explore the potential of unsupervised machine learning algorithms in differentiating between genuine and fake COVID-19 news shared on Twitter. The methodology includes a literature review, experimental analysis, and the utilization of a Twitter dataset. Methods: The study used both Mini-Batch K-means and K-means algorithms of clustering techniques to provide us with ‘grouping' of Twitter data in the two of clusters. Word embedding techniques such as TF-IDF, Word2Vec, and BERT were employed because machine learning models cannot process unprocessed text data directly, and word embedding resolves this issue. Results: The results on the test data show that K-means algorithm was the best performing algorithm (76% accuracy was achieved) in determining fake tweets about Covid-19. K-means algorithm using Bert word embedding is the best performing model followed by Mini-Batch K-means using TF-IDF word embedding (69% accuracy was achieved). Conclusions: The study demonstrates that clustering Twitter COVID-19 news as genuine or fake using K-means and Mini-Batch K-means algorithms is feasible Keywords: Clustering, Machine Learning, unsupervised learning, K-Means, MiniBatch K-Means, TF-IDF, Word2Vec, Bert, Confusion Matrix, Truncated SVD (Singular Value Decomposition), t-distributed stochastic neighbourhood embedding (t-SNE) 2024-07-04T13:37:19Z 2024-07-04T13:37:19Z 2024 2024-07-03T13:39:11Z Thesis / Dissertation Masters MSc http://hdl.handle.net/11427/40266 Eng application/pdf Department of Statistical Sciences Faculty of Science
spellingShingle	Statistical Sciences Jarana, Bulungisa An unsupervised approach to COVID-19 fake tweet detection
thesis_degree_str	Master's
title	An unsupervised approach to COVID-19 fake tweet detection
title_full	An unsupervised approach to COVID-19 fake tweet detection
title_fullStr	An unsupervised approach to COVID-19 fake tweet detection
title_full_unstemmed	An unsupervised approach to COVID-19 fake tweet detection
title_short	An unsupervised approach to COVID-19 fake tweet detection
title_sort	unsupervised approach to covid 19 fake tweet detection
topic	Statistical Sciences
url	http://hdl.handle.net/11427/40266
work_keys_str_mv	AT jaranabulungisa anunsupervisedapproachtocovid19faketweetdetection AT jaranabulungisa unsupervisedapproachtocovid19faketweetdetection

Full Text Available

An unsupervised approach to COVID-19 fake tweet detection

Similar Items