Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Visualising data through biplots using Categorical PCA and clustering

Thesis (MCom)--Stellenbosch University, 2022.

Saved in:

Bibliographic Details
Main Author:	Van Dyk, Wilmari
Other Authors:	Van der Merwe, Carel
Format:	Thesis
Language:	en_ZA
Published:	Stellenbosch : Stellenbosch University 2022
Subjects:	Cluster analysis Principal Component Analysis Multivariate analysis UCTD
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867614112247185408
access_status_str	Open Access
author	Van Dyk, Wilmari
author2	Van der Merwe, Carel
author_browse	Van Dyk, Wilmari Van der Merwe, Carel
author_facet	Van der Merwe, Carel Van Dyk, Wilmari
author_sort	Van Dyk, Wilmari
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Thesis (MCom)--Stellenbosch University, 2022.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/126109
institution	Stellenbosch University (South Africa)
language	en_ZA
last_indexed	2026-06-10T12:46:51.765Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2022
publishDateRange	2022
publishDateSort	2022
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/126109 Visualising data through biplots using Categorical PCA and clustering Van Dyk, Wilmari Van der Merwe, Carel Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science. Cluster analysis Principal Component Analysis Multivariate analysis UCTD Thesis (MCom)--Stellenbosch University, 2022. ENGLISH SUMMARY: Handling large data sets have become an everyday occurrence and the need for efficiently processing and interpreting data have increased tremendously over the last couple of years. The easiest way to interpret data quickly is to have a visual representation of the data. Since data is often multidimensional, the use of biplots have become more frequent. Biplots are a tool that allows for multidimensional data to be displayed on a two- or three-dimensional graph. The first step in constructing such a plot would be to apply some dimension reduction technique to transform a data set from a high dimensional space to a lower dimensional space. Depending on the type of data that needs to be transformed, the most often used dimension reduction techniques are principal component analysis (PCA) for continuous data or multiple correspondence analysis (MCA) for categorical data. When conducting unsupervised learning, inferences need to be made on a data set regarding the relationships among the different variables. Clustering is very useful for this purpose. There are various clustering techniques that can be used to cluster data, depending on the type of data that needs to be analysed. More specifically, for continuous data, reduced k-means, or factorial k-means can be used and for categorical data, MCA k-means, cluster correspondence analysis, and iterative factorial clustering are often used. The purpose of this assignment is to develop a R-function that can apply some dimension reduction and clustering techniques to categorical data to transform the data in such a way that it can be represented on a biplot and inference can be made regarding certain relationships within the data. Categorical PCA will be used as a dimension reduction technique to transform the data from a higher dimension to a lower dimension. Since Categorical PCA gives scores to the category levels by focusing on individual categories, the categories become numerical which means they can be displayed on straight line axes. While the dimension reduction takes place, the function will also attempt to cluster the data using either reduced k-means or factorial k-means. After the data is transformed, it can be displayed on a biplot with many additional features to enhance the biplot. AFRIKAANSE OPSOMMING: Die hantering van groot datastelle het ’n alledaagse gebeurtenis geword en die behoefte aan doeltreffende verwerking en interpretasie van data het oor die afgelope paar jaar geweldig toegeneem. Die maklikste manier om data vinnig te interpreteer, is om ’n visuele voorstelling van die data te he. Aangesien data dikwels multidimensioneel is, het die gebruik van bistippings meer algemeen geword. Bistippings is ’n tegniek wat dit moontlik maak om multidimensionele data op ’n twee- of drie-dimensionele grafiek voor te stel. Die eerste stap sal wees om een of ander dimensieverminderingstegniek toe te pas om die data van ’n hoe dimensionele ruimte na ’n laer dimensionele ruimte te transformeer. Afhangende van die tipe data wat getransformeer moet word, is die mees algemeenste dimensieverminderingstegnieke hoofkomponentanalise vir numeriese data of meervoudige korrespondensie analise vir kategoriese data. Wanneer daar met data gewerk word wat nie ’n onafhanklike veranderlike het nie, moet afleidings gemaak word oor die verwantskappe tussen die verskillende veranderlikes. Groepering is baie nuttig vir hierdie doel. Daar is verskeie groeperingstegnieke wat gebruik kan word om data te groepeer, afhangende van die tipe data. Meer spesifiek, vir deurlopende data, kan verminderde k-gemiddelde of faktoriale k-gemiddelde gebruik word. Vir kategoriese data kan meervoudige korrespondensie analise k-gemiddelde, kluster korrespondensie analise en iteratiewe faktoriale groepering gebruik word. Die doel van hierdie werkopdrag is om ’n R-funksie te ontwikkel wat een of ander dimensieverminderings- en groeperingstegniek op kategoriese data kan toepas om die data so te transformeer dat dit op ’n bistipping voorgestel kan word. Sekere afleidings kan dan gemaak word oor moontlike verwantskappe binne die data. Kategoriese hoofkomponentanalise sal gebruik word as ’n dimensieverminderingstegniek om die data van ’n hoer dimensie na ’n laer dimensie te transformeer. Aangesien Kategoriese hoofkomponentanalise tellings aan die kategorievlakke gee deur op individuele kategoriee te fokus, word die kategoriee numeries wat beteken die data kan op reguitlyn-asse voorgestel word. Terwyl die dimensievermindering plaasvind, sal die funksie ook probeer om die data te groepeer deur of verminderde k-gemiddelde of faktoriale k-gemiddelde te gebruik. Nadat die data getransformeer is, kan dit op ’n bistipping voorgestel word met baie bykomende funksies om die voorstelling van die bistipping te verbeter. Masters 2022-11-21T15:12:53Z 2023-01-16T12:50:31Z 2022-11-21T15:12:53Z 2023-01-16T12:50:31Z 2022-12 Thesis http://hdl.handle.net/10019.1/126109 en_ZA Stellenbosch University xii, 88 pages : illustrations, includes annexures application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Cluster analysis Principal Component Analysis Multivariate analysis UCTD Van Dyk, Wilmari Visualising data through biplots using Categorical PCA and clustering
title	Visualising data through biplots using Categorical PCA and clustering
title_full	Visualising data through biplots using Categorical PCA and clustering
title_fullStr	Visualising data through biplots using Categorical PCA and clustering
title_full_unstemmed	Visualising data through biplots using Categorical PCA and clustering
title_short	Visualising data through biplots using Categorical PCA and clustering
title_sort	visualising data through biplots using categorical pca and clustering
topic	Cluster analysis Principal Component Analysis Multivariate analysis UCTD
url	http://hdl.handle.net/10019.1/126109
work_keys_str_mv	AT vandykwilmari visualisingdatathroughbiplotsusingcategoricalpcaandclustering

Full Text Available

Visualising data through biplots using Categorical PCA and clustering

Similar Items