Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MCom)--Stellenbosch University, 2022.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | en_ZA |
| Published: |
Stellenbosch : Stellenbosch University
2022
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867614112247185408 |
|---|---|
| access_status_str | Open Access |
| author | Van Dyk, Wilmari |
| author2 | Van der Merwe, Carel |
| author_browse | Van Dyk, Wilmari Van der Merwe, Carel |
| author_facet | Van der Merwe, Carel Van Dyk, Wilmari |
| author_sort | Van Dyk, Wilmari |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (MCom)--Stellenbosch University, 2022. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/126109 |
| institution | Stellenbosch University (South Africa) |
| language | en_ZA |
| last_indexed | 2026-06-10T12:46:51.765Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2022 |
| publishDateRange | 2022 |
| publishDateSort | 2022 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/126109 Visualising data through biplots using Categorical PCA and clustering Van Dyk, Wilmari Van der Merwe, Carel Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science. Cluster analysis Principal Component Analysis Multivariate analysis UCTD Thesis (MCom)--Stellenbosch University, 2022. ENGLISH SUMMARY: Handling large data sets have become an everyday occurrence and the need for efficiently processing and interpreting data have increased tremendously over the last couple of years. The easiest way to interpret data quickly is to have a visual representation of the data. Since data is often multidimensional, the use of biplots have become more frequent. Biplots are a tool that allows for multidimensional data to be displayed on a two- or three-dimensional graph. The first step in constructing such a plot would be to apply some dimension reduction technique to transform a data set from a high dimensional space to a lower dimensional space. Depending on the type of data that needs to be transformed, the most often used dimension reduction techniques are principal component analysis (PCA) for continuous data or multiple correspondence analysis (MCA) for categorical data. When conducting unsupervised learning, inferences need to be made on a data set regarding the relationships among the different variables. Clustering is very useful for this purpose. There are various clustering techniques that can be used to cluster data, depending on the type of data that needs to be analysed. More specifically, for continuous data, reduced k-means, or factorial k-means can be used and for categorical data, MCA k-means, cluster correspondence analysis, and iterative factorial clustering are often used. The purpose of this assignment is to develop a R-function that can apply some dimension reduction and clustering techniques to categorical data to transform the data in such a way that it can be represented on a biplot and inference can be made regarding certain relationships within the data. Categorical PCA will be used as a dimension reduction technique to transform the data from a higher dimension to a lower dimension. Since Categorical PCA gives scores to the category levels by focusing on individual categories, the categories become numerical which means they can be displayed on straight line axes. While the dimension reduction takes place, the function will also attempt to cluster the data using either reduced k-means or factorial k-means. After the data is transformed, it can be displayed on a biplot with many additional features to enhance the biplot. AFRIKAANSE OPSOMMING: Die hantering van groot datastelle het ’n alledaagse gebeurtenis geword en die behoefte aan doeltreffende verwerking en interpretasie van data het oor die afgelope paar jaar geweldig toegeneem. Die maklikste manier om data vinnig te interpreteer, is om ’n visuele voorstelling van die data te he. Aangesien data dikwels multidimensioneel is, het die gebruik van bistippings meer algemeen geword. Bistippings is ’n tegniek wat dit moontlik maak om multidimensionele data op ’n twee- of drie-dimensionele grafiek voor te stel. Die eerste stap sal wees om een of ander dimensieverminderingstegniek toe te pas om die data van ’n hoe dimensionele ruimte na ’n laer dimensionele ruimte te transformeer. Afhangende van die tipe data wat getransformeer moet word, is die mees algemeenste dimensieverminderingstegnieke hoofkomponentanalise vir numeriese data of meervoudige korrespondensie analise vir kategoriese data. Wanneer daar met data gewerk word wat nie ’n onafhanklike veranderlike het nie, moet afleidings gemaak word oor die verwantskappe tussen die verskillende veranderlikes. Groepering is baie nuttig vir hierdie doel. Daar is verskeie groeperingstegnieke wat gebruik kan word om data te groepeer, afhangende van die tipe data. Meer spesifiek, vir deurlopende data, kan verminderde k-gemiddelde of faktoriale k-gemiddelde gebruik word. Vir kategoriese data kan meervoudige korrespondensie analise k-gemiddelde, kluster korrespondensie analise en iteratiewe faktoriale groepering gebruik word. Die doel van hierdie werkopdrag is om ’n R-funksie te ontwikkel wat een of ander dimensieverminderings- en groeperingstegniek op kategoriese data kan toepas om die data so te transformeer dat dit op ’n bistipping voorgestel kan word. Sekere afleidings kan dan gemaak word oor moontlike verwantskappe binne die data. Kategoriese hoofkomponentanalise sal gebruik word as ’n dimensieverminderingstegniek om die data van ’n hoer dimensie na ’n laer dimensie te transformeer. Aangesien Kategoriese hoofkomponentanalise tellings aan die kategorievlakke gee deur op individuele kategoriee te fokus, word die kategoriee numeries wat beteken die data kan op reguitlyn-asse voorgestel word. Terwyl die dimensievermindering plaasvind, sal die funksie ook probeer om die data te groepeer deur of verminderde k-gemiddelde of faktoriale k-gemiddelde te gebruik. Nadat die data getransformeer is, kan dit op ’n bistipping voorgestel word met baie bykomende funksies om die voorstelling van die bistipping te verbeter. Masters 2022-11-21T15:12:53Z 2023-01-16T12:50:31Z 2022-11-21T15:12:53Z 2023-01-16T12:50:31Z 2022-12 Thesis http://hdl.handle.net/10019.1/126109 en_ZA Stellenbosch University xii, 88 pages : illustrations, includes annexures application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Cluster analysis Principal Component Analysis Multivariate analysis UCTD Van Dyk, Wilmari Visualising data through biplots using Categorical PCA and clustering |
| title | Visualising data through biplots using Categorical PCA and clustering |
| title_full | Visualising data through biplots using Categorical PCA and clustering |
| title_fullStr | Visualising data through biplots using Categorical PCA and clustering |
| title_full_unstemmed | Visualising data through biplots using Categorical PCA and clustering |
| title_short | Visualising data through biplots using Categorical PCA and clustering |
| title_sort | visualising data through biplots using categorical pca and clustering |
| topic | Cluster analysis Principal Component Analysis Multivariate analysis UCTD |
| url | http://hdl.handle.net/10019.1/126109 |
| work_keys_str_mv | AT vandykwilmari visualisingdatathroughbiplotsusingcategoricalpcaandclustering |