Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Robust principal component analysis biplots

Thesis (MSc (Mathematical Statistics))--University of Stellenbosch, 2008.

Saved in:
Bibliographic Details
Main Author: Wedlake, Ryan Stuart
Other Authors: Le Roux, N. J.
Format: Thesis
Language:English
Published: Stellenbosch : University of Stellenbosch 2008
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614019126296576
access_status_str Open Access
author Wedlake, Ryan Stuart
author2 Le Roux, N. J.
author_browse Le Roux, N. J.
Wedlake, Ryan Stuart
author_facet Le Roux, N. J.
Wedlake, Ryan Stuart
author_sort Wedlake, Ryan Stuart
collection Thesis
dc_rights_str_mv University of Stellenbosch
description Thesis (MSc (Mathematical Statistics))--University of Stellenbosch, 2008.
format Thesis
id oai:scholar.sun.ac.za:10019.1/2491
institution Stellenbosch University (South Africa)
language English
last_indexed 2026-06-10T12:45:22.846Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2008
publishDateRange 2008
publishDateSort 2008
publisher Stellenbosch : University of Stellenbosch
publisherStr Stellenbosch : University of Stellenbosch
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/2491 Robust principal component analysis biplots Wedlake, Ryan Stuart Le Roux, N. J. University of Stellenbosch. Faculty of Science. Dept. of Mathematical Sciences. Mathematical Statistics. Biplot Outliers Robust principal components Bootstrap Dissertations -- Statistics and actuarial science Theses -- Statistics and actuarial science Thesis (MSc (Mathematical Statistics))--University of Stellenbosch, 2008. In this study several procedures for finding robust principal components (RPCs) for low and high dimensional data sets are investigated in parallel with robust principal component analysis (RPCA) biplots. These RPCA biplots will be used for the simultaneous visualisation of the observations and variables in the subspace spanned by the RPCs. Chapter 1 contains: a brief overview of the difficulties that are encountered when graphically investigating patterns and relationships in multidimensional data and why PCA can be used to circumvent these difficulties; the objectives of this study; a summary of the work done in order to meet these objectives; certain results in matrix algebra that are needed throughout this study. In Chapter 2 the derivation of the classic sample principal components (SPCs) is first discussed in detail since they are the „building blocks‟ of classic principal component analysis (CPCA) biplots. Secondly, the traditional CPCA biplot of Gabriel (1971) is reviewed. Thirdly, modifications to this biplot using the new philosophy of Gower & Hand (1996) are given attention. Reasons why this modified biplot has several advantages over the traditional biplot – some of which are aesthetical in nature – are given. Lastly, changes that can be made to the Gower & Hand (1996) PCA biplot to optimally visualise the correlations between the variables is discussed. Because the SPCs determine the position of the observations as well as the orientation of the arrows (traditional biplot) or axes (Gower and Hand biplot) in the PCA biplot subspace, it is useful to give estimates of the standard errors of the SPCs together with the biplot display as an indication of the stability of the biplot. A computer-intensive statistical technique called the Bootstrap is firstly discussed that is used to calculate the standard errors of the SPCs without making underlying distributional assumptions. Secondly, the influence of outliers on Bootstrap results is investigated. Lastly, a robust form of the Bootstrap is briefly discussed for calculating standard error estimates that remain stable with or without the presence of outliers in the sample. All the preceding topics are the subject matter of Chapter 3. In Chapter 4, reasons why a PC analysis should be made robust in the presence of outliers are firstly discussed. Secondly, different types of outliers are discussed. Thirdly, a method for identifying influential observations and a method for identifying outlying observations are investigated. Lastly, different methods for constructing robust estimates of location and dispersion for the observations receive attention. These robust estimates are used in numerical procedures that calculate RPCs. In Chapter 5, an overview of some of the procedures that are used to calculate RPCs for lower and higher dimensional data sets is firstly discussed. Secondly, two numerical procedures that can be used to calculate RPCs for lower dimensional data sets are discussed and compared in detail. Details and examples of robust versions of the Gower & Hand (1996) PCA biplot that can be constructed using these RPCs are also provided. In Chapter 6, five numerical procedures for calculating RPCs for higher dimensional data sets are discussed in detail. Once RPCs have been obtained by using these methods, they are used to construct robust versions of the PCA biplot of Gower & Hand (1996). Details and examples of these robust PCA biplots are also provided. An extensive software library has been developed so that the biplot methodology discussed in this study can be used in practice. The functions in this library are given in an appendix at the end of this study. This software library is used on data sets from various fields so that the merit of the theory developed in this study can be visually appraised. 2008-06-24T12:02:08Z 2010-06-01T08:50:11Z 2008-06-24T12:02:08Z 2010-06-01T08:50:11Z 2008-03 Thesis http://hdl.handle.net/10019.1/2491 en University of Stellenbosch application/pdf Stellenbosch : University of Stellenbosch
spellingShingle Biplot
Outliers
Robust principal components
Bootstrap
Dissertations -- Statistics and actuarial science
Theses -- Statistics and actuarial science
Wedlake, Ryan Stuart
Robust principal component analysis biplots
title Robust principal component analysis biplots
title_full Robust principal component analysis biplots
title_fullStr Robust principal component analysis biplots
title_full_unstemmed Robust principal component analysis biplots
title_short Robust principal component analysis biplots
title_sort robust principal component analysis biplots
topic Biplot
Outliers
Robust principal components
Bootstrap
Dissertations -- Statistics and actuarial science
Theses -- Statistics and actuarial science
url http://hdl.handle.net/10019.1/2491
work_keys_str_mv AT wedlakeryanstuart robustprincipalcomponentanalysisbiplots