Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Preconditioning for feature selection in classification

Thesis (MCom)--Stellenbosch University, 2019.

Saved in:
Bibliographic Details
Main Author: Pretorius, Jani
Other Authors: Steel, S. J.
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2019
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613979481735168
access_status_str Open Access
author Pretorius, Jani
author2 Steel, S. J.
author_browse Pretorius, Jani
Steel, S. J.
author_facet Steel, S. J.
Pretorius, Jani
author_sort Pretorius, Jani
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MCom)--Stellenbosch University, 2019.
format Thesis
id oai:scholar.sun.ac.za:10019.1/106057
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:44:44.746Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2019
publishDateRange 2019
publishDateSort 2019
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/106057 Preconditioning for feature selection in classification Pretorius, Jani Steel, S. J. Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science. High-dimensional data -- Statistical methods Preconditioning Statistical learning theory Supervised learning (Machine learning) Dimension reduction (Statistics) Variables (Mathematics) -- Statistical methods Predictive modeling Discriminant analysis UCTD Thesis (MCom)--Stellenbosch University, 2019. ENGLISH SUMMARY : Increased dimensionality of data is a clear trend that has been observed over the past few decades. However, analysing high-dimensional data in order to predict an outcome can be problematic. In certain cases, such as when analysing genomic data, a predictive model that is both interpretable and accurate is required. Many techniques focus on solving these two components simultaneously; however, when the data are high-dimensional and noisy, such an approach may perform poorly. Preconditioning is a two-stage technique that aims to reduce the noise inherent in the training data before making final predictions. In doing so, it addresses the issues of interpretability and accuracy separately. The literature on this technique focuses on the regression case, but in this thesis, the technique is applied in a classification setting. An overview of the theory surrounding this method is provided, as well as an empirical analysis of the method. A simulation study evaluates the performance of the technique under various scenarios and compare the results to those obtained by standard (non-preconditioned) models. Thereafter, the models are applied to real-world datasets and their performances compared. Based on the results of the empirical work, it appears that, at their best, preconditioned classifiers can only reach a performance that is on par with standard classifiers. This is in contrast to the regression case, where the literature has shown that preconditioning can outperform standard regression models in high-dimensional settings. AFRIKAANSE OPSOMMING : ’n Toename in die dimensionaliteit van datasetelle is ’n duidelike tendens wat oor die afgelope paar dekades na voorskyn gekom het. Om hoër-dimensionele data te analiseer sodat ’n uitkoms voorspel kan word, kan problematies wees. In sekere gevalle, soos wanneer genetiese data geanaliseer word, word ’n voorspellende model wat beide interpreteerbaar, sowel as akkuraat is, verlang. Baie tegnieke fokus daarop om hierdie twee aspekte gelyktydig op te los, maar wanneer die data van ’n hoë dimensie is en geruis bevat, kan hierdie benadering swak resultate oplewer. Prekondisionering is ’n twee-fase prosess wat daarop gemik is om die geruis in die afrigdatastel te verminder voordat ’n finale voorspelling gemaak word. Sodoende spreek dit die kwessies van interpreteerbaarheid en akkuraatheid afsonderlik aan. In die literatuur word daar klem gelê op die regressie geval. In hierdie tesis word die tegniek egter toegepas in ’n klassifikasie konteks. ’n Oorsig van die teorie aangaande hierdie metode word verskaf, sowel as empiriese studies. Simulasie studies evalueer die prestasie van die tegniek onder verskeie omstandighede en vergelyk die uitkomste met dié wat deur standaard (nie-geprekondisioneerde) modelle behaal was. Daarna word die modelle toegepas op regte-wêreld datastelle en hul resultate vergelyk. Gebaseer op die resultate van die empiriese werk wil dit blyk asof geprekondisioneerde klassifikasiemodelle, op hul beste, slegs so goed as standaard klassifikasiemodelle kan presteer. Hierdie bevindinge staan in kontras met die regressie geval, waar die literatuur wys dat prekondisionering standaard regressiemodelle kan uitpresteer in hoë dimensionele gevalle. Masters 2019-02-25T16:54:29Z 2019-04-17T08:26:10Z 2019-02-25T16:54:29Z 2019-04-17T08:26:10Z 2019-04 Thesis http://hdl.handle.net/10019.1/106057 en_ZA Stellenbosch University xvi, 128 pages ; illustrations, includes annexures application/pdf Stellenbosch : Stellenbosch University
spellingShingle High-dimensional data -- Statistical methods
Preconditioning
Statistical learning theory
Supervised learning (Machine learning)
Dimension reduction (Statistics)
Variables (Mathematics) -- Statistical methods
Predictive modeling
Discriminant analysis
UCTD
Pretorius, Jani
Preconditioning for feature selection in classification
title Preconditioning for feature selection in classification
title_full Preconditioning for feature selection in classification
title_fullStr Preconditioning for feature selection in classification
title_full_unstemmed Preconditioning for feature selection in classification
title_short Preconditioning for feature selection in classification
title_sort preconditioning for feature selection in classification
topic High-dimensional data -- Statistical methods
Preconditioning
Statistical learning theory
Supervised learning (Machine learning)
Dimension reduction (Statistics)
Variables (Mathematics) -- Statistical methods
Predictive modeling
Discriminant analysis
UCTD
url http://hdl.handle.net/10019.1/106057
work_keys_str_mv AT pretoriusjani preconditioningforfeatureselectioninclassification