Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Nearest hypersphere classification : a comparison with other classification techniques

Thesis (MCom)--Stellenbosch University, 2014.

Saved in:
Bibliographic Details
Main Author: Van der Westhuizen, Cornelius Stephanus
Other Authors: Lamont, M. M. C.
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2015
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613867698290688
access_status_str Open Access
author Van der Westhuizen, Cornelius Stephanus
author2 Lamont, M. M. C.
author_browse Lamont, M. M. C.
Van der Westhuizen, Cornelius Stephanus
author_facet Lamont, M. M. C.
Van der Westhuizen, Cornelius Stephanus
author_sort Van der Westhuizen, Cornelius Stephanus
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MCom)--Stellenbosch University, 2014.
format Thesis
id oai:scholar.sun.ac.za:10019.1/95839
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:42:57.574Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2015
publishDateRange 2015
publishDateSort 2015
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/95839 Nearest hypersphere classification : a comparison with other classification techniques Van der Westhuizen, Cornelius Stephanus Lamont, M. M. C. Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science. UCTD Dissertations -- Statistics and actuarial science Theses -- Statistics and actuarial science Classification Machine learning Kernel functions Thesis (MCom)--Stellenbosch University, 2014. ENGLISH ABSTRACT: Classification is a widely used statistical procedure to classify objects into two or more classes according to some rule which is based on the input variables. Examples of such techniques are Linear and Quadratic Discriminant Analysis (LDA and QDA). However, classification of objects with these methods can get complicated when the number of input variables in the data become too large (􀝊 ≪ 􀝌), when the assumption of normality is no longer met or when classes are not linearly separable. Vapnik et al. (1995) introduced the Support Vector Machine (SVM), a kernel-based technique, which can perform classification in cases where LDA and QDA are not valid. SVM makes use of an optimal separating hyperplane and a kernel function to derive a rule which can be used for classifying objects. Another kernel-based technique was proposed by Tax and Duin (1999) where a hypersphere is used for domain description of a single class. The idea of a hypersphere for a single class can be easily extended to classification when dealing with multiple classes by just classifying objects to the nearest hypersphere. Although the theory of hyperspheres is well developed, not much research has gone into using hyperspheres for classification and the performance thereof compared to other classification techniques. In this thesis we will give an overview of Nearest Hypersphere Classification (NHC) as well as provide further insight regarding the performance of NHC compared to other classification techniques (LDA, QDA and SVM) under different simulation configurations. We begin with a literature study, where the theory of the classification techniques LDA, QDA, SVM and NHC will be dealt with. In the discussion of each technique, applications in the statistical software R will also be provided. An extensive simulation study is carried out to compare the performance of LDA, QDA, SVM and NHC for the two-class case. Various data scenarios will be considered in the simulation study. This will give further insight in terms of which classification technique performs better under the different data scenarios. Finally, the thesis ends with the comparison of these techniques on real-world data. AFRIKAANSE OPSOMMING: Klassifikasie is ’n statistiese metode wat gebruik word om objekte in twee of meer klasse te klassifiseer gebaseer op ’n reël wat gebou is op die onafhanklike veranderlikes. Voorbeelde van hierdie metodes sluit in Lineêre en Kwadratiese Diskriminant Analise (LDA en KDA). Wanneer die aantal onafhanklike veranderlikes in ’n datastel te veel raak, die aanname van normaliteit nie meer geld nie of die klasse nie meer lineêr skeibaar is nie, raak die toepassing van metodes soos LDA en KDA egter te moeilik. Vapnik et al. (1995) het ’n kern gebaseerde metode bekendgestel, die Steun Vektor Masjien (SVM), wat wel vir klassifisering gebruik kan word in situasies waar metodes soos LDA en KDA misluk. SVM maak gebruik van ‘n optimale skeibare hipervlak en ’n kern funksie om ’n reël af te lei wat gebruik kan word om objekte te klassifiseer. ’n Ander kern gebaseerde tegniek is voorgestel deur Tax and Duin (1999) waar ’n hipersfeer gebruik kan word om ’n gebied beskrywing op te stel vir ’n datastel met net een klas. Dié idee van ’n enkele klas wat beskryf kan word deur ’n hipersfeer, kan maklik uitgebrei word na ’n multi-klas klassifikasie probleem. Dit kan gedoen word deur slegs die objekte te klassifiseer na die naaste hipersfeer. Alhoewel die teorie van hipersfere goed ontwikkeld is, is daar egter nog nie baie navorsing gedoen rondom die gebruik van hipersfere vir klassifikasie nie. Daar is ook nog nie baie gekyk na die prestasie van hipersfere in vergelyking met ander klassifikasie tegnieke nie. In hierdie tesis gaan ons ‘n oorsig gee van Naaste Hipersfeer Klassifikasie (NHK) asook verdere insig in terme van die prestasie van NHK in vergelyking met ander klassifikasie tegnieke (LDA, KDA en SVM) onder sekere simulasie konfigurasies. Ons gaan begin met ‘n literatuurstudie, waar die teorie van die klassifikasie tegnieke LDA, KDA, SVM en NHK behandel gaan word. Vir elke tegniek gaan toepassings in die statistiese sagteware R ook gewys word. ‘n Omvattende simulasie studie word uitgevoer om die prestasie van die tegnieke LDA, KDA, SVM en NHK te vergelyk. Die vergelyking word gedoen vir situasies waar die data slegs twee klasse het. ‘n Verskeidenheid van data situasies gaan ook ondersoek word om verdere insig te toon in terme van wanneer watter tegniek die beste vaar. Die tesis gaan afsluit deur die genoemde tegnieke toe te pas op praktiese datastelle. Masters 2015-01-13T11:47:35Z 2015-01-13T11:47:35Z 2014-12 Thesis http://hdl.handle.net/10019.1/95839 en_ZA Stellenbosch University 109 p. : Ill. application/pdf Stellenbosch : Stellenbosch University
spellingShingle UCTD
Dissertations -- Statistics and actuarial science
Theses -- Statistics and actuarial science
Classification
Machine learning
Kernel functions
Van der Westhuizen, Cornelius Stephanus
Nearest hypersphere classification : a comparison with other classification techniques
title Nearest hypersphere classification : a comparison with other classification techniques
title_full Nearest hypersphere classification : a comparison with other classification techniques
title_fullStr Nearest hypersphere classification : a comparison with other classification techniques
title_full_unstemmed Nearest hypersphere classification : a comparison with other classification techniques
title_short Nearest hypersphere classification : a comparison with other classification techniques
title_sort nearest hypersphere classification a comparison with other classification techniques
topic UCTD
Dissertations -- Statistics and actuarial science
Theses -- Statistics and actuarial science
Classification
Machine learning
Kernel functions
url http://hdl.handle.net/10019.1/95839
work_keys_str_mv AT vanderwesthuizencorneliusstephanus nearesthypersphereclassificationacomparisonwithotherclassificationtechniques