Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Multiple outlier detection and cluster analysis of multivariate normal data

Thesis (MscEng)--Stellenbosch University, 2003.

Saved in:
Bibliographic Details
Main Author: Robson, Geoffrey
Other Authors: Herbst, B. M.
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2012
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613854423318528
access_status_str Open Access
author Robson, Geoffrey
author2 Herbst, B. M.
author_browse Herbst, B. M.
Robson, Geoffrey
author_facet Herbst, B. M.
Robson, Geoffrey
author_sort Robson, Geoffrey
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MscEng)--Stellenbosch University, 2003.
format Thesis
id oai:scholar.sun.ac.za:10019.1/53508
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:42:44.343Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2012
publishDateRange 2012
publishDateSort 2012
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/53508 Multiple outlier detection and cluster analysis of multivariate normal data Robson, Geoffrey Herbst, B. M. Muller, N. L. Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Multivariate analysis Outliers (Statistics) Data editing Minimum Covariance Determinant (MCD) Dissertations -- Applied mathematics Theses -- Applied mathematics Dissertations -- Mathematical sciences Theses -- Mathematical sciences Thesis (MscEng)--Stellenbosch University, 2003. ENGLISH ABSTRACT: Outliers may be defined as observations that are sufficiently aberrant to arouse the suspicion of the analyst as to their origin. They could be the result of human error, in which case they should be corrected, but they may also be an interesting exception, and this would deserve further investigation. Identification of outliers typically consists of an informal inspection of a plot of the data, but this is unreliable for dimensions greater than two. A formal procedure for detecting outliers allows for consistency when classifying observations. It also enables one to automate the detection of outliers by using computers. The special case of univariate data is treated separately to introduce essential concepts, and also because it may well be of interest in its own right. We then consider techniques used for detecting multiple outliers in a multivariate normal sample, and go on to explain how these may be generalized to include cluster analysis. Multivariate outlier detection is based on the Minimum Covariance Determinant (MCD) subset, and is therefore treated in detail. Exact bivariate algorithms were refined and implemented, and the solutions were used to establish the performance of the commonly used heuristic, Fast–MCD. AFRIKAANSE OPSOMMING: Uitskieters word gedefinieer as waarnemings wat tot s´o ’n mate afwyk van die verwagte gedrag dat die analis wantrouig is oor die oorsprong daarvan. Hierdie waarnemings mag die resultaat wees van menslike foute, in welke geval dit reggestel moet word. Dit mag egter ook ’n interressante verskynsel wees wat verdere ondersoek benodig. Die identifikasie van uitskieters word tipies informeel deur inspeksie vanaf ’n grafiese voorstelling van die data uitgevoer, maar hierdie benadering is onbetroubaar vir dimensies groter as twee. ’n Formele prosedure vir die bepaling van uitskieters sal meer konsekwente klassifisering van steekproefdata tot gevolg hˆe. Dit gee ook geleentheid vir effektiewe rekenaar implementering van die tegnieke. Aanvanklik word die spesiale geval van eenveranderlike data behandel om noodsaaklike begrippe bekend te stel, maar ook aangesien dit in eie reg ’n area van groot belang is. Verder word tegnieke vir die identifikasie van verskeie uitskieters in meerveranderlike, normaal verspreide data beskou. Daar word ook ondersoek hoe hierdie idees veralgemeen kan word om tros analise in te sluit. Die sogenaamde Minimum Covariance Determinant (MCD) subversameling is fundamenteel vir die identifikasie van meerveranderlike uitskieters, en word daarom in detail ondersoek. Deterministiese tweeveranderlike algoritmes is verfyn en ge¨ımplementeer, en gebruik om die effektiwiteit van die algemeen gebruikte heuristiese algoritme, Fast–MCD, te ondersoek. 2012-08-27T11:35:30Z 2012-08-27T11:35:30Z 2003-12 Thesis http://hdl.handle.net/10019.1/53508 en_ZA Stellenbosch University 127 p. : ill. application/pdf Stellenbosch : Stellenbosch University
spellingShingle Multivariate analysis
Outliers (Statistics)
Data editing
Minimum Covariance Determinant (MCD)
Dissertations -- Applied mathematics
Theses -- Applied mathematics
Dissertations -- Mathematical sciences
Theses -- Mathematical sciences
Robson, Geoffrey
Multiple outlier detection and cluster analysis of multivariate normal data
title Multiple outlier detection and cluster analysis of multivariate normal data
title_full Multiple outlier detection and cluster analysis of multivariate normal data
title_fullStr Multiple outlier detection and cluster analysis of multivariate normal data
title_full_unstemmed Multiple outlier detection and cluster analysis of multivariate normal data
title_short Multiple outlier detection and cluster analysis of multivariate normal data
title_sort multiple outlier detection and cluster analysis of multivariate normal data
topic Multivariate analysis
Outliers (Statistics)
Data editing
Minimum Covariance Determinant (MCD)
Dissertations -- Applied mathematics
Theses -- Applied mathematics
Dissertations -- Mathematical sciences
Theses -- Mathematical sciences
url http://hdl.handle.net/10019.1/53508
work_keys_str_mv AT robsongeoffrey multipleoutlierdetectionandclusteranalysisofmultivariatenormaldata