Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Data measures that characterise classification problems

Dissertation (MEng)--University of Pretoria, 2008.

Saved in:
Bibliographic Details
Other Authors: Barnard, E.
Format: Thesis
Published: University of Pretoria 2013
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613707291328512
access_status_str Open Access
author2 Barnard, E.
author_browse Barnard, E.
author_facet Barnard, E.
collection Thesis
dc_rights_str_mv © University of Pretoria 2008 E1080/
description Dissertation (MEng)--University of Pretoria, 2008.
format Thesis
id oai:repository.up.ac.za:2263/27624
institution University of Pretoria (South Africa)
last_indexed 2026-06-10T12:40:25.453Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate 2013
publishDateRange 2013
publishDateSort 2013
publisher University of Pretoria
publisherStr University of Pretoria
record_format dspace
source_str UPSpace — University of Pretoria Institutional Repository
spelling oai:repository.up.ac.za:2263/27624 Data measures that characterise classification problems Barnard, E. cmvdwalt@gmail.com Van der Walt, Christiaan Maarten Classifier selection Data measures Data characteristics Artificial data Data analysis Classification Supervised learning Pattern recognition Meta-classification Classification prediction UCTD Dissertation (MEng)--University of Pretoria, 2008. We have a wide-range of classifiers today that are employed in numerous applications, from credit scoring to speech-processing, with great technical and commercial success. No classifier, however, exists that will outperform all other classifiers on all classification tasks, and the process of classifier selection is still mainly one of trial and error. The optimal classifier for a classification task is determined by the characteristics of the data set employed; understanding the relationship between data characteristics and the performance of classifiers is therefore crucial to the process of classifier selection. Empirical and theoretical approaches have been employed in the literature to define this relationship. None of these approaches have, however, been very successful in accurately predicting or explaining classifier performance on real-world data. We use theoretical properties of classifiers to identify data characteristics that influence classifier performance; these data properties guide us in the development of measures that describe the relationship between data characteristics and classifier performance. We employ these data measures on real-world and artificial data to construct a meta-classification system. We use theoretical properties of classifiers to identify data characteristics that influence classifier performance; these data properties guide us in the development of measures that describe the relationship between data characteristics and classifier performance. We employ these data measures on real-world and artificial data to construct a meta-classification system. The purpose of this meta-classifier is two-fold: (1) to predict the classification performance of real-world classification tasks, and (2) to explain these predictions in order to gain insight into the properties of real-world data. We show that these data measures can be employed successfully to predict the classification performance of real-world data sets; these predictions are accurate in some instances but there is still unpredictable behaviour in other instances. We illustrate that these data measures can give valuable insight into the properties and data structures of real-world data; these insights are extremely valuable for high-dimensional classification problems. Electrical, Electronic and Computer Engineering unrestricted 2013-09-07T11:52:19Z 2008-09-09 2013-09-07T11:52:19Z 2008-04-09 2008-09-09 2008-08-29 Dissertation a 2008 E1080/gm http://hdl.handle.net/2263/27624 http://upetd.up.ac.za/thesis/available/etd-08292008-162648/ © University of Pretoria 2008 E1080/ application/pdf University of Pretoria
spellingShingle Classifier selection
Data measures
Data characteristics
Artificial data
Data analysis
Classification
Supervised learning
Pattern recognition
Meta-classification
Classification prediction
UCTD
Data measures that characterise classification problems
title Data measures that characterise classification problems
title_full Data measures that characterise classification problems
title_fullStr Data measures that characterise classification problems
title_full_unstemmed Data measures that characterise classification problems
title_short Data measures that characterise classification problems
title_sort data measures that characterise classification problems
topic Classifier selection
Data measures
Data characteristics
Artificial data
Data analysis
Classification
Supervised learning
Pattern recognition
Meta-classification
Classification prediction
UCTD
url http://hdl.handle.net/2263/27624
http://upetd.up.ac.za/thesis/available/etd-08292008-162648/