Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Resampling algorithms for multi-label classification

Thesis (MCom)--Stellenbosch University, 2022.

Saved in:

Bibliographic Details
Main Author:	Kotze, Ulrich
Other Authors:	Sandrock, Trudie
Format:	Thesis
Language:	en_ZA
Published:	Stellenbosch : Stellenbosch University 2022
Subjects:	Statistics > Data processing Algorithms Computer algorithms UCTD
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613760032604160
access_status_str	Open Access
author	Kotze, Ulrich
author2	Sandrock, Trudie
author_browse	Kotze, Ulrich Sandrock, Trudie
author_facet	Sandrock, Trudie Kotze, Ulrich
author_sort	Kotze, Ulrich
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Thesis (MCom)--Stellenbosch University, 2022.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/124730
institution	Stellenbosch University (South Africa)
language	en_ZA
last_indexed	2026-06-10T12:41:15.521Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2022
publishDateRange	2022
publishDateSort	2022
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/124730 Resampling algorithms for multi-label classification Kotze, Ulrich Sandrock, Trudie Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science. Statistics -- Data processing Algorithms Computer algorithms UCTD Thesis (MCom)--Stellenbosch University, 2022. ENGLISH SUMMARY: Multi-label classification is a member of the supervised learning family and represents a scenario where we wish to classify an observation into many of many classes. Therefore, in the classification paradigm an observation can belong to more than one class simultaneously. Imbalanced data is a common problem in the multi-label paradigm of learning. This project investigated resampling algorithms as a pre-processing mechanism to address the manifestation of imbalance in multi-label data to improve multi-label classification performance. Imbalance can manifest itself through a sparse data matrix at small global densities. Imbalance can also manifest itself through a disparity in local label density at larger global densities. The effect of resampling algorithms on multi-label performance is studied for both of these forms of imbalance. We specifically study the effect of these resampling algorithms on multi-label performance at changing levels of global density. The thesis made use of simulated data, five common multi-label classification techniques and seven of the most popular resampling algorithms. Three example-based, label-based and ranking-based evaluation metrics were used to assess the effect of the resampling algorithms on multi-label classification performance. AFRIKAANSE OPSOMMING: Multi-etiket klassifikasie is 'n voorbeeld van onder toesig leer en verteenwoordig 'n scenario waarin ons 'n waarneming in baie van baie klasse wil klassifiseer. Daarom kan 'n waarneming in 'n klassifikasieparadigma gelyktydig aan meer as een klas behoort. Ongebalanseerde data is 'n algemene probleem in die multi-etiket paradigma van leer. Hierdie tesis het hersteekproefnemingalgoritmes ondersoek as 'n voorverwerkingsmeganisme om die manifestasie van wanbalans in multi-etiket data aan te spreek om multi-etiket klassifikasieprestasie te verbeter. Wanbalans kan manifesteer deur 'n yl data matriks by klein globale digthede of deur 'n verskil in plaaslike etiketdigtheid by groter globale digthede. Die effek van hersteekproefnemingalgoritmes op multi-etiket prestasie word bestudeer vir beide hierdie vorme van wanbalans. Ons bestudeer spesifiek die effek van hierdie hersteekproefnemingalgoritmes op multi-etiket prestasie by veranderende vlakke van globale digtheid. Die studie het gebruik gemaak van gesimuleerde data, vyf algemene multi-etiket klassifikasietegnieke en sewe van die gewildste hersteekproefnemingalgoritmes. Drie voorbeeld-gebaseerde, etiket-gebaseerde en ranglys-gebaseerde evalueringsmetings is gebruik om die effek van die hersteekproefnemingalgoritmes op multi-etiket klassifikasieprestasie te bepaal. Masters 2022-03-04T09:03:42Z 2022-04-29T09:29:14Z 2022-03-04T09:03:42Z 2022-04-29T09:29:14Z 2022-04 Thesis http://hdl.handle.net/10019.1/124730 en_ZA Stellenbosch University 159 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Statistics -- Data processing Algorithms Computer algorithms UCTD Kotze, Ulrich Resampling algorithms for multi-label classification
title	Resampling algorithms for multi-label classification
title_full	Resampling algorithms for multi-label classification
title_fullStr	Resampling algorithms for multi-label classification
title_full_unstemmed	Resampling algorithms for multi-label classification
title_short	Resampling algorithms for multi-label classification
title_sort	resampling algorithms for multi label classification
topic	Statistics -- Data processing Algorithms Computer algorithms UCTD
url	http://hdl.handle.net/10019.1/124730
work_keys_str_mv	AT kotzeulrich resamplingalgorithmsformultilabelclassification

Full Text Available

Resampling algorithms for multi-label classification

Similar Items