Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Resampling algorithms for multi-label classification

Thesis (MCom)--Stellenbosch University, 2022.

Saved in:
Bibliographic Details
Main Author: Kotze, Ulrich
Other Authors: Sandrock, Trudie
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2022
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613760032604160
access_status_str Open Access
author Kotze, Ulrich
author2 Sandrock, Trudie
author_browse Kotze, Ulrich
Sandrock, Trudie
author_facet Sandrock, Trudie
Kotze, Ulrich
author_sort Kotze, Ulrich
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MCom)--Stellenbosch University, 2022.
format Thesis
id oai:scholar.sun.ac.za:10019.1/124730
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:41:15.521Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2022
publishDateRange 2022
publishDateSort 2022
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/124730 Resampling algorithms for multi-label classification Kotze, Ulrich Sandrock, Trudie Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science. Statistics -- Data processing Algorithms Computer algorithms UCTD Thesis (MCom)--Stellenbosch University, 2022. ENGLISH SUMMARY: Multi-label classification is a member of the supervised learning family and represents a scenario where we wish to classify an observation into many of many classes. Therefore, in the classification paradigm an observation can belong to more than one class simultaneously. Imbalanced data is a common problem in the multi-label paradigm of learning. This project investigated resampling algorithms as a pre-processing mechanism to address the manifestation of imbalance in multi-label data to improve multi-label classification performance. Imbalance can manifest itself through a sparse data matrix at small global densities. Imbalance can also manifest itself through a disparity in local label density at larger global densities. The effect of resampling algorithms on multi-label performance is studied for both of these forms of imbalance. We specifically study the effect of these resampling algorithms on multi-label performance at changing levels of global density. The thesis made use of simulated data, five common multi-label classification techniques and seven of the most popular resampling algorithms. Three example-based, label-based and ranking-based evaluation metrics were used to assess the effect of the resampling algorithms on multi-label classification performance. AFRIKAANSE OPSOMMING: Multi-etiket klassifikasie is 'n voorbeeld van onder toesig leer en verteenwoordig 'n scenario waarin ons 'n waarneming in baie van baie klasse wil klassifiseer. Daarom kan 'n waarneming in 'n klassifikasieparadigma gelyktydig aan meer as een klas behoort. Ongebalanseerde data is 'n algemene probleem in die multi-etiket paradigma van leer. Hierdie tesis het hersteekproefnemingalgoritmes ondersoek as 'n voorverwerkingsmeganisme om die manifestasie van wanbalans in multi-etiket data aan te spreek om multi-etiket klassifikasieprestasie te verbeter. Wanbalans kan manifesteer deur 'n yl data matriks by klein globale digthede of deur 'n verskil in plaaslike etiketdigtheid by groter globale digthede. Die effek van hersteekproefnemingalgoritmes op multi-etiket prestasie word bestudeer vir beide hierdie vorme van wanbalans. Ons bestudeer spesifiek die effek van hierdie hersteekproefnemingalgoritmes op multi-etiket prestasie by veranderende vlakke van globale digtheid. Die studie het gebruik gemaak van gesimuleerde data, vyf algemene multi-etiket klassifikasietegnieke en sewe van die gewildste hersteekproefnemingalgoritmes. Drie voorbeeld-gebaseerde, etiket-gebaseerde en ranglys-gebaseerde evalueringsmetings is gebruik om die effek van die hersteekproefnemingalgoritmes op multi-etiket klassifikasieprestasie te bepaal. Masters 2022-03-04T09:03:42Z 2022-04-29T09:29:14Z 2022-03-04T09:03:42Z 2022-04-29T09:29:14Z 2022-04 Thesis http://hdl.handle.net/10019.1/124730 en_ZA Stellenbosch University 159 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Statistics -- Data processing
Algorithms
Computer algorithms
UCTD
Kotze, Ulrich
Resampling algorithms for multi-label classification
title Resampling algorithms for multi-label classification
title_full Resampling algorithms for multi-label classification
title_fullStr Resampling algorithms for multi-label classification
title_full_unstemmed Resampling algorithms for multi-label classification
title_short Resampling algorithms for multi-label classification
title_sort resampling algorithms for multi label classification
topic Statistics -- Data processing
Algorithms
Computer algorithms
UCTD
url http://hdl.handle.net/10019.1/124730
work_keys_str_mv AT kotzeulrich resamplingalgorithmsformultilabelclassification