Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MCom)--Stellenbosch University, 2022.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | en_ZA |
| Published: |
Stellenbosch : Stellenbosch University
2022
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613760032604160 |
|---|---|
| access_status_str | Open Access |
| author | Kotze, Ulrich |
| author2 | Sandrock, Trudie |
| author_browse | Kotze, Ulrich Sandrock, Trudie |
| author_facet | Sandrock, Trudie Kotze, Ulrich |
| author_sort | Kotze, Ulrich |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (MCom)--Stellenbosch University, 2022. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/124730 |
| institution | Stellenbosch University (South Africa) |
| language | en_ZA |
| last_indexed | 2026-06-10T12:41:15.521Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2022 |
| publishDateRange | 2022 |
| publishDateSort | 2022 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/124730 Resampling algorithms for multi-label classification Kotze, Ulrich Sandrock, Trudie Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science. Statistics -- Data processing Algorithms Computer algorithms UCTD Thesis (MCom)--Stellenbosch University, 2022. ENGLISH SUMMARY: Multi-label classification is a member of the supervised learning family and represents a scenario where we wish to classify an observation into many of many classes. Therefore, in the classification paradigm an observation can belong to more than one class simultaneously. Imbalanced data is a common problem in the multi-label paradigm of learning. This project investigated resampling algorithms as a pre-processing mechanism to address the manifestation of imbalance in multi-label data to improve multi-label classification performance. Imbalance can manifest itself through a sparse data matrix at small global densities. Imbalance can also manifest itself through a disparity in local label density at larger global densities. The effect of resampling algorithms on multi-label performance is studied for both of these forms of imbalance. We specifically study the effect of these resampling algorithms on multi-label performance at changing levels of global density. The thesis made use of simulated data, five common multi-label classification techniques and seven of the most popular resampling algorithms. Three example-based, label-based and ranking-based evaluation metrics were used to assess the effect of the resampling algorithms on multi-label classification performance. AFRIKAANSE OPSOMMING: Multi-etiket klassifikasie is 'n voorbeeld van onder toesig leer en verteenwoordig 'n scenario waarin ons 'n waarneming in baie van baie klasse wil klassifiseer. Daarom kan 'n waarneming in 'n klassifikasieparadigma gelyktydig aan meer as een klas behoort. Ongebalanseerde data is 'n algemene probleem in die multi-etiket paradigma van leer. Hierdie tesis het hersteekproefnemingalgoritmes ondersoek as 'n voorverwerkingsmeganisme om die manifestasie van wanbalans in multi-etiket data aan te spreek om multi-etiket klassifikasieprestasie te verbeter. Wanbalans kan manifesteer deur 'n yl data matriks by klein globale digthede of deur 'n verskil in plaaslike etiketdigtheid by groter globale digthede. Die effek van hersteekproefnemingalgoritmes op multi-etiket prestasie word bestudeer vir beide hierdie vorme van wanbalans. Ons bestudeer spesifiek die effek van hierdie hersteekproefnemingalgoritmes op multi-etiket prestasie by veranderende vlakke van globale digtheid. Die studie het gebruik gemaak van gesimuleerde data, vyf algemene multi-etiket klassifikasietegnieke en sewe van die gewildste hersteekproefnemingalgoritmes. Drie voorbeeld-gebaseerde, etiket-gebaseerde en ranglys-gebaseerde evalueringsmetings is gebruik om die effek van die hersteekproefnemingalgoritmes op multi-etiket klassifikasieprestasie te bepaal. Masters 2022-03-04T09:03:42Z 2022-04-29T09:29:14Z 2022-03-04T09:03:42Z 2022-04-29T09:29:14Z 2022-04 Thesis http://hdl.handle.net/10019.1/124730 en_ZA Stellenbosch University 159 pages : illustrations application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Statistics -- Data processing Algorithms Computer algorithms UCTD Kotze, Ulrich Resampling algorithms for multi-label classification |
| title | Resampling algorithms for multi-label classification |
| title_full | Resampling algorithms for multi-label classification |
| title_fullStr | Resampling algorithms for multi-label classification |
| title_full_unstemmed | Resampling algorithms for multi-label classification |
| title_short | Resampling algorithms for multi-label classification |
| title_sort | resampling algorithms for multi label classification |
| topic | Statistics -- Data processing Algorithms Computer algorithms UCTD |
| url | http://hdl.handle.net/10019.1/124730 |
| work_keys_str_mv | AT kotzeulrich resamplingalgorithmsformultilabelclassification |