Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MCom)--Stellenbosch University, 2025.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Published: |
Stellenbosch : Stellenbosch University
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613887476531200 |
|---|---|
| access_status_str | Open Access |
| author | Manefeldt, Peter Cornelius |
| author2 | Lamont, M. M. C. |
| author_browse | Lamont, M. M. C. Manefeldt, Peter Cornelius |
| author_facet | Lamont, M. M. C. Manefeldt, Peter Cornelius |
| author_sort | Manefeldt, Peter Cornelius |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description |
Thesis (MCom)--Stellenbosch University, 2025. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/132647 |
| institution | Stellenbosch University (South Africa) |
| last_indexed | 2026-06-10T12:43:16.997Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/132647 Visualising interpretability in random forests Manefeldt, Peter Cornelius Lamont, M. M. C. Lubbe, S. Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistical and Actuarial Science. Machine learning -- Models Biplots Geometric data analysis Correspondence analysis UCTD Thesis (MCom)--Stellenbosch University, 2025. Manefeldt, P. C. 2025. Visualising Interpretability in Random Forests. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/57483c04-5512-4dfb-b1b0-590be397e355 ENGLISH SUMMARY: Models that use prediction proficiency as their aim are often viewed as black-box models. So called “black-box” models are able to map highly complex nonlinear relationships with high order interactions, but lack interpretability. Random Forests are one such model, with decision boundaries described by its thousands of trees. Random Forests have been shown to incur a low generalisation error while also needing very little to no optimisation by the user. Random Forest proximities and out-of-bag (OOB) Random Forest proximities act as two unique similarity measures between observations. Multidimensional scaling (MDS) seeks to find a low dimensional approximation of pairwise similarities that provides a visual representation of the similarity between observations. By applying MDS to the Random Forest proximity measure, Random Forest proximity plots are constructed. The Random Forest proximity plot provides a view of how the observations in your sample are related from the model’s perspective, thus allowing us to “see through the eyes” of the black-box. The MDS method under consideration is classical scaling, as this provides a transformation that can be used to embed new/hypothetical cases on the proximity plot. How would the model have viewed a given observation if one of its covariates were different? We can answer this counterfactual question by embedding counterfactual observations into the proximity plot. These embedded counterfactual cases can be used to create trajectory axes. Case based trajectory axes embedded in the proximity plot, would result in a Random Forest proximity biplot. As a special case of nonlinear biplots, this enables the exploration of the relationships uncovered by the model. Additionally, adding predictive axes creates a biplot that relates the model’s view of the observations back to the original variables. Additional procedures are added that use α-bags to visualise the sampling variability in the MDS procedure as well as stability of the Random Forest proximity. AFRIKAANSE OPSOMMING: Modelle met noukeurigheid van voorspellings as hul hoof doel word gereeld as “swart boks” modelle gesien. Sogenaamde “swart boks” modelle is in staat om hoogs komplekse nie-lineere verhoudings met hoe orde interaksies te pas, maar lei aan ’n gebrek aan interpreteerbaarheid. Die Ewekansige Woud is een so ’n model, met beslissingsgrense beskryf deur hul duisende bome. Daar is getoon dat die Ewekansige Woud ’n lae veralgemeende foutkoers behaal terwyl dit ook baie min tot geen optimalisering deur die gebruiker benodig. Ewekansige Woud nabyheid en buite-sak (OOB) Ewekansige woud nabyheid tree op as twee unieke ooreenkomsmaatstawwe tussen waarnemings. Meerdimensionale skalering (MDS) poog om ’n lae dimensionele benadering van paarsgewyse ooreenkomste te vind wat visueel voorgestel kan word. Deur MDS toe te pas op die Ewekansige Woud nabyheidsmaatreel, word die Ewekansige Woud Nabyheidsdiagram geskep. Die Ewekansige Woud Nabyheidsdiagram bied ’n “blik deur die o¨e” van die swart boks, deur die verwantskap tussen die waarneming in die steekproef uit die model se perspektief te sien. Die MDSmetode wat benut word, is klassieke skalering, aangesien dit ’n transformasie verskaf wat gebruik kan word om nuwe/hipotetiese gevalle op die nabyheidsdiagram in te sluit. Hoe sou die model ’n gegewe waarneming sien indien een van sy kovariate anders was? Ons kan hierdie teenfeitelike vraag beantwoord deur teenfeitelike waarnemings in die nabyheidsdiagram in te sluit. Hierdie ingebedde teenfeitelike gevalle kan gebruik word om trajek-asse te skep. Gevalgebaseerde trajek-asse wat in die nabyheidsdiagram ingebed is, sal lei tot ’n Ewekansige Woud bi-stipping. As ’n spesiale geval van nie-lineere bi-stipping, maak dit die verkenning van die verwantskappe wat deur die model ontbloot word moontlik. Deur voorspelling-asse by te voeg, word ’n bi-stipping geskep wat die model se siening van die waarnemings terug na die oorspronklike veranderlikes koppel. Bykomende prosedures word bygevoeg vir die visualisering van steekproefveranderlikheid en onstabiliteit van die Ewekansige Woud nabyheid deur middel van α-sakkies. Masters 2025-06-12T08:55:18Z 2025-06-12T08:55:18Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132647 Stellenbosch University xvii, 125 pages : illustrations, includes annexures application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Machine learning -- Models Biplots Geometric data analysis Correspondence analysis UCTD Manefeldt, Peter Cornelius Visualising interpretability in random forests |
| title | Visualising interpretability in random forests |
| title_full | Visualising interpretability in random forests |
| title_fullStr | Visualising interpretability in random forests |
| title_full_unstemmed | Visualising interpretability in random forests |
| title_short | Visualising interpretability in random forests |
| title_sort | visualising interpretability in random forests |
| topic | Machine learning -- Models Biplots Geometric data analysis Correspondence analysis UCTD |
| url | https://scholar.sun.ac.za/handle/10019.1/132647 |
| work_keys_str_mv | AT manefeldtpetercornelius visualisinginterpretabilityinrandomforests |