Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Visualising interpretability in random forests

Thesis (MCom)--Stellenbosch University, 2025.

Saved in:
Bibliographic Details
Main Author: Manefeldt, Peter Cornelius
Other Authors: Lamont, M. M. C.
Format: Thesis
Published: Stellenbosch : Stellenbosch University 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613887476531200
access_status_str Open Access
author Manefeldt, Peter Cornelius
author2 Lamont, M. M. C.
author_browse Lamont, M. M. C.
Manefeldt, Peter Cornelius
author_facet Lamont, M. M. C.
Manefeldt, Peter Cornelius
author_sort Manefeldt, Peter Cornelius
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MCom)--Stellenbosch University, 2025.
format Thesis
id oai:scholar.sun.ac.za:10019.1/132647
institution Stellenbosch University (South Africa)
last_indexed 2026-06-10T12:43:16.997Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/132647 Visualising interpretability in random forests Manefeldt, Peter Cornelius Lamont, M. M. C. Lubbe, S. Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistical and Actuarial Science. Machine learning -- Models Biplots Geometric data analysis Correspondence analysis UCTD Thesis (MCom)--Stellenbosch University, 2025. Manefeldt, P. C. 2025. Visualising Interpretability in Random Forests. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/57483c04-5512-4dfb-b1b0-590be397e355 ENGLISH SUMMARY: Models that use prediction proficiency as their aim are often viewed as black-box models. So called “black-box” models are able to map highly complex nonlinear relationships with high order interactions, but lack interpretability. Random Forests are one such model, with decision boundaries described by its thousands of trees. Random Forests have been shown to incur a low generalisation error while also needing very little to no optimisation by the user. Random Forest proximities and out-of-bag (OOB) Random Forest proximities act as two unique similarity measures between observations. Multidimensional scaling (MDS) seeks to find a low dimensional approximation of pairwise similarities that provides a visual representation of the similarity between observations. By applying MDS to the Random Forest proximity measure, Random Forest proximity plots are constructed. The Random Forest proximity plot provides a view of how the observations in your sample are related from the model’s perspective, thus allowing us to “see through the eyes” of the black-box. The MDS method under consideration is classical scaling, as this provides a transformation that can be used to embed new/hypothetical cases on the proximity plot. How would the model have viewed a given observation if one of its covariates were different? We can answer this counterfactual question by embedding counterfactual observations into the proximity plot. These embedded counterfactual cases can be used to create trajectory axes. Case based trajectory axes embedded in the proximity plot, would result in a Random Forest proximity biplot. As a special case of nonlinear biplots, this enables the exploration of the relationships uncovered by the model. Additionally, adding predictive axes creates a biplot that relates the model’s view of the observations back to the original variables. Additional procedures are added that use α-bags to visualise the sampling variability in the MDS procedure as well as stability of the Random Forest proximity. AFRIKAANSE OPSOMMING: Modelle met noukeurigheid van voorspellings as hul hoof doel word gereeld as “swart boks” modelle gesien. Sogenaamde “swart boks” modelle is in staat om hoogs komplekse nie-lineere verhoudings met hoe orde interaksies te pas, maar lei aan ’n gebrek aan interpreteerbaarheid. Die Ewekansige Woud is een so ’n model, met beslissingsgrense beskryf deur hul duisende bome. Daar is getoon dat die Ewekansige Woud ’n lae veralgemeende foutkoers behaal terwyl dit ook baie min tot geen optimalisering deur die gebruiker benodig. Ewekansige Woud nabyheid en buite-sak (OOB) Ewekansige woud nabyheid tree op as twee unieke ooreenkomsmaatstawwe tussen waarnemings. Meerdimensionale skalering (MDS) poog om ’n lae dimensionele benadering van paarsgewyse ooreenkomste te vind wat visueel voorgestel kan word. Deur MDS toe te pas op die Ewekansige Woud nabyheidsmaatreel, word die Ewekansige Woud Nabyheidsdiagram geskep. Die Ewekansige Woud Nabyheidsdiagram bied ’n “blik deur die o¨e” van die swart boks, deur die verwantskap tussen die waarneming in die steekproef uit die model se perspektief te sien. Die MDSmetode wat benut word, is klassieke skalering, aangesien dit ’n transformasie verskaf wat gebruik kan word om nuwe/hipotetiese gevalle op die nabyheidsdiagram in te sluit. Hoe sou die model ’n gegewe waarneming sien indien een van sy kovariate anders was? Ons kan hierdie teenfeitelike vraag beantwoord deur teenfeitelike waarnemings in die nabyheidsdiagram in te sluit. Hierdie ingebedde teenfeitelike gevalle kan gebruik word om trajek-asse te skep. Gevalgebaseerde trajek-asse wat in die nabyheidsdiagram ingebed is, sal lei tot ’n Ewekansige Woud bi-stipping. As ’n spesiale geval van nie-lineere bi-stipping, maak dit die verkenning van die verwantskappe wat deur die model ontbloot word moontlik. Deur voorspelling-asse by te voeg, word ’n bi-stipping geskep wat die model se siening van die waarnemings terug na die oorspronklike veranderlikes koppel. Bykomende prosedures word bygevoeg vir die visualisering van steekproefveranderlikheid en onstabiliteit van die Ewekansige Woud nabyheid deur middel van α-sakkies. Masters 2025-06-12T08:55:18Z 2025-06-12T08:55:18Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132647 Stellenbosch University xvii, 125 pages : illustrations, includes annexures application/pdf Stellenbosch : Stellenbosch University
spellingShingle Machine learning -- Models
Biplots
Geometric data analysis
Correspondence analysis
UCTD
Manefeldt, Peter Cornelius
Visualising interpretability in random forests
title Visualising interpretability in random forests
title_full Visualising interpretability in random forests
title_fullStr Visualising interpretability in random forests
title_full_unstemmed Visualising interpretability in random forests
title_short Visualising interpretability in random forests
title_sort visualising interpretability in random forests
topic Machine learning -- Models
Biplots
Geometric data analysis
Correspondence analysis
UCTD
url https://scholar.sun.ac.za/handle/10019.1/132647
work_keys_str_mv AT manefeldtpetercornelius visualisinginterpretabilityinrandomforests