Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice

The abundance of high dimensional data has necessitated various approaches to dimensionality reduction. Many of these methods make simplifying assumptions about the variation structure of the data and permit approximating high-dimensional structures using only a few components. In the context of mod...

Full description

Saved in:
Bibliographic Details
Main Author: Evert, Francois
Other Authors: Pienaar, Etienne
Format: Thesis
Language:English
Published: Department of Statistical Sciences 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613323047993344
access_status_str Open Access
author Evert, Francois
author2 Pienaar, Etienne
author_browse Evert, Francois
Pienaar, Etienne
author_facet Pienaar, Etienne
Evert, Francois
author_sort Evert, Francois
collection Thesis
description The abundance of high dimensional data has necessitated various approaches to dimensionality reduction. Many of these methods make simplifying assumptions about the variation structure of the data and permit approximating high-dimensional structures using only a few components. In the context of modern day machine learning applications, such assumptions can rarely be justified. To this end, Tenenbaum, de Silva, and Langford [2000] proposed an approach for approximating the underlying non-linear structure on which the variation occurs and such assumptions would be more tenable. This is achieved by first finding the non-linear underlying structure (manifold) in a high dimensional input space and then using the shortest distance matrix of the data points along this manifold to achieve non-linear dimensionality reduction. The authors proposed a 3 step approach : Step 1 - constructing a distance graph by defining the neighbours of each data point such that it is contained within one section of the manifold, Step 2 - approximating the geodesic distances between all points using the graph constructed in Step 1 and Step 3 - applying Classical MDS to this distance graph to achieve dimensionality reduction. The first objective of this research paper is to demonstrate that the approach by Tenenbaum et al. [2000] struggles to detect the manifold under conditions where the stochastic components of the data dominates to the extent that the underlying manifold may be difficult to approximate reliably. We propose and explore possible solutions on how to effectively deal with the variability in the data, such that the distance graph reliably reflect the geodesic interpoint distances. In principle, this deals with departures from the assumptions in Step 1 (constructing distance graph) and the remediation thereof. The second objective, assuming that one can reliably detect the manifold, relates to the often substantial computational overhead of the algorithm: Step 2 of the algorithm attempts to approximate geodesic distances on the manifold by traversing paths on the graph constructed in Step 1, more specifically, the shortest paths. We test and demonstrate an alternative strategy for approximating distances by relaxing the constraint that the shortest path needs to be found, substituting the condition that any (reasonable) path between points on the graph suffices for purposes of this algorithm. Lastly we demonstrate how the proposed remediations/variations on the algorithm can be used to conduct dimensionality reduction on real-world data sets, and provide some conclusions and possible future reseach topics.
format Thesis
id oai:open.uct.ac.za:11427/37403
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:34:17.944Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2023
publishDateRange 2023
publishDateSort 2023
publisher Department of Statistical Sciences
publisherStr Department of Statistical Sciences
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/37403 ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice Evert, Francois Pienaar, Etienne Statistical Sciences The abundance of high dimensional data has necessitated various approaches to dimensionality reduction. Many of these methods make simplifying assumptions about the variation structure of the data and permit approximating high-dimensional structures using only a few components. In the context of modern day machine learning applications, such assumptions can rarely be justified. To this end, Tenenbaum, de Silva, and Langford [2000] proposed an approach for approximating the underlying non-linear structure on which the variation occurs and such assumptions would be more tenable. This is achieved by first finding the non-linear underlying structure (manifold) in a high dimensional input space and then using the shortest distance matrix of the data points along this manifold to achieve non-linear dimensionality reduction. The authors proposed a 3 step approach : Step 1 - constructing a distance graph by defining the neighbours of each data point such that it is contained within one section of the manifold, Step 2 - approximating the geodesic distances between all points using the graph constructed in Step 1 and Step 3 - applying Classical MDS to this distance graph to achieve dimensionality reduction. The first objective of this research paper is to demonstrate that the approach by Tenenbaum et al. [2000] struggles to detect the manifold under conditions where the stochastic components of the data dominates to the extent that the underlying manifold may be difficult to approximate reliably. We propose and explore possible solutions on how to effectively deal with the variability in the data, such that the distance graph reliably reflect the geodesic interpoint distances. In principle, this deals with departures from the assumptions in Step 1 (constructing distance graph) and the remediation thereof. The second objective, assuming that one can reliably detect the manifold, relates to the often substantial computational overhead of the algorithm: Step 2 of the algorithm attempts to approximate geodesic distances on the manifold by traversing paths on the graph constructed in Step 1, more specifically, the shortest paths. We test and demonstrate an alternative strategy for approximating distances by relaxing the constraint that the shortest path needs to be found, substituting the condition that any (reasonable) path between points on the graph suffices for purposes of this algorithm. Lastly we demonstrate how the proposed remediations/variations on the algorithm can be used to conduct dimensionality reduction on real-world data sets, and provide some conclusions and possible future reseach topics. 2023-03-13T12:31:14Z 2023-03-13T12:31:14Z 2022 2023-02-20T12:44:14Z Master Thesis Masters MSc http://hdl.handle.net/11427/37403 eng application/pdf Department of Statistical Sciences Faculty of Science
spellingShingle Statistical Sciences
Evert, Francois
ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice
thesis_degree_str Master's
title ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice
title_full ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice
title_fullStr ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice
title_full_unstemmed ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice
title_short ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice
title_sort iso maps for non linear dimension reduction addressing geometric and numerical issues observed in practice
topic Statistical Sciences
url http://hdl.handle.net/11427/37403
work_keys_str_mv AT evertfrancois isomapsfornonlineardimensionreductionaddressinggeometricandnumericalissuesobservedinpractice