Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
The abundance of high dimensional data has necessitated various approaches to dimensionality reduction. Many of these methods make simplifying assumptions about the variation structure of the data and permit approximating high-dimensional structures using only a few components. In the context of mod...
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Department of Statistical Sciences
2023
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613323047993344 |
|---|---|
| access_status_str | Open Access |
| author | Evert, Francois |
| author2 | Pienaar, Etienne |
| author_browse | Evert, Francois Pienaar, Etienne |
| author_facet | Pienaar, Etienne Evert, Francois |
| author_sort | Evert, Francois |
| collection | Thesis |
| description | The abundance of high dimensional data has necessitated various approaches to dimensionality reduction. Many of these methods make simplifying assumptions about the variation structure of the data and permit approximating high-dimensional structures using only a few components. In the context of modern day machine learning applications, such assumptions can rarely be justified. To this end, Tenenbaum, de Silva, and Langford [2000] proposed an approach for approximating the underlying non-linear structure on which the variation occurs and such assumptions would be more tenable. This is achieved by first finding the non-linear underlying structure (manifold) in a high dimensional input space and then using the shortest distance matrix of the data points along this manifold to achieve non-linear dimensionality reduction. The authors proposed a 3 step approach : Step 1 - constructing a distance graph by defining the neighbours of each data point such that it is contained within one section of the manifold, Step 2 - approximating the geodesic distances between all points using the graph constructed in Step 1 and Step 3 - applying Classical MDS to this distance graph to achieve dimensionality reduction. The first objective of this research paper is to demonstrate that the approach by Tenenbaum et al. [2000] struggles to detect the manifold under conditions where the stochastic components of the data dominates to the extent that the underlying manifold may be difficult to approximate reliably. We propose and explore possible solutions on how to effectively deal with the variability in the data, such that the distance graph reliably reflect the geodesic interpoint distances. In principle, this deals with departures from the assumptions in Step 1 (constructing distance graph) and the remediation thereof. The second objective, assuming that one can reliably detect the manifold, relates to the often substantial computational overhead of the algorithm: Step 2 of the algorithm attempts to approximate geodesic distances on the manifold by traversing paths on the graph constructed in Step 1, more specifically, the shortest paths. We test and demonstrate an alternative strategy for approximating distances by relaxing the constraint that the shortest path needs to be found, substituting the condition that any (reasonable) path between points on the graph suffices for purposes of this algorithm. Lastly we demonstrate how the proposed remediations/variations on the algorithm can be used to conduct dimensionality reduction on real-world data sets, and provide some conclusions and possible future reseach topics. |
| format | Thesis |
| id | oai:open.uct.ac.za:11427/37403 |
| institution | University of Cape Town (South Africa) |
| language | eng |
| last_indexed | 2026-06-10T12:34:17.944Z |
| license_str | Not specified — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository |
| publishDate | 2023 |
| publishDateRange | 2023 |
| publishDateSort | 2023 |
| publisher | Department of Statistical Sciences |
| publisherStr | Department of Statistical Sciences |
| record_format | dspace |
| source_str | UCTD — University of Cape Town Open Access Repository |
| spelling | oai:open.uct.ac.za:11427/37403 ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice Evert, Francois Pienaar, Etienne Statistical Sciences The abundance of high dimensional data has necessitated various approaches to dimensionality reduction. Many of these methods make simplifying assumptions about the variation structure of the data and permit approximating high-dimensional structures using only a few components. In the context of modern day machine learning applications, such assumptions can rarely be justified. To this end, Tenenbaum, de Silva, and Langford [2000] proposed an approach for approximating the underlying non-linear structure on which the variation occurs and such assumptions would be more tenable. This is achieved by first finding the non-linear underlying structure (manifold) in a high dimensional input space and then using the shortest distance matrix of the data points along this manifold to achieve non-linear dimensionality reduction. The authors proposed a 3 step approach : Step 1 - constructing a distance graph by defining the neighbours of each data point such that it is contained within one section of the manifold, Step 2 - approximating the geodesic distances between all points using the graph constructed in Step 1 and Step 3 - applying Classical MDS to this distance graph to achieve dimensionality reduction. The first objective of this research paper is to demonstrate that the approach by Tenenbaum et al. [2000] struggles to detect the manifold under conditions where the stochastic components of the data dominates to the extent that the underlying manifold may be difficult to approximate reliably. We propose and explore possible solutions on how to effectively deal with the variability in the data, such that the distance graph reliably reflect the geodesic interpoint distances. In principle, this deals with departures from the assumptions in Step 1 (constructing distance graph) and the remediation thereof. The second objective, assuming that one can reliably detect the manifold, relates to the often substantial computational overhead of the algorithm: Step 2 of the algorithm attempts to approximate geodesic distances on the manifold by traversing paths on the graph constructed in Step 1, more specifically, the shortest paths. We test and demonstrate an alternative strategy for approximating distances by relaxing the constraint that the shortest path needs to be found, substituting the condition that any (reasonable) path between points on the graph suffices for purposes of this algorithm. Lastly we demonstrate how the proposed remediations/variations on the algorithm can be used to conduct dimensionality reduction on real-world data sets, and provide some conclusions and possible future reseach topics. 2023-03-13T12:31:14Z 2023-03-13T12:31:14Z 2022 2023-02-20T12:44:14Z Master Thesis Masters MSc http://hdl.handle.net/11427/37403 eng application/pdf Department of Statistical Sciences Faculty of Science |
| spellingShingle | Statistical Sciences Evert, Francois ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice |
| thesis_degree_str | Master's |
| title | ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice |
| title_full | ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice |
| title_fullStr | ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice |
| title_full_unstemmed | ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice |
| title_short | ISO-maps for Non-linear Dimension Reduction - Addressing Geometric and Numerical Issues Observed in Practice |
| title_sort | iso maps for non linear dimension reduction addressing geometric and numerical issues observed in practice |
| topic | Statistical Sciences |
| url | http://hdl.handle.net/11427/37403 |
| work_keys_str_mv | AT evertfrancois isomapsfornonlineardimensionreductionaddressinggeometricandnumericalissuesobservedinpractice |