Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Visualization as a guidance to classification for large datasets

Data visualization has gained a lot of attention after the stressing need to make sense of the huge amounts of data that we collect every day. Lower dimensional embedding techniques such as IsoMap, Locally Linear Embedding and t-SNE help us visualize high dimensional data by projecting it on a two o...

Full description

Saved in:
Bibliographic Details
Main Author: Atteya, Heba Abdelfattah
Format: Thesis
Published: AUC Knowledge Fountain 2017
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613411807854592
access_status_str Open Access
author Atteya, Heba Abdelfattah
author_browse Atteya, Heba Abdelfattah
author_facet Atteya, Heba Abdelfattah
author_sort Atteya, Heba Abdelfattah
collection Thesis
dc_rights_str_mv The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy.
description Data visualization has gained a lot of attention after the stressing need to make sense of the huge amounts of data that we collect every day. Lower dimensional embedding techniques such as IsoMap, Locally Linear Embedding and t-SNE help us visualize high dimensional data by projecting it on a two or three-dimensional space. t-SNE, or t-Distributed Stochastic Neighbor Embedding proved to be successful in providing lower dimensional data mappings that makes interpreting the underlying structure of data easier for our human brains. We wanted to test the hypothesis that this simple visualization that human beings can easily understand will also simplify the job of the classification models and boost their performance. In order to test this hypothesis, we reduce the dimensionality of a student performance dataset using t-SNE into 2D and 3D and feed the calculated 2D and 3D feature vectors into a classifier to classify students according to their predicted performance. We compare the classifier performance before and after the dimensionality reduction. Our experiments showed that t-SNE helps improve classification accuracy of NN and KNN on a benchmarking dataset as well as a user-curated dataset on performance of students at our home institution. We also visually compared the 2D and 3D mapping of t-SNE and PCA. Our comparison favored t-SNE's visualization over PC's. This was also reflected in the classification accuracy of all classifiers used, scoring higher on t-SNE's mapping than on the PCA's mapping.
format Thesis
id oai:fount.aucegypt.edu:etds-1690
institution American University in Cairo (Egypt)
last_indexed 2026-06-10T12:35:43.583Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from AUC Knowledge Fountain — bepress
publishDate 2017
publishDateRange 2017
publishDateSort 2017
publisher AUC Knowledge Fountain
publisherStr AUC Knowledge Fountain
record_format dspace
source_str AUC Knowledge Fountain — bepress
spelling oai:fount.aucegypt.edu:etds-1690 Visualization as a guidance to classification for large datasets Atteya, Heba Abdelfattah Data visualization has gained a lot of attention after the stressing need to make sense of the huge amounts of data that we collect every day. Lower dimensional embedding techniques such as IsoMap, Locally Linear Embedding and t-SNE help us visualize high dimensional data by projecting it on a two or three-dimensional space. t-SNE, or t-Distributed Stochastic Neighbor Embedding proved to be successful in providing lower dimensional data mappings that makes interpreting the underlying structure of data easier for our human brains. We wanted to test the hypothesis that this simple visualization that human beings can easily understand will also simplify the job of the classification models and boost their performance. In order to test this hypothesis, we reduce the dimensionality of a student performance dataset using t-SNE into 2D and 3D and feed the calculated 2D and 3D feature vectors into a classifier to classify students according to their predicted performance. We compare the classifier performance before and after the dimensionality reduction. Our experiments showed that t-SNE helps improve classification accuracy of NN and KNN on a benchmarking dataset as well as a user-curated dataset on performance of students at our home institution. We also visually compared the 2D and 3D mapping of t-SNE and PCA. Our comparison favored t-SNE's visualization over PC's. This was also reflected in the classification accuracy of all classifiers used, scoring higher on t-SNE's mapping than on the PCA's mapping. 2017-06-01T07:00:00Z thesis application/pdf https://fount.aucegypt.edu/etds/691 https://fount.aucegypt.edu/context/etds/article/1690/viewcontent/Computer_20Science_20Thesis_20Dissertation_20__20Heba_20Atteya.pdf The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy. Theses and Dissertations AUC Knowledge Fountain t-SNE t-Distributed Stochastic Neighbor Embedding
spellingShingle t-SNE
t-Distributed Stochastic Neighbor Embedding
Atteya, Heba Abdelfattah
Visualization as a guidance to classification for large datasets
title Visualization as a guidance to classification for large datasets
title_full Visualization as a guidance to classification for large datasets
title_fullStr Visualization as a guidance to classification for large datasets
title_full_unstemmed Visualization as a guidance to classification for large datasets
title_short Visualization as a guidance to classification for large datasets
title_sort visualization as a guidance to classification for large datasets
topic t-SNE
t-Distributed Stochastic Neighbor Embedding
url https://fount.aucegypt.edu/etds/691
https://fount.aucegypt.edu/context/etds/article/1690/viewcontent/Computer_20Science_20Thesis_20Dissertation_20__20Heba_20Atteya.pdf
work_keys_str_mv AT atteyahebaabdelfattah visualizationasaguidancetoclassificationforlargedatasets