Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Scaling the ConceptCloud browser to very large semi-structured data sets: architecture and data completion

Thesis (MSc)--Stellenbosch University, 2020.

Saved in:
Bibliographic Details
Main Author: Berndt, Joshua
Other Authors: Fischer, Bernd
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2020
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613912175738880
access_status_str Open Access
author Berndt, Joshua
author2 Fischer, Bernd
author_browse Berndt, Joshua
Fischer, Bernd
author_facet Fischer, Bernd
Berndt, Joshua
author_sort Berndt, Joshua
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MSc)--Stellenbosch University, 2020.
format Thesis
id oai:scholar.sun.ac.za:10019.1/109315
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:43:40.919Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2020
publishDateRange 2020
publishDateSort 2020
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/109315 Scaling the ConceptCloud browser to very large semi-structured data sets: architecture and data completion Berndt, Joshua Fischer, Bernd Britz, Katarina Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science. ConceptCloud Browser Big data -- Scalability Architecture and Data Completion UCTD Thesis (MSc)--Stellenbosch University, 2020. ENGLISH ABSTRACT: Semi-structured data sets such as product reviews or event log data are simultaneously becoming more widely used and ever larger. This thesis describes ConceptCloud, a exible, interactive browser for semi-structured datasets, with a focus on the improvements made to accommodate larger datasets, more intuitive data representation and the enrichment of the underlying data by way of data-imputation. ConceptCloud makes use of an intuitive tag cloud visualisation viewer in combination with an underlying concept lattice to provide a formal structure for navigation through datasets without prior knowledge of the structure of the data or compromising scalability. This scalability is achieved by the implementation of architectural changes to increase the system's resource efficiency. These changes are demonstrated by way of a case study on a dataset of wine reviews. Semi-structured data sets such as product reviews or event log data often contain a geolocation aspect: for example, the location of the winery for wine reviews, or the accident location for traffic data. In this thesis, I describe ConceptCloud extensions which allow for the rendering of specialised geolocation data while providing alternate navigation paths through the dataset. I show that using biclusters can make the navigation bidirectional, and demonstrate this approach on a crime data set making use of a geolocation specialised map viewer. Semi-structured data often contains implicit information which will be useful in driving data exploration if made explicit. I take advantage of domain ontologies to both allow implicit data in each input data set to be made explicit and verify and correct inconsistencies allowing for better data exploration. I demonstrate this approach with a continuation of the wine case study. AFRIKAANSE OPSOMMING: Semi-gestruktureerde datastelle soos produkbeoordelings of gebeurtenislogdata word terselfdertyd al hoe meer gebruik en word al hoe groter. Hierdie tesis beskryf ConceptCloud, 'n buigsame, interaktiewe blaaier vir semigestruktureerde datastelle, met die fokus op die verbeterings wat aangebring is om groter datastelle te akkommodeer, meer intuitiewe datavoorstelling te bereik en die verryking van die onderliggende data deur gebruik van databerekening. ConceptCloud maak gebruik van 'n intuitiewe tag-wolkvisualiseringkyker in kombinasie met 'n onderliggende konseprooster om 'n formele struktuurte bou vir navigasie deur datastelle sonder voorafkennis van die struktuur van die data of om die skaalbaarheid in die gedrang te bring. Hierdie skaalbaarheid is bereik deur die implementering van argitektoniese veranderings om die stelsel se hulpbrondoeltreffendheid te verhoog. Hierdie verbeterings word by wyse van 'n gevallestudie op 'n datastel van wynoorsigte gedemonstreer. Semi-gestruktureerde datastelle soos produkbeoordelings of gebeurtenislogdata bevat 'n ligginggewing-aspek: byvoorbeeld die ligging van die wynmakery vir wyn resensies, of die ongelukligging vir verkeersdata. In hierdie tesis beskryf ons 'n ConceptCloud-uitbreiding wat voorsiening maak vir gespesialiseerde ligginggewing-data, aangesien ons almal navigasiepaaie deur die datastel wissel. Ons wys dat die gebruik van biclusters die navigasie in twee rigtings kan laat plaasvind en demonstreer hierdie benadering op 'n misdaaddatastel wat gebruik maak van 'n gespesialiseerde geolokasie-kaart kyker. Semi-gestruktureerde data bevat dikwels implisiete inligting wat nuttig sal wees om data-eksplorasie te dryf as dit kan eksplisiet gemaak word. Ons benut die domeinontologiee om beide implisiete data in elke insetdatastel eksplisiet te laat maak as ook teenstrydighede te verifieer en te korrigeer, wat beter data-eksplorasie moontlik maak. Ons demonstreer hierdie benadering deur 'n gevallestudie met wyn data. Masters 2020-11-27T09:08:58Z 2021-01-31T19:44:22Z 2020-11-27T09:08:58Z 2021-01-31T19:44:22Z 2020-12 Thesis http://hdl.handle.net/10019.1/109315 en_ZA Stellenbosch University xi, 108 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle ConceptCloud Browser
Big data -- Scalability
Architecture and Data Completion
UCTD
Berndt, Joshua
Scaling the ConceptCloud browser to very large semi-structured data sets: architecture and data completion
title Scaling the ConceptCloud browser to very large semi-structured data sets: architecture and data completion
title_full Scaling the ConceptCloud browser to very large semi-structured data sets: architecture and data completion
title_fullStr Scaling the ConceptCloud browser to very large semi-structured data sets: architecture and data completion
title_full_unstemmed Scaling the ConceptCloud browser to very large semi-structured data sets: architecture and data completion
title_short Scaling the ConceptCloud browser to very large semi-structured data sets: architecture and data completion
title_sort scaling the conceptcloud browser to very large semi structured data sets architecture and data completion
topic ConceptCloud Browser
Big data -- Scalability
Architecture and Data Completion
UCTD
url http://hdl.handle.net/10019.1/109315
work_keys_str_mv AT berndtjoshua scalingtheconceptcloudbrowsertoverylargesemistructureddatasetsarchitectureanddatacompletion