Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Joining and aggregating datasets using CouchDB

Data mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed...

Full description

Saved in:
Bibliographic Details
Main Author: Smith, Zach
Other Authors: Berman, Sonia
Format: Thesis
Language:English
Published: Department of Computer Science 2019
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867611311238545408
access_status_str Open Access
author Smith, Zach
author2 Berman, Sonia
author_browse Berman, Sonia
Smith, Zach
author_facet Berman, Sonia
Smith, Zach
author_sort Smith, Zach
collection Thesis
description Data mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed. This project shows how join and aggregation can be achieved across entity boundaries in such systems, as encountered for example in the pre-processing and exploration stages of educational data mining. A software stack is presented as a means by which this can be achieved; first, datasets are processed via ETL operations, then MapReduce is used to create indices of ordered and aggregated data. Finally, a Couchdb list function is used to iterate through these indices and perform joins, and to compute aggregated values on joined datasets such as variance and correlations. In terms of the case study, it is shown that the proposed approach to implementing cross-document joins and aggregation is effective and scalable. In addition, it was discovered that for the 2014 - 2016 UCT cohorts, NBT scores correlate better with final grades for the CSC1015F course than do Grade 12 results for English, Science and Mathematics.
format Thesis
id oai:open.uct.ac.za:11427/29530
institution University of Cape Town (South Africa)
language eng
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2019
publishDateRange 2019
publishDateSort 2019
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/29530 Joining and aggregating datasets using CouchDB Smith, Zach Berman, Sonia Computer Science Data mining typically requires implementing operations that involve cross-cutting entity boundaries and are awkward to implement in document-oriented databases. CouchDB, for example, models entities as documents, with highly isolated entity boundaries, and on which joins cannot be directly performed. This project shows how join and aggregation can be achieved across entity boundaries in such systems, as encountered for example in the pre-processing and exploration stages of educational data mining. A software stack is presented as a means by which this can be achieved; first, datasets are processed via ETL operations, then MapReduce is used to create indices of ordered and aggregated data. Finally, a Couchdb list function is used to iterate through these indices and perform joins, and to compute aggregated values on joined datasets such as variance and correlations. In terms of the case study, it is shown that the proposed approach to implementing cross-document joins and aggregation is effective and scalable. In addition, it was discovered that for the 2014 - 2016 UCT cohorts, NBT scores correlate better with final grades for the CSC1015F course than do Grade 12 results for English, Science and Mathematics. 2019-02-14T13:15:50Z 2019-02-14T13:15:50Z 2018 2019-02-14T11:26:06Z Master Thesis Masters MSc http://hdl.handle.net/11427/29530 eng application/pdf Department of Computer Science Faculty of Science University of Cape Town
spellingShingle Computer Science
Smith, Zach
Joining and aggregating datasets using CouchDB
thesis_degree_str Master's
title Joining and aggregating datasets using CouchDB
title_full Joining and aggregating datasets using CouchDB
title_fullStr Joining and aggregating datasets using CouchDB
title_full_unstemmed Joining and aggregating datasets using CouchDB
title_short Joining and aggregating datasets using CouchDB
title_sort joining and aggregating datasets using couchdb
topic Computer Science
url http://hdl.handle.net/11427/29530
work_keys_str_mv AT smithzach joiningandaggregatingdatasetsusingcouchdb