Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Enhancing digital text collections with detailed metadata to improve retrieval

Thesis (DPhil (Information Science))--University of Pretoria, 2020.

Saved in:
Bibliographic Details
Other Authors: Bothma, T.J.D. (Theodorus Jan Daniel)
Format: Thesis
Language:English
Published: University of Pretoria 2021
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613674442588160
access_status_str Open Access
author2 Bothma, T.J.D. (Theodorus Jan Daniel)
author_browse Bothma, T.J.D. (Theodorus Jan Daniel)
author_facet Bothma, T.J.D. (Theodorus Jan Daniel)
collection Thesis
dc_rights_str_mv © 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description Thesis (DPhil (Information Science))--University of Pretoria, 2020.
format Thesis
id oai:repository.up.ac.za:2263/79015
institution University of Pretoria (South Africa)
language English
last_indexed 2026-06-10T12:39:54.193Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate 2021
publishDateRange 2021
publishDateSort 2021
publisher University of Pretoria
publisherStr University of Pretoria
record_format dspace
source_str UPSpace — University of Pretoria Institutional Repository
spelling oai:repository.up.ac.za:2263/79015 Enhancing digital text collections with detailed metadata to improve retrieval Bothma, T.J.D. (Theodorus Jan Daniel) liezl.ball@up.ac.za Ball, Liezl UCTD Information science Metadata Digital humanities Retrieval Encoding Digital text collections Engineering, built environment and information technology theses SDG-04 SDG-04: Quality education Engineering, built environment and information technology theses SDG-09 SDG-09: Industry, innovation and infrastructure Engineering, built environment and information technology theses SDG-16 SDG-16: Peace, justice and strong institutions Thesis (DPhil (Information Science))--University of Pretoria, 2020. Digital text collections are increasingly important, as they enable researchers to explore new ways of interacting with texts through the use of technology. Various tools have been developed to facilitate exploring and searching in text collections at a fairly low level of granularity. Ideally, it should be possible to filter the results at a greater level of granularity to retrieve only specific instances in which the researcher is interested. The aim of this study was to investigate to what extent detailed metadata could be used to enhance texts in order to improve retrieval. To do this, the researcher had to identify metadata that could be useful to filter according to and find ways in which these metadata can be applied to or encoded in texts. The researcher also had to evaluate existing tools to determine to what extent current tools support retrieval on a fine-grained level. After identifying useful metadata and reviewing existing tools, the researcher could suggest a metadata framework that could be used to encode texts on a detailed level. Metadata in five different categories were used, namely morphological, syntactic, semantic, functional and bibliographic. A further contribution in this metadata framework was the addition of in-text bibliographic metadata, to use where sections in a text have different properties than those in the main text. The suggested framework had to be tested to determine if retrieval was indeed improved. In order to do so, a selection of texts was encoded with the suggested framework and a prototype was developed to test the retrieval. The prototype receives the encoded texts and stores the information in a database. A graphical user interface was developed to enable searching in the database in an easy and intuitive manner. The prototype demonstrates that it is possible to search for words or phrases with specific properties when detailed metadata are applied to texts. The fine-grained metadata from five different categories enable retrieval on a greater level of granularity and specificity. It is therefore recommended that detailed metadata are used to encode texts in order to improve retrieval in digital text collections. Keywords: metadata, digital humanities, digital text collections, retrieval, encoding Information Science DPhil (Information Science) Unrestricted 2021-03-17T07:04:07Z 2021-03-17T07:04:07Z 2021-04-20 2020 Thesis Ball, LH 2020, Enhancing digital text collections with detailed metadata to improve retrieval, DPhil (Information Science) Thesis, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/79015> A2021 http://hdl.handle.net/2263/79015 en © 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle UCTD
Information science
Metadata
Digital humanities
Retrieval
Encoding
Digital text collections
Engineering, built environment and information technology theses SDG-04
SDG-04: Quality education
Engineering, built environment and information technology theses SDG-09
SDG-09: Industry, innovation and infrastructure
Engineering, built environment and information technology theses SDG-16
SDG-16: Peace, justice and strong institutions
Enhancing digital text collections with detailed metadata to improve retrieval
title Enhancing digital text collections with detailed metadata to improve retrieval
title_full Enhancing digital text collections with detailed metadata to improve retrieval
title_fullStr Enhancing digital text collections with detailed metadata to improve retrieval
title_full_unstemmed Enhancing digital text collections with detailed metadata to improve retrieval
title_short Enhancing digital text collections with detailed metadata to improve retrieval
title_sort enhancing digital text collections with detailed metadata to improve retrieval
topic UCTD
Information science
Metadata
Digital humanities
Retrieval
Encoding
Digital text collections
Engineering, built environment and information technology theses SDG-04
SDG-04: Quality education
Engineering, built environment and information technology theses SDG-09
SDG-09: Industry, innovation and infrastructure
Engineering, built environment and information technology theses SDG-16
SDG-16: Peace, justice and strong institutions
url http://hdl.handle.net/2263/79015