Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework

The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the earliest habitants of Southern Africa. Previous attempts have been made to recognize the complex text in the notebooks using machine learning techniques, but due to the complexity of...

Full description

Saved in:
Bibliographic Details
Main Author: Munyaradzi, Ngoni
Other Authors: Suleman, Hussein
Format: Thesis
Language:English
Published: Department of Computer Science 2014
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613194581704704
access_status_str Open Access
author Munyaradzi, Ngoni
author2 Suleman, Hussein
author_browse Munyaradzi, Ngoni
Suleman, Hussein
author_facet Suleman, Hussein
Munyaradzi, Ngoni
author_sort Munyaradzi, Ngoni
collection Thesis
description The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the earliest habitants of Southern Africa. Previous attempts have been made to recognize the complex text in the notebooks using machine learning techniques, but due to the complexity of the manuscripts the recognition accuracy was low. In this research, a crowdsouring based method is proposed to transcribe the historical handwritten manuscripts, where volunteers transcribe the notebooks online. An online crowdsourcing transcription tool was developed and deployed. Experiments were conducted to determine the quality of transcriptions and accuracy of the volunteers compared with a gold standard. The results show that volunteers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for ǀXam text and 95% for English text. When the ǀXam text transcriptions produced by the volunteers are compared with the gold standard, the volunteers achieve an average accuracy of 69.69%. Findings show that there exists a positive linear correlation between the inter-transcriber agreement and the accuracy of transcriptions. The user survey revealed that volunteers found the transcription process enjoyable, though it was difficult. Results indicate that volunteer thinking can be used to crowdsource intellectually-intensive tasks in digital libraries like transcription of handwritten manuscripts. Volunteer thinking outperforms machine learning techniques at the task of transcribing notebooks from the Bleek and Lloyd Collection.
format Thesis
id oai:open.uct.ac.za:11427/6640
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:32:13.078Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2014
publishDateRange 2014
publishDateSort 2014
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/6640 Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework Munyaradzi, Ngoni Suleman, Hussein The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the earliest habitants of Southern Africa. Previous attempts have been made to recognize the complex text in the notebooks using machine learning techniques, but due to the complexity of the manuscripts the recognition accuracy was low. In this research, a crowdsouring based method is proposed to transcribe the historical handwritten manuscripts, where volunteers transcribe the notebooks online. An online crowdsourcing transcription tool was developed and deployed. Experiments were conducted to determine the quality of transcriptions and accuracy of the volunteers compared with a gold standard. The results show that volunteers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for ǀXam text and 95% for English text. When the ǀXam text transcriptions produced by the volunteers are compared with the gold standard, the volunteers achieve an average accuracy of 69.69%. Findings show that there exists a positive linear correlation between the inter-transcriber agreement and the accuracy of transcriptions. The user survey revealed that volunteers found the transcription process enjoyable, though it was difficult. Results indicate that volunteer thinking can be used to crowdsource intellectually-intensive tasks in digital libraries like transcription of handwritten manuscripts. Volunteer thinking outperforms machine learning techniques at the task of transcribing notebooks from the Bleek and Lloyd Collection. 2014-08-20T19:30:39Z 2014-08-20T19:30:39Z 2013 Master Thesis Masters MSc http://hdl.handle.net/11427/6640 eng application/pdf Department of Computer Science Faculty of Science University of Cape Town
spellingShingle Munyaradzi, Ngoni
Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework
thesis_degree_str Master's
title Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework
title_full Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework
title_fullStr Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework
title_full_unstemmed Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework
title_short Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework
title_sort transcription of the bleek and lloyd collection using the bossa volunteer thinking framework
url http://hdl.handle.net/11427/6640
work_keys_str_mv AT munyaradzingoni transcriptionofthebleekandlloydcollectionusingthebossavolunteerthinkingframework