Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Crowdsourcing a text corpus for a low resource language

Low resourced languages, such as South Africa's isiXhosa, have a limited number of digitised texts, making it challenging to build language corpora and the information retrieval services, such as search and translation that depend on them. Researchers have been unable to assemble isiXhosa corpora of...

Full description

Saved in:
Bibliographic Details
Main Author: Packham, Sean
Other Authors: Suleman, Hussein
Format: Thesis
Language:English
Published: Department of Computer Science 2016
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614047907610624
access_status_str Open Access
author Packham, Sean
author2 Suleman, Hussein
author_browse Packham, Sean
Suleman, Hussein
author_facet Suleman, Hussein
Packham, Sean
author_sort Packham, Sean
collection Thesis
description Low resourced languages, such as South Africa's isiXhosa, have a limited number of digitised texts, making it challenging to build language corpora and the information retrieval services, such as search and translation that depend on them. Researchers have been unable to assemble isiXhosa corpora of sufficient size and quality to produce working machine translation systems and it has been acknowledged that there is little to know training data and sourcing translations from professionals can be a costly process. A crowdsourcing translation game which paid participants for their contributions was proposed as a solution to source original and relevant parallel corpora for low resource languages such as isiXhosa. The objectives of this dissertation is to report on the four experiments that were conducted to assess user motivation and contribution quantity under various scenarios using the developed crowdsourcing translation game. The first experiment was a pilot study to test a custom built system and to find out if social network users would volunteer to participate in a translation game for free. The second experiment tested multiple payment schemes with users from the University of Cape Town. The schemes rewarded users with consistent, increasing or decreasing amounts for subsequent contributions. Experiment 3 tested whether the same users from Experiment 2 would continue contributing if payments were taken away. The last experiment tested a payment scheme that did not offer a direct and guaranteed reward. Users were paid based on their leaderboard placement and only a limited number of the top leaderboard spots were allocated rewards. From experiment 1 and 3 we found that people do not volunteer without financial incentives, experiment 2 and 4 showed that people want increased rewards when putting in increased effort , experiment 3 also showed that people will not continue contributing if the financial incentives are taken away and experiment 4 also showed that the possibility of incentives is as attractive as offering guaranteed incentives .
format Thesis
id oai:open.uct.ac.za:11427/20436
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:45:50.449Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2016
publishDateRange 2016
publishDateSort 2016
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/20436 Crowdsourcing a text corpus for a low resource language Packham, Sean Suleman, Hussein Computer Science Low resourced languages, such as South Africa's isiXhosa, have a limited number of digitised texts, making it challenging to build language corpora and the information retrieval services, such as search and translation that depend on them. Researchers have been unable to assemble isiXhosa corpora of sufficient size and quality to produce working machine translation systems and it has been acknowledged that there is little to know training data and sourcing translations from professionals can be a costly process. A crowdsourcing translation game which paid participants for their contributions was proposed as a solution to source original and relevant parallel corpora for low resource languages such as isiXhosa. The objectives of this dissertation is to report on the four experiments that were conducted to assess user motivation and contribution quantity under various scenarios using the developed crowdsourcing translation game. The first experiment was a pilot study to test a custom built system and to find out if social network users would volunteer to participate in a translation game for free. The second experiment tested multiple payment schemes with users from the University of Cape Town. The schemes rewarded users with consistent, increasing or decreasing amounts for subsequent contributions. Experiment 3 tested whether the same users from Experiment 2 would continue contributing if payments were taken away. The last experiment tested a payment scheme that did not offer a direct and guaranteed reward. Users were paid based on their leaderboard placement and only a limited number of the top leaderboard spots were allocated rewards. From experiment 1 and 3 we found that people do not volunteer without financial incentives, experiment 2 and 4 showed that people want increased rewards when putting in increased effort , experiment 3 also showed that people will not continue contributing if the financial incentives are taken away and experiment 4 also showed that the possibility of incentives is as attractive as offering guaranteed incentives . 2016-07-18T12:55:04Z 2016-07-18T12:55:04Z 2016 Master Thesis Masters MSc http://hdl.handle.net/11427/20436 eng application/pdf Department of Computer Science Faculty of Science University of Cape Town
spellingShingle Computer Science
Packham, Sean
Crowdsourcing a text corpus for a low resource language
thesis_degree_str Master's
title Crowdsourcing a text corpus for a low resource language
title_full Crowdsourcing a text corpus for a low resource language
title_fullStr Crowdsourcing a text corpus for a low resource language
title_full_unstemmed Crowdsourcing a text corpus for a low resource language
title_short Crowdsourcing a text corpus for a low resource language
title_sort crowdsourcing a text corpus for a low resource language
topic Computer Science
url http://hdl.handle.net/11427/20436
work_keys_str_mv AT packhamsean crowdsourcingatextcorpusforalowresourcelanguage