Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Using question-specific vocabularies to support speech data collection with SALAAM

There has been an increasing use of small-vocabulary spoken dialogue systems in low-resource settings for information dissemination and data collection. This provides an opportunity to reduce the information gap in low-resource settings in which low-literacy is a huge hindrance to the adoption of In...

Full description

Saved in:

Bibliographic Details
Main Author:	Chibuye, Kayokwa Nick
Other Authors:	De Renzi, Brian
Format:	Thesis
Language:	English
Published:	Department of Computer Science 2020
Subjects:	computer science
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613217802420224
access_status_str	Open Access
author	Chibuye, Kayokwa Nick
author2	De Renzi, Brian
author_browse	Chibuye, Kayokwa Nick De Renzi, Brian
author_facet	De Renzi, Brian Chibuye, Kayokwa Nick
author_sort	Chibuye, Kayokwa Nick
collection	Thesis
description	There has been an increasing use of small-vocabulary spoken dialogue systems in low-resource settings for information dissemination and data collection. This provides an opportunity to reduce the information gap in low-resource settings in which low-literacy is a huge hindrance to the adoption of Information Communication Technologies (ICTs). Since the languages spoken in these areas are computationally low-resourced, they rely on techniques such as crosslanguage phoneme mapping to facilitate fast development of small-vocabulary speech recognisers. Despite the success of this technique, there has been a lack of guidance on how to deploy such systems across a range of languages. This study presents a systematic exploration of the suitability and limitations of using crosslanguage phoneme mapping for the development of small-vocabulary speech recognisers for computationally low-resource languages, particularly Bantu languages. Five target languages and four source languages were used in the study. Speech-based Accent Learning And Articulation Mapping (SALAAM), a cross-language phoneme mapping algorithm was used to aid the study based on its implementation in an open-source tool Lex4All. The following research questions guided our investigations: i) What impact does source language choice have on recognition accuracy, ii) What impact does gender composition of the training data set have on recognition accuracy and iii) What impact do varied alternative pronunciations per word type have on recognition accuracy. Data for the target languages was collected from 104 university student volunteers consisting of 58 female and 46 male students. The results showed that target and source language phonetic similarity as well as gender composition of the training datasets affects recognition accuracy of speech applications developed using cross-language phoneme mapping techniques. They also showed that increasing the number of alternative pronunciations per word in the vocabulary generally increases recognition accuracy although with a slower system response time. This study provides evidence that a careful selection of the source language, gender composition of the training data and the number of alternative pronunciations per word can improve the recognition accuracy of speech recognisers developed using cross-language phoneme mapping.
format	Thesis
id	oai:open.uct.ac.za:11427/31535
institution	University of Cape Town (South Africa)
language	eng
last_indexed	2026-06-10T12:32:38.580Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate	2020
publishDateRange	2020
publishDateSort	2020
publisher	Department of Computer Science
publisherStr	Department of Computer Science
record_format	dspace
source_str	UCTD — University of Cape Town Open Access Repository
spelling	oai:open.uct.ac.za:11427/31535 Using question-specific vocabularies to support speech data collection with SALAAM Chibuye, Kayokwa Nick De Renzi, Brian computer science There has been an increasing use of small-vocabulary spoken dialogue systems in low-resource settings for information dissemination and data collection. This provides an opportunity to reduce the information gap in low-resource settings in which low-literacy is a huge hindrance to the adoption of Information Communication Technologies (ICTs). Since the languages spoken in these areas are computationally low-resourced, they rely on techniques such as crosslanguage phoneme mapping to facilitate fast development of small-vocabulary speech recognisers. Despite the success of this technique, there has been a lack of guidance on how to deploy such systems across a range of languages. This study presents a systematic exploration of the suitability and limitations of using crosslanguage phoneme mapping for the development of small-vocabulary speech recognisers for computationally low-resource languages, particularly Bantu languages. Five target languages and four source languages were used in the study. Speech-based Accent Learning And Articulation Mapping (SALAAM), a cross-language phoneme mapping algorithm was used to aid the study based on its implementation in an open-source tool Lex4All. The following research questions guided our investigations: i) What impact does source language choice have on recognition accuracy, ii) What impact does gender composition of the training data set have on recognition accuracy and iii) What impact do varied alternative pronunciations per word type have on recognition accuracy. Data for the target languages was collected from 104 university student volunteers consisting of 58 female and 46 male students. The results showed that target and source language phonetic similarity as well as gender composition of the training datasets affects recognition accuracy of speech applications developed using cross-language phoneme mapping techniques. They also showed that increasing the number of alternative pronunciations per word in the vocabulary generally increases recognition accuracy although with a slower system response time. This study provides evidence that a careful selection of the source language, gender composition of the training data and the number of alternative pronunciations per word can improve the recognition accuracy of speech recognisers developed using cross-language phoneme mapping. 2020-03-10T13:58:34Z 2020-03-10T13:58:34Z 2019 2020-03-10T13:45:53Z Master Thesis Masters MSc http://hdl.handle.net/11427/31535 eng application/pdf Department of Computer Science Faculty of Science
spellingShingle	computer science Chibuye, Kayokwa Nick Using question-specific vocabularies to support speech data collection with SALAAM
thesis_degree_str	Master's
title	Using question-specific vocabularies to support speech data collection with SALAAM
title_full	Using question-specific vocabularies to support speech data collection with SALAAM
title_fullStr	Using question-specific vocabularies to support speech data collection with SALAAM
title_full_unstemmed	Using question-specific vocabularies to support speech data collection with SALAAM
title_short	Using question-specific vocabularies to support speech data collection with SALAAM
title_sort	using question specific vocabularies to support speech data collection with salaam
topic	computer science
url	http://hdl.handle.net/11427/31535
work_keys_str_mv	AT chibuyekayokwanick usingquestionspecificvocabulariestosupportspeechdatacollectionwithsalaam

Full Text Available

Using question-specific vocabularies to support speech data collection with SALAAM

Similar Items