Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Using question-specific vocabularies to support speech data collection with SALAAM

There has been an increasing use of small-vocabulary spoken dialogue systems in low-resource settings for information dissemination and data collection. This provides an opportunity to reduce the information gap in low-resource settings in which low-literacy is a huge hindrance to the adoption of In...

Full description

Saved in:
Bibliographic Details
Main Author: Chibuye, Kayokwa Nick
Other Authors: De Renzi, Brian
Format: Thesis
Language:English
Published: Department of Computer Science 2020
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613217802420224
access_status_str Open Access
author Chibuye, Kayokwa Nick
author2 De Renzi, Brian
author_browse Chibuye, Kayokwa Nick
De Renzi, Brian
author_facet De Renzi, Brian
Chibuye, Kayokwa Nick
author_sort Chibuye, Kayokwa Nick
collection Thesis
description There has been an increasing use of small-vocabulary spoken dialogue systems in low-resource settings for information dissemination and data collection. This provides an opportunity to reduce the information gap in low-resource settings in which low-literacy is a huge hindrance to the adoption of Information Communication Technologies (ICTs). Since the languages spoken in these areas are computationally low-resourced, they rely on techniques such as crosslanguage phoneme mapping to facilitate fast development of small-vocabulary speech recognisers. Despite the success of this technique, there has been a lack of guidance on how to deploy such systems across a range of languages. This study presents a systematic exploration of the suitability and limitations of using crosslanguage phoneme mapping for the development of small-vocabulary speech recognisers for computationally low-resource languages, particularly Bantu languages. Five target languages and four source languages were used in the study. Speech-based Accent Learning And Articulation Mapping (SALAAM), a cross-language phoneme mapping algorithm was used to aid the study based on its implementation in an open-source tool Lex4All. The following research questions guided our investigations: i) What impact does source language choice have on recognition accuracy, ii) What impact does gender composition of the training data set have on recognition accuracy and iii) What impact do varied alternative pronunciations per word type have on recognition accuracy. Data for the target languages was collected from 104 university student volunteers consisting of 58 female and 46 male students. The results showed that target and source language phonetic similarity as well as gender composition of the training datasets affects recognition accuracy of speech applications developed using cross-language phoneme mapping techniques. They also showed that increasing the number of alternative pronunciations per word in the vocabulary generally increases recognition accuracy although with a slower system response time. This study provides evidence that a careful selection of the source language, gender composition of the training data and the number of alternative pronunciations per word can improve the recognition accuracy of speech recognisers developed using cross-language phoneme mapping.
format Thesis
id oai:open.uct.ac.za:11427/31535
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:32:38.580Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2020
publishDateRange 2020
publishDateSort 2020
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/31535 Using question-specific vocabularies to support speech data collection with SALAAM Chibuye, Kayokwa Nick De Renzi, Brian computer science There has been an increasing use of small-vocabulary spoken dialogue systems in low-resource settings for information dissemination and data collection. This provides an opportunity to reduce the information gap in low-resource settings in which low-literacy is a huge hindrance to the adoption of Information Communication Technologies (ICTs). Since the languages spoken in these areas are computationally low-resourced, they rely on techniques such as crosslanguage phoneme mapping to facilitate fast development of small-vocabulary speech recognisers. Despite the success of this technique, there has been a lack of guidance on how to deploy such systems across a range of languages. This study presents a systematic exploration of the suitability and limitations of using crosslanguage phoneme mapping for the development of small-vocabulary speech recognisers for computationally low-resource languages, particularly Bantu languages. Five target languages and four source languages were used in the study. Speech-based Accent Learning And Articulation Mapping (SALAAM), a cross-language phoneme mapping algorithm was used to aid the study based on its implementation in an open-source tool Lex4All. The following research questions guided our investigations: i) What impact does source language choice have on recognition accuracy, ii) What impact does gender composition of the training data set have on recognition accuracy and iii) What impact do varied alternative pronunciations per word type have on recognition accuracy. Data for the target languages was collected from 104 university student volunteers consisting of 58 female and 46 male students. The results showed that target and source language phonetic similarity as well as gender composition of the training datasets affects recognition accuracy of speech applications developed using cross-language phoneme mapping techniques. They also showed that increasing the number of alternative pronunciations per word in the vocabulary generally increases recognition accuracy although with a slower system response time. This study provides evidence that a careful selection of the source language, gender composition of the training data and the number of alternative pronunciations per word can improve the recognition accuracy of speech recognisers developed using cross-language phoneme mapping. 2020-03-10T13:58:34Z 2020-03-10T13:58:34Z 2019 2020-03-10T13:45:53Z Master Thesis Masters MSc http://hdl.handle.net/11427/31535 eng application/pdf Department of Computer Science Faculty of Science
spellingShingle computer science
Chibuye, Kayokwa Nick
Using question-specific vocabularies to support speech data collection with SALAAM
thesis_degree_str Master's
title Using question-specific vocabularies to support speech data collection with SALAAM
title_full Using question-specific vocabularies to support speech data collection with SALAAM
title_fullStr Using question-specific vocabularies to support speech data collection with SALAAM
title_full_unstemmed Using question-specific vocabularies to support speech data collection with SALAAM
title_short Using question-specific vocabularies to support speech data collection with SALAAM
title_sort using question specific vocabularies to support speech data collection with salaam
topic computer science
url http://hdl.handle.net/11427/31535
work_keys_str_mv AT chibuyekayokwanick usingquestionspecificvocabulariestosupportspeechdatacollectionwithsalaam