Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Language modelling for code-switched automatic speech recognition in five South African languages

Thesis (PhD)--Stellenbosch University, 2018.

Saved in:
Bibliographic Details
Main Author: Van der Westhuizen, Ewald
Other Authors: Niesler, T. R.
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2018
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614096298344448
access_status_str Open Access
author Van der Westhuizen, Ewald
author2 Niesler, T. R.
author_browse Niesler, T. R.
Van der Westhuizen, Ewald
author_facet Niesler, T. R.
Van der Westhuizen, Ewald
author_sort Van der Westhuizen, Ewald
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (PhD)--Stellenbosch University, 2018.
format Thesis
id oai:scholar.sun.ac.za:10019.1/104997
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:46:36.532Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2018
publishDateRange 2018
publishDateSort 2018
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/104997 Language modelling for code-switched automatic speech recognition in five South African languages Van der Westhuizen, Ewald Niesler, T. R. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. UCTD Code switching (Linguistics) Automatic speech recognition Diglossia (Linguistics) Acoustic models Grammar, Comparative and general -- Augmentatives Thesis (PhD)--Stellenbosch University, 2018. ENGLISH ABSTRACT: Code-switching refers to natural, spontaneous language alternation by multilingual speakers during a conversation or utterance, and is prevalent in everyday conversations by multilingual South Africans. Automatic speech recognition systems are generally highly optimised for monolingual input and performance deteriorates when presented with mixed-language speech. This thesis addresses the automatic recognition of speech containing code-switching between English and four South African Bantu languages, focussing specifically on the language modelling of English-isiZulu, English-isiXhosa, English- Setswana and English-Sesotho. Due to the severe scarcity of code-switched speech data in South African languages, it was necessary to first develop a representative corpus. This new and unique 35-hour corpus contains segmented and transcribed code-switched speech from conversations in South African soap operas, which exhibit spontaneous utterances with regular code-switching in the target languages. Insertional, alternational, and intraword intrasentential code-switching are all represented in the data, as are some other special characteristics of fast, spontaneous Bantu speech such as postlexical deletion. The distribution of language switches is extremely sparse, however. In this thesis, a number of data-driven modelling approaches were investigated and applied to address the sparsity by augmenting the training data with synthetically generated data. Postlexical deletion was successfully modelled statistically with joint-sequence models, and these models were used to generate synthetic pronunciations which were demonstrated to lead to improved automatic speech recognition performance. Two new code-switched language modelling approaches were proposed to address data sparsity. First, parallel language-dependent language modelling (PLDLM), which consists of two monolingual language models with explicit language transitions, was demonstrated to outperform a conventional language-independent language model in terms of recognition word error rate. Second, language models in which word embeddings were used to synthesise probable unseen code-switched bigrams were considered. It was possible to achieve a reduction of up to 31% in language model perplexity across a language switch boundary by including such synthesised code-switch bigrams. Although smaller, improvements in the recognition word error rate were also observed. AFRIKAANSE OPSOMMING: Kodewisseling behels die natuurlike, spontane skakeling tussen tale deur veeltalige sprekers gedurende ’n gesprek of uiting en kom alledaags voor in gesprekke van veeltalige Suid-Afrikaners. Outomatiese spraakherkenningstelsels is in die algemeen spesifiek geoptimeer vir die hantering van eentalige spraak en toon swak werkverrigting in die hantering van meertalige spraak. Hierdie tesis spreek die outomatiese herkenning van spraak met kodewisseling tussen Engels en vier Suid-Afrikaanse Bantoe-tale aan. Die taalmodellering van Engels-IsiZulu, Engels-IsiXhosa, Engels-Setswana en Engels-Sesotho spraak met kodewisseling word spesifiek aangespreek. Weens die skaarste van spraakdata in Suid-Afrikaanse tale wat kodewisseling bevat, was dit nodig om ’n verteenwoordigende spraakkorpus saam te stel. Hierdie nuwe en unieke korpus bestaan uit 35-uur se gesegmenteerde en getranskribeerde spraak wat kodewisseling bevat. Die data is onttrek uit gesprekke in Suid-Afrikaanse sepie-TVreekse, wat spontane spraak met gereelde kodewisseling toon in die voorge noemde tale. Verskeie kodewisselingsvorme kom in die data voor, waaronder intersentensiële kodewisseling as ’n insetsel (insertional), as ’n alternerende sinsdeel (alternational) of intern tot ’n woord (intraword) kan voorkom. Die verspreiding van kodewisselingvoorbeelde in die data is egter besonder yl. ’n Aantal datagedrewe modelleringstegnieke is ondersoek om yl afrigdata met sintetiese data aan te vul. Vokaaldelesie, ’n kenmerkende verskynsel in spontane spraak met ’n hoë tempo, word ook onder die Afrikatale waargeneem. Vokaaldelesie is suksesvol gemodelleer met gesamentlike-sekwensiemodelle. Hierdie modelle is gebruik om sintetiese uitsprake te skep wat gelei het tot verbeterde woordfouttempo met die outomatiese spraakherkenner. Twee nuwe benaderings tot die taalmodellering van kodewisseling is ondersoek. Die eerste is ’n parallelle taalafhanklike taalmodel wat twee eentalige taalmodelle met eksplisiete taaloorgangskakels verbind. Dit is bewys dat hierdie benadering ’n beter woordfouttempo as die konvensionele taalonafhanklike taalmodel kon lewer. Die tweede benadering het taalmodelle ondersoek waarby woordinbedding toegepas is om waarskynlike kodewisselingsbigramme te sintetiseer. Dit is moontlik om ’n afname van tot 31% in die perpleksiteit by ’n taalskakelingspunt te bewerkstellig deur die sintetiese kodewisselingsbigramme by die taalmodelle in te sluit. ’n Verbetering in woordfouttempo is ook waargeneem, alhoewel kleiner. Doctoral 2018-11-22T09:08:36Z 2018-12-07T06:54:28Z 2018-11-22T09:08:36Z 2018-12-07T06:54:28Z 2018-12 Thesis http://hdl.handle.net/10019.1/104997 en_ZA Stellenbosch University 209 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle UCTD
Code switching (Linguistics)
Automatic speech recognition
Diglossia (Linguistics)
Acoustic models
Grammar, Comparative and general -- Augmentatives
Van der Westhuizen, Ewald
Language modelling for code-switched automatic speech recognition in five South African languages
title Language modelling for code-switched automatic speech recognition in five South African languages
title_full Language modelling for code-switched automatic speech recognition in five South African languages
title_fullStr Language modelling for code-switched automatic speech recognition in five South African languages
title_full_unstemmed Language modelling for code-switched automatic speech recognition in five South African languages
title_short Language modelling for code-switched automatic speech recognition in five South African languages
title_sort language modelling for code switched automatic speech recognition in five south african languages
topic UCTD
Code switching (Linguistics)
Automatic speech recognition
Diglossia (Linguistics)
Acoustic models
Grammar, Comparative and general -- Augmentatives
url http://hdl.handle.net/10019.1/104997
work_keys_str_mv AT vanderwesthuizenewald languagemodellingforcodeswitchedautomaticspeechrecognitioninfivesouthafricanlanguages