Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Automated phoneme mapping for cross-language speech recognition

Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006.

Saved in:
Bibliographic Details
Other Authors: Botha, Elizabeth C.
Format: Thesis
Published: University of Pretoria 2013
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613576767733760
access_status_str Open Access
author2 Botha, Elizabeth C.
author_browse Botha, Elizabeth C.
author_facet Botha, Elizabeth C.
collection Thesis
dc_rights_str_mv © 2004, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006.
format Thesis
id oai:repository.up.ac.za:2263/30584
institution University of Pretoria (South Africa)
last_indexed 2026-06-10T12:38:21.029Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate 2013
publishDateRange 2013
publishDateSort 2013
publisher University of Pretoria
publisherStr University of Pretoria
record_format dspace
source_str UPSpace — University of Pretoria Institutional Repository
spelling oai:repository.up.ac.za:2263/30584 Automated phoneme mapping for cross-language speech recognition Botha, Elizabeth C. jayren@yahoo.com Sooful, Jayren Jugpal Map Mllr Data pooling Embedded baum-welch re-estimation Transformation-based adaptation Cross-language Phoneme mapping Acoustic distance measures UCTD Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006. This dissertation explores a unique automated approach to map one phoneme set to another, based on the acoustic distances between the individual phonemes. Although the focus of this investigation is on cross-language applications, this automated approach can be extended to same-language but different-database applications as well. The main goal of this investigation is to be able to use the data of a source language, to train the initial acoustic models of a target language for which very little speech data may be available. To do this, an automatic technique for mapping the phonemes of the two data sets must be found. Using this technique, it would be possible to accelerate the development of a speech recognition system for a new language. The current research in the cross-language speech recognition field has focused on manual methods to map phonemes. This investigation has considered an English-to-Afrikaans phoneme mapping, as well as an Afrikaans-to-English phoneme mapping. This has been previously applied to these language instances, but utilising manual phoneme mapping methods. To determine the best phoneme mapping, different acoustic distance measures are compared. The distance measures that are considered are the Kullback-Leibler measure, the Bhattacharyya distance metric, the Mahalanobis measure, the Euclidean measure, the L2 metric and the Jeffreys-Matusita distance. The distance measures are tested by comparing the cross-database recognition results obtained on phoneme models created from the TIMIT speech corpus and a locally-compiled South African SUN Speech database. By selecting the most appropriate distance measure, an automated procedure to map phonemes from the source language to the target language can be done. The best distance measure for the mapping gives recognition rates comparable to a manual mapping process undertaken by a phonetic expert. This study also investigates the effect of the number of Gaussian mixture components on the mapping and on the speech recognition system’s performance. The results indicate that the recogniser’s performance increases up to a limit as the number of mixtures increase. In addition, this study has explored the effect of excluding the Mel Frequency delta and acceleration cepstral coefficients. It is found that the inclusion of these temporal features help improve the mapping and the recognition system’s phoneme recognition rate. Experiments are also carried out to determine the impact of the number of HMM recogniser states. It is found that single-state HMMs deliver the optimum cross-language phoneme recognition results. After having done the mapping, speaker adaptation strategies are applied on the recognisers to improve their target-language performance. The models of a fully trained speech recogniser in a source language are adapted to target-language models using Maximum Likelihood Linear Regression (MLLR) followed by Maximum A Posteriori (MAP) techniques. Embedded Baum-Welch re-estimation is used to further adapt the models to the target language. These techniques result in a considerable improvement in the phoneme recognition rate. Although a combination of MLLR and MAP techniques have been used previously in speech adaptation studies, the combination of MLLR, MAP and EBWR in cross-language speech recognition is a unique contribution of this study. Finally, a data pooling technique is applied to build a new recogniser using the automatically mapped phonemes from the target language as well as the source language phonemes. This new recogniser demonstrates moderate bilingual phoneme recognition capabilities. The bilingual recogniser is then further adapted to the target language using MAP and embedded Baum-Welch re-estimation techniques. This combination of adaptation techniques together with the data pooling strategy is uniquely applied in the field of cross-language recognition. The results obtained using this technique outperform all other techniques tested in terms of phoneme recognition rates, although it requires a considerably more time consuming training process. It displays only slightly poorer phoneme recognition than the recognisers trained and tested on the same language database. Electrical, Electronic and Computer Engineering unrestricted 2013-09-07T19:22:24Z 2005-01-11 2013-09-07T19:22:24Z 2004-06-10 2006-01-11 2005-01-11 Dissertation Sooful, J 2004, Automated phoneme mapping for cross-language speech recognition, MEng dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/30584 > http://hdl.handle.net/2263/30584 http://upetd.up.ac.za/thesis/available/etd-01112005-131128/ © 2004, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle Map
Mllr
Data pooling
Embedded baum-welch re-estimation
Transformation-based adaptation
Cross-language
Phoneme mapping
Acoustic distance measures
UCTD
Automated phoneme mapping for cross-language speech recognition
title Automated phoneme mapping for cross-language speech recognition
title_full Automated phoneme mapping for cross-language speech recognition
title_fullStr Automated phoneme mapping for cross-language speech recognition
title_full_unstemmed Automated phoneme mapping for cross-language speech recognition
title_short Automated phoneme mapping for cross-language speech recognition
title_sort automated phoneme mapping for cross language speech recognition
topic Map
Mllr
Data pooling
Embedded baum-welch re-estimation
Transformation-based adaptation
Cross-language
Phoneme mapping
Acoustic distance measures
UCTD
url http://hdl.handle.net/2263/30584
http://upetd.up.ac.za/thesis/available/etd-01112005-131128/