Full Text Available

Access Repository Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Automated phoneme mapping for cross-language speech recognition

Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006.

Saved in:

Bibliographic Details
Other Authors:	Botha, Elizabeth C.
Format:	Thesis
Published:	University of Pretoria 2013
Subjects:	Map Mllr Data pooling Embedded baum-welch re-estimation Transformation-based adaptation Cross-language Phoneme mapping Acoustic distance measures UCTD
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613576767733760
access_status_str	Open Access
author2	Botha, Elizabeth C.
author_browse	Botha, Elizabeth C.
author_facet	Botha, Elizabeth C.
collection	Thesis
dc_rights_str_mv	© 2004, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description	Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006.
format	Thesis
id	oai:repository.up.ac.za:2263/30584
institution	University of Pretoria (South Africa)
last_indexed	2026-06-10T12:38:21.029Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate	2013
publishDateRange	2013
publishDateSort	2013
publisher	University of Pretoria
publisherStr	University of Pretoria
record_format	dspace
source_str	UPSpace — University of Pretoria Institutional Repository
spelling	oai:repository.up.ac.za:2263/30584 Automated phoneme mapping for cross-language speech recognition Botha, Elizabeth C. jayren@yahoo.com Sooful, Jayren Jugpal Map Mllr Data pooling Embedded baum-welch re-estimation Transformation-based adaptation Cross-language Phoneme mapping Acoustic distance measures UCTD Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006. This dissertation explores a unique automated approach to map one phoneme set to another, based on the acoustic distances between the individual phonemes. Although the focus of this investigation is on cross-language applications, this automated approach can be extended to same-language but different-database applications as well. The main goal of this investigation is to be able to use the data of a source language, to train the initial acoustic models of a target language for which very little speech data may be available. To do this, an automatic technique for mapping the phonemes of the two data sets must be found. Using this technique, it would be possible to accelerate the development of a speech recognition system for a new language. The current research in the cross-language speech recognition field has focused on manual methods to map phonemes. This investigation has considered an English-to-Afrikaans phoneme mapping, as well as an Afrikaans-to-English phoneme mapping. This has been previously applied to these language instances, but utilising manual phoneme mapping methods. To determine the best phoneme mapping, different acoustic distance measures are compared. The distance measures that are considered are the Kullback-Leibler measure, the Bhattacharyya distance metric, the Mahalanobis measure, the Euclidean measure, the L2 metric and the Jeffreys-Matusita distance. The distance measures are tested by comparing the cross-database recognition results obtained on phoneme models created from the TIMIT speech corpus and a locally-compiled South African SUN Speech database. By selecting the most appropriate distance measure, an automated procedure to map phonemes from the source language to the target language can be done. The best distance measure for the mapping gives recognition rates comparable to a manual mapping process undertaken by a phonetic expert. This study also investigates the effect of the number of Gaussian mixture components on the mapping and on the speech recognition system’s performance. The results indicate that the recogniser’s performance increases up to a limit as the number of mixtures increase. In addition, this study has explored the effect of excluding the Mel Frequency delta and acceleration cepstral coefficients. It is found that the inclusion of these temporal features help improve the mapping and the recognition system’s phoneme recognition rate. Experiments are also carried out to determine the impact of the number of HMM recogniser states. It is found that single-state HMMs deliver the optimum cross-language phoneme recognition results. After having done the mapping, speaker adaptation strategies are applied on the recognisers to improve their target-language performance. The models of a fully trained speech recogniser in a source language are adapted to target-language models using Maximum Likelihood Linear Regression (MLLR) followed by Maximum A Posteriori (MAP) techniques. Embedded Baum-Welch re-estimation is used to further adapt the models to the target language. These techniques result in a considerable improvement in the phoneme recognition rate. Although a combination of MLLR and MAP techniques have been used previously in speech adaptation studies, the combination of MLLR, MAP and EBWR in cross-language speech recognition is a unique contribution of this study. Finally, a data pooling technique is applied to build a new recogniser using the automatically mapped phonemes from the target language as well as the source language phonemes. This new recogniser demonstrates moderate bilingual phoneme recognition capabilities. The bilingual recogniser is then further adapted to the target language using MAP and embedded Baum-Welch re-estimation techniques. This combination of adaptation techniques together with the data pooling strategy is uniquely applied in the field of cross-language recognition. The results obtained using this technique outperform all other techniques tested in terms of phoneme recognition rates, although it requires a considerably more time consuming training process. It displays only slightly poorer phoneme recognition than the recognisers trained and tested on the same language database. Electrical, Electronic and Computer Engineering unrestricted 2013-09-07T19:22:24Z 2005-01-11 2013-09-07T19:22:24Z 2004-06-10 2006-01-11 2005-01-11 Dissertation Sooful, J 2004, Automated phoneme mapping for cross-language speech recognition, MEng dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/30584 > http://hdl.handle.net/2263/30584 http://upetd.up.ac.za/thesis/available/etd-01112005-131128/ © 2004, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle	Map Mllr Data pooling Embedded baum-welch re-estimation Transformation-based adaptation Cross-language Phoneme mapping Acoustic distance measures UCTD Automated phoneme mapping for cross-language speech recognition
title	Automated phoneme mapping for cross-language speech recognition
title_full	Automated phoneme mapping for cross-language speech recognition
title_fullStr	Automated phoneme mapping for cross-language speech recognition
title_full_unstemmed	Automated phoneme mapping for cross-language speech recognition
title_short	Automated phoneme mapping for cross-language speech recognition
title_sort	automated phoneme mapping for cross language speech recognition
topic	Map Mllr Data pooling Embedded baum-welch re-estimation Transformation-based adaptation Cross-language Phoneme mapping Acoustic distance measures UCTD
url	http://hdl.handle.net/2263/30584 http://upetd.up.ac.za/thesis/available/etd-01112005-131128/

Full Text Available

Automated phoneme mapping for cross-language speech recognition

Similar Items