Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Pronunciation modelling and bootstrapping

Thesis (PhD (Electronic Engineering))--University of Pretoria, 2006.

Saved in:
Bibliographic Details
Other Authors: Barnard, E.
Format: Thesis
Published: University of Pretoria 2013
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613454853996544
access_status_str Open Access
author2 Barnard, E.
author_browse Barnard, E.
author_facet Barnard, E.
collection Thesis
dc_rights_str_mv © 2005, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description Thesis (PhD (Electronic Engineering))--University of Pretoria, 2006.
format Thesis
id oai:repository.up.ac.za:2263/28611
institution University of Pretoria (South Africa)
last_indexed 2026-06-10T12:36:24.683Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate 2013
publishDateRange 2013
publishDateSort 2013
publisher University of Pretoria
publisherStr University of Pretoria
record_format dspace
source_str UPSpace — University of Pretoria Institutional Repository
spelling oai:repository.up.ac.za:2263/28611 Pronunciation modelling and bootstrapping Barnard, E. mdavel@csir.co.za Davel, Marelie Hattingh Grapheme-to-phoneme conversion Grapheme-to-phoneme alignment Bootstrapping UCTD Thesis (PhD (Electronic Engineering))--University of Pretoria, 2006. Bootstrapping techniques have the potential to accelerate the development of language technology resources. This is of specific importance in the developing world where language technology resources are scarce and linguistic diversity is high. In this thesis we analyse the pronunciation modelling task within a bootstrapping framework, as a case study in the bootstrapping of language technology resources. We analyse the grapheme-to-phoneme conversion task in the search for a grapheme-to-phoneme conversion algorithm that can be utilised during bootstrapping. We experiment with enhancements to the Dynamically Expanding Context algorithm and develop a new algorithm for grapheme-tophoneme rule extraction (Default & Refine) that utilises the concept of a ‘default phoneme’ to create a cascade of increasingly specialised rules. This algorithm displays a number of attractive properties including rapid learning, language independence, good asymptotic accuracy, robustness to noise, and the production of a compact rule set. In order to have greater flexibility with regard to the various heuristic choices made during rewrite rule extraction, we define a new theoretical framework for analysing instance-based learning of rewrite rule sets. We define the concept of minimal representation graphs, and discuss the utility of these graphs in obtaining the smallest possible rule set describing a given set of discrete training data. We develop an approach for the interactive creation of pronunciation models via bootstrapping, and implement this approach in a system that integrates various of the analysed grapheme-to-phoneme alignment and conversion algorithms. The focus of this work is on combining machine learning and human intervention in such a way as to minimise the amount of human effort required during bootstrapping, and a generic framework for the analysis of this process is defined. Practical tools that support the bootstrapping process are developed and the efficiency of the process is analysed from both a machine learning and a human factors perspective. We find that even linguistically untrained users can use the system to create electronic pronunciation dictionaries accurately, in a fraction of the time the traditional approach requires. We create new dictionaries in a number of languages (isiZulu, Afrikaans and Sepedi) and demonstrate the utility of these dictionaries by incorporating them in speech technology systems. Electrical, Electronic and Computer Engineering unrestricted 2013-09-07T13:49:04Z 2005-10-11 2013-09-07T13:49:04Z 2005-08-01 2006-10-11 2005-10-11 Thesis Davel, M 2005, Pronunciation modelling and bootstrapping, PhD thesis, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/28611 > http://hdl.handle.net/2263/28611 http://upetd.up.ac.za/thesis/available/etd-10112005-150530/ © 2005, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf application/pdf application/pdf application/pdf application/pdf application/pdf application/pdf application/pdf application/pdf application/pdf University of Pretoria
spellingShingle Grapheme-to-phoneme conversion
Grapheme-to-phoneme alignment
Bootstrapping
UCTD
Pronunciation modelling and bootstrapping
title Pronunciation modelling and bootstrapping
title_full Pronunciation modelling and bootstrapping
title_fullStr Pronunciation modelling and bootstrapping
title_full_unstemmed Pronunciation modelling and bootstrapping
title_short Pronunciation modelling and bootstrapping
title_sort pronunciation modelling and bootstrapping
topic Grapheme-to-phoneme conversion
Grapheme-to-phoneme alignment
Bootstrapping
UCTD
url http://hdl.handle.net/2263/28611
http://upetd.up.ac.za/thesis/available/etd-10112005-150530/