Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems

Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2011.

Saved in:

Bibliographic Details
Main Author:	Goussard, George Willem
Other Authors:	Niesler, T. R.
Format:	Thesis
Language:	en_ZA
Published:	Stellenbosch : University of Stellenbosch 2011
Subjects:	Automatic speech recognition Pronunciation dictionary TIMIT Automatically determined phonetic baseforms Dissertations > Electronic engineering Theses > Electronic engineering Natural language processing (Computer science) Speech processing systems
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867614050051948544
access_status_str	Open Access
author	Goussard, George Willem
author2	Niesler, T. R.
author_browse	Goussard, George Willem Niesler, T. R.
author_facet	Niesler, T. R. Goussard, George Willem
author_sort	Goussard, George Willem
collection	Thesis
dc_rights_str_mv	University of Stellenbosch
description	Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2011.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/6686
institution	Stellenbosch University (South Africa)
language	en_ZA
last_indexed	2026-06-10T12:45:52.267Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2011
publishDateRange	2011
publishDateSort	2011
publisher	Stellenbosch : University of Stellenbosch
publisherStr	Stellenbosch : University of Stellenbosch
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/6686 Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems Goussard, George Willem Niesler, T. R. University of Stellenbosch. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Automatic speech recognition Pronunciation dictionary TIMIT Automatically determined phonetic baseforms Dissertations -- Electronic engineering Theses -- Electronic engineering Natural language processing (Computer science) Speech processing systems Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2011. ENGLISH ABSTRACT: This thesis presents a system that is designed to replace the manual process of generating a pronunciation dictionary for use in automatic speech recognition. The proposed system has several stages. The first stage segments the audio into what will be known as the subword units, using a frequency domain method. In the second stage, dynamic time warping is used to determine the similarity between the segments of each possible pair of these acoustic segments. These similarities are used to cluster similar acoustic segments into acoustic clusters. The final stage derives a pronunciation dictionary from the orthography of the training data and corresponding sequence of acoustic clusters. This process begins with an initial mapping between words and their sequence of clusters, established by Viterbi alignment with the orthographic transcription. The dictionary is refined iteratively by pruning redundant mappings, hidden Markov model estimation and Viterbi re-alignment in each iteration. This approach is evaluated experimentally by applying it to two subsets of the TIMIT corpus. It is found that, when test words are repeated often in the training material, the approach leads to a system whose accuracy is almost as good as one trained using the phonetic transcriptions. When test words are not repeated often in the training set, the proposed approach leads to better results than those achieved using the phonetic transcriptions, although the recognition is poor overall in this case. AFRIKAANSE OPSOMMING: Die doelwit van die tesis is om ’n stelsel te beskryf wat ontwerp is om die handgedrewe proses in die samestelling van ’n woordeboek, vir die gebruik in outomatiese spraakherkenningsstelsels, te vervang. Die voorgestelde stelsel bestaan uit ’n aantal stappe. Die eerste stap is die segmentering van die oudio in sogenaamde sub-woord eenhede deur gebruik te maak van ’n frekwensie gebied tegniek. Met die tweede stap word die dinamiese tydverplasingsalgoritme ingespan om die ooreenkoms tussen die segmente van elkeen van die moontlike pare van die akoestiese segmente bepaal. Die ooreenkomste word dan gebruik om die akoestiese segmente te groepeer in akoestiese groepe. Die laaste stap stel die woordeboek saam deur gebruik te maak van die ortografiese transkripsie van afrigtingsdata en die ooreenstemmende reeks akoestiese groepe. Die finale stap begin met ’n aanvanklike afbeelding vanaf woorde tot hul reeks groep identifiseerders, bewerkstellig deur Viterbi belyning en die ortografiese transkripsie. Die woordeboek word iteratief verfyn deur oortollige afbeeldings te snoei, verskuilde Markov modelle af te rig en deur Viterbi belyning te gebruik in elke iterasie. Die benadering is getoets deur dit eksperimenteel te evalueer op twee subversamelings data vanuit die TIMIT korpus. Daar is bevind dat, wanneer woorde herhaal word in die afrigtingsdata, die stelsel se benadering die akkuraatheid ewenaar van ’n stelsel wat met die fonetiese transkripsie afgerig is. As die woorde nie herhaal word in die afrigtingsdata nie, is die akkuraatheid van die stelsel se benadering beter as wanneer die stelsel afgerig word met die fonetiese transkripsie, alhoewel die akkuraatheid in die algemeen swak is. 2011-02-28T08:39:28Z 2011-03-14T08:31:22Z 2011-02-28T08:39:28Z 2011-03-14T08:31:22Z 2011-03 Thesis http://hdl.handle.net/10019.1/6686 en_ZA University of Stellenbosch 71 p. : ill. application/pdf Stellenbosch : University of Stellenbosch
spellingShingle	Automatic speech recognition Pronunciation dictionary TIMIT Automatically determined phonetic baseforms Dissertations -- Electronic engineering Theses -- Electronic engineering Natural language processing (Computer science) Speech processing systems Goussard, George Willem Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems
title	Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems
title_full	Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems
title_fullStr	Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems
title_full_unstemmed	Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems
title_short	Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems
title_sort	unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems
topic	Automatic speech recognition Pronunciation dictionary TIMIT Automatically determined phonetic baseforms Dissertations -- Electronic engineering Theses -- Electronic engineering Natural language processing (Computer science) Speech processing systems
url	http://hdl.handle.net/10019.1/6686
work_keys_str_mv	AT goussardgeorgewillem unsupervisedclusteringofaudiodataforacousticmodellinginautomaticspeechrecognitionsystems

Full Text Available

Unsupervised clustering of audio data for acoustic modelling in automatic speech recognition systems

Similar Items