Full Text Available

Access Repository Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Phoneme duration modelling for speaker verification

Dissertation (MEng)--University of Pretoria, 2009.

Saved in:

Bibliographic Details
Other Authors:	Barnard, E.
Format:	Thesis
Published:	University of Pretoria 2013
Subjects:	Eigen vectors Speech rate normalization Speaker verification Phoneme durations Duration modeling Prosodic features Hidden markov models Gaussian mixture models Maximum likelihood UCTD
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613696693370880
access_status_str	Open Access
author2	Barnard, E.
author_browse	Barnard, E.
author_facet	Barnard, E.
collection	Thesis
dc_rights_str_mv	©University of Pretoria 2008 Please cite as follows Van Heerden, CJ 2008, Pnoneme duration modelling for speaker verification, MEng dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://upetd.up.ac.za/thesis/available/etd-06262009-150945/ > E1309/
description	Dissertation (MEng)--University of Pretoria, 2009.
format	Thesis
id	oai:repository.up.ac.za:2263/25869
institution	University of Pretoria (South Africa)
last_indexed	2026-06-10T12:40:15.382Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate	2013
publishDateRange	2013
publishDateSort	2013
publisher	University of Pretoria
publisherStr	University of Pretoria
record_format	dspace
source_str	UPSpace — University of Pretoria Institutional Repository
spelling	oai:repository.up.ac.za:2263/25869 Phoneme duration modelling for speaker verification Barnard, E. cvheerden@gmail.com Van Heerden, Charl Johannes Eigen vectors Speech rate normalization Speaker verification Phoneme durations Duration modeling Prosodic features Hidden markov models Gaussian mixture models Maximum likelihood UCTD Dissertation (MEng)--University of Pretoria, 2009. Higher-level features are considered to be a potential remedy against transmission line and cross-channel degradations, currently some of the biggest problems associated with speaker verification. Phoneme durations in particular are not altered by these factors; thus a robust duration model will be a particularly useful addition to traditional cepstral based speaker verification systems. In this dissertation we investigate the feasibility of phoneme durations as a feature for speaker verification. Simple speaker specific triphone duration models are created to statistically represent the phoneme durations. Durations are obtained from an automatic hidden Markov model (HMM) based automatic speech recognition system and are modeled using single mixture Gaussian distributions. These models are applied in a speaker verification system (trained and tested on the YOHO corpus) and found to be a useful feature, even when used in isolation. When fused with acoustic features, verification performance increases significantly. A novel speech rate normalization technique is developed in order to remove some of the inherent intra-speaker variability (due to differing speech rates). Speech rate variability has a negative impact on both speaker verification and automatic speech recognition. Although the duration modelling seems to benefit only slightly from this procedure, the fused system performance improvement is substantial. Other factors known to influence the duration of phonemes are incorporated into the duration model. Utterance final lengthening is known be a consistent effect and thus “position in sentence” is modeled. “Position in word” is also modeled since triphones do not provide enough contextual information. This is found to improve performance since some vowels’ duration are particularly sensitive to its position in the word. Data scarcity becomes a problem when building speaker specific duration models. By using information from available data, unknown durations can be predicted in an attempt to overcome the data scarcity problem. To this end we develop a novel approach to predict unknown phoneme durations from the values of known phoneme durations for a particular speaker, based on the maximum likelihood criterion. This model is based on the observation that phonemes from the same broad phonetic class tend to co-vary strongly, but that there is also significant cross-class correlations. This approach is tested on the TIMIT corpus and found to be more accurate than using back-off techniques. Electrical, Electronic and Computer Engineering unrestricted 2013-09-07T01:04:42Z 2009-06-29 2013-09-07T01:04:42Z 2009-04-15 2009-06-29 2009-06-26 Dissertation 2008 Please cite as follows Van Heerden, CJ 2008, Pnoneme duration modelling for speaker verification, MEng dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/25869 > E1309/gm http://hdl.handle.net/2263/25869 http://upetd.up.ac.za/thesis/available/etd-06262009-150945/ ©University of Pretoria 2008 Please cite as follows Van Heerden, CJ 2008, Pnoneme duration modelling for speaker verification, MEng dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://upetd.up.ac.za/thesis/available/etd-06262009-150945/ > E1309/ application/pdf University of Pretoria
spellingShingle	Eigen vectors Speech rate normalization Speaker verification Phoneme durations Duration modeling Prosodic features Hidden markov models Gaussian mixture models Maximum likelihood UCTD Phoneme duration modelling for speaker verification
title	Phoneme duration modelling for speaker verification
title_full	Phoneme duration modelling for speaker verification
title_fullStr	Phoneme duration modelling for speaker verification
title_full_unstemmed	Phoneme duration modelling for speaker verification
title_short	Phoneme duration modelling for speaker verification
title_sort	phoneme duration modelling for speaker verification
topic	Eigen vectors Speech rate normalization Speaker verification Phoneme durations Duration modeling Prosodic features Hidden markov models Gaussian mixture models Maximum likelihood UCTD
url	http://hdl.handle.net/2263/25869 http://upetd.up.ac.za/thesis/available/etd-06262009-150945/

Full Text Available

Phoneme duration modelling for speaker verification

Similar Items