Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Large-Scale clustering of acoustic segments for sub-word acoustic modelling

Thesis (PhD)--Stellenbosch University, 2019.

Saved in:
Bibliographic Details
Main Author: Lerato, Lerato
Other Authors: Niesler, T. R.
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2019
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613963347296256
access_status_str Open Access
author Lerato, Lerato
author2 Niesler, T. R.
author_browse Lerato, Lerato
Niesler, T. R.
author_facet Niesler, T. R.
Lerato, Lerato
author_sort Lerato, Lerato
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (PhD)--Stellenbosch University, 2019.
format Thesis
id oai:scholar.sun.ac.za:10019.1/105757
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:44:29.748Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2019
publishDateRange 2019
publishDateSort 2019
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/105757 Large-Scale clustering of acoustic segments for sub-word acoustic modelling Lerato, Lerato Niesler, T. R. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Large-Scale Clustering; Acoustic Segments; Sub-word; Acoustic Modelling Automatic speech recognition Agglomerations Acoustical engineering UCTD Thesis (PhD)--Stellenbosch University, 2019. ENGLISH ABSTRACT: A pronunciation dictionary is one of the key building blocks in automatic speech recognition (ASR) systems. However, pronunciation dictionaries used in state-of-the-art ASR systems are hand-crafted by linguists. This process requires expertise, time and funding and as a consequence is not realised for many under-resourced languages. To address this, we develop a new unsupervised agglomerative hierarchical clustering (AHC) algorithm that can be used to discover sub-word units that can in turn be used for the automatic induction of a pronunciation dictionary. The new algorithm, named multi-stage agglomerative hierarchical clustering (MAHC), addresses the O(N2) memory and computation complexity observed when classical AHC is applied to large datasets. MAHC splits the data into independent subsets and applies AHC to each. The resultant clusters are merged, re-divided into subsets, and passed to a following iteration. Results show that MAHC can match and even surpass the performance of classical AHC. Furthermore, MAHC can automatically determine the optimal number of clusters which is a feature not offered by most other approaches. A further refinement of MAHC, termed MAHC with memory size management (MAHC+M), addresses the case where some subsets may exhibit excessive growth during iterative clustering. MAHC+M is able to adhere to maximum memory constraints, which improves efficiency and is practically useful when using parallel computing resources. The input to MAHC is a matrix of pairwise distances computed with dynamic time warping (DTW). A modified form of DTW, named feature trajectory DTW (FTDTW), is introduced and shown to generally lead to better performance for both MAHC and MAHC+M. It is shown that clusters obtained using the MAHC algorithm can be used as sub-word units (SWUs) for acoustic modelling. Pronunciations in terms of these SWUs were obtained by alignment with the orthography. Speech recognition experiments show that dictionaries induced using clusters obtained by FTDTW-based MAHC+M consistently outperform those obtained using DTW-based MAHC. Doctoral 2019-02-01T07:48:22Z 2019-04-17T08:11:43Z 2019-02-01T07:48:22Z 2019-04-17T08:11:43Z 2019-04 Thesis http://hdl.handle.net/10019.1/105757 en_ZA Stellenbosch University 125 pages application/pdf Stellenbosch : Stellenbosch University
spellingShingle Large-Scale Clustering; Acoustic Segments; Sub-word; Acoustic Modelling
Automatic speech recognition
Agglomerations
Acoustical engineering
UCTD
Lerato, Lerato
Large-Scale clustering of acoustic segments for sub-word acoustic modelling
title Large-Scale clustering of acoustic segments for sub-word acoustic modelling
title_full Large-Scale clustering of acoustic segments for sub-word acoustic modelling
title_fullStr Large-Scale clustering of acoustic segments for sub-word acoustic modelling
title_full_unstemmed Large-Scale clustering of acoustic segments for sub-word acoustic modelling
title_short Large-Scale clustering of acoustic segments for sub-word acoustic modelling
title_sort large scale clustering of acoustic segments for sub word acoustic modelling
topic Large-Scale Clustering; Acoustic Segments; Sub-word; Acoustic Modelling
Automatic speech recognition
Agglomerations
Acoustical engineering
UCTD
url http://hdl.handle.net/10019.1/105757
work_keys_str_mv AT leratolerato largescaleclusteringofacousticsegmentsforsubwordacousticmodelling