Full Text Available

Access Repository Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

End-to-end automated speech recognition using a character based small scale transformer architecture.

Dissertation (MEng(Electronic Engineering))--University of Pretoria, 2024.

Saved in:

Bibliographic Details
Other Authors:	De Villiers, Pieter
Format:	Thesis
Language:	English
Published:	University of Pretoria 2024
Subjects:	UCTD Speech recognition transformer end-to-end character based connectionist temporal classification
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613666243772416
access_status_str	Open Access
author2	De Villiers, Pieter
author_browse	De Villiers, Pieter
author_facet	De Villiers, Pieter
collection	Thesis
dc_rights_str_mv	© 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description	Dissertation (MEng(Electronic Engineering))--University of Pretoria, 2024.
format	Thesis
id	oai:repository.up.ac.za:2263/94605
institution	University of Pretoria (South Africa)
language	English
last_indexed	2026-06-10T12:39:46.144Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate	2024
publishDateRange	2024
publishDateSort	2024
publisher	University of Pretoria
publisherStr	University of Pretoria
record_format	dspace
source_str	UPSpace — University of Pretoria Institutional Repository
spelling	oai:repository.up.ac.za:2263/94605 End-to-end automated speech recognition using a character based small scale transformer architecture. De Villiers, Pieter alex.loubser@gmail.com De Freitas, Allan Loubser, Alexander UCTD Speech recognition transformer end-to-end character based connectionist temporal classification Dissertation (MEng(Electronic Engineering))--University of Pretoria, 2024. This study explores the feasibility of constructing a small-scale speech recognition system capable of competing with larger, modern automated speech recognition (ASR) systems in both performance and word error rate (WER). Our central hypothesis posits that a compact transformer-based ASR model can yield comparable results, specifically in terms of WER, to traditional ASR models while challenging contemporary ASR systems that boast significantly larger computational sizes. The aim is to extend ASR capabilities to under-resourced languages with limited corpora, catering to scenarios where practitioners face constraints in both data availability and computational resources. The model, comprising a compact convolutional neural network (CNN) and transformer architecture with 2.214 million parameters, challenges the conventional wisdom that large-scale transformer-based ASR systems are essential for achieving high accuracy. In comparison, contemporary ASR systems often deploy over 300 million parameters. Trained on a modest dataset of approximately 3000 hours—significantly less than the 50,000 hours used in larger systems—the proposed model leverages the Common Voice and LibriSpeech datasets. Evaluation on the LibriSpeech test-clean and test-other datasets produced character error rates (CERs) of 6.40% and 16.73% and WERs of 16.03% and 35.51% respectively. Comparisons with existing architectures showcase the efficiency of our model. A gated recurrent unit (GRU) architecture, albeit achieving lower error rates, incurred a computational cost 24 times larger than our proposed model. Large-scale transformer architectures, while achieving marginally lower WERs (2-4% on LibriSpeech test-clean), require 200 times more parameters and 53,000 additional hours of training data. Modern large language models are used to improve the WERs, but require large computational resources. To further enhance performance, a small 4-gram language model was integrated into our end-to-end ASR model, resulting in improved WERs. The overarching goal of this work is to provide a practical solution for practitioners dealing with limited datasets and computational resources, particularly in the context of under-resourced languages. MultiChoice Chair of Machine Learning Electrical, Electronic and Computer Engineering Masters of Engineering (Electronic Engineering) Unrestricted Faculty of Engineering, Built Environment and Information Technology SDG-09: Industry, innovation and infrastructure 2024-02-14T12:45:07Z 2024-02-14T12:45:07Z 2024-04-29 2024-02-12 Dissertation * April 2024 (A2024) http://hdl.handle.net/2263/94605 https://doi.org/10.25403/UPresearchdata.25217993 en © 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle	UCTD Speech recognition transformer end-to-end character based connectionist temporal classification End-to-end automated speech recognition using a character based small scale transformer architecture.
title	End-to-end automated speech recognition using a character based small scale transformer architecture.
title_full	End-to-end automated speech recognition using a character based small scale transformer architecture.
title_fullStr	End-to-end automated speech recognition using a character based small scale transformer architecture.
title_full_unstemmed	End-to-end automated speech recognition using a character based small scale transformer architecture.
title_short	End-to-end automated speech recognition using a character based small scale transformer architecture.
title_sort	end to end automated speech recognition using a character based small scale transformer architecture
topic	UCTD Speech recognition transformer end-to-end character based connectionist temporal classification
url	http://hdl.handle.net/2263/94605 https://doi.org/10.25403/UPresearchdata.25217993

Full Text Available

End-to-end automated speech recognition using a character based small scale transformer architecture.

Similar Items