Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings

Herreilers, J. T. 2025. Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/8a75534d-2349-4a64-88c5-44181976dc38

Saved in:

Bibliographic Details
Main Author:	Herreilers, Julian Thomas
Other Authors:	Niesler, T. R.
Format:	Thesis
Published:	Stellenbosch : Stellenbosch University 2025
Subjects:	Radio broadcasting > Data processing Acoustical engineering Speech processing systems Low-resource languages Automatic speech recognition UCTD
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613950704615424
access_status_str	Open Access
author	Herreilers, Julian Thomas
author2	Niesler, T. R.
author_browse	Herreilers, Julian Thomas Niesler, T. R.
author_facet	Niesler, T. R. Herreilers, Julian Thomas
author_sort	Herreilers, Julian Thomas
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Herreilers, J. T. 2025. Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/8a75534d-2349-4a64-88c5-44181976dc38
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/132431
institution	Stellenbosch University (South Africa)
last_indexed	2026-06-10T12:44:17.380Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2025
publishDateRange	2025
publishDateSort	2025
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/132431 Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings Herreilers, Julian Thomas Niesler, T. R. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Radio broadcasting -- Data processing Acoustical engineering Speech processing systems Low-resource languages Automatic speech recognition UCTD Herreilers, J. T. 2025. Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/8a75534d-2349-4a64-88c5-44181976dc38 Thesis (MEng)--Stellenbosch University, 2025. ENGLISH ABSTRACT: In parts of Africa without sufficient internet connectivity, the monitoring of local and community radio broadcasts has proved to be a successful strategy for aid agencies to manage humanitarian relief efforts. However, the languages spoken in the affected areas are often severely under-resourced, making approaches that require labelled speech datasets such as ASR impractical. We investigated how very low-resource keyword spotting (KWS) systems can be improved by incorporating pre-trained self-supervised models, trained on numerous other well-resourced languages, using two approaches requiring only a handful of isolated keyword instances in the target language to operate. We focused our developments on Luganda and Bambara, two low-resource African languages, and also English. Initial experiments demonstrated that incorporating features extracted by these selfsupervised models for conventional dynamic time warping (DTW) based KWS improved performance for Bambara and English. However, DTW has serious performance and computational limitations, motivating the subsequent investigation of acoustic word embeddings (AWEs) as a more computationally-efficient alternative. The first two embedding approaches derived AWEs directly from the features extracted by multilingual self-supervised models. Both approaches surpassed an existing baseline, and the meanpooling of features to construct AWEs proved superior and substantially more efficient, particularly from mHuBERT-147. Next, we examined whether KWS could be enhanced by using strictly trained multilingual AWE models. We showed that AWE models which contrastively optimise the embedding space in which KWS is performed are better suited than a previous reconstruction-based approach. Furthermore, these supervised contrastive AWE models, trained on better-resourced African languages, also outperformed meanpooled AWEs when the target language (e.g. Bambara) was not part of the self-supervised model’s corpus. Finally, we proposed a new transformer-encoder-only contrastive AWE model. This Contrastive Transformer achieved state-of-the-art performance on our KWS task, with mean average precision improvements of 41% and 27% upon an existing DTW baseline for Luganda and Bambara, respectively. This was the best performance among all the approaches we considered. We conclude that, when trained AWEs are unavailable, very low-resource keyword spotting using AWEs derived directly using self-supervised models is a feasible approach. Furthermore, utilising contrastive multilingual AWE models offers superior performance, especially when the target languages are highly under-resourced. AFRIKAANSE OPSOMMING: In dele van Afrika sonder voldoende internetverbinding het die monitering van plaaslike en gemeenskapsradio-uitsendings ’n suksesvolle strategie vir hulpagentskappe bewys omhumanitˆere hulpverlening te bestuur. Die tale wat in die geaffekteerde gebiede gepraatword, is egter dikwels erg onderbeskik, wat benaderings wat ge¨etiketteerde spraakdatastelle benodig soos ASR onprakties maak. Ons het ondersoek hoe baie lae-hulpbron sleutelwoordopsporings (KWS) stelsels verbeter kan word deur vooraf-opgeleide selftoesigmodelle, opgelei in ander tale met beter hulpbronne, in te sluit deur gebruik te maak van twee benaderings wat slegs ’n handvol ge¨ısoleerde sleutelwoordgevalle in die teikentaal vereis om te funksioneer. ons het ons werk gefokus op Luganda en Bambara, twee lae-hupbron Afrikatale, sowel as Engels. Aanvanklike eksperimente het getoon dat die insluiting van kenmerke wat deur hierdie selftoesigmodelle onttrek word vir konvensionele dinamiese tydsaanpassing (DTW) gebaseerde KWS prestasie vir Bambara en Engels verbeter het. DTW het egter ernstige prestasie- en berekeningsbeperkings, wat die daaropvolgende ondersoek van akoestiese woordinbeddings (AWEs) as ’n meer berekeningsdoeltreffende alternatief motiveer. Die eerste twee inbeddingsbenaderings het AWEs direk afgelei van die kenmerke wat deur meertalige selftoesigmodelle onttrek is. Albei benaderings het ’n bestaande basislyn oortref, en die gemiddelde samevoeging van kenmerke om AWEs te konstrueer was meer effektief/akkuraat en aansienlik meer doeltreffend, veral vanaf mHuBERT-147. Vervolgens het ons ondersoek of KWS verbeter kan word deur streng opgeleide meertalige AWE-modelle te gebruik. Ons het bewys dat AWE-modelle wat die inbeddingsruimte kontrastief optimaliseer waarin KWS uitgevoer word, beter geskik is as ’n vorige rekonstruksie-gebaseerde benadering. Verder het hierdie gesuperviseerde kontrastiewe AWE-modelle, opgelei op Afrikatale met better hulpbronne, gemiddelde AWEs oortref wanneer die teikentaal (bv. Bambara) nie deel van die selftoesigmodel se korpus was nie. Laastens het ons ’n nuwe transformator-enkodeerder-alleen kontrastiewe AWE-model voorgestel. Hierdie ContrastiveTransformer het die wˆereldklas prestasie op ons KWS-taak behaal, met gemiddelde presisieverbeterings van 41% en 27% op ’n bestaande DTWbasislyn vir Luganda en Bambara, onderskeidelik. Dit was die beste prestasie van al die benaderings wat ons oorweeg het. Ons kom tot die gevolgtrekking dat, wanneer opgeleide AWEs nie beskikbaar is nie, baie lae-hulpbron sleutelwoordopsporing met AWEs wat direk afgelei is deur selftoesigmodelle ’n haalbare benadering is. Verder bied die gebruik van kontrastiewe meertalige AWE-modelle voortreflike prestasie, veral wanneer die teikentale baie onderbeskik is. Masters 2025-06-06T12:11:43Z 2025-06-06T12:11:43Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132431 Stellenbosch University viii, 95 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Radio broadcasting -- Data processing Acoustical engineering Speech processing systems Low-resource languages Automatic speech recognition UCTD Herreilers, Julian Thomas Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings
title	Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings
title_full	Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings
title_fullStr	Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings
title_full_unstemmed	Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings
title_short	Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings
title_sort	under resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings
topic	Radio broadcasting -- Data processing Acoustical engineering Speech processing systems Low-resource languages Automatic speech recognition UCTD
url	https://scholar.sun.ac.za/handle/10019.1/132431
work_keys_str_mv	AT herreilersjulianthomas underresourcedkeywordspottingonradiobroadcastsusingcontrastiveacousticwordembeddings

Full Text Available

Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings

Similar Items