Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Herreilers, J. T. 2025. Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/8a75534d-2349-4a64-88c5-44181976dc38
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Published: |
Stellenbosch : Stellenbosch University
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613950704615424 |
|---|---|
| access_status_str | Open Access |
| author | Herreilers, Julian Thomas |
| author2 | Niesler, T. R. |
| author_browse | Herreilers, Julian Thomas Niesler, T. R. |
| author_facet | Niesler, T. R. Herreilers, Julian Thomas |
| author_sort | Herreilers, Julian Thomas |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Herreilers, J. T. 2025. Under-resourced keyword spotting on
radio broadcasts using contrastive acoustic word embeddings. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/8a75534d-2349-4a64-88c5-44181976dc38 |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/132431 |
| institution | Stellenbosch University (South Africa) |
| last_indexed | 2026-06-10T12:44:17.380Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/132431 Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings Herreilers, Julian Thomas Niesler, T. R. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Radio broadcasting -- Data processing Acoustical engineering Speech processing systems Low-resource languages Automatic speech recognition UCTD Herreilers, J. T. 2025. Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/8a75534d-2349-4a64-88c5-44181976dc38 Thesis (MEng)--Stellenbosch University, 2025. ENGLISH ABSTRACT: In parts of Africa without sufficient internet connectivity, the monitoring of local and community radio broadcasts has proved to be a successful strategy for aid agencies to manage humanitarian relief efforts. However, the languages spoken in the affected areas are often severely under-resourced, making approaches that require labelled speech datasets such as ASR impractical. We investigated how very low-resource keyword spotting (KWS) systems can be improved by incorporating pre-trained self-supervised models, trained on numerous other well-resourced languages, using two approaches requiring only a handful of isolated keyword instances in the target language to operate. We focused our developments on Luganda and Bambara, two low-resource African languages, and also English. Initial experiments demonstrated that incorporating features extracted by these selfsupervised models for conventional dynamic time warping (DTW) based KWS improved performance for Bambara and English. However, DTW has serious performance and computational limitations, motivating the subsequent investigation of acoustic word embeddings (AWEs) as a more computationally-efficient alternative. The first two embedding approaches derived AWEs directly from the features extracted by multilingual self-supervised models. Both approaches surpassed an existing baseline, and the meanpooling of features to construct AWEs proved superior and substantially more efficient, particularly from mHuBERT-147. Next, we examined whether KWS could be enhanced by using strictly trained multilingual AWE models. We showed that AWE models which contrastively optimise the embedding space in which KWS is performed are better suited than a previous reconstruction-based approach. Furthermore, these supervised contrastive AWE models, trained on better-resourced African languages, also outperformed meanpooled AWEs when the target language (e.g. Bambara) was not part of the self-supervised model’s corpus. Finally, we proposed a new transformer-encoder-only contrastive AWE model. This Contrastive Transformer achieved state-of-the-art performance on our KWS task, with mean average precision improvements of 41% and 27% upon an existing DTW baseline for Luganda and Bambara, respectively. This was the best performance among all the approaches we considered. We conclude that, when trained AWEs are unavailable, very low-resource keyword spotting using AWEs derived directly using self-supervised models is a feasible approach. Furthermore, utilising contrastive multilingual AWE models offers superior performance, especially when the target languages are highly under-resourced. AFRIKAANSE OPSOMMING: In dele van Afrika sonder voldoende internetverbinding het die monitering van plaaslike en gemeenskapsradio-uitsendings ’n suksesvolle strategie vir hulpagentskappe bewys omhumanitˆere hulpverlening te bestuur. Die tale wat in die geaffekteerde gebiede gepraatword, is egter dikwels erg onderbeskik, wat benaderings wat ge¨etiketteerde spraakdatastelle benodig soos ASR onprakties maak. Ons het ondersoek hoe baie lae-hulpbron sleutelwoordopsporings (KWS) stelsels verbeter kan word deur vooraf-opgeleide selftoesigmodelle, opgelei in ander tale met beter hulpbronne, in te sluit deur gebruik te maak van twee benaderings wat slegs ’n handvol ge¨ısoleerde sleutelwoordgevalle in die teikentaal vereis om te funksioneer. ons het ons werk gefokus op Luganda en Bambara, twee lae-hupbron Afrikatale, sowel as Engels. Aanvanklike eksperimente het getoon dat die insluiting van kenmerke wat deur hierdie selftoesigmodelle onttrek word vir konvensionele dinamiese tydsaanpassing (DTW) gebaseerde KWS prestasie vir Bambara en Engels verbeter het. DTW het egter ernstige prestasie- en berekeningsbeperkings, wat die daaropvolgende ondersoek van akoestiese woordinbeddings (AWEs) as ’n meer berekeningsdoeltreffende alternatief motiveer. Die eerste twee inbeddingsbenaderings het AWEs direk afgelei van die kenmerke wat deur meertalige selftoesigmodelle onttrek is. Albei benaderings het ’n bestaande basislyn oortref, en die gemiddelde samevoeging van kenmerke om AWEs te konstrueer was meer effektief/akkuraat en aansienlik meer doeltreffend, veral vanaf mHuBERT-147. Vervolgens het ons ondersoek of KWS verbeter kan word deur streng opgeleide meertalige AWE-modelle te gebruik. Ons het bewys dat AWE-modelle wat die inbeddingsruimte kontrastief optimaliseer waarin KWS uitgevoer word, beter geskik is as ’n vorige rekonstruksie-gebaseerde benadering. Verder het hierdie gesuperviseerde kontrastiewe AWE-modelle, opgelei op Afrikatale met better hulpbronne, gemiddelde AWEs oortref wanneer die teikentaal (bv. Bambara) nie deel van die selftoesigmodel se korpus was nie. Laastens het ons ’n nuwe transformator-enkodeerder-alleen kontrastiewe AWE-model voorgestel. Hierdie ContrastiveTransformer het die wˆereldklas prestasie op ons KWS-taak behaal, met gemiddelde presisieverbeterings van 41% en 27% op ’n bestaande DTWbasislyn vir Luganda en Bambara, onderskeidelik. Dit was die beste prestasie van al die benaderings wat ons oorweeg het. Ons kom tot die gevolgtrekking dat, wanneer opgeleide AWEs nie beskikbaar is nie, baie lae-hulpbron sleutelwoordopsporing met AWEs wat direk afgelei is deur selftoesigmodelle ’n haalbare benadering is. Verder bied die gebruik van kontrastiewe meertalige AWE-modelle voortreflike prestasie, veral wanneer die teikentale baie onderbeskik is. Masters 2025-06-06T12:11:43Z 2025-06-06T12:11:43Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132431 Stellenbosch University viii, 95 pages : illustrations application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Radio broadcasting -- Data processing Acoustical engineering Speech processing systems Low-resource languages Automatic speech recognition UCTD Herreilers, Julian Thomas Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings |
| title | Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings |
| title_full | Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings |
| title_fullStr | Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings |
| title_full_unstemmed | Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings |
| title_short | Under-resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings |
| title_sort | under resourced keyword spotting on radio broadcasts using contrastive acoustic word embeddings |
| topic | Radio broadcasting -- Data processing Acoustical engineering Speech processing systems Low-resource languages Automatic speech recognition UCTD |
| url | https://scholar.sun.ac.za/handle/10019.1/132431 |
| work_keys_str_mv | AT herreilersjulianthomas underresourcedkeywordspottingonradiobroadcastsusingcontrastiveacousticwordembeddings |