Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Herbst, C. D. 2025. An Investigation of Generative Data Augmentation for Bioacoustics Classification. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/04515a96-6fb4-46aa-88ad-bdd23dd377b6
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Stellenbosch : Stellenbosch University
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613736996438016 |
|---|---|
| access_status_str | Open Access |
| author | Herbst, Charles Daniel |
| author2 | Dufourq, E. |
| author_browse | Dufourq, E. Herbst, Charles Daniel |
| author_facet | Dufourq, E. Herbst, Charles Daniel |
| author_sort | Herbst, Charles Daniel |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Herbst, C. D. 2025. An Investigation of Generative Data Augmentation for Bioacoustics Classification. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/04515a96-6fb4-46aa-88ad-bdd23dd377b6 |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/132210 |
| institution | Stellenbosch University (South Africa) |
| language | English |
| last_indexed | 2026-06-10T12:40:53.839Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/132210 An investigation of generative data augmentation for bioacoustics classification Herbst, Charles Daniel Dufourq, E. Engelbrecht, A. P. Jeantet, L. Stellenbosch University. Faculty of Engineering. Dept. of Industrial Engineering. Bioacoustics -- Classification Deep learning (Machine learning) -- Data processing Animal sounds -- Recording and reproducing Wildlife monitoring -- Technological innovations UCTD Herbst, C. D. 2025. An Investigation of Generative Data Augmentation for Bioacoustics Classification. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/04515a96-6fb4-46aa-88ad-bdd23dd377b6 Thesis (MEng)--Stellenbosch University, 2025. ENGLISH ABSTRACT: One major challenge in supervised deep learning is the need for large training datasets to achieve satisfactory generalisation performance. In the field of bioacoustics - a discipline dedicated to the recording, study and analysis of sound produced by animals - the acquisition of audio recordings from endangered animals presents a significant challenge. This is compounded by high costs, logistical constraints, and the rarity of the species in question. Typically, bioacoustics datasets have imbalanced class distribution, further complicating model training with limited examples for some rare species. To overcome this challenge, this thesis conducts and evaluation of generative models for audio augmentation. Generative models, such as variational autoencoders (VAEs) and denoising diffusion probabilistic models (DDPMs), offer the ability to create synthetic data after training on existing datasets. This thesis assesses the effectiveness of VAEs and DDPMs in augmenting various bioacoustic datasets. The datasets used include vocalisation of the critically endangered Hainan gibbon, the world's rarest primate, as well as bird calls from the pin-tailed Whydah, a resident breeding bird in South Africa, a non endangered species. The generated synthetic data was assessed through visual inspection and by computing the kernel inception distance, and compared with the distribution of the generated dataset to the training set. Furthermore, this thesis investigates the efficacy of using the generated dataset to train a deep learning classifier for identifying the Hainan gibbon calls or pin-tailed Whydah calls. For each species, two deep learning classifiers are used, namely, a self-designed convolutional neural network (CNN) with randomly initialised weights, and a pre-trained residual network (ResNet) model. The size of the training datasets varied and the classification performance across four scenarios are compared, namely, no augmentation, augmentation with VAEs, augmentation with DDPMs, and standard bioacoustics augmentation methods commonly used in literature. The results of this thesis show that standard audio augmentation methods are as effective as newer generative approaches commonly used in computer vision. Furthermore, the experiments reveal that the effectiveness of these generative approaches on more complex and sparse vocalisations - such as those in the pin-tailed Whydah dataset - is highly dependent on the amount of data used for augmentation. Considering the high computational costs of VAEs and DDPMs, this emphasises the stability of simpler techniques for building deep learning classifiers on bioacoustic datasets. The results of this thesis highlight the need for further exploration to fully understand the integration of generative models in the field of bioacoustics. Lastly, this thesis serves as a foundational stepping stone for future research in the field of computational bioacoustics. AFRIKAANSE OPSOMMING: Een groot uitdaging in toesighoudende diep leer is die behoefte aan groot opleidingsdatastelle om bevredigende veralgemeningsprestasie te behaal. In die veld van bioakoestiek - 'n dissipline wat toegewy is aan die opname, studie en analise van klank wat deur diere geproduseer word - bied die verkryging van klankopnames van bedreigde diere 'n beduidende uitdaging. Dit word vererger deur hoë koste, logistieke beperkings en die seldsaamheid van die betrokke spesie. Tipies het bioakoestiekdatastelle ongebalanseerde klasverspreiding, wat modelopleiding verder kompliseer met beperkte voorbeelde vir sommige seldsame spesies. Om hierdie uitdaging te oorkom, voer hierdie tesis generatiewe modelle vir klankversterking uit en evalueer dit. Generatiewe modelle, soos variasie-outoenkodeerders (VAE's) en ruisonderdrukkende diffusie-probabilistiese modelle (DDPM's), bied die vermoë om sintetiese data te skep na opleiding op bestaande datastelle. Hierdie tesis assesseer die doeltreffendheid van VAE's en DDPM's in die vergroting van verskeie bioakoestiese datastelle. Die datastelle wat gebruik is, sluit in die vokalisering van die krities bedreigde Hainan-gibbon, die wêreld se skaarsste primaat, sowel as voëlroepe van die speldstert-Whydah, 'n inwonende broeivoël in Suid-Afrika, 'n nie-bedreigde spesie. Die gegenereerde sintetiese data is beoordeel deur visuele inspeksie en deur die kern-aanvangsafstand te bereken, en vergelyk met die verspreiding van die gegenereerde datastel na die opleidingstel. Verder ondersoek hierdie tesis die doeltreffendheid van die gebruik van die gegenereerde datastel om 'n diep leerklassifiseerder op te lei vir die identifisering van die Hainan-gibbonroepe of speldstert-Whydah-roepe. Vir elke spesie word twee diep leer klassifiseerders gebruik, naamlik 'n selfontwerpte konvolusionele neurale netwerk (CNN) met lukraak geïnisialiseerde gewigte, en 'n vooraf-opgeleide residuele netwerk (ResNet) model. Die grootte van die opleidingsdatastelle het gewissel en die klassifikasieprestasie oor vier scenario's word vergelyk, naamlik geen vergroting, vergroting met VAE's, vergroting met DDPM's, en standaard bioakoestiese vergrotingsmetodes wat algemeen in die literatuur gebruik word. Die resultate van hierdie tesis toon dat standaard oudio-vergrotingsmetodes net so effektief is as nuwer generatiewe benaderings wat algemeen in rekenaarvisie gebruik word. Verder toon die eksperimente dat die doeltreffendheid van hierdie generatiewe benaderings op meer komplekse en yl vokalisasies - soos dié in die speldstert-Whydah-datastel - hoogs afhanklik is van die hoeveelheid data wat vir augmentasie gebruik word. In die lig van die hoë berekeningskoste van VAE's en DDPM's, beklemtoon dit die stabiliteit van eenvoudiger tegnieke vir die bou van diep leerklassifiseerders op bioakoestiese datastelle. Die resultate van hierdie tesis beklemtoon die behoefte aan verdere eksplorasie om die integrasie van generatiewe modelle in die veld van bioakoestiek ten volle te verstaan. Laastens dien hierdie tesis as 'n fondamentele springplank vir toekomstige navorsing op die gebied van berekeningsbioakoestiek. Masters 2025-05-30T06:04:10Z 2025-05-30T06:04:10Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132210 en Stellenbosch University xviii, 97 pages : illustrations application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Bioacoustics -- Classification Deep learning (Machine learning) -- Data processing Animal sounds -- Recording and reproducing Wildlife monitoring -- Technological innovations UCTD Herbst, Charles Daniel An investigation of generative data augmentation for bioacoustics classification |
| title | An investigation of generative data augmentation for bioacoustics classification |
| title_full | An investigation of generative data augmentation for bioacoustics classification |
| title_fullStr | An investigation of generative data augmentation for bioacoustics classification |
| title_full_unstemmed | An investigation of generative data augmentation for bioacoustics classification |
| title_short | An investigation of generative data augmentation for bioacoustics classification |
| title_sort | investigation of generative data augmentation for bioacoustics classification |
| topic | Bioacoustics -- Classification Deep learning (Machine learning) -- Data processing Animal sounds -- Recording and reproducing Wildlife monitoring -- Technological innovations UCTD |
| url | https://scholar.sun.ac.za/handle/10019.1/132210 |
| work_keys_str_mv | AT herbstcharlesdaniel aninvestigationofgenerativedataaugmentationforbioacousticsclassification AT herbstcharlesdaniel investigationofgenerativedataaugmentationforbioacousticsclassification |