Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

An investigation of generative data augmentation for bioacoustics classification

Herbst, C. D. 2025. An Investigation of Generative Data Augmentation for Bioacoustics Classification. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/04515a96-6fb4-46aa-88ad-bdd23dd377b6

Saved in:
Bibliographic Details
Main Author: Herbst, Charles Daniel
Other Authors: Dufourq, E.
Format: Thesis
Language:English
Published: Stellenbosch : Stellenbosch University 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613736996438016
access_status_str Open Access
author Herbst, Charles Daniel
author2 Dufourq, E.
author_browse Dufourq, E.
Herbst, Charles Daniel
author_facet Dufourq, E.
Herbst, Charles Daniel
author_sort Herbst, Charles Daniel
collection Thesis
dc_rights_str_mv Stellenbosch University
description Herbst, C. D. 2025. An Investigation of Generative Data Augmentation for Bioacoustics Classification. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/04515a96-6fb4-46aa-88ad-bdd23dd377b6
format Thesis
id oai:scholar.sun.ac.za:10019.1/132210
institution Stellenbosch University (South Africa)
language English
last_indexed 2026-06-10T12:40:53.839Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/132210 An investigation of generative data augmentation for bioacoustics classification Herbst, Charles Daniel Dufourq, E. Engelbrecht, A. P. Jeantet, L. Stellenbosch University. Faculty of Engineering. Dept. of Industrial Engineering. Bioacoustics -- Classification Deep learning (Machine learning) -- Data processing Animal sounds -- Recording and reproducing Wildlife monitoring -- Technological innovations UCTD Herbst, C. D. 2025. An Investigation of Generative Data Augmentation for Bioacoustics Classification. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/04515a96-6fb4-46aa-88ad-bdd23dd377b6 Thesis (MEng)--Stellenbosch University, 2025. ENGLISH ABSTRACT: One major challenge in supervised deep learning is the need for large training datasets to achieve satisfactory generalisation performance. In the field of bioacoustics - a discipline dedicated to the recording, study and analysis of sound produced by animals - the acquisition of audio recordings from endangered animals presents a significant challenge. This is compounded by high costs, logistical constraints, and the rarity of the species in question. Typically, bioacoustics datasets have imbalanced class distribution, further complicating model training with limited examples for some rare species. To overcome this challenge, this thesis conducts and evaluation of generative models for audio augmentation. Generative models, such as variational autoencoders (VAEs) and denoising diffusion probabilistic models (DDPMs), offer the ability to create synthetic data after training on existing datasets. This thesis assesses the effectiveness of VAEs and DDPMs in augmenting various bioacoustic datasets. The datasets used include vocalisation of the critically endangered Hainan gibbon, the world's rarest primate, as well as bird calls from the pin-tailed Whydah, a resident breeding bird in South Africa, a non endangered species. The generated synthetic data was assessed through visual inspection and by computing the kernel inception distance, and compared with the distribution of the generated dataset to the training set. Furthermore, this thesis investigates the efficacy of using the generated dataset to train a deep learning classifier for identifying the Hainan gibbon calls or pin-tailed Whydah calls. For each species, two deep learning classifiers are used, namely, a self-designed convolutional neural network (CNN) with randomly initialised weights, and a pre-trained residual network (ResNet) model. The size of the training datasets varied and the classification performance across four scenarios are compared, namely, no augmentation, augmentation with VAEs, augmentation with DDPMs, and standard bioacoustics augmentation methods commonly used in literature. The results of this thesis show that standard audio augmentation methods are as effective as newer generative approaches commonly used in computer vision. Furthermore, the experiments reveal that the effectiveness of these generative approaches on more complex and sparse vocalisations - such as those in the pin-tailed Whydah dataset - is highly dependent on the amount of data used for augmentation. Considering the high computational costs of VAEs and DDPMs, this emphasises the stability of simpler techniques for building deep learning classifiers on bioacoustic datasets. The results of this thesis highlight the need for further exploration to fully understand the integration of generative models in the field of bioacoustics. Lastly, this thesis serves as a foundational stepping stone for future research in the field of computational bioacoustics. AFRIKAANSE OPSOMMING: Een groot uitdaging in toesighoudende diep leer is die behoefte aan groot opleidingsdatastelle om bevredigende veralgemeningsprestasie te behaal. In die veld van bioakoestiek - 'n dissipline wat toegewy is aan die opname, studie en analise van klank wat deur diere geproduseer word - bied die verkryging van klankopnames van bedreigde diere 'n beduidende uitdaging. Dit word vererger deur hoë koste, logistieke beperkings en die seldsaamheid van die betrokke spesie. Tipies het bioakoestiekdatastelle ongebalanseerde klasverspreiding, wat modelopleiding verder kompliseer met beperkte voorbeelde vir sommige seldsame spesies. Om hierdie uitdaging te oorkom, voer hierdie tesis generatiewe modelle vir klankversterking uit en evalueer dit. Generatiewe modelle, soos variasie-outoenkodeerders (VAE's) en ruisonderdrukkende diffusie-probabilistiese modelle (DDPM's), bied die vermoë om sintetiese data te skep na opleiding op bestaande datastelle. Hierdie tesis assesseer die doeltreffendheid van VAE's en DDPM's in die vergroting van verskeie bioakoestiese datastelle. Die datastelle wat gebruik is, sluit in die vokalisering van die krities bedreigde Hainan-gibbon, die wêreld se skaarsste primaat, sowel as voëlroepe van die speldstert-Whydah, 'n inwonende broeivoël in Suid-Afrika, 'n nie-bedreigde spesie. Die gegenereerde sintetiese data is beoordeel deur visuele inspeksie en deur die kern-aanvangsafstand te bereken, en vergelyk met die verspreiding van die gegenereerde datastel na die opleidingstel. Verder ondersoek hierdie tesis die doeltreffendheid van die gebruik van die gegenereerde datastel om 'n diep leerklassifiseerder op te lei vir die identifisering van die Hainan-gibbonroepe of speldstert-Whydah-roepe. Vir elke spesie word twee diep leer klassifiseerders gebruik, naamlik 'n selfontwerpte konvolusionele neurale netwerk (CNN) met lukraak geïnisialiseerde gewigte, en 'n vooraf-opgeleide residuele netwerk (ResNet) model. Die grootte van die opleidingsdatastelle het gewissel en die klassifikasieprestasie oor vier scenario's word vergelyk, naamlik geen vergroting, vergroting met VAE's, vergroting met DDPM's, en standaard bioakoestiese vergrotingsmetodes wat algemeen in die literatuur gebruik word. Die resultate van hierdie tesis toon dat standaard oudio-vergrotingsmetodes net so effektief is as nuwer generatiewe benaderings wat algemeen in rekenaarvisie gebruik word. Verder toon die eksperimente dat die doeltreffendheid van hierdie generatiewe benaderings op meer komplekse en yl vokalisasies - soos dié in die speldstert-Whydah-datastel - hoogs afhanklik is van die hoeveelheid data wat vir augmentasie gebruik word. In die lig van die hoë berekeningskoste van VAE's en DDPM's, beklemtoon dit die stabiliteit van eenvoudiger tegnieke vir die bou van diep leerklassifiseerders op bioakoestiese datastelle. Die resultate van hierdie tesis beklemtoon die behoefte aan verdere eksplorasie om die integrasie van generatiewe modelle in die veld van bioakoestiek ten volle te verstaan. Laastens dien hierdie tesis as 'n fondamentele springplank vir toekomstige navorsing op die gebied van berekeningsbioakoestiek. Masters 2025-05-30T06:04:10Z 2025-05-30T06:04:10Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132210 en Stellenbosch University xviii, 97 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Bioacoustics -- Classification
Deep learning (Machine learning) -- Data processing
Animal sounds -- Recording and reproducing
Wildlife monitoring -- Technological innovations
UCTD
Herbst, Charles Daniel
An investigation of generative data augmentation for bioacoustics classification
title An investigation of generative data augmentation for bioacoustics classification
title_full An investigation of generative data augmentation for bioacoustics classification
title_fullStr An investigation of generative data augmentation for bioacoustics classification
title_full_unstemmed An investigation of generative data augmentation for bioacoustics classification
title_short An investigation of generative data augmentation for bioacoustics classification
title_sort investigation of generative data augmentation for bioacoustics classification
topic Bioacoustics -- Classification
Deep learning (Machine learning) -- Data processing
Animal sounds -- Recording and reproducing
Wildlife monitoring -- Technological innovations
UCTD
url https://scholar.sun.ac.za/handle/10019.1/132210
work_keys_str_mv AT herbstcharlesdaniel aninvestigationofgenerativedataaugmentationforbioacousticsclassification
AT herbstcharlesdaniel investigationofgenerativedataaugmentationforbioacousticsclassification