Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

On the theory and practice of anomaly detection in time series

Thesis (PhD)--Stellenbosch University, 2025.

Saved in:
Bibliographic Details
Main Author: Barrish, Daniel
Other Authors: Van Vuuren, Jan
Format: Thesis
Language:English
Published: Stellenbosch : Stellenbosch University 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613913389989888
access_status_str Open Access
author Barrish, Daniel
author2 Van Vuuren, Jan
author_browse Barrish, Daniel
Van Vuuren, Jan
author_facet Van Vuuren, Jan
Barrish, Daniel
author_sort Barrish, Daniel
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (PhD)--Stellenbosch University, 2025.
format Thesis
id oai:scholar.sun.ac.za:10019.1/134506
institution Stellenbosch University (South Africa)
language English
last_indexed 2026-06-10T12:43:41.995Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/134506 On the theory and practice of anomaly detection in time series Barrish, Daniel Van Vuuren, Jan Stellenbosch University. Faculty of Engineering. Dept. of Industrial Engineering. Anomaly detection (Computer security) Time-series analysis Threshold (Perception) Algorithm Data mining -- Statistical methods Thesis (PhD)--Stellenbosch University, 2025. Barrish, D. 2025. On the Theory and Practice of Anomaly Detection in Time Series. Unpublished doctoral dissertation. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/6daf69f0-8733-47f4-a028-c5c6c0029bef ENGLISH ABSTRACT: The detection of anomalies in time series data is a critical task in numerous domains, including industrial predictive maintenance, healthcare, information technology, and finance. Progress in the field is, however, hindered by a persistent gap between theoretical research and practical application, often due to flawed benchmarking practices, misaligned evaluation metrics, and a lack of comprehensive, end-to-end frameworks. These challenges are addressed in this dissertation in which a systematic, multi-faceted investigation into the theory and practice of time series anomaly detection is documented. A rigorous critique of existing public benchmark datasets is first undertaken, which reveals significant deficiencies in respect of label accuracy, unrealistic anomaly densities, and anomaly detection triviality. In response, a new, principled archive, called the Univariate Dataset Archive for Time Series Anomaly Detection, is introduced which includes curated public datasets and two novel synthetic datasets generated from complex dynamical systems. This archive provides a more reliable foundation for evaluating and comparing anomaly detection algorithms. Conventional anomaly detection evaluation methodologies are also challenged and the so-called realistic F-score is proposed in response. This novel metric is designed to better reflect the practical requirements of anomaly detection systems by appropriately handling contiguous anomalous events and individual false alarms. Leveraging this improved evaluation framework, a comprehensive comparative study of anomaly scoring algorithms is further conducted during which the well-known local outlier factor algorithm is identified as a top performer. This algorithm is subsequently enhanced quite significantly by facilitating graphics processing unit acceleration and adopting an ensembling approach. The resulting improved algorithm is empirically shown to achieve state-of-the-art accuracy. The often-overlooked, yet crucial, stages of thresholding and postprocessing are also investigated systematically. Simple, layered techniques are shown to improve the utility of anomaly alerts significantly. An efficient hyperparameter optimisation strategy based on the well-known tree-structured Parzen estimator is proposed and validated in the contexts of both static and streaming data scenarios in order to automate the tuning of the numerous parameters across the anomaly detection pipeline. novel anomaly detection framework called the Generic Anomaly Detection in Time Series framework. This framework is a modular, scalable, automated, online, and evolvable end-to-end pipeline designed to provide a robust and practical anomaly detection solution for real-world applications. By addressing foundational problems in benchmarking, evaluation, and optimisation, the framework establishes a rigorous and practical path forward for the field of time series anomaly detection. AFRIKAANSE OPSOMMING: Die opsporing van anomalie¨e in tydreeksdata is ’n kritieke taak in talle terreine, insluitend voorspellende instandhouding in die bedryf, gesondheidsorg, inligtingstegnologie en finansies. Vordering in die veld word egter belemmer deur ’n volgehoue gaping tussen teoretiese navorsing en praktiese toepassing, dikwels as gevolg van gebrekkige maatstafpraktyke, foutief-belynde evalueringsmaatstawwe en ’n gebrek aan omvattende, end-tot-end raamwerke. Hierdie uitdagings word in di´e proefskrif aangespreek, waarin ’n sistematiese, veelsydige ondersoek na die teorie en praktyk van tydreeksanomalie-opsporing gedokumenteer word. ’n Deeglike kritiek op bestaande openbare maatstafdatastelle word eers onderneem, waaruit beduidende tekortkominge ten opsigte van etiket-akkuraatheid, onrealistiese anomalie-digthede en anomalie-opsporingstrivialiteit aan die lig kom. In reaksie hierop word ’n nuwe, beginselvaste argief, genaamd die Een-veranderlike Datastel Argief vir Tydreeksanomalie-opsporing, bekendgestel, wat saamgestelde openbare datastelle insluit, sowel as twee nuwe sintetiese datastelle wat uit komplekse dinamiese stelsels gegenereer word. Hierdie argief bied ’n meer betroubare grondslag vir die evaluering en vergelyking van anomalie-opsporingsalgoritmes. Konvensionele evalueringsmetodologie¨e vir anomalie-opsporing word ook bevraagteken en die sogenaamde realistiese F-telling word in reaksie voorgestel. Hierdie nuwe maatstaf is ontwerp om die praktiese vereistes van anomalie-opsporingstelsels beter te weerspie¨el deur aaneenlopende anomalie-voorkomste en individuele vals alarms toepaslik te hanteer. Deur van hierdie verbeterde evalueringsraamwerk gebruik te maak, word ’n omvattende vergelykende studie van anomalie-opsporingsalgoritmes verder uitgevoer waartydens die bekende plaaslike uitskieter-faktoralgoritme as ’n toppresteerder ge¨ıdentifiseer word. Hierdie algoritme word vervolgens aansienlik verbeter deur grafiese verwekingseenheid-versnelling te bewerkstellig en ’n ensemble-benadering te volg. Daar word empiries getoon dat die gevolglike verbeterde algoritme die mees gevorderde akkuraatheid behaal. Die belangrike aspekte van drempelbepaling en naverwerking wat dikwels in die literatuur oor die hoof gesien word, word ook stelselmatig ondersoek. Daar word getoon dat eenvoudige, belaagde tegnieke daartoe in staat is om die nut van anomalie-waarskuwings aansienlik te verbeter. ’n Doeltreffende hiperparameter-optimeringstrategie gebaseer op die bekende Boom-gestruktureerde Parzen-beramer word voorgestel en in die konteks van beide statiese en stroomdata-scenario’s gevalideer om sodoende die afskatting van die talle parameters in die anomalie-opsporingspyplyn te outomatiseer. Laastens word die bogenoemde diverse navorsingsbydraes saamgesnaer deur ’n nuwe anomalieopsporingsraamwerk daar te stel wat as die Generiese Anomalie-opsporing in Tydreekse raamwerk bekendstaan. Hierdie raamwerk is ’n modulˆere, skaalbare, outomatiese, aanlyn en verderontwikkelbare end-tot-end pyplyn wat ontwerp is om ’n robuuste en praktiese anomalie-opsporingsoplossing vir werklike toepassings te bied. Deur fundamentele probleme in maatstafbepaling, evaluering en optimering aan te spreek, vestig die raamwerk ’n omvattende en praktiese pad vorentoe vir die veld van tydreeksanomalie-opsporing. Doctoral 2025-12-11T11:52:10Z 2025-12-11T11:52:10Z 2025-12 Thesis https://scholar.sun.ac.za/handle/10019.1/134506 en Stellenbosch University xxxii, 294 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Anomaly detection (Computer security)
Time-series analysis
Threshold (Perception)
Algorithm
Data mining -- Statistical methods
Barrish, Daniel
On the theory and practice of anomaly detection in time series
title On the theory and practice of anomaly detection in time series
title_full On the theory and practice of anomaly detection in time series
title_fullStr On the theory and practice of anomaly detection in time series
title_full_unstemmed On the theory and practice of anomaly detection in time series
title_short On the theory and practice of anomaly detection in time series
title_sort on the theory and practice of anomaly detection in time series
topic Anomaly detection (Computer security)
Time-series analysis
Threshold (Perception)
Algorithm
Data mining -- Statistical methods
url https://scholar.sun.ac.za/handle/10019.1/134506
work_keys_str_mv AT barrishdaniel onthetheoryandpracticeofanomalydetectionintimeseries