Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Sequential nonparametric estimation via Hermite series estimators

Algorithms for estimating the statistical properties of streams of data in real time, as well as for the efficient analysis of massive data sets, are becoming particularly pertinent given the increasing ubiquity of such data. In this thesis we introduce novel approaches to sequential (online) estima...

Full description

Saved in:
Bibliographic Details
Main Author: Stephanou, Michael Jared
Other Authors: Varughese, Melvin
Format: Thesis
Language:English
Published: Department of Statistical Sciences 2021
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614053675827200
access_status_str Open Access
author Stephanou, Michael Jared
author2 Varughese, Melvin
author_browse Stephanou, Michael Jared
Varughese, Melvin
author_facet Varughese, Melvin
Stephanou, Michael Jared
author_sort Stephanou, Michael Jared
collection Thesis
description Algorithms for estimating the statistical properties of streams of data in real time, as well as for the efficient analysis of massive data sets, are becoming particularly pertinent given the increasing ubiquity of such data. In this thesis we introduce novel approaches to sequential (online) estimation in both stationary and non-stationary settings based on Hermite series density estimators. In the univariate context we apply Hermite series based distribution function estimators to sequential cumulative distribution function estimation. These distribution function estimators are particularly useful because they allow the sequential estimation of the full cumulative distribution function. This is in contrast to the empirical distribution function estimator and smooth kernel distribution function estimator which only allow sequential cumulative probability estimation at predefined values on the support of the associated density function. We explore the asymptotic consistency and robustness properties of the Hermite series based cumulative distribution function estimator thereby redressing a gap in the literature. Given the sequential Hermite series based distribution function estimator, we obtain sequential quantile estimates numerically. Our algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time, in both the static and dynamic quantile estimation settings. In the bivariate context we introduce a Hermite series based sequential estimator for the Spearman's rank correlation coefficient and provide algorithms applicable in both the stationary and non-stationary settings. To treat the the non-stationary setting, we introduce a novel, exponentially weighted estimator for the Spearman's rank correlation, which allows the local nonparametric correlation of a bivariate data stream to be tracked. To the best of our knowledge this is the first algorithm to be proposed for estimating a time-varying Spearman's rank correlation that does not rely on a moving window approach. We explore the practical effectiveness of the Hermite series based estimators through real data and simulation studies, demonstrating competitive performance compared to leading existing algorithms. The potential applications of this work are manifold. Our sequential distribution function and quantile estimation algorithms can be applied to real time anomaly and outlier detection, real time provisioning for future demand as well as real time risk estimation for example. The Hermite series based Spearman's rank correlation estimator can be applied to fast and robust online calculation of correlation which may vary over time. Possible machine learning applications include fast feature selection and hierarchical clustering on massive data sets amongst others.
format Thesis
id oai:open.uct.ac.za:11427/32998
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:45:55.950Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2021
publishDateRange 2021
publishDateSort 2021
publisher Department of Statistical Sciences
publisherStr Department of Statistical Sciences
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/32998 Sequential nonparametric estimation via Hermite series estimators Stephanou, Michael Jared Varughese, Melvin statistical sciences Algorithms for estimating the statistical properties of streams of data in real time, as well as for the efficient analysis of massive data sets, are becoming particularly pertinent given the increasing ubiquity of such data. In this thesis we introduce novel approaches to sequential (online) estimation in both stationary and non-stationary settings based on Hermite series density estimators. In the univariate context we apply Hermite series based distribution function estimators to sequential cumulative distribution function estimation. These distribution function estimators are particularly useful because they allow the sequential estimation of the full cumulative distribution function. This is in contrast to the empirical distribution function estimator and smooth kernel distribution function estimator which only allow sequential cumulative probability estimation at predefined values on the support of the associated density function. We explore the asymptotic consistency and robustness properties of the Hermite series based cumulative distribution function estimator thereby redressing a gap in the literature. Given the sequential Hermite series based distribution function estimator, we obtain sequential quantile estimates numerically. Our algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time, in both the static and dynamic quantile estimation settings. In the bivariate context we introduce a Hermite series based sequential estimator for the Spearman's rank correlation coefficient and provide algorithms applicable in both the stationary and non-stationary settings. To treat the the non-stationary setting, we introduce a novel, exponentially weighted estimator for the Spearman's rank correlation, which allows the local nonparametric correlation of a bivariate data stream to be tracked. To the best of our knowledge this is the first algorithm to be proposed for estimating a time-varying Spearman's rank correlation that does not rely on a moving window approach. We explore the practical effectiveness of the Hermite series based estimators through real data and simulation studies, demonstrating competitive performance compared to leading existing algorithms. The potential applications of this work are manifold. Our sequential distribution function and quantile estimation algorithms can be applied to real time anomaly and outlier detection, real time provisioning for future demand as well as real time risk estimation for example. The Hermite series based Spearman's rank correlation estimator can be applied to fast and robust online calculation of correlation which may vary over time. Possible machine learning applications include fast feature selection and hierarchical clustering on massive data sets amongst others. 2021-02-25T18:40:24Z 2021-02-25T18:40:24Z 2020 2021-02-25T18:39:46Z Doctoral Thesis Doctoral PhD http://hdl.handle.net/11427/32998 eng application/pdf Department of Statistical Sciences Faculty of Science
spellingShingle statistical sciences
Stephanou, Michael Jared
Sequential nonparametric estimation via Hermite series estimators
thesis_degree_str Doctoral
title Sequential nonparametric estimation via Hermite series estimators
title_full Sequential nonparametric estimation via Hermite series estimators
title_fullStr Sequential nonparametric estimation via Hermite series estimators
title_full_unstemmed Sequential nonparametric estimation via Hermite series estimators
title_short Sequential nonparametric estimation via Hermite series estimators
title_sort sequential nonparametric estimation via hermite series estimators
topic statistical sciences
url http://hdl.handle.net/11427/32998
work_keys_str_mv AT stephanoumichaeljared sequentialnonparametricestimationviahermiteseriesestimators