Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Approximation approaches for training neural network problems with dynamic mini-batch sub-sampled losses

Thesis (PhD (Mechanical Engineering))--University of Pretoria, 2021.

Saved in:

Bibliographic Details
Other Authors:	Wilke, Daniel Nicolas
Format:	Thesis
Language:	English
Published:	University of Pretoria 2022
Subjects:	UCTD Line search SNN-GPP Gradient only Neural network Approximation
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613634728820736
access_status_str	Open Access
author2	Wilke, Daniel Nicolas
author_browse	Wilke, Daniel Nicolas
author_facet	Wilke, Daniel Nicolas
collection	Thesis
dc_rights_str_mv	© 2022 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description	Thesis (PhD (Mechanical Engineering))--University of Pretoria, 2021.
format	Thesis
id	oai:repository.up.ac.za:2263/84238
institution	University of Pretoria (South Africa)
language	English
last_indexed	2026-06-10T12:39:16.035Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate	2022
publishDateRange	2022
publishDateSort	2022
publisher	University of Pretoria
publisherStr	University of Pretoria
record_format	dspace
source_str	UPSpace — University of Pretoria Institutional Repository
spelling	oai:repository.up.ac.za:2263/84238 Approximation approaches for training neural network problems with dynamic mini-batch sub-sampled losses Wilke, Daniel Nicolas u11085160@tuks.co.za Chae, Younghwan UCTD Line search SNN-GPP Gradient only Neural network Approximation Thesis (PhD (Mechanical Engineering))--University of Pretoria, 2021. Learning rate schedule parameters is a sensitive and challenging hyperparameter to resolve in machine learning. It needs to be resolved whenever a model, data, data preprocessing or data batching changes. Implications of poorly resolving learning rates include poor models, high computing cost, excessive training time, and excessive carbon footprint. In addition, deep neural network (DNN) architectures routinely require billions of parameters, with GPT-3 utilizing 175 billion parameters and an estimated 12 million USD to train. Mini-batch sub-sampling introduces bias and variance that can manifest in several ways. Considering a line-search along a descent direction, the implications are smooth loss functions with large bias (static) or pointwise discontinuous loss functions with low bias but high variance in the function response. Two previous studies demonstrated that line searches have the potential to automate learning rate selection. In both cases, learning rates are resolved for point-wise discontinuous functions that include Bayesian regression and direct optimization using a gradient-only line search, GOLS. This study is an explorative study that investigates the potential of surrogates to resolve learning rates instead of direct optimization of the loss function. We aim to identify domains that warrant further investigation, for which purposes we introduced a new robustness measure to compare algorithms more sensibly. As a result, we start our surrogate investigation at the fundamental level, considering the most basic form for each approach. This isolates the essence and rids unnecessary complexity. We do, however, retain selected complexity that is deemed crucial such as dynamic sub-sampling. Hence, this study is an explorative study and not yet another study that proposes a state-of-the-art (SOTA) algorithm on a carefully curated dataset with carefully curated baseline algorithms against which to compare. The three fundamentally different approaches to resolve learning rates using surrogates are 1. The construction of one-dimensional quadratic surrogates for point-wise discontinuous functions to resolve learning rates by minimization; 2. The construction of one-dimensional classifiers to resolve learning rates from a gradient-only perspective using classification; 3. Sub-dimensional surrogates (higher than 1D) on smooth loss functions to isolate the identification of appropriate bases on simple test problems. This study concludes that both 1 and 2 further warrant investigation, with the longer-term goal to be extended to sub-dimensional surrogates to enhance efficiency Department of Mechanical and Aeronautical Engineering Mechanical and Aeronautical Engineering PhD (Mechanical Engineering) Unrestricted 2022-02-25T12:13:06Z 2022-02-25T12:13:06Z 2022-05-13 2021 Thesis * A2022 http://hdl.handle.net/2263/84238 en © 2022 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle	UCTD Line search SNN-GPP Gradient only Neural network Approximation Approximation approaches for training neural network problems with dynamic mini-batch sub-sampled losses
title	Approximation approaches for training neural network problems with dynamic mini-batch sub-sampled losses
title_full	Approximation approaches for training neural network problems with dynamic mini-batch sub-sampled losses
title_fullStr	Approximation approaches for training neural network problems with dynamic mini-batch sub-sampled losses
title_full_unstemmed	Approximation approaches for training neural network problems with dynamic mini-batch sub-sampled losses
title_short	Approximation approaches for training neural network problems with dynamic mini-batch sub-sampled losses
title_sort	approximation approaches for training neural network problems with dynamic mini batch sub sampled losses
topic	UCTD Line search SNN-GPP Gradient only Neural network Approximation
url	http://hdl.handle.net/2263/84238

Full Text Available

Approximation approaches for training neural network problems with dynamic mini-batch sub-sampled losses

Similar Items