Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning

Thesis (PhD)--Stellenbosch University, 2026.

Saved in:

Bibliographic Details
Main Author:	Bras, Edward Hendrik
Other Authors:	Louw, Tobias Muller
Format:	Thesis
Language:	English
Published:	Stellenbosch : Stellenbosch University 2026
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613751172136960
access_status_str	Open Access
author	Bras, Edward Hendrik
author2	Louw, Tobias Muller
author_browse	Bras, Edward Hendrik Louw, Tobias Muller
author_facet	Louw, Tobias Muller Bras, Edward Hendrik
author_sort	Bras, Edward Hendrik
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Thesis (PhD)--Stellenbosch University, 2026.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/135638
institution	Stellenbosch University (South Africa)
language	English
last_indexed	2026-06-10T12:41:06.301Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2026
publishDateRange	2026
publishDateSort	2026
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/135638 Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning Bras, Edward Hendrik Louw, Tobias Muller Bradshaw, Steven Martin Stellenbosch University. Faculty of Engineering. Dept. of Chemical Engineering. Thesis (PhD)--Stellenbosch University, 2026. Bras, E. H. 2026. Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning. Unpublished doctoral dissertation. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/0e5eccc1-8e1e-4021-8e48-23d785c22b03 Reinforcement learning (RL) enables the prospect of data-driven controllers that learn to select control actions optimally purely through the feedback provided by an evaluative signal (the reward). In principle, this technology may be used to develop adaptive controllers that account for plant-model mismatches, thereby unlocking untapped potential in the operation of processes in the chemical- and minerals processing industries. RL relies on trial-and-error design of opaque function approximator architectures and the selection of hyperparameters. Furthermore, challenges in establishing RL agent training that is efficient and safe currently hinders the technology’s industrial adoption. The analysis of RL control is typically decoupled from well-established control principles derived from classical and optimal control research. The first objective of the study was to show that RL can be applied to the adaptive control of a visualizable control policy initialized using proportional-integral (PI) control. This included formulating an RL controller to accommodate continuous control without randomised starting states and incorporating safety into the model-free design. The second goal of the research was to relate the tuning of this controller to generalised plant dynamics. Thirdly, the strengths and weaknesses of RL were assessed by tasking the controller with the adaptation of explicit model predictive control (MPC) policies. A key strategy in achieving each objective was to provide a warm start to the RL agent’s policy by fitting a shallow neural network to a policy precomputed using linear control or non-linear MPC. Subsequently, RL was applied to regulatory controller adaptation. An early actor-critic algorithm foundational to modern RL was used throughout. Model-free RL was shown to enable adaptive control of a visualizable control policy initialised to emulate PI control. By incorporating a safety mechanism in the RL agent’s control policy, control performance improvements relative to the PI controller used for initialisation was achieved efficiently and safely. The RL agent’s objective function was selected to enable continuing control, and the inclusion of additional states without losing training progress was demonstrated. The stability of the developed RL controller depends on the selection of RL hyperparameters. Process control is safety critical, and hence the selection of hyperparameters using only a priori knowledge must be investigated. The research presented leverages archetypal process dynamics to develop qualitative guidance for the selection of RL hyperparameters. First-order plus time delay, underdamped second order plus time delay, and inverse response dynamics were considered. For each system and set of process parameters, insensitivity to hyperparameter tuning was demonstrated, provided that the actor learning rate is set below a threshold value beyond which policy divergence is likely. Fostering an improved theoretical understanding of RL agent failure is essential. The quadruple tank benchmark was applied to evaluate the model-free RL agent’s capacity to account for plant-model mismatches by adapting an explicit non-linear MPC policy. By slowly moving a multivariable zero to simulate plant-model mismatches, it was demonstrated that the RL agent displays clear points of failure. Furthermore, these points of failure were explained using qualitative root-locus arguments that establish a relationship between closed-loop poles and open-loop zeros. Multivariable zeros in the right-half plane or close to the origin were shown to introduce a significant risk of closed-loop instability under RL control, particularly in the presence of diminishing process gain. The addition of exploratory noise induces noise in closed-loop pole locations which may inadvertently place a left-half plane pole at the origin, resulting in closed-loop instability. This dissertation has successfully uncovered novel insights into model-free RL agent training for chemical process control. The research comprised the development of a novel adaptive RL-based controller for continuing control, the development of a priori knowledge to inform RL agent tuning for this controller, and a novel analysis of RL agent strengths and weaknesses by applying concepts from linear systems theory and predictive control. The insights provided by this dissertation contributes knowledge directly applicable to general classes of archetypal single-loop and multivariable control problems. Doctoral 2026-04-07T06:37:47Z 2026-04-07T06:37:47Z 2026-03 Thesis https://scholar.sun.ac.za/handle/10019.1/135638 en Stellenbosch University 201 pages : ill. application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Bras, Edward Hendrik Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning
title	Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning
title_full	Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning
title_fullStr	Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning
title_full_unstemmed	Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning
title_short	Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning
title_sort	nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning
url	https://scholar.sun.ac.za/handle/10019.1/135638
work_keys_str_mv	AT brasedwardhendrik nonlinearadaptationofestablishedlinearandpredictivecontrollawsthroughsafereinforcementlearning

Full Text Available

Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning

Similar Items