Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (PhD)--Stellenbosch University, 2026.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Stellenbosch : Stellenbosch University
2026
|
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613751172136960 |
|---|---|
| access_status_str | Open Access |
| author | Bras, Edward Hendrik |
| author2 | Louw, Tobias Muller |
| author_browse | Bras, Edward Hendrik Louw, Tobias Muller |
| author_facet | Louw, Tobias Muller Bras, Edward Hendrik |
| author_sort | Bras, Edward Hendrik |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (PhD)--Stellenbosch University, 2026. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/135638 |
| institution | Stellenbosch University (South Africa) |
| language | English |
| last_indexed | 2026-06-10T12:41:06.301Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2026 |
| publishDateRange | 2026 |
| publishDateSort | 2026 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/135638 Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning Bras, Edward Hendrik Louw, Tobias Muller Bradshaw, Steven Martin Stellenbosch University. Faculty of Engineering. Dept. of Chemical Engineering. Thesis (PhD)--Stellenbosch University, 2026. Bras, E. H. 2026. Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning. Unpublished doctoral dissertation. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/0e5eccc1-8e1e-4021-8e48-23d785c22b03 Reinforcement learning (RL) enables the prospect of data-driven controllers that learn to select control actions optimally purely through the feedback provided by an evaluative signal (the reward). In principle, this technology may be used to develop adaptive controllers that account for plant-model mismatches, thereby unlocking untapped potential in the operation of processes in the chemical- and minerals processing industries. RL relies on trial-and-error design of opaque function approximator architectures and the selection of hyperparameters. Furthermore, challenges in establishing RL agent training that is efficient and safe currently hinders the technology’s industrial adoption. The analysis of RL control is typically decoupled from well-established control principles derived from classical and optimal control research. The first objective of the study was to show that RL can be applied to the adaptive control of a visualizable control policy initialized using proportional-integral (PI) control. This included formulating an RL controller to accommodate continuous control without randomised starting states and incorporating safety into the model-free design. The second goal of the research was to relate the tuning of this controller to generalised plant dynamics. Thirdly, the strengths and weaknesses of RL were assessed by tasking the controller with the adaptation of explicit model predictive control (MPC) policies. A key strategy in achieving each objective was to provide a warm start to the RL agent’s policy by fitting a shallow neural network to a policy precomputed using linear control or non-linear MPC. Subsequently, RL was applied to regulatory controller adaptation. An early actor-critic algorithm foundational to modern RL was used throughout. Model-free RL was shown to enable adaptive control of a visualizable control policy initialised to emulate PI control. By incorporating a safety mechanism in the RL agent’s control policy, control performance improvements relative to the PI controller used for initialisation was achieved efficiently and safely. The RL agent’s objective function was selected to enable continuing control, and the inclusion of additional states without losing training progress was demonstrated. The stability of the developed RL controller depends on the selection of RL hyperparameters. Process control is safety critical, and hence the selection of hyperparameters using only a priori knowledge must be investigated. The research presented leverages archetypal process dynamics to develop qualitative guidance for the selection of RL hyperparameters. First-order plus time delay, underdamped second order plus time delay, and inverse response dynamics were considered. For each system and set of process parameters, insensitivity to hyperparameter tuning was demonstrated, provided that the actor learning rate is set below a threshold value beyond which policy divergence is likely. Fostering an improved theoretical understanding of RL agent failure is essential. The quadruple tank benchmark was applied to evaluate the model-free RL agent’s capacity to account for plant-model mismatches by adapting an explicit non-linear MPC policy. By slowly moving a multivariable zero to simulate plant-model mismatches, it was demonstrated that the RL agent displays clear points of failure. Furthermore, these points of failure were explained using qualitative root-locus arguments that establish a relationship between closed-loop poles and open-loop zeros. Multivariable zeros in the right-half plane or close to the origin were shown to introduce a significant risk of closed-loop instability under RL control, particularly in the presence of diminishing process gain. The addition of exploratory noise induces noise in closed-loop pole locations which may inadvertently place a left-half plane pole at the origin, resulting in closed-loop instability. This dissertation has successfully uncovered novel insights into model-free RL agent training for chemical process control. The research comprised the development of a novel adaptive RL-based controller for continuing control, the development of a priori knowledge to inform RL agent tuning for this controller, and a novel analysis of RL agent strengths and weaknesses by applying concepts from linear systems theory and predictive control. The insights provided by this dissertation contributes knowledge directly applicable to general classes of archetypal single-loop and multivariable control problems. Doctoral 2026-04-07T06:37:47Z 2026-04-07T06:37:47Z 2026-03 Thesis https://scholar.sun.ac.za/handle/10019.1/135638 en Stellenbosch University 201 pages : ill. application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Bras, Edward Hendrik Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning |
| title | Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning |
| title_full | Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning |
| title_fullStr | Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning |
| title_full_unstemmed | Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning |
| title_short | Nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning |
| title_sort | nonlinear adaptation of established linear and predictive control laws through safe reinforcement learning |
| url | https://scholar.sun.ac.za/handle/10019.1/135638 |
| work_keys_str_mv | AT brasedwardhendrik nonlinearadaptationofestablishedlinearandpredictivecontrollawsthroughsafereinforcementlearning |