Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Self-attention policy architectures for reinforcement learning under partial observability

Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments....

Full description

Saved in:

Bibliographic Details
Main Author:	Du Plessis, Jeremy
Other Authors:	Shock, Jonathan
Format:	Thesis
Language:	English English
Published:	Department of Mathematics and Applied Mathematics 2025
Subjects:	Self-attention
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613148530343936
access_status_str	Open Access
author	Du Plessis, Jeremy
author2	Shock, Jonathan
author_browse	Du Plessis, Jeremy Shock, Jonathan
author_facet	Shock, Jonathan Du Plessis, Jeremy
author_sort	Du Plessis, Jeremy
collection	Thesis
description	Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making.
format	Thesis
id	oai:open.uct.ac.za:11427/41574
institution	University of Cape Town (South Africa)
language	English eng
last_indexed	2026-06-10T12:31:31.816Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate	2025
publishDateRange	2025
publishDateSort	2025
publisher	Department of Mathematics and Applied Mathematics
publisherStr	Department of Mathematics and Applied Mathematics
record_format	dspace
source_str	UCTD — University of Cape Town Open Access Repository
spelling	oai:open.uct.ac.za:11427/41574 Self-attention policy architectures for reinforcement learning under partial observability Du Plessis, Jeremy Shock, Jonathan Self-attention Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making. 2025-08-13T12:59:08Z 2025-08-13T12:59:08Z 2025 2025-08-07T09:06:32Z Thesis / Dissertation Masters MSc http://hdl.handle.net/11427/41574 en eng application/pdf Department of Mathematics and Applied Mathematics Faculty of Science University of Cape Town
spellingShingle	Self-attention Du Plessis, Jeremy Self-attention policy architectures for reinforcement learning under partial observability
thesis_degree_str	Master's
title	Self-attention policy architectures for reinforcement learning under partial observability
title_full	Self-attention policy architectures for reinforcement learning under partial observability
title_fullStr	Self-attention policy architectures for reinforcement learning under partial observability
title_full_unstemmed	Self-attention policy architectures for reinforcement learning under partial observability
title_short	Self-attention policy architectures for reinforcement learning under partial observability
title_sort	self attention policy architectures for reinforcement learning under partial observability
topic	Self-attention
url	http://hdl.handle.net/11427/41574
work_keys_str_mv	AT duplessisjeremy selfattentionpolicyarchitecturesforreinforcementlearningunderpartialobservability

Full Text Available

Self-attention policy architectures for reinforcement learning under partial observability

Similar Items