Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Self-attention policy architectures for reinforcement learning under partial observability

Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments....

Full description

Saved in:
Bibliographic Details
Main Author: Du Plessis, Jeremy
Other Authors: Shock, Jonathan
Format: Thesis
Language:English
English
Published: Department of Mathematics and Applied Mathematics 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613148530343936
access_status_str Open Access
author Du Plessis, Jeremy
author2 Shock, Jonathan
author_browse Du Plessis, Jeremy
Shock, Jonathan
author_facet Shock, Jonathan
Du Plessis, Jeremy
author_sort Du Plessis, Jeremy
collection Thesis
description Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making.
format Thesis
id oai:open.uct.ac.za:11427/41574
institution University of Cape Town (South Africa)
language English
eng
last_indexed 2026-06-10T12:31:31.816Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher Department of Mathematics and Applied Mathematics
publisherStr Department of Mathematics and Applied Mathematics
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/41574 Self-attention policy architectures for reinforcement learning under partial observability Du Plessis, Jeremy Shock, Jonathan Self-attention Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making. 2025-08-13T12:59:08Z 2025-08-13T12:59:08Z 2025 2025-08-07T09:06:32Z Thesis / Dissertation Masters MSc http://hdl.handle.net/11427/41574 en eng application/pdf Department of Mathematics and Applied Mathematics Faculty of Science University of Cape Town
spellingShingle Self-attention
Du Plessis, Jeremy
Self-attention policy architectures for reinforcement learning under partial observability
thesis_degree_str Master's
title Self-attention policy architectures for reinforcement learning under partial observability
title_full Self-attention policy architectures for reinforcement learning under partial observability
title_fullStr Self-attention policy architectures for reinforcement learning under partial observability
title_full_unstemmed Self-attention policy architectures for reinforcement learning under partial observability
title_short Self-attention policy architectures for reinforcement learning under partial observability
title_sort self attention policy architectures for reinforcement learning under partial observability
topic Self-attention
url http://hdl.handle.net/11427/41574
work_keys_str_mv AT duplessisjeremy selfattentionpolicyarchitecturesforreinforcementlearningunderpartialobservability