Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments....
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English English |
| Published: |
Department of Mathematics and Applied Mathematics
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613148530343936 |
|---|---|
| access_status_str | Open Access |
| author | Du Plessis, Jeremy |
| author2 | Shock, Jonathan |
| author_browse | Du Plessis, Jeremy Shock, Jonathan |
| author_facet | Shock, Jonathan Du Plessis, Jeremy |
| author_sort | Du Plessis, Jeremy |
| collection | Thesis |
| description | Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making. |
| format | Thesis |
| id | oai:open.uct.ac.za:11427/41574 |
| institution | University of Cape Town (South Africa) |
| language | English eng |
| last_indexed | 2026-06-10T12:31:31.816Z |
| license_str | Not specified — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Department of Mathematics and Applied Mathematics |
| publisherStr | Department of Mathematics and Applied Mathematics |
| record_format | dspace |
| source_str | UCTD — University of Cape Town Open Access Repository |
| spelling | oai:open.uct.ac.za:11427/41574 Self-attention policy architectures for reinforcement learning under partial observability Du Plessis, Jeremy Shock, Jonathan Self-attention Intermittent unavailability of sensory signals due to sensor failure and/or latency is a problem encountered in production environments such as in large manufacturing plants, for example. Deep reinforcement learning offers a natural solution for process control and optimisation in such environments. However, a shortcom-ing of conventional agent policy architectures in this instance is an inability to handle variable-sized inputs composed of available sensory signals, thus requiring the imputation of unavailable sensory signals with data which necessarily constitutes noise. We explore self-attention-based policy architectures as a solution to this problem, demonstrating their robustness under conditions of high partial observability on different rein-forcement learning benchmark tasks, and explore the advantages and disadvantages offered by our solution over conventional policy architectures. Additionally, we propose a novel hard attention mechanism, used in conjunction with our proposed policy architecture, enabling the agent to attend to the most salient sensory signals and allowing for greater interpretability of the agent's decision-making. 2025-08-13T12:59:08Z 2025-08-13T12:59:08Z 2025 2025-08-07T09:06:32Z Thesis / Dissertation Masters MSc http://hdl.handle.net/11427/41574 en eng application/pdf Department of Mathematics and Applied Mathematics Faculty of Science University of Cape Town |
| spellingShingle | Self-attention Du Plessis, Jeremy Self-attention policy architectures for reinforcement learning under partial observability |
| thesis_degree_str | Master's |
| title | Self-attention policy architectures for reinforcement learning under partial observability |
| title_full | Self-attention policy architectures for reinforcement learning under partial observability |
| title_fullStr | Self-attention policy architectures for reinforcement learning under partial observability |
| title_full_unstemmed | Self-attention policy architectures for reinforcement learning under partial observability |
| title_short | Self-attention policy architectures for reinforcement learning under partial observability |
| title_sort | self attention policy architectures for reinforcement learning under partial observability |
| topic | Self-attention |
| url | http://hdl.handle.net/11427/41574 |
| work_keys_str_mv | AT duplessisjeremy selfattentionpolicyarchitecturesforreinforcementlearningunderpartialobservability |