Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Evolutionary algorithms for optimising reinforcement learning policy approximation

Reinforcement learning methods have become more efficient in recent years. In particular, the A3C (asynchronous advantage actor critic) approach demonstrated in Mnih et al. (2016) was able to halve the training time of the existing state-of-the-art approaches. However, these methods still require re...

Full description

Saved in:

Bibliographic Details
Main Author:	Cuningham, Blake
Other Authors:	Bassett, Bruce
Format:	Thesis
Language:	English
Published:	Department of Statistical Sciences 2020
Subjects:	statistical sciences
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613315831693312
access_status_str	Open Access
author	Cuningham, Blake
author2	Bassett, Bruce
author_browse	Bassett, Bruce Cuningham, Blake
author_facet	Bassett, Bruce Cuningham, Blake
author_sort	Cuningham, Blake
collection	Thesis
description	Reinforcement learning methods have become more efficient in recent years. In particular, the A3C (asynchronous advantage actor critic) approach demonstrated in Mnih et al. (2016) was able to halve the training time of the existing state-of-the-art approaches. However, these methods still require relatively large amounts of training resources due to the fundamental exploratory nature of reinforcement learning. Other machine learning approaches are able to improve the ability to train reinforcement learning agents by better processing input information to help map states to actions - convolutional and recurrent neural networks are helpful when input data is in image form that does not satisfy the Markov property. The specific required architecture of these convolutional and recurrent neural network models is not obvious given infinite possible permutations. There is very limited research giving clear guidance on neural network structure in a RL (reinforcement learning) context, and grid search-like approaches require too many resources and do not always find good optima. In order to address these, and other, challenges associated with traditional parameter optimization methods, an evolutionary approach similar to that taken by Dufourq and Bassett (2017) for image classification tasks was used to find the optimal model architecture when training an agent that learns to play Atari Pong. The approach found models that were able to train reinforcement learning agents faster, and with fewer parameters than that found by OpenAI’s model in Blackwell et al. (2018) - a superhuman level of performance.
format	Thesis
id	oai:open.uct.ac.za:11427/31170
institution	University of Cape Town (South Africa)
language	eng
last_indexed	2026-06-10T12:34:10.861Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate	2020
publishDateRange	2020
publishDateSort	2020
publisher	Department of Statistical Sciences
publisherStr	Department of Statistical Sciences
record_format	dspace
source_str	UCTD — University of Cape Town Open Access Repository
spelling	oai:open.uct.ac.za:11427/31170 Evolutionary algorithms for optimising reinforcement learning policy approximation Cuningham, Blake Bassett, Bruce statistical sciences Reinforcement learning methods have become more efficient in recent years. In particular, the A3C (asynchronous advantage actor critic) approach demonstrated in Mnih et al. (2016) was able to halve the training time of the existing state-of-the-art approaches. However, these methods still require relatively large amounts of training resources due to the fundamental exploratory nature of reinforcement learning. Other machine learning approaches are able to improve the ability to train reinforcement learning agents by better processing input information to help map states to actions - convolutional and recurrent neural networks are helpful when input data is in image form that does not satisfy the Markov property. The specific required architecture of these convolutional and recurrent neural network models is not obvious given infinite possible permutations. There is very limited research giving clear guidance on neural network structure in a RL (reinforcement learning) context, and grid search-like approaches require too many resources and do not always find good optima. In order to address these, and other, challenges associated with traditional parameter optimization methods, an evolutionary approach similar to that taken by Dufourq and Bassett (2017) for image classification tasks was used to find the optimal model architecture when training an agent that learns to play Atari Pong. The approach found models that were able to train reinforcement learning agents faster, and with fewer parameters than that found by OpenAI’s model in Blackwell et al. (2018) - a superhuman level of performance. 2020-02-19T12:18:14Z 2020-02-19T12:18:14Z 2019 2020-02-19T12:17:41Z Master Thesis Masters MSc http://hdl.handle.net/11427/31170 eng application/pdf Department of Statistical Sciences Faculty of Science
spellingShingle	statistical sciences Cuningham, Blake Evolutionary algorithms for optimising reinforcement learning policy approximation
thesis_degree_str	Master's
title	Evolutionary algorithms for optimising reinforcement learning policy approximation
title_full	Evolutionary algorithms for optimising reinforcement learning policy approximation
title_fullStr	Evolutionary algorithms for optimising reinforcement learning policy approximation
title_full_unstemmed	Evolutionary algorithms for optimising reinforcement learning policy approximation
title_short	Evolutionary algorithms for optimising reinforcement learning policy approximation
title_sort	evolutionary algorithms for optimising reinforcement learning policy approximation
topic	statistical sciences
url	http://hdl.handle.net/11427/31170
work_keys_str_mv	AT cuninghamblake evolutionaryalgorithmsforoptimisingreinforcementlearningpolicyapproximation

Full Text Available

Evolutionary algorithms for optimising reinforcement learning policy approximation

Similar Items