Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Solving sparse-reward problems in partially observable 3D environments using distributed reinforcement learning

Thesis (MEng)--Stellenbosch University, 2021.

Saved in:

Bibliographic Details
Main Author:	Louw, Jacobus Martin
Other Authors:	Engelbrecht, Herman
Format:	Thesis
Language:	en_ZA
Published:	Stellenbosch : Stellenbosch University 2021
Subjects:	Sparse-reward problems Reinforcement learning 3D Environments Deep learning (Machine learning) UCTD
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613760992051200
access_status_str	Open Access
author	Louw, Jacobus Martin
author2	Engelbrecht, Herman
author_browse	Engelbrecht, Herman Louw, Jacobus Martin
author_facet	Engelbrecht, Herman Louw, Jacobus Martin
author_sort	Louw, Jacobus Martin
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Thesis (MEng)--Stellenbosch University, 2021.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/123775
institution	Stellenbosch University (South Africa)
language	en_ZA
last_indexed	2026-06-10T12:41:16.700Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2021
publishDateRange	2021
publishDateSort	2021
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/123775 Solving sparse-reward problems in partially observable 3D environments using distributed reinforcement learning Louw, Jacobus Martin Engelbrecht, Herman Schoeman, J. C. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Sparse-reward problems Reinforcement learning 3D Environments Deep learning (Machine learning) UCTD Thesis (MEng)--Stellenbosch University, 2021. ENGLISH ABSTRACT: n this study, we address sparse-reward problems in partially observable 3D environments. The example task is set in a simulation environment where a reinforcement learning (RL) agent has to deliver a first-aid kit to an immobilised miner using an image observation. We apply a deep Q-learning algorithm with several modifications to solve this problem. We first show that it helps the agent to solve problems in the partially observable environment when the agent’s observation is augmented with a history of previous observations and performed actions. We then consider three main modifications made to the deep Q-learning algorithm to address this problem. The first is to dramatically increase the rate at which new data is generated by using a distributed system. Secondly, we utilise prioritised experience replay (PER) [39] to repeat transitions of significance more frequently to the agent. Lastly, we add the n-step return to the algorithm. The work by Hessel et al. [14] and Horgan et al. [16] shows that these modifications significantly improve the performance of the deep Q-learning algorithm on the Atari platform. The Atari platform consists mainly of simple 2D environments; however, we consider performance on a partially observable 3D environment with sparse rewards. We confirm the results of Fedus et al. [10] and show that better-performing policies are trained when the replay buffer contains more recently generated data. We show that prioritising transitions and the n-step return is very important in solving the example sparse-reward problem. In addition to these modifications we also look into strategies to improve exploration. We then demonstrate that curriculum learning (CL) or domain randomisation (DR) can be used to help the agent to solve more challenging problems where it is difficult to initially receive the reward signal. Lastly, we establish that it greatly benefits the deep Q-learning agent’s performance when CL is used in combination with DR to solve larger, more complex problems. AFRIKAANSE OPSOMMING: n hierdie studie spreek ons skaars-beloningsprobleme in gedeeltelik sigbare 3D-omgewings aan. In die probleem wat ons as voorbeeld gebruik, moet ’n versterkingsleeragent ’n noodhulpkissie aan ’n gestrande mynwerker in ’n simulasie-omgewing aflewer. Die agent moet aksies, gebaseer op ’n kamerabeeld, uitvoer om die taak te verrig. Ons pas ’n diep-Q-leer algoritme met ’n paar wysigings toe, om die probleem op te los. Ons toon eerstens aan dat dit die agent help om probleme in die gedeeltlik sigbare omgewing op te los, indien sy waarneming aangevul word deur vorige waarnemings en uitgevoerde aksies. Daarna oorweeg ons drie hoofsaaklike wysigings aan die diep-Q-leer algoritme om hierdie probleem op te los. Eerstens word die spoed waarteen nuwe data gegenereer word drasties verhoog deur van ’n verspreide stelsel gebruik te maak. Tweedens gebruik ons ’n geprioritiseerde ervaringsbuffer [39] om belangrike ervarings meer gereeld aan die agent terug te speel. Laastens voeg ons n-stap opdaterings by die algoritme. Die navorsing deur Hessel et al. [14] en Horgan et al. [16] toon aan dat hierdie wysigings die werksverrigting van die diep-Q-leer algoritme op die Atari-platform aansienlik verbeter. Die Atari-speletjies bestaan hoofsaaklik uit 2D-omgewings, terwyl ons die algoritme op ’n 3D-omgewing met skaars-belonings toepas. Ons bevestig die resultate van Fedus et al. [10] en toon aan dat beter gedragspatrone aangeleer word indien die ervaringsbuffer meer onlangs gegenereerde data bevat. Ons toon ook dat die prioritisering van ervaring en n-stap opdaterings baie belangrik is om die skaars-beloningsprobleem in die voorbeeld op te los. Aanvullend tot hierdie wysigings, ondersoek ons ook strategieë om die verkenning van die omgewing te verbeter. Ons toon aan dat kurrikulumleer of domein-lukraakheid die agent kan help om meer uitdagende probleme op te los, waar dit aanvanklik moeilik is om ’n beloning te ontvang. Laastens wys ons dat dit die diep-Q-leer agent verder bevoordeel indien kurrikulumleer in kombinasie met domein-lukraakheid gebruik word om groter en moeiliker probleme op te los. Masters 2021-11-08T04:29:10Z 2021-12-22T14:20:44Z 2021-11-08T04:29:10Z 2021-12-22T14:20:44Z 2021-12 Thesis http://hdl.handle.net/10019.1/123775 en_ZA Stellenbosch University 144 pages application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Sparse-reward problems Reinforcement learning 3D Environments Deep learning (Machine learning) UCTD Louw, Jacobus Martin Solving sparse-reward problems in partially observable 3D environments using distributed reinforcement learning
title	Solving sparse-reward problems in partially observable 3D environments using distributed reinforcement learning
title_full	Solving sparse-reward problems in partially observable 3D environments using distributed reinforcement learning
title_fullStr	Solving sparse-reward problems in partially observable 3D environments using distributed reinforcement learning
title_full_unstemmed	Solving sparse-reward problems in partially observable 3D environments using distributed reinforcement learning
title_short	Solving sparse-reward problems in partially observable 3D environments using distributed reinforcement learning
title_sort	solving sparse reward problems in partially observable 3d environments using distributed reinforcement learning
topic	Sparse-reward problems Reinforcement learning 3D Environments Deep learning (Machine learning) UCTD
url	http://hdl.handle.net/10019.1/123775
work_keys_str_mv	AT louwjacobusmartin solvingsparserewardproblemsinpartiallyobservable3denvironmentsusingdistributedreinforcementlearning

Full Text Available

Solving sparse-reward problems in partially observable 3D environments using distributed reinforcement learning

Similar Items