Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
ENGLISH ABSTRACT: Humans have the remarkable ability to perform actions at various levels of abstraction. In addition to this, humans are also able to learn new skills by applying relevant knowledge, observing experts and refining t hrough e x p erience. M any c urrent r einforcement learning (RL...
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | en_ZA |
| Published: |
Stellenbosch : Stellenbosch University
2021
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867614002464423936 |
|---|---|
| access_status_str | Open Access |
| author | Rossouw, Francois Armand |
| author2 | Engelbrecht, H. A. |
| author_browse | Engelbrecht, H. A. Rossouw, Francois Armand |
| author_facet | Engelbrecht, H. A. Rossouw, Francois Armand |
| author_sort | Rossouw, Francois Armand |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | ENGLISH ABSTRACT: Humans have the remarkable ability to perform actions at various levels of abstraction. In
addition to this, humans are also able to learn new skills by applying relevant knowledge,
observing experts and refining t hrough e x p erience. M any c urrent r einforcement learning
(RL) algorithms rely on a lengthy trial-and-error training process, making it infeasible
to train them in the real world. In this thesis, to address sparse, hierarchical problems
we propose the following: (1) an RL algorithm, Branched Rainbow from Demonstrations
(BRfD), which combines several improvements to the Deep Q-Networks (DQN) algorithm,
and is capable of learning from human demonstrations; (2) a hierarchically structured RL
algorithm using BRfD to solve a set of sub-tasks in order to reach a goal. We evaluate both
of these algorithms in the 2019 MineRL challenge environments. The MineRL competition
challenged participants to find a Diamond i n M inecraft—a 3 D, o p en-world, procedurally
generated game. We analyse the efficiency of several improvements implemented in the
BRfD algorithm through an extensive ablation study. For this study, the agents are tasked
with collecting 64 logs in a Minecraft forest environment. We show that our algorithm
outperforms the overall winner of the MineRL challenge in the TreeChop environment.
Additionally, we show that nearly all of the improvements impact the performance either in
terms of learning speed or rewards received. For the hierarchical algorithm, we segment the
demonstrations into the respective sub-tasks. The algorithm then trains a version of BRfD
on these demonstrations before learning from its own experiences in the environment. We
then evaluate the algorithm by inspecting the proportion of episodes in which certain items
were obtained. While our algorithm is able to obtain iron ore, the current state-of-the-art
algorithms are capable of obtaining a diamond. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/110556 |
| institution | Stellenbosch University (South Africa) |
| language | en_ZA |
| last_indexed | 2026-06-10T12:45:06.534Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2021 |
| publishDateRange | 2021 |
| publishDateSort | 2021 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/110556 Hierarchical Reinforcement Learning in Minecraft Rossouw, Francois Armand Engelbrecht, H. A. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Minecraft (Game) UCTD Reinforcement learning -- Hierarchies Neural networks (Computer science) ENGLISH ABSTRACT: Humans have the remarkable ability to perform actions at various levels of abstraction. In addition to this, humans are also able to learn new skills by applying relevant knowledge, observing experts and refining t hrough e x p erience. M any c urrent r einforcement learning (RL) algorithms rely on a lengthy trial-and-error training process, making it infeasible to train them in the real world. In this thesis, to address sparse, hierarchical problems we propose the following: (1) an RL algorithm, Branched Rainbow from Demonstrations (BRfD), which combines several improvements to the Deep Q-Networks (DQN) algorithm, and is capable of learning from human demonstrations; (2) a hierarchically structured RL algorithm using BRfD to solve a set of sub-tasks in order to reach a goal. We evaluate both of these algorithms in the 2019 MineRL challenge environments. The MineRL competition challenged participants to find a Diamond i n M inecraft—a 3 D, o p en-world, procedurally generated game. We analyse the efficiency of several improvements implemented in the BRfD algorithm through an extensive ablation study. For this study, the agents are tasked with collecting 64 logs in a Minecraft forest environment. We show that our algorithm outperforms the overall winner of the MineRL challenge in the TreeChop environment. Additionally, we show that nearly all of the improvements impact the performance either in terms of learning speed or rewards received. For the hierarchical algorithm, we segment the demonstrations into the respective sub-tasks. The algorithm then trains a version of BRfD on these demonstrations before learning from its own experiences in the environment. We then evaluate the algorithm by inspecting the proportion of episodes in which certain items were obtained. While our algorithm is able to obtain iron ore, the current state-of-the-art algorithms are capable of obtaining a diamond. AFRIKAANSE OPSOMMING: Mense het die uitsonderlike vermoë om op verskillende vlakke van abstraksie verskeie take uit te voer. Verder kan nuwe vaardighede aangeleer word deur relevante kennis toe te pas, kundiges waar te neem en deur verfyning van ondervinding. Verskeie bestaande versterkingsleer-algoritmes vertrou op omslagtige probeer-en-tref opleidingsprosesse wat dit nie lewensvatbaar maak in die praktyk nie. In hierdie tesis, om die beperkte rangorde van belangrikheid aan te spreek, stel ons die volgende voor: (1) ’n versterkingsleer- algoritme, “Branched Rainbow from Demonstrations (BRfD)”, wat verskeie verbeterings in die “Deep Q-Networks (DQN)” algoritme kombineer wat deur menslike demonstrasie leer; (2) ‘n hiërargiesgestruktureerde versterkingsleer-algoritme wat deur middel van BRfD verskeie subtake kan oplos. Ons ontleed beide die bovermelde algoritmes in die 2019 “MineRL” omgewing. Die “MineRL” kompetisie het deelnemers uitgedaag om ’n Diamant te vind in “Minecraft”. “Minecraft” is ’n driedimensionele, “open-world”, progressief gegenereerde rekenaarspeletjie. Verskeie verbeteringe wat in die BRfD-algoritme toegepas is deur omvangryke ablasiestudiemetodes word ontleed. Vir die studie is die agente opdrag gegee om 64 “logs” in ’n “Minecraft” woud omgewing bymekaar te maak. Ons toon dat hierdie algoritme die algehele wenner in die “Treechop” omgewing van die 2019 “MineRL” uitdaging klop. erder toon ons dat byna alle verbeterings ’n positiewe impak het ten opsigte van leerspoed of vergoeding ontvang. Vir die hiërargiese algoritme is die demonstrasies opgebreek in hulle verskeie subopdragte. Die algoritme leer dan ’n weergawe van BRfD deur middel van hierdie demonstrasies gebaseer op sy eie ondervinding in die omgewing. Ons evalueer dan die algoritmes deur ’n ondersoek te doen na die proporsie van episodes waar sekere items verkry is. Ons algoritme kon slegs ystererts vind in teenstelling met die huidige moderne algoritmes wat ’n diamant vind. Masters 2021-06-07T10:52:36Z 2021-06-07T10:52:36Z 2021-03 Thesis http://hdl.handle.net/10019.1/110556 en_ZA Stellenbosch University 129 pages application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Minecraft (Game) UCTD Reinforcement learning -- Hierarchies Neural networks (Computer science) Rossouw, Francois Armand Hierarchical Reinforcement Learning in Minecraft |
| title | Hierarchical Reinforcement Learning in Minecraft |
| title_full | Hierarchical Reinforcement Learning in Minecraft |
| title_fullStr | Hierarchical Reinforcement Learning in Minecraft |
| title_full_unstemmed | Hierarchical Reinforcement Learning in Minecraft |
| title_short | Hierarchical Reinforcement Learning in Minecraft |
| title_sort | hierarchical reinforcement learning in minecraft |
| topic | Minecraft (Game) UCTD Reinforcement learning -- Hierarchies Neural networks (Computer science) |
| url | http://hdl.handle.net/10019.1/110556 |
| work_keys_str_mv | AT rossouwfrancoisarmand hierarchicalreinforcementlearninginminecraft |