Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Hierarchical Reinforcement Learning in Minecraft

ENGLISH ABSTRACT: Humans have the remarkable ability to perform actions at various levels of abstraction. In addition to this, humans are also able to learn new skills by applying relevant knowledge, observing experts and refining t hrough e x p erience. M any c urrent r einforcement learning (RL...

Full description

Saved in:
Bibliographic Details
Main Author: Rossouw, Francois Armand
Other Authors: Engelbrecht, H. A.
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2021
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614002464423936
access_status_str Open Access
author Rossouw, Francois Armand
author2 Engelbrecht, H. A.
author_browse Engelbrecht, H. A.
Rossouw, Francois Armand
author_facet Engelbrecht, H. A.
Rossouw, Francois Armand
author_sort Rossouw, Francois Armand
collection Thesis
dc_rights_str_mv Stellenbosch University
description ENGLISH ABSTRACT: Humans have the remarkable ability to perform actions at various levels of abstraction. In addition to this, humans are also able to learn new skills by applying relevant knowledge, observing experts and refining t hrough e x p erience. M any c urrent r einforcement learning (RL) algorithms rely on a lengthy trial-and-error training process, making it infeasible to train them in the real world. In this thesis, to address sparse, hierarchical problems we propose the following: (1) an RL algorithm, Branched Rainbow from Demonstrations (BRfD), which combines several improvements to the Deep Q-Networks (DQN) algorithm, and is capable of learning from human demonstrations; (2) a hierarchically structured RL algorithm using BRfD to solve a set of sub-tasks in order to reach a goal. We evaluate both of these algorithms in the 2019 MineRL challenge environments. The MineRL competition challenged participants to find a Diamond i n M inecraft—a 3 D, o p en-world, procedurally generated game. We analyse the efficiency of several improvements implemented in the BRfD algorithm through an extensive ablation study. For this study, the agents are tasked with collecting 64 logs in a Minecraft forest environment. We show that our algorithm outperforms the overall winner of the MineRL challenge in the TreeChop environment. Additionally, we show that nearly all of the improvements impact the performance either in terms of learning speed or rewards received. For the hierarchical algorithm, we segment the demonstrations into the respective sub-tasks. The algorithm then trains a version of BRfD on these demonstrations before learning from its own experiences in the environment. We then evaluate the algorithm by inspecting the proportion of episodes in which certain items were obtained. While our algorithm is able to obtain iron ore, the current state-of-the-art algorithms are capable of obtaining a diamond.
format Thesis
id oai:scholar.sun.ac.za:10019.1/110556
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:45:06.534Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2021
publishDateRange 2021
publishDateSort 2021
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/110556 Hierarchical Reinforcement Learning in Minecraft Rossouw, Francois Armand Engelbrecht, H. A. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Minecraft (Game) UCTD Reinforcement learning -- Hierarchies Neural networks (Computer science) ENGLISH ABSTRACT: Humans have the remarkable ability to perform actions at various levels of abstraction. In addition to this, humans are also able to learn new skills by applying relevant knowledge, observing experts and refining t hrough e x p erience. M any c urrent r einforcement learning (RL) algorithms rely on a lengthy trial-and-error training process, making it infeasible to train them in the real world. In this thesis, to address sparse, hierarchical problems we propose the following: (1) an RL algorithm, Branched Rainbow from Demonstrations (BRfD), which combines several improvements to the Deep Q-Networks (DQN) algorithm, and is capable of learning from human demonstrations; (2) a hierarchically structured RL algorithm using BRfD to solve a set of sub-tasks in order to reach a goal. We evaluate both of these algorithms in the 2019 MineRL challenge environments. The MineRL competition challenged participants to find a Diamond i n M inecraft—a 3 D, o p en-world, procedurally generated game. We analyse the efficiency of several improvements implemented in the BRfD algorithm through an extensive ablation study. For this study, the agents are tasked with collecting 64 logs in a Minecraft forest environment. We show that our algorithm outperforms the overall winner of the MineRL challenge in the TreeChop environment. Additionally, we show that nearly all of the improvements impact the performance either in terms of learning speed or rewards received. For the hierarchical algorithm, we segment the demonstrations into the respective sub-tasks. The algorithm then trains a version of BRfD on these demonstrations before learning from its own experiences in the environment. We then evaluate the algorithm by inspecting the proportion of episodes in which certain items were obtained. While our algorithm is able to obtain iron ore, the current state-of-the-art algorithms are capable of obtaining a diamond. AFRIKAANSE OPSOMMING: Mense het die uitsonderlike vermoë om op verskillende vlakke van abstraksie verskeie take uit te voer. Verder kan nuwe vaardighede aangeleer word deur relevante kennis toe te pas, kundiges waar te neem en deur verfyning van ondervinding. Verskeie bestaande versterkingsleer-algoritmes vertrou op omslagtige probeer-en-tref opleidingsprosesse wat dit nie lewensvatbaar maak in die praktyk nie. In hierdie tesis, om die beperkte rangorde van belangrikheid aan te spreek, stel ons die volgende voor: (1) ’n versterkingsleer- algoritme, “Branched Rainbow from Demonstrations (BRfD)”, wat verskeie verbeterings in die “Deep Q-Networks (DQN)” algoritme kombineer wat deur menslike demonstrasie leer; (2) ‘n hiërargiesgestruktureerde versterkingsleer-algoritme wat deur middel van BRfD verskeie subtake kan oplos. Ons ontleed beide die bovermelde algoritmes in die 2019 “MineRL” omgewing. Die “MineRL” kompetisie het deelnemers uitgedaag om ’n Diamant te vind in “Minecraft”. “Minecraft” is ’n driedimensionele, “open-world”, progressief gegenereerde rekenaarspeletjie. Verskeie verbeteringe wat in die BRfD-algoritme toegepas is deur omvangryke ablasiestudiemetodes word ontleed. Vir die studie is die agente opdrag gegee om 64 “logs” in ’n “Minecraft” woud omgewing bymekaar te maak. Ons toon dat hierdie algoritme die algehele wenner in die “Treechop” omgewing van die 2019 “MineRL” uitdaging klop. erder toon ons dat byna alle verbeterings ’n positiewe impak het ten opsigte van leerspoed of vergoeding ontvang. Vir die hiërargiese algoritme is die demonstrasies opgebreek in hulle verskeie subopdragte. Die algoritme leer dan ’n weergawe van BRfD deur middel van hierdie demonstrasies gebaseer op sy eie ondervinding in die omgewing. Ons evalueer dan die algoritmes deur ’n ondersoek te doen na die proporsie van episodes waar sekere items verkry is. Ons algoritme kon slegs ystererts vind in teenstelling met die huidige moderne algoritmes wat ’n diamant vind. Masters 2021-06-07T10:52:36Z 2021-06-07T10:52:36Z 2021-03 Thesis http://hdl.handle.net/10019.1/110556 en_ZA Stellenbosch University 129 pages application/pdf Stellenbosch : Stellenbosch University
spellingShingle Minecraft (Game)
UCTD
Reinforcement learning -- Hierarchies
Neural networks (Computer science)
Rossouw, Francois Armand
Hierarchical Reinforcement Learning in Minecraft
title Hierarchical Reinforcement Learning in Minecraft
title_full Hierarchical Reinforcement Learning in Minecraft
title_fullStr Hierarchical Reinforcement Learning in Minecraft
title_full_unstemmed Hierarchical Reinforcement Learning in Minecraft
title_short Hierarchical Reinforcement Learning in Minecraft
title_sort hierarchical reinforcement learning in minecraft
topic Minecraft (Game)
UCTD
Reinforcement learning -- Hierarchies
Neural networks (Computer science)
url http://hdl.handle.net/10019.1/110556
work_keys_str_mv AT rossouwfrancoisarmand hierarchicalreinforcementlearninginminecraft