Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Hierarchical Reinforcement Learning in Minecraft

ENGLISH ABSTRACT: Humans have the remarkable ability to perform actions at various levels of abstraction. In addition to this, humans are also able to learn new skills by applying relevant knowledge, observing experts and refining t hrough e x p erience. M any c urrent r einforcement learning (RL...

Full description

Saved in:

Bibliographic Details
Main Author:	Rossouw, Francois Armand
Other Authors:	Engelbrecht, H. A.
Format:	Thesis
Language:	en_ZA
Published:	Stellenbosch : Stellenbosch University 2021
Subjects:	Minecraft (Game) UCTD Reinforcement learning > Hierarchies Neural networks (Computer science)
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867614002464423936
access_status_str	Open Access
author	Rossouw, Francois Armand
author2	Engelbrecht, H. A.
author_browse	Engelbrecht, H. A. Rossouw, Francois Armand
author_facet	Engelbrecht, H. A. Rossouw, Francois Armand
author_sort	Rossouw, Francois Armand
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	ENGLISH ABSTRACT: Humans have the remarkable ability to perform actions at various levels of abstraction. In addition to this, humans are also able to learn new skills by applying relevant knowledge, observing experts and refining t hrough e x p erience. M any c urrent r einforcement learning (RL) algorithms rely on a lengthy trial-and-error training process, making it infeasible to train them in the real world. In this thesis, to address sparse, hierarchical problems we propose the following: (1) an RL algorithm, Branched Rainbow from Demonstrations (BRfD), which combines several improvements to the Deep Q-Networks (DQN) algorithm, and is capable of learning from human demonstrations; (2) a hierarchically structured RL algorithm using BRfD to solve a set of sub-tasks in order to reach a goal. We evaluate both of these algorithms in the 2019 MineRL challenge environments. The MineRL competition challenged participants to find a Diamond i n M inecraft—a 3 D, o p en-world, procedurally generated game. We analyse the efficiency of several improvements implemented in the BRfD algorithm through an extensive ablation study. For this study, the agents are tasked with collecting 64 logs in a Minecraft forest environment. We show that our algorithm outperforms the overall winner of the MineRL challenge in the TreeChop environment. Additionally, we show that nearly all of the improvements impact the performance either in terms of learning speed or rewards received. For the hierarchical algorithm, we segment the demonstrations into the respective sub-tasks. The algorithm then trains a version of BRfD on these demonstrations before learning from its own experiences in the environment. We then evaluate the algorithm by inspecting the proportion of episodes in which certain items were obtained. While our algorithm is able to obtain iron ore, the current state-of-the-art algorithms are capable of obtaining a diamond.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/110556
institution	Stellenbosch University (South Africa)
language	en_ZA
last_indexed	2026-06-10T12:45:06.534Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2021
publishDateRange	2021
publishDateSort	2021
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/110556 Hierarchical Reinforcement Learning in Minecraft Rossouw, Francois Armand Engelbrecht, H. A. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Minecraft (Game) UCTD Reinforcement learning -- Hierarchies Neural networks (Computer science) ENGLISH ABSTRACT: Humans have the remarkable ability to perform actions at various levels of abstraction. In addition to this, humans are also able to learn new skills by applying relevant knowledge, observing experts and refining t hrough e x p erience. M any c urrent r einforcement learning (RL) algorithms rely on a lengthy trial-and-error training process, making it infeasible to train them in the real world. In this thesis, to address sparse, hierarchical problems we propose the following: (1) an RL algorithm, Branched Rainbow from Demonstrations (BRfD), which combines several improvements to the Deep Q-Networks (DQN) algorithm, and is capable of learning from human demonstrations; (2) a hierarchically structured RL algorithm using BRfD to solve a set of sub-tasks in order to reach a goal. We evaluate both of these algorithms in the 2019 MineRL challenge environments. The MineRL competition challenged participants to find a Diamond i n M inecraft—a 3 D, o p en-world, procedurally generated game. We analyse the efficiency of several improvements implemented in the BRfD algorithm through an extensive ablation study. For this study, the agents are tasked with collecting 64 logs in a Minecraft forest environment. We show that our algorithm outperforms the overall winner of the MineRL challenge in the TreeChop environment. Additionally, we show that nearly all of the improvements impact the performance either in terms of learning speed or rewards received. For the hierarchical algorithm, we segment the demonstrations into the respective sub-tasks. The algorithm then trains a version of BRfD on these demonstrations before learning from its own experiences in the environment. We then evaluate the algorithm by inspecting the proportion of episodes in which certain items were obtained. While our algorithm is able to obtain iron ore, the current state-of-the-art algorithms are capable of obtaining a diamond. AFRIKAANSE OPSOMMING: Mense het die uitsonderlike vermoë om op verskillende vlakke van abstraksie verskeie take uit te voer. Verder kan nuwe vaardighede aangeleer word deur relevante kennis toe te pas, kundiges waar te neem en deur verfyning van ondervinding. Verskeie bestaande versterkingsleer-algoritmes vertrou op omslagtige probeer-en-tref opleidingsprosesse wat dit nie lewensvatbaar maak in die praktyk nie. In hierdie tesis, om die beperkte rangorde van belangrikheid aan te spreek, stel ons die volgende voor: (1) ’n versterkingsleer- algoritme, “Branched Rainbow from Demonstrations (BRfD)”, wat verskeie verbeterings in die “Deep Q-Networks (DQN)” algoritme kombineer wat deur menslike demonstrasie leer; (2) ‘n hiërargiesgestruktureerde versterkingsleer-algoritme wat deur middel van BRfD verskeie subtake kan oplos. Ons ontleed beide die bovermelde algoritmes in die 2019 “MineRL” omgewing. Die “MineRL” kompetisie het deelnemers uitgedaag om ’n Diamant te vind in “Minecraft”. “Minecraft” is ’n driedimensionele, “open-world”, progressief gegenereerde rekenaarspeletjie. Verskeie verbeteringe wat in die BRfD-algoritme toegepas is deur omvangryke ablasiestudiemetodes word ontleed. Vir die studie is die agente opdrag gegee om 64 “logs” in ’n “Minecraft” woud omgewing bymekaar te maak. Ons toon dat hierdie algoritme die algehele wenner in die “Treechop” omgewing van die 2019 “MineRL” uitdaging klop. erder toon ons dat byna alle verbeterings ’n positiewe impak het ten opsigte van leerspoed of vergoeding ontvang. Vir die hiërargiese algoritme is die demonstrasies opgebreek in hulle verskeie subopdragte. Die algoritme leer dan ’n weergawe van BRfD deur middel van hierdie demonstrasies gebaseer op sy eie ondervinding in die omgewing. Ons evalueer dan die algoritmes deur ’n ondersoek te doen na die proporsie van episodes waar sekere items verkry is. Ons algoritme kon slegs ystererts vind in teenstelling met die huidige moderne algoritmes wat ’n diamant vind. Masters 2021-06-07T10:52:36Z 2021-06-07T10:52:36Z 2021-03 Thesis http://hdl.handle.net/10019.1/110556 en_ZA Stellenbosch University 129 pages application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Minecraft (Game) UCTD Reinforcement learning -- Hierarchies Neural networks (Computer science) Rossouw, Francois Armand Hierarchical Reinforcement Learning in Minecraft
title	Hierarchical Reinforcement Learning in Minecraft
title_full	Hierarchical Reinforcement Learning in Minecraft
title_fullStr	Hierarchical Reinforcement Learning in Minecraft
title_full_unstemmed	Hierarchical Reinforcement Learning in Minecraft
title_short	Hierarchical Reinforcement Learning in Minecraft
title_sort	hierarchical reinforcement learning in minecraft
topic	Minecraft (Game) UCTD Reinforcement learning -- Hierarchies Neural networks (Computer science)
url	http://hdl.handle.net/10019.1/110556
work_keys_str_mv	AT rossouwfrancoisarmand hierarchicalreinforcementlearninginminecraft

Full Text Available

Hierarchical Reinforcement Learning in Minecraft

Similar Items