Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning

Thesis (MEng)--Stellenbosch University, 2024.

Saved in:

Bibliographic Details
Main Author:	Denkema, De Wet
Other Authors:	Engelbrecht, Herman
Format:	Thesis
Language:	en_ZA en_ZA
Published:	Stellenbosch : Stellenbosch University 2024
Subjects:	Reinforcement learning Multiagent systems Artificial Intelligence Deep learning (Machine learning) UCTD
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613853079044096
access_status_str	Open Access
author	Denkema, De Wet
author2	Engelbrecht, Herman
author_browse	Denkema, De Wet Engelbrecht, Herman
author_facet	Engelbrecht, Herman Denkema, De Wet
author_sort	Denkema, De Wet
collection	Thesis
description	Thesis (MEng)--Stellenbosch University, 2024.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/130216
institution	Stellenbosch University (South Africa)
language	en_ZA en_ZA
last_indexed	2026-06-10T12:42:44.343Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2024
publishDateRange	2024
publishDateSort	2024
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/130216 Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning Denkema, De Wet Engelbrecht, Herman Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Reinforcement learning Multiagent systems Artificial Intelligence Deep learning (Machine learning) UCTD Thesis (MEng)--Stellenbosch University, 2024. ENGLISH ABSTRACT: Single-agent reinforcement learning has seen extensive research and improvements in the past few years, with the various algorithms that make up the state-of-the-art seeing numerous modifications to improve their performance. Recent years have also seen an upsurge in the use of multi-agent reinforcement learning to address complex cooperative problems, such as the StarCraft multi-agent challenge. The improvements made to the single-agent algorithms have not seen extensive testing in combination with their multi-agent counterparts, where applicable, particularly not in domains consisting of complex, partially observable 3D environments. In this thesis, we address multi-task cooperative multi-agent reinforcement learning in such environments. The task that this thesis focuses on consists of a simplified version of a level from the Portal 2 computer game, where the agents have to press buttons, place portals, and use the portals to teleport to reach a target zone. We use QMIX, a value-based multi-agent reinforcement learning algorithm to solve this problem, along with several modifications made to both the Deep Q-Networks algorithm and the Deep Recurrent Q-Networks algorithm, which the QMIX agents are based on. These modifications are the use of multiple decoupled actor processes to speed up the rate of experience generation, coupled with prioritised experience replay (PER) [1] for training on samples with a higher significance more frequently. We also use recurrent burn-in [2] to improve the accuracy of the recurrent layer of our networks, along with reward standardisation to stabilise training, and n-step returns for improved performance. Noisy neural networks for exploration [3] and value function rescaling [2] were also experimented with but were shown to consistently reduce performance. We examine the contribution of each of these modifications to the algorithm’s performance in an exhaustive ablation study. Finally, we use curriculum-transfer learning to train our agents in related problems and transfer the knowledge to more complex, multi-task problems, showing that this approach significantly improves agent performance in all of the tested environments. Our ablation study shows that the inclusion of these modifications, except for noisy neural networks and value function rescaling, is a requirement for successfully solving our learning environment. Experimenting with the use of curriculum-transfer learning further shows that knowledge from simpler tasks can be transferred to related, more complex tasks, which becomes a requirement for solving the tasks in the most complex environments. Finally, we use curriculum-transfer learning to train agents in an environment with a related action and observation space, but a different task structure, to show that agents successfully learn and apply skills in new environments, while learning to cooperate. AFRIKAANSE OPSOMMING: Enkel-agent versterkingsleer het in die afgelope paar jaar uitgebreide navorsing en verbeteringe beleef, met die verskeidenheid algoritmes wat die nuutste tegnologie uitmaak wat baie wysigings sien om hul prestasie te verbeter. Onlangse jare het ook ’n toename in die gebruik van multi-agent versterkingsleer gesien om komplekse samewerkingsprobleme aan te spreek, soos die StarCraft multi-agent uitdaging. Die verbeteringe aan die enkellagsalgoritmes het egter nognie deeglike toetsing in kombinasie met hul multi-agentsverteenwoordigers gesien nie, waar van toepassing, veral nie in gebiede wat uit komplekse, gedeeltelik waarneembare 3D-omgewings bestaan nie. In hierdie studie adresseer ons die multi-taak samewerkende multi-agentsversterkingsleer in sulke omgewings. Die taak waarop hierdie studie fokus, bestaan uit ’n vereenvoudigde weergawe van ’n vlak uit die Portal 2-rekenaarspeletjie, waar die agente knoppies moet druk, poorte moet plaas en die poorte moet gebruik om te teleporteer om ’n teikengebied te bereik. Ons gebruik QMIX, ’n waarde-gebaseerde multi-agentsversterkingsleer algoritme om hierdie probleem op te los, saam met verskeie wysigings wat aan beide die DQN-algoritme en die DRQN-algoritme gemaak is, waarop die QMIX-agente gebaseer is. Hierdie wysigings sluit in die gebruik van meerdere ontkoppelde akteurprosesse om die tempo van ervaringsgenerasie te versnel, gekoppel met prioriteitservaringsherhaling om monsters met ’n ho¨er beduidendheid meer gereeld te gebruik om vanaf te leer. Ons gebruik ook herhalende aanloop om die akkuraatheid van die herhalende laag van ons netwerke te verbeter, saam met beloningsstandaardisering om opleiding te stabiliseer. Raserige neurale netwerke vir verkenning en waardefunksie-herkalibrasie is ook getoets, maar dit is getoon dat dit die prestasie verminder. Ons ondersoek die bydrae van elke van hierdie wysigings tot die algoritme se prestasie in ’n omvattende ablasie-studie. Laastens gebruik ons kurrikulum-oordragleer om ons agente in verwante probleme op te lei en kennis oor te dra na meer komplekse, multi-taakprobleme, wat aantoon dat hierdie benadering agente se prestasie in enige omgewing aansienlik verbeter. Ons ablasie-studie toon aan dat die insluiting van hierdie wysigings, met die uitsondering van raserige neurale netwerke en waardefunksie-herkalibrasie, ’n vereiste is om ons leeromgewing suksesvol op te los. Eksperimentering met die gebruik van kurrikulum-oordragleer toon verder dat kennis van eenvoudiger take na verwante, meer komplekse take oorgedra kan word, wat ’n vereiste word vir die oplossing van die take in die mees komplekse omgewings. Laastens gebruik ons kurrikulum-oordragleer om agente in ’n omgewing met ’n verwante aksie- en waarnemingruimte, maar ’n ander taakstruktuur, op te lei om aan te toon dat agente suksesvol vaardighede in nuwe omgewings leer en toepas, terwyl hulle leer om saam te werk. Masters 2024-02-29T08:12:35Z 2024-04-26T09:31:59Z 2024-02-29T08:12:35Z 2024-04-26T09:31:59Z 2024-03 Thesis https://scholar.sun.ac.za/handle/10019.1/130216 en_ZA en_ZA xix, 139 pages : illustrations. application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Reinforcement learning Multiagent systems Artificial Intelligence Deep learning (Machine learning) UCTD Denkema, De Wet Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title	Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title_full	Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title_fullStr	Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title_full_unstemmed	Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title_short	Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title_sort	cooperative multi agent reinforcement learning in sparse reward partially observable 3d environments with curriculum transfer learning
topic	Reinforcement learning Multiagent systems Artificial Intelligence Deep learning (Machine learning) UCTD
url	https://scholar.sun.ac.za/handle/10019.1/130216
work_keys_str_mv	AT denkemadewet cooperativemultiagentreinforcementlearninginsparserewardpartiallyobservable3denvironmentswithcurriculumtransferlearning

Full Text Available

Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning

Similar Items