Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning

Thesis (MEng)--Stellenbosch University, 2024.

Saved in:
Bibliographic Details
Main Author: Denkema, De Wet
Other Authors: Engelbrecht, Herman
Format: Thesis
Language:en_ZA
en_ZA
Published: Stellenbosch : Stellenbosch University 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613853079044096
access_status_str Open Access
author Denkema, De Wet
author2 Engelbrecht, Herman
author_browse Denkema, De Wet
Engelbrecht, Herman
author_facet Engelbrecht, Herman
Denkema, De Wet
author_sort Denkema, De Wet
collection Thesis
description Thesis (MEng)--Stellenbosch University, 2024.
format Thesis
id oai:scholar.sun.ac.za:10019.1/130216
institution Stellenbosch University (South Africa)
language en_ZA
en_ZA
last_indexed 2026-06-10T12:42:44.343Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2024
publishDateRange 2024
publishDateSort 2024
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/130216 Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning Denkema, De Wet Engelbrecht, Herman Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Reinforcement learning Multiagent systems Artificial Intelligence Deep learning (Machine learning) UCTD Thesis (MEng)--Stellenbosch University, 2024. ENGLISH ABSTRACT: Single-agent reinforcement learning has seen extensive research and improvements in the past few years, with the various algorithms that make up the state-of-the-art seeing numerous modifications to improve their performance. Recent years have also seen an upsurge in the use of multi-agent reinforcement learning to address complex cooperative problems, such as the StarCraft multi-agent challenge. The improvements made to the single-agent algorithms have not seen extensive testing in combination with their multi-agent counterparts, where applicable, particularly not in domains consisting of complex, partially observable 3D environments. In this thesis, we address multi-task cooperative multi-agent reinforcement learning in such environments. The task that this thesis focuses on consists of a simplified version of a level from the Portal 2 computer game, where the agents have to press buttons, place portals, and use the portals to teleport to reach a target zone. We use QMIX, a value-based multi-agent reinforcement learning algorithm to solve this problem, along with several modifications made to both the Deep Q-Networks algorithm and the Deep Recurrent Q-Networks algorithm, which the QMIX agents are based on. These modifications are the use of multiple decoupled actor processes to speed up the rate of experience generation, coupled with prioritised experience replay (PER) [1] for training on samples with a higher significance more frequently. We also use recurrent burn-in [2] to improve the accuracy of the recurrent layer of our networks, along with reward standardisation to stabilise training, and n-step returns for improved performance. Noisy neural networks for exploration [3] and value function rescaling [2] were also experimented with but were shown to consistently reduce performance. We examine the contribution of each of these modifications to the algorithm’s performance in an exhaustive ablation study. Finally, we use curriculum-transfer learning to train our agents in related problems and transfer the knowledge to more complex, multi-task problems, showing that this approach significantly improves agent performance in all of the tested environments. Our ablation study shows that the inclusion of these modifications, except for noisy neural networks and value function rescaling, is a requirement for successfully solving our learning environment. Experimenting with the use of curriculum-transfer learning further shows that knowledge from simpler tasks can be transferred to related, more complex tasks, which becomes a requirement for solving the tasks in the most complex environments. Finally, we use curriculum-transfer learning to train agents in an environment with a related action and observation space, but a different task structure, to show that agents successfully learn and apply skills in new environments, while learning to cooperate. AFRIKAANSE OPSOMMING: Enkel-agent versterkingsleer het in die afgelope paar jaar uitgebreide navorsing en verbeteringe beleef, met die verskeidenheid algoritmes wat die nuutste tegnologie uitmaak wat baie wysigings sien om hul prestasie te verbeter. Onlangse jare het ook ’n toename in die gebruik van multi-agent versterkingsleer gesien om komplekse samewerkingsprobleme aan te spreek, soos die StarCraft multi-agent uitdaging. Die verbeteringe aan die enkellagsalgoritmes het egter nognie deeglike toetsing in kombinasie met hul multi-agentsverteenwoordigers gesien nie, waar van toepassing, veral nie in gebiede wat uit komplekse, gedeeltelik waarneembare 3D-omgewings bestaan nie. In hierdie studie adresseer ons die multi-taak samewerkende multi-agentsversterkingsleer in sulke omgewings. Die taak waarop hierdie studie fokus, bestaan uit ’n vereenvoudigde weergawe van ’n vlak uit die Portal 2-rekenaarspeletjie, waar die agente knoppies moet druk, poorte moet plaas en die poorte moet gebruik om te teleporteer om ’n teikengebied te bereik. Ons gebruik QMIX, ’n waarde-gebaseerde multi-agentsversterkingsleer algoritme om hierdie probleem op te los, saam met verskeie wysigings wat aan beide die DQN-algoritme en die DRQN-algoritme gemaak is, waarop die QMIX-agente gebaseer is. Hierdie wysigings sluit in die gebruik van meerdere ontkoppelde akteurprosesse om die tempo van ervaringsgenerasie te versnel, gekoppel met prioriteitservaringsherhaling om monsters met ’n ho¨er beduidendheid meer gereeld te gebruik om vanaf te leer. Ons gebruik ook herhalende aanloop om die akkuraatheid van die herhalende laag van ons netwerke te verbeter, saam met beloningsstandaardisering om opleiding te stabiliseer. Raserige neurale netwerke vir verkenning en waardefunksie-herkalibrasie is ook getoets, maar dit is getoon dat dit die prestasie verminder. Ons ondersoek die bydrae van elke van hierdie wysigings tot die algoritme se prestasie in ’n omvattende ablasie-studie. Laastens gebruik ons kurrikulum-oordragleer om ons agente in verwante probleme op te lei en kennis oor te dra na meer komplekse, multi-taakprobleme, wat aantoon dat hierdie benadering agente se prestasie in enige omgewing aansienlik verbeter. Ons ablasie-studie toon aan dat die insluiting van hierdie wysigings, met die uitsondering van raserige neurale netwerke en waardefunksie-herkalibrasie, ’n vereiste is om ons leeromgewing suksesvol op te los. Eksperimentering met die gebruik van kurrikulum-oordragleer toon verder dat kennis van eenvoudiger take na verwante, meer komplekse take oorgedra kan word, wat ’n vereiste word vir die oplossing van die take in die mees komplekse omgewings. Laastens gebruik ons kurrikulum-oordragleer om agente in ’n omgewing met ’n verwante aksie- en waarnemingruimte, maar ’n ander taakstruktuur, op te lei om aan te toon dat agente suksesvol vaardighede in nuwe omgewings leer en toepas, terwyl hulle leer om saam te werk. Masters 2024-02-29T08:12:35Z 2024-04-26T09:31:59Z 2024-02-29T08:12:35Z 2024-04-26T09:31:59Z 2024-03 Thesis https://scholar.sun.ac.za/handle/10019.1/130216 en_ZA en_ZA xix, 139 pages : illustrations. application/pdf Stellenbosch : Stellenbosch University
spellingShingle Reinforcement learning
Multiagent systems
Artificial Intelligence
Deep learning (Machine learning)
UCTD
Denkema, De Wet
Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title_full Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title_fullStr Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title_full_unstemmed Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title_short Cooperative multi-agent reinforcement learning in sparse-reward, partially observable 3d environments with curriculum-transfer learning
title_sort cooperative multi agent reinforcement learning in sparse reward partially observable 3d environments with curriculum transfer learning
topic Reinforcement learning
Multiagent systems
Artificial Intelligence
Deep learning (Machine learning)
UCTD
url https://scholar.sun.ac.za/handle/10019.1/130216
work_keys_str_mv AT denkemadewet cooperativemultiagentreinforcementlearninginsparserewardpartiallyobservable3denvironmentswithcurriculumtransferlearning