Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.

Thesis (MEng)--Stellenbosch University, 2023.

Saved in:
Bibliographic Details
Main Author: Bakambana, Jeremie
Other Authors: Engelbrecht, Herman
Format: Thesis
Language:en_ZA
en_ZA
Published: Stellenbosch : Stellenbosch University 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613993425698816
access_status_str Open Access
author Bakambana, Jeremie
author2 Engelbrecht, Herman
author_browse Bakambana, Jeremie
Engelbrecht, Herman
author_facet Engelbrecht, Herman
Bakambana, Jeremie
author_sort Bakambana, Jeremie
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MEng)--Stellenbosch University, 2023.
format Thesis
id oai:scholar.sun.ac.za:10019.1/127368
institution Stellenbosch University (South Africa)
language en_ZA
en_ZA
last_indexed 2026-06-10T12:44:57.544Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2023
publishDateRange 2023
publishDateSort 2023
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/127368 Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning. Bakambana, Jeremie Engelbrecht, Herman Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Reinforcement learning Transfer learning (Machine learning) Neural networks (Computer science) Thesis (MEng)--Stellenbosch University, 2023. ENGLISH ABSTRACT: Humans have the interesting ability to adapt to complex tasks by leveraging knowledge acquired on simpler tasks. In addition, humans can coordinate behaviors to reach a common objective. The recent progress in the field of Reinforcement Learning (RL) has demonstrated that an agent can acclimate to a complex task after being introduced to a simpler variant of the same task. In this study, we investigate the ability of RL agents to solve a complex task, while collaborating with another learning agent. The given task is a cooperative volleyball game in 3 dimensions. We used the Proximal Policy Optimization (PPO) algorithm as the training agent because it was successful in solving, in a single-agent scenario, a simple variant of the game, which is the same volleyball game in 2 dimensions. We applied Incremental RL as a training paradigm to address the sparsity due to the large state space of the experimental environment. We first started by investigating the problem in a single-agent scenario. We broke down the main task MDP into a sequence of incremental MDPs, which generated a sequence of different variants of the same task ranging from the simplest to the most complex. Then we trained the agent to solve each task in the sequence starting with the simplest. The investigation demonstrated that: (1) the agent can adapt to an incremental sequence of MDPs; (2) Reaching the optimal level of expertise in a simple variant of a task is not a requirement to adapt to a more complex variant of the same task, the agent can still adapt in a complex task after a partial mastering of a simpler variant; (3) the optimal policy generated by the agent at the final task generalizes over all previous MDPs generated by simpler variants of the final task; (4) A successful incremental learning can be influenced by two parameters: one controlling when the training agent can transit to a more complex variant of the given task, and another controlling how complex the new variant of the task must be. Based on the experiment result in the single-agent scenario, we investigated the paradigm in cooperative multi-agent scenarios. Toward the investigation, we demonstrated that with appropriate Reward Shaping, decentralized learning can be effective to solve cooperative scenarios without necessarily tuning hyperparameters. We also showed that Incremental Learning is an effective and promising approach to address issues such as the sparsity of tasks with large state space in the multi-agent scenario. We finally proved in our work the ability of RL agents to adapt to a dynamic environment and maintain collaboration with other agents. AFRIKAANS OPSOMMING: Mense het die interessante vermo¨e om by komplekse take aan te pas deur kennis wat op eenvoudiger take opgedoen is, te benut. Daarbenewens kan mense gedrag ko¨ordineer om ’n gemeenskaplike doelwit te bereik. Onlangse vooruitgang in versterkingsleer (VL) het bewys dat agente by komplekse taak kan aanpas deur met ’n eenvoudige taak te begin en die kompleksiteit van die taak geleidelik te verhoog. In hierdie studie ondersoek ons die vermo¨e van versterkingsleeragente om by ’n komplekse taak aan te pas, terwyl hulle saamwerk met ’n ander agent. Die gegewe taak is ’n vlugbalbalspel in 3 dimensies (3D) waar ’n span moet saamwerk. Ons het die Proksimale Beleidsoptimalisering (PPO) algoritme gebruik om die taak op te los omdat die toestandsen aksieruimte van die taak kontinu is. Ons het ook PPO gebruik omdat dit suksesvol was in die basislynstaak, wat dieselfde vlugbalspel is wat in ’n 2-dimensionele omgewing gespeel word. Ons pas Inkrementele Versterkingsleer toe om die ylheid aan te spreek wat ’n gevolg is van die groot toestandsruimte van die eksperimentele omgewing. Ons het begin deur die probleem as ’n enkelagent-scenario te ondersoek. Ons het die hoofomgewing se Markov Besluitnemingsproses (MBP) opgebreek in ’n reeks inkrementele MBP’s en die agent sekwensie¨el opgelei om elke MBP in die reeks op te los. Die studie het getoon dat: (1) die agent kan aanpas by ’n inkrementele reeks van MBP’s; (2) om ’n subtaak perfek te bemeester is nie ’n vereiste vir ’n suksesvolle aanpassing by die taak van ’n volgende inkrement nie, die agent kan aanpas by die volgende inkrement na ’n gedeeltelike bemeestering van die huidige taak; (3) die optimale beleid wat deur die agent by die finale taak genereer word, veralgemeen oor alle vorige MBP’s in die reeks. Ons het ons studie voortgesit deur Inkrementele Leer in ’n ko¨operatiewe-kompeterende multi-agent scenario te ondersoek. Ons het getoon dat desentraliseerde leer met behoorlike Beloningsvorming die samewerkende scenario’s suksesvol kan oplos sonder om spesifieke hiperparameter-instelling te vereis. Ons het ook gewys dat Inkrementele Leer ’n effektiewe en belowende benadering is om kwessies soos die ylheid van take met groot toestandsruimtes in die multi-agent scenario aan te spreek. Ons demonstreer met hierdie studie die vermo¨e van VL-agente om by ’n dinamiese omgewing aan te pas en om samewerking met ander agente te handhaaf. Masters 2023-03-06T08:46:23Z 2023-05-18T07:18:31Z 2023-03-06T08:46:23Z 2023-05-18T07:18:31Z 2023-03 Thesis http://hdl.handle.net/10019.1/127368 en_ZA en_ZA Stellenbosch University xii, 123 application/pdf Stellenbosch : Stellenbosch University
spellingShingle Reinforcement learning
Transfer learning (Machine learning)
Neural networks (Computer science)
Bakambana, Jeremie
Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title_full Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title_fullStr Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title_full_unstemmed Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title_short Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title_sort learning decentralized policies with incremental reinforcement learning reward shaping and self play learning
topic Reinforcement learning
Transfer learning (Machine learning)
Neural networks (Computer science)
url http://hdl.handle.net/10019.1/127368
work_keys_str_mv AT bakambanajeremie learningdecentralizedpolicieswithincrementalreinforcementlearningrewardshapingandselfplaylearning