Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.

Thesis (MEng)--Stellenbosch University, 2023.

Saved in:

Bibliographic Details
Main Author:	Bakambana, Jeremie
Other Authors:	Engelbrecht, Herman
Format:	Thesis
Language:	en_ZA en_ZA
Published:	Stellenbosch : Stellenbosch University 2023
Subjects:	Reinforcement learning Transfer learning (Machine learning) Neural networks (Computer science)
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613993425698816
access_status_str	Open Access
author	Bakambana, Jeremie
author2	Engelbrecht, Herman
author_browse	Bakambana, Jeremie Engelbrecht, Herman
author_facet	Engelbrecht, Herman Bakambana, Jeremie
author_sort	Bakambana, Jeremie
collection	Thesis
dc_rights_str_mv	Stellenbosch University
description	Thesis (MEng)--Stellenbosch University, 2023.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/127368
institution	Stellenbosch University (South Africa)
language	en_ZA en_ZA
last_indexed	2026-06-10T12:44:57.544Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2023
publishDateRange	2023
publishDateSort	2023
publisher	Stellenbosch : Stellenbosch University
publisherStr	Stellenbosch : Stellenbosch University
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/127368 Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning. Bakambana, Jeremie Engelbrecht, Herman Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Reinforcement learning Transfer learning (Machine learning) Neural networks (Computer science) Thesis (MEng)--Stellenbosch University, 2023. ENGLISH ABSTRACT: Humans have the interesting ability to adapt to complex tasks by leveraging knowledge acquired on simpler tasks. In addition, humans can coordinate behaviors to reach a common objective. The recent progress in the field of Reinforcement Learning (RL) has demonstrated that an agent can acclimate to a complex task after being introduced to a simpler variant of the same task. In this study, we investigate the ability of RL agents to solve a complex task, while collaborating with another learning agent. The given task is a cooperative volleyball game in 3 dimensions. We used the Proximal Policy Optimization (PPO) algorithm as the training agent because it was successful in solving, in a single-agent scenario, a simple variant of the game, which is the same volleyball game in 2 dimensions. We applied Incremental RL as a training paradigm to address the sparsity due to the large state space of the experimental environment. We first started by investigating the problem in a single-agent scenario. We broke down the main task MDP into a sequence of incremental MDPs, which generated a sequence of different variants of the same task ranging from the simplest to the most complex. Then we trained the agent to solve each task in the sequence starting with the simplest. The investigation demonstrated that: (1) the agent can adapt to an incremental sequence of MDPs; (2) Reaching the optimal level of expertise in a simple variant of a task is not a requirement to adapt to a more complex variant of the same task, the agent can still adapt in a complex task after a partial mastering of a simpler variant; (3) the optimal policy generated by the agent at the final task generalizes over all previous MDPs generated by simpler variants of the final task; (4) A successful incremental learning can be influenced by two parameters: one controlling when the training agent can transit to a more complex variant of the given task, and another controlling how complex the new variant of the task must be. Based on the experiment result in the single-agent scenario, we investigated the paradigm in cooperative multi-agent scenarios. Toward the investigation, we demonstrated that with appropriate Reward Shaping, decentralized learning can be effective to solve cooperative scenarios without necessarily tuning hyperparameters. We also showed that Incremental Learning is an effective and promising approach to address issues such as the sparsity of tasks with large state space in the multi-agent scenario. We finally proved in our work the ability of RL agents to adapt to a dynamic environment and maintain collaboration with other agents. AFRIKAANS OPSOMMING: Mense het die interessante vermo¨e om by komplekse take aan te pas deur kennis wat op eenvoudiger take opgedoen is, te benut. Daarbenewens kan mense gedrag ko¨ordineer om ’n gemeenskaplike doelwit te bereik. Onlangse vooruitgang in versterkingsleer (VL) het bewys dat agente by komplekse taak kan aanpas deur met ’n eenvoudige taak te begin en die kompleksiteit van die taak geleidelik te verhoog. In hierdie studie ondersoek ons die vermo¨e van versterkingsleeragente om by ’n komplekse taak aan te pas, terwyl hulle saamwerk met ’n ander agent. Die gegewe taak is ’n vlugbalbalspel in 3 dimensies (3D) waar ’n span moet saamwerk. Ons het die Proksimale Beleidsoptimalisering (PPO) algoritme gebruik om die taak op te los omdat die toestandsen aksieruimte van die taak kontinu is. Ons het ook PPO gebruik omdat dit suksesvol was in die basislynstaak, wat dieselfde vlugbalspel is wat in ’n 2-dimensionele omgewing gespeel word. Ons pas Inkrementele Versterkingsleer toe om die ylheid aan te spreek wat ’n gevolg is van die groot toestandsruimte van die eksperimentele omgewing. Ons het begin deur die probleem as ’n enkelagent-scenario te ondersoek. Ons het die hoofomgewing se Markov Besluitnemingsproses (MBP) opgebreek in ’n reeks inkrementele MBP’s en die agent sekwensie¨el opgelei om elke MBP in die reeks op te los. Die studie het getoon dat: (1) die agent kan aanpas by ’n inkrementele reeks van MBP’s; (2) om ’n subtaak perfek te bemeester is nie ’n vereiste vir ’n suksesvolle aanpassing by die taak van ’n volgende inkrement nie, die agent kan aanpas by die volgende inkrement na ’n gedeeltelike bemeestering van die huidige taak; (3) die optimale beleid wat deur die agent by die finale taak genereer word, veralgemeen oor alle vorige MBP’s in die reeks. Ons het ons studie voortgesit deur Inkrementele Leer in ’n ko¨operatiewe-kompeterende multi-agent scenario te ondersoek. Ons het getoon dat desentraliseerde leer met behoorlike Beloningsvorming die samewerkende scenario’s suksesvol kan oplos sonder om spesifieke hiperparameter-instelling te vereis. Ons het ook gewys dat Inkrementele Leer ’n effektiewe en belowende benadering is om kwessies soos die ylheid van take met groot toestandsruimtes in die multi-agent scenario aan te spreek. Ons demonstreer met hierdie studie die vermo¨e van VL-agente om by ’n dinamiese omgewing aan te pas en om samewerking met ander agente te handhaaf. Masters 2023-03-06T08:46:23Z 2023-05-18T07:18:31Z 2023-03-06T08:46:23Z 2023-05-18T07:18:31Z 2023-03 Thesis http://hdl.handle.net/10019.1/127368 en_ZA en_ZA Stellenbosch University xii, 123 application/pdf Stellenbosch : Stellenbosch University
spellingShingle	Reinforcement learning Transfer learning (Machine learning) Neural networks (Computer science) Bakambana, Jeremie Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title	Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title_full	Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title_fullStr	Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title_full_unstemmed	Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title_short	Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.
title_sort	learning decentralized policies with incremental reinforcement learning reward shaping and self play learning
topic	Reinforcement learning Transfer learning (Machine learning) Neural networks (Computer science)
url	http://hdl.handle.net/10019.1/127368
work_keys_str_mv	AT bakambanajeremie learningdecentralizedpolicieswithincrementalreinforcementlearningrewardshapingandselfplaylearning

Full Text Available

Learning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.

Similar Items