Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi

Thesis (PhD)--Stellenbosch University, 2025.

Saved in:
Bibliographic Details
Main Author: Bredell, Francois
Other Authors: Schoeman, J. C.
Format: Thesis
Language:English
Published: Stellenbosch : Stellenbosch University 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613979310817280
access_status_str Open Access
author Bredell, Francois
author2 Schoeman, J. C.
author_browse Bredell, Francois
Schoeman, J. C.
author_facet Schoeman, J. C.
Bredell, Francois
author_sort Bredell, Francois
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (PhD)--Stellenbosch University, 2025.
format Thesis
id oai:scholar.sun.ac.za:10019.1/134531
institution Stellenbosch University (South Africa)
language English
last_indexed 2026-06-10T12:44:44.746Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/134531 Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi Bredell, Francois Schoeman, J. C. Engelbrecht, H. A. (Herman) Stellenbosch University. Faculty of Engineering. Dept. of Electrical & Electronic Engineering. Multiagent systems Human-computer interaction Reinforcement learning Intelligent agents (Computer software) Cooperative games (Mathematics) Thesis (PhD)--Stellenbosch University, 2025. Bredell, F. 2025. Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi. Unpublished doctoral dissertation. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/9f0c7d12-f35b-4ae2-a819-94867abe1ff2 ENGLISH ABSTRACT: Multi-agent reinforcement learning (MARL) holds promising potential to address a large variety of problems where artificial agent operation would offer a significant improvement over alternative handcrafted solutions, e.g., autonomous agents controlling vehicles on roads. This is due to MARL agents being able to learn from a collective of past and simulated experiences. Unfortunately, multi-agent systems often contain hidden information and require clear communication to achieve effective cooperation. Furthermore, this communication channel can be restricted, e.g., in quick reaction time scenarios where many agents must cooperate simultaneously. When humans tackle these complex multi-agent problems, they rely on conventions to reduce uncertainty. Conventions allow for a means to implicitly convey ideas or knowledge based on a mutually agreed upon set of “rules” or principles, and range from driving on a certain side of the road to social conventions, greetings, and norms. The card game Hanabi is considered a strong platform for the testing and development of MARL algorithms, due to its cooperative nature, partial observability, and limited communication. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on complex architecture design to achieve state-of-the-art performance. When humans tackle the Hanabi challenge, they require the use of conventions to consistently achieve a score near 25/25. One aspect of the Hanabi challenge yet to be explored in MARL is that of conventions, and how to incorporate human-like conventions into MARL algorithms. In this dissertation, we propose a novel framework to incorporate artificial conventions into MARL algorithms, where one agent can select a convention from the convention space (as opposed to the action space) and other agents can continue this convention by actively choosing to participate in it through the selection of the next steps of that convention. Artificial conventions are based on existing and widely adopted human conventions for Hanabi, and can be applied to any MARL algorithm by incorporating them into an agent’s action space using our framework. Our results show that artificial conventions lead to a five times reduction in computational cost by significantly reducing the training data required to reach a converged policy. One of the major benefits of artificial conventions is its ability to achieve effective cooperation in cross-play Hanabi, where agents from different training runs are paired together during evaluation. Rainbow agents paired in a self-play scenario are able to achieve a score of 20.64/25 in two-player Hanabi; however, when paired in a cross-play scenario, the agents only achieve a score of 2.91/25. By applying artificial conventions to the Rainbow agents, we are able to significantly improve the cross-play performance and achieve a score of 17.02/25. This research demonstrates the importance of implicit knowledge sharing through the use of artificial conventions, allowing agents to cooperate effectively in a manner closely resembling that of their human counterparts in complex multi-agent scenarios. AFRIKAANSE OPSOMMING: Multi-agent versterkingsleer (MARL) het belowende potensiaal om ’n groot verskeidenheid probleme aan te pak waar kunsmatige agente ’n beduidende verbetering sou bied bo-oor alternatiewe handgemaakte oplossings, bv. outonome agente wat voertuie op paaie beheer. Hierdie is a.g.v. die feit dat MARL-agente kan leer uit ’n versameling van vorige en gesimuleerde ervarings. Multi-agentstelsels bevat egter dikwels verborge inligting en vereis duidelike kommunikasie om effektiewe samewerking te bewerkstellig. Hierdie kommunikasiekanaal kan ook beperk wees, bv. in vinnige reaksietyd omgewings waar baie agente gelyktydig moet saamwerk. Wanneer mense sulke probleme aanpak, maak hulle staat op konvensies om onsekerheid te verminder. Konvensies maak voorsiening vir ’n manier om implisiet idees of kennis oor te dra gebaseer op ’n onderling ooreengekome stel "reëls" of beginsels, en behels die ry aan ’n sekere kant van die pad tot sosiale konvensies, begroetings en norme. Die kaartspel Hanabi word beskou as ’n sterk platform vir die toetsing en ontwikkeling van MARL-algoritmes, as gevolg van sy samewerkende aard, gedeeltelike waarneembaarheid en beperkte kommunikasie. Vorige navorsingspogings het die vermoëns van MARL-algoritmes binne Hanabi ondersoek, met ’n groot fokus op argitektuurontwerp om moderne prestasie te bereik. Wanneer mense die Hanabi-probleem aanpak, benodig hulle die gebruik van konvensies om ’n telling naby 25/25 te behaal. Een aspek van Hanabi wat nog in MARL ondersoek moet word, is dié van konvensies, en hoe om mensagtige konvensies in bestaande MARL-algoritmes in te werk. In hierdie proefskrif, stel ons ’n nuwe raamwerk voor om kunsmatige konvensies in MARL-algoritmes in te werk, waar een agent ’n konvensie uit die konvensieruimte kan kies (teenoor die aksieruimte) en ander agente hierdie konvensie kan voortsit om aktief deel te neem in dit deur die volgende stappe van daardie konvensie te kies. Kunsmatige konvensies is gebaseer op bestaande en aanvaarde menslike konvensies vir Hanabi, en kan op enige MARL-algoritme toegepas word deur dit in ’n agent se bestaande aksieruimte in te sluit met behulp van ons raamwerk. Ons resultate toon dat kunsmatige konvensies lei tot ’n vyfvoudige vermindering in berekeningskoste deur die opleidingsdata wat benodig word om ’n gekonvergeerde beleid te bereik, aansienlik te verminder. Een van die grootste voordele van kunsmatige konvensies is die vermoë om effektiewe samewerking in kruisspel Hanabi te bewerkstellig, waar agente van verskillende opleidingslopies saamgevoeg word tydens evaluering. Rainbow agente wat in ’n selfspel-omgewing gepaar word, kan ’n telling van 20.64/25 in tweespelerHanabi behaal, maar wanneer hulle in ’n kruisspel-omgewing gepaar word, behaal die agente slegs 2.91/25. Deur kunsmatige konvensies op die agente toe te pas, kan ons die kruisspel-prestasie aansienlik verbeter en 17.02/25 behaal. Hierdie navorsing demonstreer die belangrikheid van implisiete kennisdeling met behulp van kunsmatige konvensies, wat agente in staat stel om effektief saam te werk op ’n manier wat nou ooreenstem met dié van hul menslike eweknieë in komplekse multi-agent probleme. Doctoral 2025-12-12T09:06:28Z 2025-12-12T09:06:28Z 2025-12 Thesis https://scholar.sun.ac.za/handle/10019.1/134531 en Stellenbosch University xv, 128 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Multiagent systems
Human-computer interaction
Reinforcement learning
Intelligent agents (Computer software)
Cooperative games (Mathematics)
Bredell, Francois
Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi
title Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi
title_full Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi
title_fullStr Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi
title_full_unstemmed Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi
title_short Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi
title_sort augmenting the action space with conventions to improve multi agent cooperation in hanabi
topic Multiagent systems
Human-computer interaction
Reinforcement learning
Intelligent agents (Computer software)
Cooperative games (Mathematics)
url https://scholar.sun.ac.za/handle/10019.1/134531
work_keys_str_mv AT bredellfrancois augmentingtheactionspacewithconventionstoimprovemultiagentcooperationinhanabi