Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (PhD)--Stellenbosch University, 2025.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Stellenbosch : Stellenbosch University
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613979310817280 |
|---|---|
| access_status_str | Open Access |
| author | Bredell, Francois |
| author2 | Schoeman, J. C. |
| author_browse | Bredell, Francois Schoeman, J. C. |
| author_facet | Schoeman, J. C. Bredell, Francois |
| author_sort | Bredell, Francois |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (PhD)--Stellenbosch University, 2025. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/134531 |
| institution | Stellenbosch University (South Africa) |
| language | English |
| last_indexed | 2026-06-10T12:44:44.746Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/134531 Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi Bredell, Francois Schoeman, J. C. Engelbrecht, H. A. (Herman) Stellenbosch University. Faculty of Engineering. Dept. of Electrical & Electronic Engineering. Multiagent systems Human-computer interaction Reinforcement learning Intelligent agents (Computer software) Cooperative games (Mathematics) Thesis (PhD)--Stellenbosch University, 2025. Bredell, F. 2025. Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi. Unpublished doctoral dissertation. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/9f0c7d12-f35b-4ae2-a819-94867abe1ff2 ENGLISH ABSTRACT: Multi-agent reinforcement learning (MARL) holds promising potential to address a large variety of problems where artificial agent operation would offer a significant improvement over alternative handcrafted solutions, e.g., autonomous agents controlling vehicles on roads. This is due to MARL agents being able to learn from a collective of past and simulated experiences. Unfortunately, multi-agent systems often contain hidden information and require clear communication to achieve effective cooperation. Furthermore, this communication channel can be restricted, e.g., in quick reaction time scenarios where many agents must cooperate simultaneously. When humans tackle these complex multi-agent problems, they rely on conventions to reduce uncertainty. Conventions allow for a means to implicitly convey ideas or knowledge based on a mutually agreed upon set of “rules” or principles, and range from driving on a certain side of the road to social conventions, greetings, and norms. The card game Hanabi is considered a strong platform for the testing and development of MARL algorithms, due to its cooperative nature, partial observability, and limited communication. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on complex architecture design to achieve state-of-the-art performance. When humans tackle the Hanabi challenge, they require the use of conventions to consistently achieve a score near 25/25. One aspect of the Hanabi challenge yet to be explored in MARL is that of conventions, and how to incorporate human-like conventions into MARL algorithms. In this dissertation, we propose a novel framework to incorporate artificial conventions into MARL algorithms, where one agent can select a convention from the convention space (as opposed to the action space) and other agents can continue this convention by actively choosing to participate in it through the selection of the next steps of that convention. Artificial conventions are based on existing and widely adopted human conventions for Hanabi, and can be applied to any MARL algorithm by incorporating them into an agent’s action space using our framework. Our results show that artificial conventions lead to a five times reduction in computational cost by significantly reducing the training data required to reach a converged policy. One of the major benefits of artificial conventions is its ability to achieve effective cooperation in cross-play Hanabi, where agents from different training runs are paired together during evaluation. Rainbow agents paired in a self-play scenario are able to achieve a score of 20.64/25 in two-player Hanabi; however, when paired in a cross-play scenario, the agents only achieve a score of 2.91/25. By applying artificial conventions to the Rainbow agents, we are able to significantly improve the cross-play performance and achieve a score of 17.02/25. This research demonstrates the importance of implicit knowledge sharing through the use of artificial conventions, allowing agents to cooperate effectively in a manner closely resembling that of their human counterparts in complex multi-agent scenarios. AFRIKAANSE OPSOMMING: Multi-agent versterkingsleer (MARL) het belowende potensiaal om ’n groot verskeidenheid probleme aan te pak waar kunsmatige agente ’n beduidende verbetering sou bied bo-oor alternatiewe handgemaakte oplossings, bv. outonome agente wat voertuie op paaie beheer. Hierdie is a.g.v. die feit dat MARL-agente kan leer uit ’n versameling van vorige en gesimuleerde ervarings. Multi-agentstelsels bevat egter dikwels verborge inligting en vereis duidelike kommunikasie om effektiewe samewerking te bewerkstellig. Hierdie kommunikasiekanaal kan ook beperk wees, bv. in vinnige reaksietyd omgewings waar baie agente gelyktydig moet saamwerk. Wanneer mense sulke probleme aanpak, maak hulle staat op konvensies om onsekerheid te verminder. Konvensies maak voorsiening vir ’n manier om implisiet idees of kennis oor te dra gebaseer op ’n onderling ooreengekome stel "reëls" of beginsels, en behels die ry aan ’n sekere kant van die pad tot sosiale konvensies, begroetings en norme. Die kaartspel Hanabi word beskou as ’n sterk platform vir die toetsing en ontwikkeling van MARL-algoritmes, as gevolg van sy samewerkende aard, gedeeltelike waarneembaarheid en beperkte kommunikasie. Vorige navorsingspogings het die vermoëns van MARL-algoritmes binne Hanabi ondersoek, met ’n groot fokus op argitektuurontwerp om moderne prestasie te bereik. Wanneer mense die Hanabi-probleem aanpak, benodig hulle die gebruik van konvensies om ’n telling naby 25/25 te behaal. Een aspek van Hanabi wat nog in MARL ondersoek moet word, is dié van konvensies, en hoe om mensagtige konvensies in bestaande MARL-algoritmes in te werk. In hierdie proefskrif, stel ons ’n nuwe raamwerk voor om kunsmatige konvensies in MARL-algoritmes in te werk, waar een agent ’n konvensie uit die konvensieruimte kan kies (teenoor die aksieruimte) en ander agente hierdie konvensie kan voortsit om aktief deel te neem in dit deur die volgende stappe van daardie konvensie te kies. Kunsmatige konvensies is gebaseer op bestaande en aanvaarde menslike konvensies vir Hanabi, en kan op enige MARL-algoritme toegepas word deur dit in ’n agent se bestaande aksieruimte in te sluit met behulp van ons raamwerk. Ons resultate toon dat kunsmatige konvensies lei tot ’n vyfvoudige vermindering in berekeningskoste deur die opleidingsdata wat benodig word om ’n gekonvergeerde beleid te bereik, aansienlik te verminder. Een van die grootste voordele van kunsmatige konvensies is die vermoë om effektiewe samewerking in kruisspel Hanabi te bewerkstellig, waar agente van verskillende opleidingslopies saamgevoeg word tydens evaluering. Rainbow agente wat in ’n selfspel-omgewing gepaar word, kan ’n telling van 20.64/25 in tweespelerHanabi behaal, maar wanneer hulle in ’n kruisspel-omgewing gepaar word, behaal die agente slegs 2.91/25. Deur kunsmatige konvensies op die agente toe te pas, kan ons die kruisspel-prestasie aansienlik verbeter en 17.02/25 behaal. Hierdie navorsing demonstreer die belangrikheid van implisiete kennisdeling met behulp van kunsmatige konvensies, wat agente in staat stel om effektief saam te werk op ’n manier wat nou ooreenstem met dié van hul menslike eweknieë in komplekse multi-agent probleme. Doctoral 2025-12-12T09:06:28Z 2025-12-12T09:06:28Z 2025-12 Thesis https://scholar.sun.ac.za/handle/10019.1/134531 en Stellenbosch University xv, 128 pages : illustrations application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Multiagent systems Human-computer interaction Reinforcement learning Intelligent agents (Computer software) Cooperative games (Mathematics) Bredell, Francois Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi |
| title | Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi |
| title_full | Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi |
| title_fullStr | Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi |
| title_full_unstemmed | Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi |
| title_short | Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi |
| title_sort | augmenting the action space with conventions to improve multi agent cooperation in hanabi |
| topic | Multiagent systems Human-computer interaction Reinforcement learning Intelligent agents (Computer software) Cooperative games (Mathematics) |
| url | https://scholar.sun.ac.za/handle/10019.1/134531 |
| work_keys_str_mv | AT bredellfrancois augmentingtheactionspacewithconventionstoimprovemultiagentcooperationinhanabi |