Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Scaling multi-agent reinforcement learning to eleven aside simulated robot soccer

Thesis (PhD) -- Stellenbosch University, 2022.

Saved in:
Bibliographic Details
Main Author: Smit, Andries
Other Authors: Engelbrecht, Herman
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2022
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613980030140416
access_status_str Open Access
author Smit, Andries
author2 Engelbrecht, Herman
author_browse Engelbrecht, Herman
Smit, Andries
author_facet Engelbrecht, Herman
Smit, Andries
author_sort Smit, Andries
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (PhD) -- Stellenbosch University, 2022.
format Thesis
id oai:scholar.sun.ac.za:10019.1/125969
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:44:45.702Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2022
publishDateRange 2022
publishDateSort 2022
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/125969 Scaling multi-agent reinforcement learning to eleven aside simulated robot soccer Smit, Andries Engelbrecht, Herman Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Scaling multi-agent reinforcement; learning to eleven aside simulated robot soccer Robotics in sports Artificial intelligence -- Engineering applications Intelligent agents (Computer software) Robot Sumo (Game) Neural networks (Computer science) UCTD Thesis (PhD) -- Stellenbosch University, 2022. ENGLISH ABSTRACT: Robot soccer, where teams of autonomous agents compete against each other, has long been regarded as a grand challenge in arti cial intelligence. Despite recent successes of learned policies over heuristics and handcrafted rules in other domains, current teams in the RoboCup soccer simulation leagues still rely on handcrafted strategies and apply reinforcement learning only on small subcomponents. This limits a learning agent's ability to nd strong, high-level strategies for the game in its entirety. End-to-end reinforcement learning has successfully been applied in soccer simulations with up to 4 players. However, little previous work has been done on training in settings with more than 4 players, as learning is often unstable as well as it taking much longer to learn basic strategies. In this dissertation, we investigate whether it is possible for agents to learn competent soccer strategies in a full 22 player soccer game using limited computational resources (one CPU and one GPU), from tabula rasa and entirely through self-play. To enable this investigation, we build a simpli- ed 2D soccer simulator with signi cantly faster simulation times than the o cial RoboCup simulator, that still contains the important challenges for multi-agent learning in the context of robot soccer. We propose various improvements to the standard single-agent proximal policy optimisation algorithm, in an e ort to scale it to our multi-agent setting. These improvements include (1) using a policy and critic network with an attention mechanism that scales linearly in the number of agents, (2) sharing networks between agents which allow for faster throughput using batching, and (3) using Polyak averaged opponents with freezing of the opponent team when necessary and league opponents. We show through experimental results that stable training in the full 22 player setting is possible. Agents trained in the 22 player setting learn to defeat a variety of handcrafted strategies, and also achieve a higher win rate compared to agents trained in the 4 player setting and evaluated in the full 22 player setting. We also evaluate our nal algorithm in the RoboCup simulator and observe steady improvement in the team's performance over the course of training. Our work can guide future end-to-end multi-agent reinforcement learning teams to compete against the best handcrafted strategies available in simulated robot soccer. AFRIKAANS OPSOMMING: Robotsokker, waar spanne van outonome agente teen mekaar meeding, word lank reeds as 'n groot uitdaging vir kunsmatige intelligensie beskou. Ten spyte van onlangse suksesse van aangeleerde beleide teenoor heuristieke en handgemaakte re els in ander domeine, maak huidige spanne in die RoboCup-sokkersimulasie-ligas steeds staat op handgemaakte strategie e en pas versterkingsleer slegs toe op klein subkomponente. Dit beperk 'n leeragent se vermo e om sterk, ho evlakstrategie e vir die spel in sy geheel te vind. Eind-tot-eind versterkingsleer is al suksesvol toegepas in sokkersimulasies met tot 4 spelers. Min vorige werk is egter gedoen aan afrigting in opstellings met meer as 4 spelers, aangesien leer dikwels onstabiel is, asook dat dit baie langer neem om basiese strategie e aan te leer. In hierdie proefskrif ondersoek ons of dit moontlik is vir agente om bekwame sokkerstrategie e in 'n volle 22- speler sokkerwedstryd aan te leer deur gebruik te maak van beperkte rekenaarhulpbronne (een SVE en een GVE), vanaf tabula rasa en net deur selfspeel. Om hierdie ondersoek moontlik te maak, bou ons 'n vereenvoudigde 2D-sokkersimulator met aansienlik vinniger simulasietye as die amptelike RoboCup-simulator, wat steeds die belangrike uitdagings vir multi-agent-leer in die konteks van robotsokker bevat. Ons stel verskeie verbeterings aan die standaard enkel-agent proksimale beleid optimeringsalgoritme voor, in 'n poging om dit te skaal na ons multi-agent opstelling. Hierdie verbeterings sluit in (1) die gebruik van 'n beleid- en kritikusnetwerk met 'n aandagmeganisme wat line^er in die aantal agente skaal, (2) die deel van netwerke tussen agente wat vinniger berekeninge moontlik maak deur gebruik te maak van groepering, en (3) die gebruik van Polyak-gemiddelde teenstanders met die vries van die teenstanderspan wanneer nodig en liga-teenstanders. Ons wys deur eksperimentele resultate dat stabiele afrigting in die volle 22-speler-opstelling moontlik is. Agente wat in die 22-speler-opstelling afgerig word, leer om 'n verskeidenheid handgemaakte strategie e te verslaan, en behaal ook 'n ho er wenkoers in vergelyking met agente wat in die 4-speleropstelling afgerig word en in die volle 22-speler-opstelling ge evalueer word. Ons evalueer ook ons nale algoritme in die RoboCup-simulator en neem deur die loop van afrigting konstante verbetering in die span se wenkoers waar. Ons werk kan toekomstige eind-toteind multi-agent versterkingsleer spanne lei om mee te ding teen die beste handgemaakte strategie e wat beskikbaar is in gesimuleerde robotsokker. Doctoral 2022-11-16T06:45:45Z 2023-01-16T12:43:34Z 2022-11-16T06:45:45Z 2023-01-16T12:43:34Z 2022-12 Thesis http://hdl.handle.net/10019.1/125969 en_ZA Stellenbosch University xv, 117 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Scaling multi-agent reinforcement; learning to eleven aside simulated robot soccer
Robotics in sports
Artificial intelligence -- Engineering applications
Intelligent agents (Computer software)
Robot Sumo (Game)
Neural networks (Computer science)
UCTD
Smit, Andries
Scaling multi-agent reinforcement learning to eleven aside simulated robot soccer
title Scaling multi-agent reinforcement learning to eleven aside simulated robot soccer
title_full Scaling multi-agent reinforcement learning to eleven aside simulated robot soccer
title_fullStr Scaling multi-agent reinforcement learning to eleven aside simulated robot soccer
title_full_unstemmed Scaling multi-agent reinforcement learning to eleven aside simulated robot soccer
title_short Scaling multi-agent reinforcement learning to eleven aside simulated robot soccer
title_sort scaling multi agent reinforcement learning to eleven aside simulated robot soccer
topic Scaling multi-agent reinforcement; learning to eleven aside simulated robot soccer
Robotics in sports
Artificial intelligence -- Engineering applications
Intelligent agents (Computer software)
Robot Sumo (Game)
Neural networks (Computer science)
UCTD
url http://hdl.handle.net/10019.1/125969
work_keys_str_mv AT smitandries scalingmultiagentreinforcementlearningtoelevenasidesimulatedrobotsoccer