Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Multi-agent path finding with reinforcement learning

Thesis (MEng)--Stellenbosch University, 2021.

Saved in:
Bibliographic Details
Main Author: Ellis, James
Other Authors: Engelbrecht, Herman
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2021
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614024334573568
access_status_str Open Access
author Ellis, James
author2 Engelbrecht, Herman
author_browse Ellis, James
Engelbrecht, Herman
author_facet Engelbrecht, Herman
Ellis, James
author_sort Ellis, James
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MEng)--Stellenbosch University, 2021.
format Thesis
id oai:scholar.sun.ac.za:10019.1/123703
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:45:27.799Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2021
publishDateRange 2021
publishDateSort 2021
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/123703 Multi-agent path finding with reinforcement learning Ellis, James Engelbrecht, Herman Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Navigation systems Multiagent systems Reinforcement Learning Path finding UCTD Thesis (MEng)--Stellenbosch University, 2021. ENGLISH ABSTRACT: Navigation systems are becoming larger, with the need to find solutions in real time. Search based approaches are normally used to find collision free paths for a group of agents. These centralised search based approaches struggle to scale to large problem settings when solutions are required in real time. Multi-Agent Reinforcement Learning (MARL) approaches could potentially provide a more scalable solution than search based approaches, together with the ability to execute in real time once trained. We investigate the applicability of different MARL approaches towards solving the Multi-Agent Path Finding (MAPF) problem. This is done by empirical evalua- tion of several state-of-the-art MARL algorithms on a gridworld simulation environment of the MAPF problem. We also consider an imitation learning approach and an approach from related work which uses both reinforcement learning and imitation learning. Lastly, a comparison is done between successful MARL approaches and a conventional centralised path planner. From this comparisons, the advantages and disadvantages between Deep MARL approaches and that of using a centralised planner is determined. AFRIKAANSE OPSOMMING: Uittreksel Navigasiestelsels raak groter, met die behoefte om intyds oplossings te vind. Metodes wat soek vir oplossings word normaalweg gebruik om botsvrye paaie vir ’n groep agente te vind. Hierdie sentrale soekmetodes sukkel om vinnige oplossings te vind wanneer die probleem groot raak. Dit maak dat hierdie metodes moeilik skaleer en nie in staat is om intydse oplossings te vind vir grootskaalse probleme nie. ’n Multi-Agent Versterkingsleer (MAVL) benadering is potensieel meer skaleerbaar vir hierdie probleem, met die vermoë om intydse oplossings te gee. Ons ondersoek die toepas- likheid van verskeie MAVL benaderinge om die Multi-Agent Pad Vind (MAPV) probleem op te los. Hierdie word gedoen deur die empiriese evaluering van verskeie MAVL algoritmes op ’n blokkieswêreld simulasie-omgewing. Ons ondersoek ook nabootsleer metodes, sowel as metodes wat gebruik word in verwante werk, wat beide versterkingsleer, sowel as nabootsleer gebruik. Laastens word die suksesvolste MAVL metode en ’n konvensionele gesentraliseerde padbeplanner met mekaar vergelyk om die voordele en nadele tussen hierdie twee benaderinge te bepaal. Ons vind dat wanneer net versterkingsleer gebruik word, kan daar nie geskaleer word tot groot gedeeltelik waarneembare omgewings nie. Daar is gevind dat data wat deur ’n kundige gegenereer is, noodsaaklik is vir die afrig van suksesvolle agente. In hierdie verband het die gebruik van gedrags-nabootsing die beste resultate gegee. Met die vergelyking van hierdie metodes en die gesentraliseerde padbeplanner, is daar gevind dat diepleer metodes meer skaleerbaar is wanneer daar baie agente is, en wanneer die digtheid van voorwerpe in die omgewing laag is. Diepleer metodes kon egter nie botsings tussen agente voorkom nie. Daarom is diepleer metodes net geskik vir die MAPV probleem wanneer botings toe- laatbaar is. In hierdie geval is diepleer metodes meer skaleerbaar as die gesentraliseerde padbeplanner, wanneer die hoeveelheid agente in die omgewing toeneem. Masters 2021-10-12T10:06:56Z 2021-12-22T14:16:46Z 2021-10-12T10:06:56Z 2021-12-22T14:16:46Z 2021-12 Thesis http://hdl.handle.net/10019.1/123703 en_ZA Stellenbosch University 187 pages application/pdf Stellenbosch : Stellenbosch University
spellingShingle Navigation systems
Multiagent systems
Reinforcement Learning
Path finding
UCTD
Ellis, James
Multi-agent path finding with reinforcement learning
title Multi-agent path finding with reinforcement learning
title_full Multi-agent path finding with reinforcement learning
title_fullStr Multi-agent path finding with reinforcement learning
title_full_unstemmed Multi-agent path finding with reinforcement learning
title_short Multi-agent path finding with reinforcement learning
title_sort multi agent path finding with reinforcement learning
topic Navigation systems
Multiagent systems
Reinforcement Learning
Path finding
UCTD
url http://hdl.handle.net/10019.1/123703
work_keys_str_mv AT ellisjames multiagentpathfindingwithreinforcementlearning