Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MEng)--Stellenbosch University, 2021.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | en_ZA |
| Published: |
Stellenbosch : Stellenbosch University
2021
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867614024334573568 |
|---|---|
| access_status_str | Open Access |
| author | Ellis, James |
| author2 | Engelbrecht, Herman |
| author_browse | Ellis, James Engelbrecht, Herman |
| author_facet | Engelbrecht, Herman Ellis, James |
| author_sort | Ellis, James |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (MEng)--Stellenbosch University, 2021. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/123703 |
| institution | Stellenbosch University (South Africa) |
| language | en_ZA |
| last_indexed | 2026-06-10T12:45:27.799Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2021 |
| publishDateRange | 2021 |
| publishDateSort | 2021 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/123703 Multi-agent path finding with reinforcement learning Ellis, James Engelbrecht, Herman Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Navigation systems Multiagent systems Reinforcement Learning Path finding UCTD Thesis (MEng)--Stellenbosch University, 2021. ENGLISH ABSTRACT: Navigation systems are becoming larger, with the need to find solutions in real time. Search based approaches are normally used to find collision free paths for a group of agents. These centralised search based approaches struggle to scale to large problem settings when solutions are required in real time. Multi-Agent Reinforcement Learning (MARL) approaches could potentially provide a more scalable solution than search based approaches, together with the ability to execute in real time once trained. We investigate the applicability of different MARL approaches towards solving the Multi-Agent Path Finding (MAPF) problem. This is done by empirical evalua- tion of several state-of-the-art MARL algorithms on a gridworld simulation environment of the MAPF problem. We also consider an imitation learning approach and an approach from related work which uses both reinforcement learning and imitation learning. Lastly, a comparison is done between successful MARL approaches and a conventional centralised path planner. From this comparisons, the advantages and disadvantages between Deep MARL approaches and that of using a centralised planner is determined. AFRIKAANSE OPSOMMING: Uittreksel Navigasiestelsels raak groter, met die behoefte om intyds oplossings te vind. Metodes wat soek vir oplossings word normaalweg gebruik om botsvrye paaie vir ’n groep agente te vind. Hierdie sentrale soekmetodes sukkel om vinnige oplossings te vind wanneer die probleem groot raak. Dit maak dat hierdie metodes moeilik skaleer en nie in staat is om intydse oplossings te vind vir grootskaalse probleme nie. ’n Multi-Agent Versterkingsleer (MAVL) benadering is potensieel meer skaleerbaar vir hierdie probleem, met die vermoë om intydse oplossings te gee. Ons ondersoek die toepas- likheid van verskeie MAVL benaderinge om die Multi-Agent Pad Vind (MAPV) probleem op te los. Hierdie word gedoen deur die empiriese evaluering van verskeie MAVL algoritmes op ’n blokkieswêreld simulasie-omgewing. Ons ondersoek ook nabootsleer metodes, sowel as metodes wat gebruik word in verwante werk, wat beide versterkingsleer, sowel as nabootsleer gebruik. Laastens word die suksesvolste MAVL metode en ’n konvensionele gesentraliseerde padbeplanner met mekaar vergelyk om die voordele en nadele tussen hierdie twee benaderinge te bepaal. Ons vind dat wanneer net versterkingsleer gebruik word, kan daar nie geskaleer word tot groot gedeeltelik waarneembare omgewings nie. Daar is gevind dat data wat deur ’n kundige gegenereer is, noodsaaklik is vir die afrig van suksesvolle agente. In hierdie verband het die gebruik van gedrags-nabootsing die beste resultate gegee. Met die vergelyking van hierdie metodes en die gesentraliseerde padbeplanner, is daar gevind dat diepleer metodes meer skaleerbaar is wanneer daar baie agente is, en wanneer die digtheid van voorwerpe in die omgewing laag is. Diepleer metodes kon egter nie botsings tussen agente voorkom nie. Daarom is diepleer metodes net geskik vir die MAPV probleem wanneer botings toe- laatbaar is. In hierdie geval is diepleer metodes meer skaleerbaar as die gesentraliseerde padbeplanner, wanneer die hoeveelheid agente in die omgewing toeneem. Masters 2021-10-12T10:06:56Z 2021-12-22T14:16:46Z 2021-10-12T10:06:56Z 2021-12-22T14:16:46Z 2021-12 Thesis http://hdl.handle.net/10019.1/123703 en_ZA Stellenbosch University 187 pages application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Navigation systems Multiagent systems Reinforcement Learning Path finding UCTD Ellis, James Multi-agent path finding with reinforcement learning |
| title | Multi-agent path finding with reinforcement learning |
| title_full | Multi-agent path finding with reinforcement learning |
| title_fullStr | Multi-agent path finding with reinforcement learning |
| title_full_unstemmed | Multi-agent path finding with reinforcement learning |
| title_short | Multi-agent path finding with reinforcement learning |
| title_sort | multi agent path finding with reinforcement learning |
| topic | Navigation systems Multiagent systems Reinforcement Learning Path finding UCTD |
| url | http://hdl.handle.net/10019.1/123703 |
| work_keys_str_mv | AT ellisjames multiagentpathfindingwithreinforcementlearning |