Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies

In this work we investigate the convergence of multiagent soft Q-learning in continuous games where learning is most likely to be affected by relative overgeneralisation. While this will occur more often in multiagent independent learner problems, it is present in joint-learner problems when informa...

Full description

Saved in:
Bibliographic Details
Main Author: Danisa, Siphelele
Other Authors: Shock, Jonathan
Format: Thesis
Language:English
Published: Department of Mathematics and Applied Mathematics 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613177193168896
access_status_str Open Access
author Danisa, Siphelele
author2 Shock, Jonathan
author_browse Danisa, Siphelele
Shock, Jonathan
author_facet Shock, Jonathan
Danisa, Siphelele
author_sort Danisa, Siphelele
collection Thesis
description In this work we investigate the convergence of multiagent soft Q-learning in continuous games where learning is most likely to be affected by relative overgeneralisation. While this will occur more often in multiagent independent learner problems, it is present in joint-learner problems when information is not used efficiently in the learning process. We first investigate the effect of different samplers and modern strategies of training and evaluating energy-based models on learning to get a sense of whether the pitfall is due to sampling inefficiencies or underlying assumptions of the multiagent soft Q-learning extension (MASQL). We use the word sampler to refer to mechanisms that allow one to get samples from a given (target) distribution. After having understood this pitfall better, we develop opponent modelling approaches with mutual information regularisation. We find that while the former (the use of efficient samplers) is not as helpful as one would wish, the latter (opponent modelling with mutual information regularisation) offers new insights into the required mechanism to solve our problem. The domain in which we work is called the Max of Two Quadratics differential game where two agents need to coordinate in a non-convex landscape, and where learning is impacted by the mentioned pathology, relative overgeneralisation. We close this research investigation by offering a principled prescription on how to best extend single-agent energy-based approaches to multiple agents, which is a novel direction.
format Thesis
id oai:open.uct.ac.za:11427/37110
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:31:58.458Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2023
publishDateRange 2023
publishDateSort 2023
publisher Department of Mathematics and Applied Mathematics
publisherStr Department of Mathematics and Applied Mathematics
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/37110 Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies Danisa, Siphelele Shock, Jonathan Pretorius, Arnu Pure and Applied Mathematics In this work we investigate the convergence of multiagent soft Q-learning in continuous games where learning is most likely to be affected by relative overgeneralisation. While this will occur more often in multiagent independent learner problems, it is present in joint-learner problems when information is not used efficiently in the learning process. We first investigate the effect of different samplers and modern strategies of training and evaluating energy-based models on learning to get a sense of whether the pitfall is due to sampling inefficiencies or underlying assumptions of the multiagent soft Q-learning extension (MASQL). We use the word sampler to refer to mechanisms that allow one to get samples from a given (target) distribution. After having understood this pitfall better, we develop opponent modelling approaches with mutual information regularisation. We find that while the former (the use of efficient samplers) is not as helpful as one would wish, the latter (opponent modelling with mutual information regularisation) offers new insights into the required mechanism to solve our problem. The domain in which we work is called the Max of Two Quadratics differential game where two agents need to coordinate in a non-convex landscape, and where learning is impacted by the mentioned pathology, relative overgeneralisation. We close this research investigation by offering a principled prescription on how to best extend single-agent energy-based approaches to multiple agents, which is a novel direction. 2023-03-02T08:14:43Z 2023-03-02T08:14:43Z 2022 2023-02-20T12:31:33Z Master Thesis Masters MSc http://hdl.handle.net/11427/37110 eng application/pdf Department of Mathematics and Applied Mathematics Faculty of Science
spellingShingle Pure and Applied Mathematics
Danisa, Siphelele
Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
thesis_degree_str Master's
title Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
title_full Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
title_fullStr Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
title_full_unstemmed Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
title_short Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
title_sort learning to coordinate efficiently through multiagent soft q learning in the presence of game theoretic pathologies
topic Pure and Applied Mathematics
url http://hdl.handle.net/11427/37110
work_keys_str_mv AT danisasiphelele learningtocoordinateefficientlythroughmultiagentsoftqlearninginthepresenceofgametheoreticpathologies