Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies

In this work we investigate the convergence of multiagent soft Q-learning in continuous games where learning is most likely to be affected by relative overgeneralisation. While this will occur more often in multiagent independent learner problems, it is present in joint-learner problems when informa...

Full description

Saved in:

Bibliographic Details
Main Author:	Danisa, Siphelele
Other Authors:	Shock, Jonathan
Format:	Thesis
Language:	English
Published:	Department of Mathematics and Applied Mathematics 2023
Subjects:	Pure and Applied Mathematics
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613177193168896
access_status_str	Open Access
author	Danisa, Siphelele
author2	Shock, Jonathan
author_browse	Danisa, Siphelele Shock, Jonathan
author_facet	Shock, Jonathan Danisa, Siphelele
author_sort	Danisa, Siphelele
collection	Thesis
description	In this work we investigate the convergence of multiagent soft Q-learning in continuous games where learning is most likely to be affected by relative overgeneralisation. While this will occur more often in multiagent independent learner problems, it is present in joint-learner problems when information is not used efficiently in the learning process. We first investigate the effect of different samplers and modern strategies of training and evaluating energy-based models on learning to get a sense of whether the pitfall is due to sampling inefficiencies or underlying assumptions of the multiagent soft Q-learning extension (MASQL). We use the word sampler to refer to mechanisms that allow one to get samples from a given (target) distribution. After having understood this pitfall better, we develop opponent modelling approaches with mutual information regularisation. We find that while the former (the use of efficient samplers) is not as helpful as one would wish, the latter (opponent modelling with mutual information regularisation) offers new insights into the required mechanism to solve our problem. The domain in which we work is called the Max of Two Quadratics differential game where two agents need to coordinate in a non-convex landscape, and where learning is impacted by the mentioned pathology, relative overgeneralisation. We close this research investigation by offering a principled prescription on how to best extend single-agent energy-based approaches to multiple agents, which is a novel direction.
format	Thesis
id	oai:open.uct.ac.za:11427/37110
institution	University of Cape Town (South Africa)
language	eng
last_indexed	2026-06-10T12:31:58.458Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate	2023
publishDateRange	2023
publishDateSort	2023
publisher	Department of Mathematics and Applied Mathematics
publisherStr	Department of Mathematics and Applied Mathematics
record_format	dspace
source_str	UCTD — University of Cape Town Open Access Repository
spelling	oai:open.uct.ac.za:11427/37110 Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies Danisa, Siphelele Shock, Jonathan Pretorius, Arnu Pure and Applied Mathematics In this work we investigate the convergence of multiagent soft Q-learning in continuous games where learning is most likely to be affected by relative overgeneralisation. While this will occur more often in multiagent independent learner problems, it is present in joint-learner problems when information is not used efficiently in the learning process. We first investigate the effect of different samplers and modern strategies of training and evaluating energy-based models on learning to get a sense of whether the pitfall is due to sampling inefficiencies or underlying assumptions of the multiagent soft Q-learning extension (MASQL). We use the word sampler to refer to mechanisms that allow one to get samples from a given (target) distribution. After having understood this pitfall better, we develop opponent modelling approaches with mutual information regularisation. We find that while the former (the use of efficient samplers) is not as helpful as one would wish, the latter (opponent modelling with mutual information regularisation) offers new insights into the required mechanism to solve our problem. The domain in which we work is called the Max of Two Quadratics differential game where two agents need to coordinate in a non-convex landscape, and where learning is impacted by the mentioned pathology, relative overgeneralisation. We close this research investigation by offering a principled prescription on how to best extend single-agent energy-based approaches to multiple agents, which is a novel direction. 2023-03-02T08:14:43Z 2023-03-02T08:14:43Z 2022 2023-02-20T12:31:33Z Master Thesis Masters MSc http://hdl.handle.net/11427/37110 eng application/pdf Department of Mathematics and Applied Mathematics Faculty of Science
spellingShingle	Pure and Applied Mathematics Danisa, Siphelele Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
thesis_degree_str	Master's
title	Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
title_full	Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
title_fullStr	Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
title_full_unstemmed	Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
title_short	Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies
title_sort	learning to coordinate efficiently through multiagent soft q learning in the presence of game theoretic pathologies
topic	Pure and Applied Mathematics
url	http://hdl.handle.net/11427/37110
work_keys_str_mv	AT danisasiphelele learningtocoordinateefficientlythroughmultiagentsoftqlearninginthepresenceofgametheoreticpathologies

Full Text Available

Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies

Similar Items