Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Reinforcement learning : theory, methods and application to decision support systems

Thesis (MSc (Applied Mathematics))--University of Stellenbosch, 2010.

Saved in:

Bibliographic Details
Main Author:	Mouton, Hildegarde Suzanne
Other Authors:	Herbst, B. M.
Format:	Thesis
Language:	English
Published:	Stellenbosch : University of Stellenbosch 2010
Subjects:	Reinforcement learning Decision support systems Dissertations > Applied mathematics Theses > Applied mathematics Monte Carlo methods Temporal-difference learning
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613763124854784
access_status_str	Open Access
author	Mouton, Hildegarde Suzanne
author2	Herbst, B. M.
author_browse	Herbst, B. M. Mouton, Hildegarde Suzanne
author_facet	Herbst, B. M. Mouton, Hildegarde Suzanne
author_sort	Mouton, Hildegarde Suzanne
collection	Thesis
dc_rights_str_mv	University of Stellenbosch
description	Thesis (MSc (Applied Mathematics))--University of Stellenbosch, 2010.
format	Thesis
id	oai:scholar.sun.ac.za:10019.1/5304
institution	Stellenbosch University (South Africa)
language	English
last_indexed	2026-06-10T12:41:18.607Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate	2010
publishDateRange	2010
publishDateSort	2010
publisher	Stellenbosch : University of Stellenbosch
publisherStr	Stellenbosch : University of Stellenbosch
record_format	dspace
source_str	SUNScholar — Stellenbosch University Repository
spelling	oai:scholar.sun.ac.za:10019.1/5304 Reinforcement learning : theory, methods and application to decision support systems Mouton, Hildegarde Suzanne Herbst, B. M. Roodt, J. H. S. University of Stellenbosch. Faculty of Science. Dept. of Mathematical Sciences. Applied Mathematics. Reinforcement learning Decision support systems Dissertations -- Applied mathematics Theses -- Applied mathematics Monte Carlo methods Temporal-difference learning Thesis (MSc (Applied Mathematics))--University of Stellenbosch, 2010. ENGLISH ABSTRACT: In this dissertation we study the machine learning subfield of Reinforcement Learning (RL). After developing a coherent background, we apply a Monte Carlo (MC) control algorithm with exploring starts (MCES), as well as an off-policy Temporal-Difference (TD) learning control algorithm, Q-learning, to a simplified version of the Weapon Assignment (WA) problem. For the MCES control algorithm, a discount parameter of τ = 1 is used. This gives very promising results when applied to 7 × 7 grids, as well as 71 × 71 grids. The same discount parameter cannot be applied to the Q-learning algorithm, as it causes the Q-values to diverge. We take a greedy approach, setting ε = 0, and vary the learning rate (α ) and the discount parameter (τ). Experimentation shows that the best results are found with set to 0.1 and constrained in the region 0.4 ≤ τ ≤ 0.7. The MC control algorithm with exploring starts gives promising results when applied to the WA problem. It performs significantly better than the off-policy TD algorithm, Q-learning, even though it is almost twice as slow. The modern battlefield is a fast paced, information rich environment, where discovery of intent, situation awareness and the rapid evolution of concepts of operation and doctrine are critical success factors. Combining the techniques investigated and tested in this work with other techniques in Artificial Intelligence (AI) and modern computational techniques may hold the key to solving some of the problems we now face in warfare. AFRIKAANSE OPSOMMING: Die fokus van hierdie verhandeling is die masjienleer-algoritmes in die veld van versterkingsleer. ’n Koherente agtergrond van die veld word gevolg deur die toepassing van ’n Monte Carlo (MC) beheer-algoritme met ondersoekende begintoestande, sowel as ’n afbeleid Temporale-Verskil beheer-algoritme, Q-leer, op ’n vereenvoudigde weergawe van die wapentoekenningsprobleem. Vir die MC beheer-algoritme word ’n afslagparameter van τ = 1 gebruik. Dit lewer belowende resultate wanneer toegepas op 7 × 7 roosters, asook op 71 × 71 roosters. Dieselfde afslagparameter kan nie op die Q-leer algoritme toegepas word nie, aangesien dit veroorsaak dat die Q-waardes divergeer. Ons neem ’n gulsige aanslag deur die gulsigheidsparameter te verstel na ε = 0. Ons varieer dan die leertempo ( α) en die afslagparameter (τ). Die beste eksperimentele resultate is behaal wanneer = 0.1 en as die afslagparameter vasgehou word in die gebied 0.4 ≤ τ ≤ 0.7. Die MC beheer-algoritme lewer belowende resultate wanneer toegepas op die wapentoekenningsprobleem. Dit lewer beduidend beter resultate as die Q-leer algoritme, al neem dit omtrent twee keer so lank om uit te voer. Die moderne slagveld is ’n omgewing ryk aan inligting, waar dit kritiek belangrik is om vinnig die vyand se planne te verstaan, om bedag te wees op die omgewing en die konteks van gebeure, en waar die snelle ontwikkeling van die konsepte van operasie en doktrine lei tot sukses. Die tegniekes wat in die verhandeling ondersoek en getoets is, en ander kunsmatige intelligensie tegnieke en moderne berekeningstegnieke saamgesnoer, mag dalk die sleutel hou tot die oplossing van die probleme wat ons tans in die gesig staar in oorlogvoering. 2010-09-21T09:50:56Z 2010-12-15T10:31:48Z 2010-09-21T09:50:56Z 2010-12-15T10:31:48Z 2010-12 Thesis http://hdl.handle.net/10019.1/5304 en University of Stellenbosch 142 p. : ill. application/pdf Stellenbosch : University of Stellenbosch
spellingShingle	Reinforcement learning Decision support systems Dissertations -- Applied mathematics Theses -- Applied mathematics Monte Carlo methods Temporal-difference learning Mouton, Hildegarde Suzanne Reinforcement learning : theory, methods and application to decision support systems
title	Reinforcement learning : theory, methods and application to decision support systems
title_full	Reinforcement learning : theory, methods and application to decision support systems
title_fullStr	Reinforcement learning : theory, methods and application to decision support systems
title_full_unstemmed	Reinforcement learning : theory, methods and application to decision support systems
title_short	Reinforcement learning : theory, methods and application to decision support systems
title_sort	reinforcement learning theory methods and application to decision support systems
topic	Reinforcement learning Decision support systems Dissertations -- Applied mathematics Theses -- Applied mathematics Monte Carlo methods Temporal-difference learning
url	http://hdl.handle.net/10019.1/5304
work_keys_str_mv	AT moutonhildegardesuzanne reinforcementlearningtheorymethodsandapplicationtodecisionsupportsystems

Full Text Available

Reinforcement learning : theory, methods and application to decision support systems

Similar Items