Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Reinforcement learning : theory, methods and application to decision support systems

Thesis (MSc (Applied Mathematics))--University of Stellenbosch, 2010.

Saved in:
Bibliographic Details
Main Author: Mouton, Hildegarde Suzanne
Other Authors: Herbst, B. M.
Format: Thesis
Language:English
Published: Stellenbosch : University of Stellenbosch 2010
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613763124854784
access_status_str Open Access
author Mouton, Hildegarde Suzanne
author2 Herbst, B. M.
author_browse Herbst, B. M.
Mouton, Hildegarde Suzanne
author_facet Herbst, B. M.
Mouton, Hildegarde Suzanne
author_sort Mouton, Hildegarde Suzanne
collection Thesis
dc_rights_str_mv University of Stellenbosch
description Thesis (MSc (Applied Mathematics))--University of Stellenbosch, 2010.
format Thesis
id oai:scholar.sun.ac.za:10019.1/5304
institution Stellenbosch University (South Africa)
language English
last_indexed 2026-06-10T12:41:18.607Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2010
publishDateRange 2010
publishDateSort 2010
publisher Stellenbosch : University of Stellenbosch
publisherStr Stellenbosch : University of Stellenbosch
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/5304 Reinforcement learning : theory, methods and application to decision support systems Mouton, Hildegarde Suzanne Herbst, B. M. Roodt, J. H. S. University of Stellenbosch. Faculty of Science. Dept. of Mathematical Sciences. Applied Mathematics. Reinforcement learning Decision support systems Dissertations -- Applied mathematics Theses -- Applied mathematics Monte Carlo methods Temporal-difference learning Thesis (MSc (Applied Mathematics))--University of Stellenbosch, 2010. ENGLISH ABSTRACT: In this dissertation we study the machine learning subfield of Reinforcement Learning (RL). After developing a coherent background, we apply a Monte Carlo (MC) control algorithm with exploring starts (MCES), as well as an off-policy Temporal-Difference (TD) learning control algorithm, Q-learning, to a simplified version of the Weapon Assignment (WA) problem. For the MCES control algorithm, a discount parameter of τ = 1 is used. This gives very promising results when applied to 7 × 7 grids, as well as 71 × 71 grids. The same discount parameter cannot be applied to the Q-learning algorithm, as it causes the Q-values to diverge. We take a greedy approach, setting ε = 0, and vary the learning rate (α ) and the discount parameter (τ). Experimentation shows that the best results are found with set to 0.1 and constrained in the region 0.4 ≤ τ ≤ 0.7. The MC control algorithm with exploring starts gives promising results when applied to the WA problem. It performs significantly better than the off-policy TD algorithm, Q-learning, even though it is almost twice as slow. The modern battlefield is a fast paced, information rich environment, where discovery of intent, situation awareness and the rapid evolution of concepts of operation and doctrine are critical success factors. Combining the techniques investigated and tested in this work with other techniques in Artificial Intelligence (AI) and modern computational techniques may hold the key to solving some of the problems we now face in warfare. AFRIKAANSE OPSOMMING: Die fokus van hierdie verhandeling is die masjienleer-algoritmes in die veld van versterkingsleer. ’n Koherente agtergrond van die veld word gevolg deur die toepassing van ’n Monte Carlo (MC) beheer-algoritme met ondersoekende begintoestande, sowel as ’n afbeleid Temporale-Verskil beheer-algoritme, Q-leer, op ’n vereenvoudigde weergawe van die wapentoekenningsprobleem. Vir die MC beheer-algoritme word ’n afslagparameter van τ = 1 gebruik. Dit lewer belowende resultate wanneer toegepas op 7 × 7 roosters, asook op 71 × 71 roosters. Dieselfde afslagparameter kan nie op die Q-leer algoritme toegepas word nie, aangesien dit veroorsaak dat die Q-waardes divergeer. Ons neem ’n gulsige aanslag deur die gulsigheidsparameter te verstel na ε = 0. Ons varieer dan die leertempo ( α) en die afslagparameter (τ). Die beste eksperimentele resultate is behaal wanneer = 0.1 en as die afslagparameter vasgehou word in die gebied 0.4 ≤ τ ≤ 0.7. Die MC beheer-algoritme lewer belowende resultate wanneer toegepas op die wapentoekenningsprobleem. Dit lewer beduidend beter resultate as die Q-leer algoritme, al neem dit omtrent twee keer so lank om uit te voer. Die moderne slagveld is ’n omgewing ryk aan inligting, waar dit kritiek belangrik is om vinnig die vyand se planne te verstaan, om bedag te wees op die omgewing en die konteks van gebeure, en waar die snelle ontwikkeling van die konsepte van operasie en doktrine lei tot sukses. Die tegniekes wat in die verhandeling ondersoek en getoets is, en ander kunsmatige intelligensie tegnieke en moderne berekeningstegnieke saamgesnoer, mag dalk die sleutel hou tot die oplossing van die probleme wat ons tans in die gesig staar in oorlogvoering. 2010-09-21T09:50:56Z 2010-12-15T10:31:48Z 2010-09-21T09:50:56Z 2010-12-15T10:31:48Z 2010-12 Thesis http://hdl.handle.net/10019.1/5304 en University of Stellenbosch 142 p. : ill. application/pdf Stellenbosch : University of Stellenbosch
spellingShingle Reinforcement learning
Decision support systems
Dissertations -- Applied mathematics
Theses -- Applied mathematics
Monte Carlo methods
Temporal-difference learning
Mouton, Hildegarde Suzanne
Reinforcement learning : theory, methods and application to decision support systems
title Reinforcement learning : theory, methods and application to decision support systems
title_full Reinforcement learning : theory, methods and application to decision support systems
title_fullStr Reinforcement learning : theory, methods and application to decision support systems
title_full_unstemmed Reinforcement learning : theory, methods and application to decision support systems
title_short Reinforcement learning : theory, methods and application to decision support systems
title_sort reinforcement learning theory methods and application to decision support systems
topic Reinforcement learning
Decision support systems
Dissertations -- Applied mathematics
Theses -- Applied mathematics
Monte Carlo methods
Temporal-difference learning
url http://hdl.handle.net/10019.1/5304
work_keys_str_mv AT moutonhildegardesuzanne reinforcementlearningtheorymethodsandapplicationtodecisionsupportsystems