Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Thesis (MEng)--Stellenbosch University, 2023.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | en_ZA en_ZA |
| Published: |
Stellenbosch : Stellenbosch University
2023
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867614013082304512 |
|---|---|
| access_status_str | Open Access |
| author | Jordaan, Ruan |
| author2 | Schmidt-Dumont, Thorsten |
| author_browse | Jordaan, Ruan Schmidt-Dumont, Thorsten |
| author_facet | Schmidt-Dumont, Thorsten Jordaan, Ruan |
| author_sort | Jordaan, Ruan |
| collection | Thesis |
| dc_rights_str_mv | Stellenbosch University |
| description | Thesis (MEng)--Stellenbosch University, 2023. |
| format | Thesis |
| id | oai:scholar.sun.ac.za:10019.1/127173 |
| institution | Stellenbosch University (South Africa) |
| language | en_ZA en_ZA |
| last_indexed | 2026-06-10T12:45:16.097Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository |
| publishDate | 2023 |
| publishDateRange | 2023 |
| publishDateSort | 2023 |
| publisher | Stellenbosch : Stellenbosch University |
| publisherStr | Stellenbosch : Stellenbosch University |
| record_format | dspace |
| source_str | SUNScholar — Stellenbosch University Repository |
| spelling | oai:scholar.sun.ac.za:10019.1/127173 Exploration of reinforcement learning algorithms implemented in the Wrapslide environment Jordaan, Ruan Schmidt-Dumont, Thorsten Nel, Gerhardus Stephanus Stellenbosch University. Faculty of Engineering. Dept. of Industrial Engineering. Machine learning Reinforcement learning Puzzles Neural networks (Computer science) Computational grids (Computer systems) Thesis (MEng)--Stellenbosch University, 2023. ENGLISH ABSTRACT: Machine learning has seen unprecedented popularity in recent years with new techniques and algorithms being developed by institutions worldwide. As such, it can often be a notable challenge to implement a machine learning approach to a specific scenario. The aim of this project is to elucidate the process of solving problems using machine learning through the implementation of various reinforcement learning agents to solve the Wrapslide puzzle. In this project, a literature review is conducted pertaining to a variety of facets of machine learning. This review includes both broad topics such as general reinforcement and supervised machine learning algorithms, as well as more nuanced aspects such as activation functions in neural networks and optimisation methods. Following this review, a literature review pertaining to the optimisation of the hyper-parameters of machine learning algorithms was conducted. The the scope is then narrowed to the implementation of reinforcement learning algorithms, specifically Q-learning. Various Q-learning implementations from basic tabular to complex deep Q-learning with convolutional, recurrent layers and attention are evaluated and compared in the WrapSlide environment. The results of these evaluations are then statistically analysed and discussed in accordance with an appropriate statistical testing procedure formulated from the literature. The WrapSlide environment simulates the toroidal puzzle game, WrapSlide, in Python. This game functions similar to a Rubik’s cube in the sense that it comprises a grid of coloured squares which much be sorted to reach a solved state. WrapSlide differs from a Rubik’s cube in that it is represented as a two-dimensional grid where the coloured blocks of the grid must be sorted into quadrants. The puzzle grid can vary in both size and number of different colours. In this project, grid sizes of 4 × 4, 6 × 6, and 8 × 8 with two, three and four colours are used to test the different implementations’ ability to deal with problems of varying complexity. The project concluded from numerical results obtained through the statistical testing of the performance of the implemented reinforcement learning algorithms, that there was not a single model capable of solving each incarnation of the Wrapslide puzzle optimally. It was concluded that the long short term memory based deep Q-learning algorithms generally performed better than the other implemented model, only being outperformed by the convolutional deep Q-learning algorithm for the relatively complex 8 × 8 two-colour puzzle. None of the puzzlesolving agents were capable of solving the 6 ×6 three-, or four-colour puzzles or the 8 ×8 three-, or four-colour puzzles. AFRIKAANS OPSOMMING: Masjienleer het die afgelope paar jaar ongekende gewildheid gesien met nuwe tegnieke en algoritmes wat wˆereldwyd deur instellings ontwikkel is. As sodanig kan dit dikwels ’n noemenswaardige uitdaging wees om ’n masjienleerbenadering vir ’n spesifieke scenario te implementeer. Die doel van hierdie projek is om die proses om probleme op te los met behulp van masjienleer toe te lig deur die implementering van verskeie versterkingsleermiddels om die Wrapslide-raaisel op te los. In hierdie projek word ’n literatuuroorsig gedoen wat betrekking het op ’n verskeidenheid fasette van masjienleer. Hierdie oorsig sluit beide bre¨e onderwerpe in soos algemene versterking en masjienleeralgoritmes onder toesig, sowel as meer genuanseerde aspekte soos aktiveringsfunksies in neurale netwerke en optimaliseringsmetodes. Na aanleiding van hierdie oorsig is ’n literatuuroorsig met betrekking tot die optimalisering van die hiperparameters van masjienleeralgoritmes uitgevoer. Die omvang word dan vernou tot die implementering van versterkingsleeralgoritmes, spesifiek Q-leer. Verskeie Q-leer-implementerings van basiese tabelleer tot komplekse diep Q-leer met konvolusionele, herhalende lae en aandag word in die WrapSlide-omgewing ge¨evalueer en vergelyk. Die resultate van hierdie evaluasies word dan statisties ontleed en bespreek in ooreenstemming met ’n toepaslike statistiese toetsprosedure wat uit die literatuur geformuleer is. Die WrapSlide-omgewing simuleer die toro¨ıdale legkaartspeletjie, WrapSlide, in Python. Hierdie speletjie funksioneer soortgelyk aan ’n Rubik se kubus in die sin dat dit ’n rooster van gekleurde blokkies bevat wat baie gesorteer word om ’n opgeloste toestand te bereik. WrapSlide verskil van ’n Rubik se kubus deurdat dit voorgestel word as ’n tweedimensionele rooster waar die gekleurde blokke van die rooster in kwadrante gesorteer moet word. Die legkaartrooster kan in beide grootte en aantal verskillende kleure verskil. In hierdie projek word roostergroottes van 4 × 4, 6 × 6, en 8 × 8 met twee, drie en vier kleure gebruik om die verskillende implementerings se vermo¨e om probleme van wisselende kompleksiteit te hanteer, te toets. Die projek het uit numeriese resultate verkry deur die statistiese toetsing van die werkverrigting van die ge¨ımplementeerde versterkingsleeralgoritmes tot die gevolgtrekking gekom dat daar nie ’n enkele model was wat in staat was om elke inkarnasie van die Wrapslide-raaisel optimaal op te los nie. Daar is tot die gevolgtrekking gekom dat die lang-korttermyngeheue-gebaseerde diep Q-leeralgoritmes oor die algemeen beter presteer het as die ander ge¨ımplementeerde model, net beter gevaar het deur die konvolusionele diep Q-learning-algoritme vir die relatief komplekse 8 × 8 two- kleur legkaart. Nie een van die raaisels oplossende agente was in staat om die 6 × 6 drie- of vierkleurraaisels of die 8 × 8 drie- of vierkleurraaisels op te los nie. Masters 2023-02-13T15:20:55Z 2023-05-18T07:08:05Z 2023-02-13T15:20:55Z 2023-05-18T07:08:05Z 2023-03-01 Thesis http://hdl.handle.net/10019.1/127173 en_ZA en_ZA Stellenbosch University xxvi, 131 pages : illustrations application/pdf Stellenbosch : Stellenbosch University |
| spellingShingle | Machine learning Reinforcement learning Puzzles Neural networks (Computer science) Computational grids (Computer systems) Jordaan, Ruan Exploration of reinforcement learning algorithms implemented in the Wrapslide environment |
| title | Exploration of reinforcement learning algorithms implemented in the Wrapslide environment |
| title_full | Exploration of reinforcement learning algorithms implemented in the Wrapslide environment |
| title_fullStr | Exploration of reinforcement learning algorithms implemented in the Wrapslide environment |
| title_full_unstemmed | Exploration of reinforcement learning algorithms implemented in the Wrapslide environment |
| title_short | Exploration of reinforcement learning algorithms implemented in the Wrapslide environment |
| title_sort | exploration of reinforcement learning algorithms implemented in the wrapslide environment |
| topic | Machine learning Reinforcement learning Puzzles Neural networks (Computer science) Computational grids (Computer systems) |
| url | http://hdl.handle.net/10019.1/127173 |
| work_keys_str_mv | AT jordaanruan explorationofreinforcementlearningalgorithmsimplementedinthewrapslideenvironment |