Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

A reinforcement learning approach to quadruped locomotion: framework, training, and evaluation

Cooke, L. 2025. A Reinforcement Learning Approach to Quadruped Locomotion: Framework, Training, and Evaluation. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/f97df9c5-e3d5-40ee-8e1c-198da8cd8abe

Saved in:
Bibliographic Details
Main Author: Cooke, Lauren
Other Authors: Fisher, C.
Format: Thesis
Published: Stellenbosch : Stellenbosch University 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614135523475456
access_status_str Open Access
author Cooke, Lauren
author2 Fisher, C.
author_browse Cooke, Lauren
Fisher, C.
author_facet Fisher, C.
Cooke, Lauren
author_sort Cooke, Lauren
collection Thesis
dc_rights_str_mv Stellenbosch University
description Cooke, L. 2025. A Reinforcement Learning Approach to Quadruped Locomotion: Framework, Training, and Evaluation. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/f97df9c5-e3d5-40ee-8e1c-198da8cd8abe
format Thesis
id oai:scholar.sun.ac.za:10019.1/132110
institution Stellenbosch University (South Africa)
last_indexed 2026-06-10T12:47:13.687Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/132110 A reinforcement learning approach to quadruped locomotion: framework, training, and evaluation Cooke, Lauren Fisher, C. Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering. Reinforcement learning Robots -- Control systems Quadrupedalism Algorithms UCTD Cooke, L. 2025. A Reinforcement Learning Approach to Quadruped Locomotion: Framework, Training, and Evaluation. Unpublished masters thesis. Stellenbosch: Stellenbosch University [online]. Available: https://scholar.sun.ac.za/items/f97df9c5-e3d5-40ee-8e1c-198da8cd8abe Thesis (MEng)--Stellenbosch University, 2025. ENGLISH ABSTRACT: The control of legged robots remains a significant challenge in robotics, particularly for applications requiring adaptation to complex, unknown environments. While deep reinforcement learning has emerged as a promising solution for developing sophisticated locomotion policies, current research often lacks implementation transparency and standardised evaluation methods. This thesis addresses these challenges by developing and documenting a simulation-based deep reinforcement learning framework for quadrupedal locomotion. The framework aims to generate, document, and evaluate several different quadruped gaits. The developed simulation environment uses the PyBullet physics engine and the Unitree A1 quadruped robot model; with training implemented through the PPO algorithm and structured according to the OpenAI Gymnasium standard. In literature, two main reinforcement learning action spaces exist: task space and joint space. The project framework employs the task space and shows that it increases sample efficiency compared to joint space approaches. Several other framework parameters were necessary for the production of desirable behaviours. Dynamics randomisation was observed to prevent policy exploitation and uneven terrain was necessary for encouraging foot-lifting behaviours. To enable meaningful comparison between policies that demonstrate the same gait, an evaluation method was introduced. This method combines a policy reward analysis, symmetry metrics, correlation coefficients and limit cycle detection to compare policy performance. This approach enables comparison between similar policies while providing insights into their underlying characteristics. However, visual verification remains essential to confirm the practicality of the policy and evaluation results. The framework successfully generated four quadrupedal gaits, including pacing, bounding, half-bounding, and pronking. The half-bound emerged as the most reproducible gait and demonstrated robustness across two terrain types and various disturbing forces. The highest terrain that all tested policies could traverse was 0.015 m which indicated that these policies cannot be applied to complex surroundings but are more applicable to flat terrain. Through the documentation of framework parameters and the development of an evaluation method, this work establishes a foundation for more reproducible and comparable research in reinforcement learning-based quadrupedal locomotion. The framework’s success suggests promising directions for future research, including physical validation, extension to additional locomotion skills and challenging terrain, as well as deeper investigation into the relationship between action space representation and achievable behaviours. AFRIKAANSE OPSOMMING: Die beheer van beenrobotte bly ’n beduidende uitdaging in robotika, veral vir toepassings in komplekse, onbekende omgewings. Terwyl diepversterkingsleer na vore gekom het as ’n belowende oplossing vir die ontwikkeling van gesofistikeerde bewegingsbeleide, het huidige navorsing dikwels ’n gebrek aan implementeringsdeursigtigheid en gestandaardiseerde evalueringsmetodes. Hierdie tesis spreek hierdie uitdagings aan deur ’n simulasie-gebaseerde diepversterkingsleerraamwerk vir ’n viervoetrobot voortbeweging te ontwikkel en te dokumenteer. Die raamwerk het ten doel om verskillende viervoetgange te genereer, te dokumenteer en te evalueer. Die ontwikkelde simulasie-omgewing gebruik die PyBullet-fisika-enjin en die Unitree A1-viervoetrobotmodel; met opleiding geïmplementeer deur die PPO-algoritme en gestruktureer volgens die OpenAI Gymnasium-standaard. In literatuur bestaan twee hoofversterkingsleeraksieruimtes: taakruimte en gesamentlike ruimte. Die projekraamwerk gebruik die taakruimte en toon dat dit monsterdoeltreffendheid verhoog in vergelyking met gesamentlike ruimtebenaderings. Verskeie ander raamwerkparameters was nodig vir die produksie van gewenste gedrag. Dinamika-randomisering is waargeneem om beleidsuitbuiting te voorkom en ongelyke terrein was nodig om voetoptel-gedrag aan te moedig. Om sinvolle vergelyking tussen beleide wat dieselfde gang demonstreer moontlik te maak, is ’n evalueringsmetode ingestel. Hierdie metode kombineer ’n beleidbeloninganalise, simmetrie-metrieke, korrelasiekoëffisiënte en limietsiklusopsporing om beleidsprestasie te vergelyk. Hierdie benadering maak vergelyking tussen soortgelyke beleide moontlik, terwyl dit insig in hul onderliggende kenmerke verskaf. Visuele verifikasie bly egter noodsaaklik om die uitvoerbaarheid van die beleid en evalueringsresultate te bevestig. Die raamwerk het suksesvol vier viervoetgange gegenereer, insluitend ‘pacing’, ‘bounding’, ’half-bounding’ en ‘pronking’. Die ‘half-bound’ het na vore gekom as die mees reproduceerbare gang en het robuustheid oor twee terreintipes en verskeie steurende kragte getoon. Die hoogste terrein wat alle getoetste polisse kon deurkruis was 0,015 m wat aangedui het dat hierdie polisse nie op komplekse omgewings toegepas kan word nie, Deur die dokumentasie van raamwerkparameters en die ontwikkeling van ’n evalueringsmetode, lê hierdie werk ’n grondslag vir meer reproduceerbare en vergelykbare navorsing in versterkingsleer-gebaseerde viervoetbeweging. Die sukses van die raamwerk dui op belowende rigtings vir toekomstige navorsing, insluitend fisiese validering, uitbreiding tot bykomende bewegingsvaardighede en uitdagende terrein, sowel as dieper ondersoek na die verband tussen aksieruimtevoorstelling en haalbare gedrag. Masters 2025-05-23T14:17:39Z 2025-05-23T14:17:39Z 2025-03 Thesis https://scholar.sun.ac.za/handle/10019.1/132110 Stellenbosch University xix, 204 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Reinforcement learning
Robots -- Control systems
Quadrupedalism
Algorithms
UCTD
Cooke, Lauren
A reinforcement learning approach to quadruped locomotion: framework, training, and evaluation
title A reinforcement learning approach to quadruped locomotion: framework, training, and evaluation
title_full A reinforcement learning approach to quadruped locomotion: framework, training, and evaluation
title_fullStr A reinforcement learning approach to quadruped locomotion: framework, training, and evaluation
title_full_unstemmed A reinforcement learning approach to quadruped locomotion: framework, training, and evaluation
title_short A reinforcement learning approach to quadruped locomotion: framework, training, and evaluation
title_sort reinforcement learning approach to quadruped locomotion framework training and evaluation
topic Reinforcement learning
Robots -- Control systems
Quadrupedalism
Algorithms
UCTD
url https://scholar.sun.ac.za/handle/10019.1/132110
work_keys_str_mv AT cookelauren areinforcementlearningapproachtoquadrupedlocomotionframeworktrainingandevaluation
AT cookelauren reinforcementlearningapproachtoquadrupedlocomotionframeworktrainingandevaluation