Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

End-to-End Autonomous Quadcopter using Reinforcement Learning

This thesis investigates the potential of Reinforcement Learning (RL) for achieving robust and adaptable quadcopter control, focusing on trajectory and attitude stabilization. We compare state-of-the-art RL algorithms, specifically Proximal Policy Optimization (PPO), against traditional Proportional...

Full description

Saved in:
Bibliographic Details
Main Author: Chawa, Mohamed Marwan
Format: Thesis
Published: AUC Knowledge Fountain 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613424193634304
access_status_str Open Access
author Chawa, Mohamed Marwan
author_browse Chawa, Mohamed Marwan
author_facet Chawa, Mohamed Marwan
author_sort Chawa, Mohamed Marwan
collection Thesis
description This thesis investigates the potential of Reinforcement Learning (RL) for achieving robust and adaptable quadcopter control, focusing on trajectory and attitude stabilization. We compare state-of-the-art RL algorithms, specifically Proximal Policy Optimization (PPO), against traditional Proportional-Integral-Derivative (PID) controllers across three tasks: hovering, slow trajectory following, and fast trajectory following. To enhance realism, we employ a modified PyFlyt simulation environment with a high-fidelity Crazyflie 2.x model, accounting for motor dynamics, noise, wind disturbances, and aerodynamic drag. The challenge of operating a quadcopter can be divided into two distinct parts: planning a flight path and actually following that path. Our focus is on the latter, training a reinforcement learning controller within a highly realistic simulated setting. This environment directly links simulated sensor data to motor commands, mimicking the real-world operation of the quadcopter. This end-to-end approach allows us to train the controller on the complete process, from sensing the environment to executing actions. Our results show that end-to-end RL-based controllers, consistently outperform both gain-scheduling and cascaded PID controllers in maintaining stable hover, particularly under strong wind conditions where RL controller acheived 70% of the average reward criteria while cascaded PID and gain-scheduling PID scored 45% and 24% respectively. Traditional PID controllers, while widely used, often struggle to maintain stability in the face of external disturbances and changing system dynamics. This is particularly evident in challenging scenarios such as strong winds, where their fixed gains may not be sufficient to counteract the destabilizing forces. RL, on the other hand, learns to adapt its control strategy based on the environment, making it more robust to such disturbances. In slow trajectory following, RL demonstrates a wider range of motion and smoother angular control, achieving performance comparable to cascaded PID. Specically, RL exhibited a 15% increase in the range of motion along the x and y axes compared to gain-scheduling PID, and a 10% reduction in angular velocity fluctuations compared to both PID controllers. This improvement can be attributed to RL’s ability to learn complex control policies that optimize for both position and attitude objectives simultaneously. For fast trajectory following, RL enables significantly faster navigation compared to slower scenarios. In our experiments, RL achieved more than a 60% reduction in the time taken to complete the trajectory compared to the slow trajectory following task. This highlights RL’s ability to exploit the full capabilities of the quadcopter’s actuators when the task demands it. These findings highlight the limitations of traditional PID controllers in dynamic environments and demonstrate the superior adaptability and robustness of RL-based controllers. The ability of RL to learn and adapt to changing conditions makes it a promising approach for quadcopter control in real-world applications where the environment is often unpredictable and disturbances are common. Future research will focus on generalizing these results across different platforms, establishing formal stability guarantees, and exploring real-world applications in complex tasks such as obstacle avoidance and collaborative flight. One of the key challenges in generalizing RL to different quadcopter platforms is the variation in physical parameters and dynamics. Addressing this challenge will require developing RL algorithms that can either adapt to these variations online or learn from a diverse set of training environments. Additionally, while RL has shown promising results in simulation, ensuring stability in real-world deployments remains an open research question. Future work will explore methods for providing formal stability guarantees for RL-based controllers. Finally, extending RL to complex tasks such as obstacle avoidance and collaborative flight will require addressing issues such as sensor limitations, partial observability, and multi-agent coordination. By addressing these challenges, RL has the potential to revolutionize quadcopter control, enabling the deployment of autonomous aerial vehicles in a wide range of applications that were previously considered too complex or dangerous for traditional control methods.
format Thesis
id oai:fount.aucegypt.edu:etds-3431
institution American University in Cairo (Egypt)
last_indexed 2026-06-10T12:35:55.364Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from AUC Knowledge Fountain — bepress
publishDate 2024
publishDateRange 2024
publishDateSort 2024
publisher AUC Knowledge Fountain
publisherStr AUC Knowledge Fountain
record_format dspace
source_str AUC Knowledge Fountain — bepress
spelling oai:fount.aucegypt.edu:etds-3431 End-to-End Autonomous Quadcopter using Reinforcement Learning Chawa, Mohamed Marwan This thesis investigates the potential of Reinforcement Learning (RL) for achieving robust and adaptable quadcopter control, focusing on trajectory and attitude stabilization. We compare state-of-the-art RL algorithms, specifically Proximal Policy Optimization (PPO), against traditional Proportional-Integral-Derivative (PID) controllers across three tasks: hovering, slow trajectory following, and fast trajectory following. To enhance realism, we employ a modified PyFlyt simulation environment with a high-fidelity Crazyflie 2.x model, accounting for motor dynamics, noise, wind disturbances, and aerodynamic drag. The challenge of operating a quadcopter can be divided into two distinct parts: planning a flight path and actually following that path. Our focus is on the latter, training a reinforcement learning controller within a highly realistic simulated setting. This environment directly links simulated sensor data to motor commands, mimicking the real-world operation of the quadcopter. This end-to-end approach allows us to train the controller on the complete process, from sensing the environment to executing actions. Our results show that end-to-end RL-based controllers, consistently outperform both gain-scheduling and cascaded PID controllers in maintaining stable hover, particularly under strong wind conditions where RL controller acheived 70% of the average reward criteria while cascaded PID and gain-scheduling PID scored 45% and 24% respectively. Traditional PID controllers, while widely used, often struggle to maintain stability in the face of external disturbances and changing system dynamics. This is particularly evident in challenging scenarios such as strong winds, where their fixed gains may not be sufficient to counteract the destabilizing forces. RL, on the other hand, learns to adapt its control strategy based on the environment, making it more robust to such disturbances. In slow trajectory following, RL demonstrates a wider range of motion and smoother angular control, achieving performance comparable to cascaded PID. Specically, RL exhibited a 15% increase in the range of motion along the x and y axes compared to gain-scheduling PID, and a 10% reduction in angular velocity fluctuations compared to both PID controllers. This improvement can be attributed to RL’s ability to learn complex control policies that optimize for both position and attitude objectives simultaneously. For fast trajectory following, RL enables significantly faster navigation compared to slower scenarios. In our experiments, RL achieved more than a 60% reduction in the time taken to complete the trajectory compared to the slow trajectory following task. This highlights RL’s ability to exploit the full capabilities of the quadcopter’s actuators when the task demands it. These findings highlight the limitations of traditional PID controllers in dynamic environments and demonstrate the superior adaptability and robustness of RL-based controllers. The ability of RL to learn and adapt to changing conditions makes it a promising approach for quadcopter control in real-world applications where the environment is often unpredictable and disturbances are common. Future research will focus on generalizing these results across different platforms, establishing formal stability guarantees, and exploring real-world applications in complex tasks such as obstacle avoidance and collaborative flight. One of the key challenges in generalizing RL to different quadcopter platforms is the variation in physical parameters and dynamics. Addressing this challenge will require developing RL algorithms that can either adapt to these variations online or learn from a diverse set of training environments. Additionally, while RL has shown promising results in simulation, ensuring stability in real-world deployments remains an open research question. Future work will explore methods for providing formal stability guarantees for RL-based controllers. Finally, extending RL to complex tasks such as obstacle avoidance and collaborative flight will require addressing issues such as sensor limitations, partial observability, and multi-agent coordination. By addressing these challenges, RL has the potential to revolutionize quadcopter control, enabling the deployment of autonomous aerial vehicles in a wide range of applications that were previously considered too complex or dangerous for traditional control methods. 2024-12-25T08:00:00Z thesis application/pdf https://fount.aucegypt.edu/etds/2421 https://fount.aucegypt.edu/context/etds/article/3431/viewcontent/main.pdf Theses and Dissertations AUC Knowledge Fountain Quadcopter Quadrotor UAV Reinforcement Learning Simulation-To-Real Acoustics, Dynamics, and Controls Robotics
spellingShingle Quadcopter
Quadrotor
UAV
Reinforcement Learning
Simulation-To-Real
Acoustics, Dynamics, and Controls
Robotics
Chawa, Mohamed Marwan
End-to-End Autonomous Quadcopter using Reinforcement Learning
title End-to-End Autonomous Quadcopter using Reinforcement Learning
title_full End-to-End Autonomous Quadcopter using Reinforcement Learning
title_fullStr End-to-End Autonomous Quadcopter using Reinforcement Learning
title_full_unstemmed End-to-End Autonomous Quadcopter using Reinforcement Learning
title_short End-to-End Autonomous Quadcopter using Reinforcement Learning
title_sort end to end autonomous quadcopter using reinforcement learning
topic Quadcopter
Quadrotor
UAV
Reinforcement Learning
Simulation-To-Real
Acoustics, Dynamics, and Controls
Robotics
url https://fount.aucegypt.edu/etds/2421
https://fount.aucegypt.edu/context/etds/article/3431/viewcontent/main.pdf
work_keys_str_mv AT chawamohamedmarwan endtoendautonomousquadcopterusingreinforcementlearning