UAV Collision Avoidance: A Robust Vision Based DQN Exploration Algorithm for Sparse Rewards
Integration of reinforcement learning with unmanned aerial vehicles (UAVs) to achieve autonomous flight has been an active research area in recent years. An important part focuses on collision detection and avoidance as a UAV navigates through an environment. In this work, we expand on previous work in this area by introducing a new variation of the Deep Q-Network (DQN) algorithm to UAVs. Exploration with other variations of DQN for collision avoidance such as D3QN, are typically done through uniform sampling of actions, however, in an environment with sparse rewards many of these actions lead to redundant states. We focus on this problem of learning the dynamics of an unseen environment with sparse rewards more efficiently. To this end, we present an improved algorithm for exploration for UAV collision avoidance. The approach is a guidance based method that uses a Bayesian Gaussian mixture model to compare previously seen states to a predicted next state in order to select the next action. Performance of these approaches was demonstrated in multiple simulation environments using Microsoft AirSim. The proposed algorithm demonstrates a two-fold improvement in average rewards compared to D3QN, after the first 1000 training episodes.