טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentAnschel Oron
SubjectVariance Reduction and Algorithm Stabilization for Deep
Reinforcement Learning
DepartmentDepartment of Electrical Engineering
Supervisor Professor Nahum Shimkin
Full Thesis textFull thesis text - English Version


Abstract

Deep Reinforcement Learning (DRL) is a new, active area of research within the field of machine learning. The combination of deep learning techniques and reinforcement learning methods enabled unprecedented performance both in video games and robotic control tasks.

While DRL algorithms achieve impressive results, instability during training procedures tends to affect algorithmic performance adversely.


In this thesis, we study the source of the various adverse effects that have been empirically observed in the Deep Q-Network (DQN) algorithm. We also propose an extension to reduce these adverse effects, resulting in improved algorithmic stability and performance.

Using a simplified model, we provide a theoretical analysis of the approximation-error variance in Q-value estimates of the DQN algorithm. The analysis shows that the approximation-error variance accumulates, particularly in the beginning of each training trajectory. Additionally, the cumulative-error variance results in a positive bias that is proportional to the variance of states, where the value estimates for different actions are similar.


To address these adverse effects, we propose a novel variation of the DQN called Averaged-DQN. This variation sets the current Q-value estimate to be the average of previously learned Q-values.

Averaging over the previous Q-values stabilizes training and improves performance by reducing the approximation error variance in the Q-value estimate.

Additionally, we provide variance reduction analysis for the proposed scheme, conduct experiments on a toy problem and test three Arcade Learning Environment benchmarks, which demonstrate significantly improved stability and state-of-the-art performance.