M.Sc Thesis

M.Sc StudentAlmog Barak
SubjectA Least Squares Temporal Difference Cross Entropy Approach
to Intelligent Air Combat
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Shie Mannor


Throughout the recent decades we witness a tremendous growth in usage of autonomous intelligent systems. Within this fascinating field great attention is drawn to modern warfare technology and to Unmanned Aerial Vehicles (UAVs) in particular. UAVs have the potential to take over manned aircraft place and outperform it in many dangerous missions. However, so far the complexity of some tasks, such as air combat, has prevented assigning them to UAV, to carry out autonomously. We formulate a representation of a one-on-one air combat maneuvering problem and suggest a Least Squares Temporal Difference (LSTD) approach for computing an efficient approximation of the state value function, hence deriving an optimal policy. This value function approach provides a fast response to a rapidly changing tactical situation and good performance without coding of rule based explicit air combat tactics.

In the version of the problem formulation that's considered in this work, the aircraft learning the optimal policy is given a slight performance advantage. The method's success is due to extensive reward shaping and expert trajectory sampling, as well as informative features development. Special attention was given to state value function model development using intelligent expansion schema. An accompanying fast and effective rollout based policy extraction method is used to accomplish on-line robust implementation.