|M.Sc Student||Heimann Noam|
|Subject||Partially Observable Markov Decision Process under Perfect|
Sensing Information POMDPUPI
|Department||Department of Industrial Engineering and Management||Supervisors||Dr. Tamir Hazan|
|Dr. Erez Karpas|
|Full Thesis text|
A substantial direction of research in the field of planning deals with computational feasibility and optimization, both theoretical and practical.
To cope with such convoluted tasks one needs sufficient understanding of graph-theory, probability, optimization, and other related topics.
In the stochastic planning field of research, particularly in Markov based models, it is widely common to use efficient techniques while making case-based relaxations, in order to reach more tractable, faster results.
When one comes to solve POMDPs, as their set of states grows larger they usually become intractable, to the point of infeasible to solve in practice.
The presence of noise in the agent's sensing mechanism often imposes extensive complexity, burdening the planning process.
Recent advancements of machine learning models for classification tasks such as deep convolutional neural networks show prominent success and very high accuracy behavior. Incorporating such prominent models as the agent's sensing mechanism may allow some modeling relaxations to be made in order to relieve complexity.
According to our hypothesis, related POMDP cases, who share similar characteristics, can be efficiently solved as H-MDPs with options - completely eliminating all partial observability from the main planner.
This family of POMDPs is a subset of general POMDPs and has the following characteristics: single agent, factored state-space, all partially observable state variables can be decoupled, sensor behavior can be approximated by a known PDF, and the sensor has a highly accurate confidence level behavior.
In this work we will relate to the incorporation of highly accurate sensors in such POMDPs. The Planning task will be modeled as a Partially Observable Markov Decision Process (POMDP), and the sensor's PDF will stem from a classification vision task, modeled as a Deep Neural Network (DNN).
The latter shall be pre-trained on data points and estimated as a theoretical PDF, with which the former shall use when planning to yield a sufficiently semi-optimal policy, in a timely manner.
Relying on DNNs’ prominent success nowadays, we wish to utilize their abilities to alleviate the issue of noisy sensor outputs in the POMDP planning task by treating the output of the sensor as (almost) perfectly reliable information.
Whilst being a reasonable assumption, the planning problem's complexity will be significantly reduced, resulting in a comparable decision making scheme alongside noticeably less computing effort, making complexly modeled tasks computationally feasible with comparable policy rewards.