|M.Sc Student||Mebel Ofir|
|Subject||Markov Decision Processes with Correlated Uncertainty|
|Department||Department of Electrical and Computer Engineering||Supervisor||PROF. Shie Mannor|
Markov decision processes are a common tool for modeling planning problems with stochastic unknowns. The model is comprised of two spaces: the state space, which an agent is traversing in discrete time-steps, and the action space, from which he chooses a single action to take in every step. The dynamics of the model are governed by an additional two elements: the transition probabilities between states, conditional on the action taken and the rewards the agent receives or loses for each action in each state. The problem of computing the optimal action policy which brings the expected total reward to a maximum is widely studied and is known to be tractable even for large state and action spaces.
However, in almost all realistic situations the system model cannot be perfectly known and must be estimated. Thus, we consider Markov decision processes under parameter uncertainty, which effectively adds a second layer of uncertainty. Previous studies which provide tractable algorithms for finding optimal policies do so only for the restricted case that uncertainties among different states are uncoupled, meaning that uncertainty manifesting in one state can’t affect the manifestation in a second state. Assuming uncoupled uncertainties leads to conservative solutions since the catastrophic scenarios of all uncertainties manifesting at once have to be considered, as well as realistic scenarios.
In this work we introduce an intuitive concept, termed “Lightning Does not Strike Twice,” to model coupled uncertain parameters. Specifically, we require that the system can deviate from its nominal parameters only a bounded number of times. We give probabilistic guarantees indicating that this model represents real-life situations and devise tractable algorithms for computing optimal control policies for multiple cases of finite and infinite horizon problems. Finally we give numerical simulations demonstrating the effectiveness and flexibility of the method proposed.