Ph.D Thesis
Ph.D Student Talyansky Roman Sample Complexity of Training Markov Chains Department of Computer Science Professor Emeritus Alon Itai

Abstract

Markov chains have gained wide popularity in a diverse spectrum of applications, where there is a need to model processes, changing along the time axis. Markovian models appear extensively in thermodynamics and statistical mechanics, testing and telecommunication, finance and economics.

A Markov chain (MC) consists of the set of states, the initial state probabilities, the set of edges and the transition probabilities that define probabilities to traverse edges. MCs are learned from a training data - sequences s of MC states. Training data are produced by a MC, called concept MC M (c) that cannot be observed directly.

Let dist be a distance between MCs and ε, δ, θ between 0 and 0.5. The goal of learning is to use  to construct a learned MC M () that is ε -close to M () with confidence at least (1 - δ):

Pr(dist(M (c),M ())< ε) > 1 - δ  .

To develop sample complexity lower bounds we use methods of Information Theory: distinguishing between two random variables and strongly typical sequences. We also use matrix perturbation analysis.

We develop three bounds on the sample complexity of learning MCs. The first bound is expressed by the terms of the concept MC. This bound is theoretic, because in practice the concept MC is not known and thus the bound cannot be calculated. The second bound is expressed by the terms of the MC, that is learned from a sequence over the states of the concept MC and distributed according to the concept MC, but involves heavy computations. Finally the last bound is practical, since it is both expressed by the terms of the learned MC and may be calculated efficiently.