Ph.D Student | Rusakov Dmitry |
---|---|

Subject | Bayesian Networks: Model Selection and Applications |

Department | Department of Computer Science |

Supervisor | Professor Dan Geiger |

A Bayesian network is a representation of joint probability distribution for a collection of random variables via a directed acyclic graph and a set of associated parameters. This thesis focuses on learning a structure of a Bayesian network based on data. We are taking a Bayesian approach to model selection which requires the evaluation of the marginal likelihood of data given a network structure. Such evaluation requires the specification of an appropriate conjugate prior distribution for network parameters or, alternatively, the development of asymptotic (large-sample) approximation formulas for the marginal likelihood integrals under study.

The first part of this thesis investigates the applicability of the BIC score for approximating the marginal likelihood of data given a Bayesian network with hidden variables. We develop a closed form asymptotic formula for the marginal likelihood of data given a naive Bayesian model with two hidden states and binary features. This formula deviates from the standard BIC score. Therefore, this work shows that the standard BIC score is generally incorrect for model selection among Bayesian networks with hidden variables.

The second part of the thesis investigates the minimal set of general conditions, known as global and local parameter independence, that are required in order to ensure the Dirichlet prior on network parameters and, consequently, to allow a closed form evaluation of the marginal likelihood.

The final part of this thesis presents an application of Bayesian model selection to the problem of finding the location of disease genes in the context of genetic linkage analysis. Recessive and dominant models of disease penetrance are being compared in this application.