M.Sc Student | Broom Ya'akov |
---|---|

Subject | Bayesian Classification and Clustering via DAG Hierarchies |

Department | Department of Computer Science |

Supervisor | Professor Dan Geiger |

Hanson, Cheeseman, and
Stutz (1991,1995) developed **autoclass**, a Naïve Bayes
classifier where the conditional distribution of an attribute given different
classes can share the same parameters. The shared parameters are estimated by merging
the relevant sufficient statistics. This model can be graphically represented as
a directed tree where the leaves are classes which inherit common parameters
from classes which are higher in the hierarchy. Alternatively, this model could
be simply represented as a set of partitions. In this representation there is
one partition of classes for each attribute. Each set in each partition
comprises exactly the classes which are constrained to have the same parameters
with respect to the corresponding attribute.

We follow this path and define a DAG hierarchy model which is a natural generalization of the tree hierarchy model. The DAG hierarchy model allows richer sets of equality constraints on the conditional parameters to be specified. We show that the two different representation of the DAG hierarchy model are equivalent, namely we show that any equality constraints that are represented as a DAG hierarchy could be represented as a set of partitions and vice versa. We define a canonical DAG hierarchy which is a unique representation of the equality constraints as a DAG hierarchy. We show that this representation is always possible and indeed it is unique. This canonical representation captures well the different levels of the hierarchical structure of the underlying classes.

We present heuristic agglomerative algorithms which search for the DAG hierarchy model with the highest marginal likelihood. The resulting classifier, when used on relatively small data sets, shows an increase in classification accuracy, demonstrating that it is a useful tool for preventing overfitting. This is demonstrated both on synthetic and real datasets. Finally we examine two well known choices of an uninformative prior for the unknown parameters. We show that one behaves poorly for the purpose of discovering the correct DAG hierarchy model from data. This is demonstrated by our experiments and supported by a theoretical derivation.