טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentShnitzer Oren
SubjectSelf-Consistent Batch Classification
DepartmentDepartment of Computer Science
Supervisor Professor Shaul Markovitch


Abstract

Most existing learning algorithms generate classifiers that take as an input an untagged instance and return its classification. hen given a set of instances to classify, the classifier treats each member of the set independently. We call this approach independent classification. In this work, we consider an alternative setup, where the classifier receives as input a set of instances and returns a set of assignments.  We call this setup batch classification. Note that batch classification is different than batch learning. In batch learning, the induction algorithm receives the training examples as a set.  In batch classification, the induced classifier receives the testing instances as a set. We claim that considering the set of test instances as a whole allows the classifier to exploit dependencies between the instances in order to improve its performance.


The goal of this work is to study the batch classification  framework, and develop a learning algorithm that takes advantage of this setup. We present several KNN-based solutions that combine the nearest-neighbor rule with some additions that allow it to use the information about the test set.  This is done by imposing an additional requirement -- that the resulting classification will yield a labeled set that is highly self-consistent. We present two approaches for producing self-consistent assignments. The first approach represents the setup as a constraint satisfaction problem (CSP) and uses CSP solving algorithms to find such assignments. The second approach builds a KNN-neighborhood graph and uses various methods to propagate the labels from the tagged examples to the unlabeled instances. Extensive empirical evaluation shows that these algorithms indeed outperform traditional independent classifiers.