M.Sc Student | Amar Moshe |
---|---|

Subject | Layman's Learning to be an Expert: An Efficient Online Multiclass Classification for Hierarchical-Structured Classes |

Department | Department of Electrical Engineering |

Supervisor | Professor Yacov Crammer |

Full Thesis text |

We study the problem of classifying data into classes embedded in the leaves of a tree. In our setting, we assume that the classes have a certain hierarchical structure that can be represented by a tree: the classes can be embedded in the tree’s leaves and every other node in the tree represent a common description of the leaves beneath it. Therefore, the learning algorithm may return not only a leaf (which represents a specific class), but also any other node (which represents a group of classes with some shared properties), and is required to balance between being correct (by returning a node close to the root) and being exact (by returning a node close to a leaf). We introduce a new class of loss functions that encourages the learner to balance between these two requirements and a greedy prediction algorithm which is optimal under some conditions.

This setting is very useful in many domains which are extremely sensitive to operating under correct predictions. For example, in some mission critical application, it is preferable to predict ”don’t-know” rather than an erroneous prediction. In this setting, we allow several levels of rough yet correct prediction, in the case where a certain prediction accuracy cannot be achieved. Thus, we allow the application receiving the prediction to fine-tune its response. Furthermore, we show that tuning the loss function parameters actually determines how exact the classification algorithm’s predictions would be.

In this study focus on the online setting. Our main result is a learning algorithm that produces a prediction by making decisions in the tree (starting from the root). The tree descent represents a more specific prediction. On every iteration, the proposed algorithm evaluates its confidence in every decision node within the tree, and according to that confidence and the expected loss it determines where to stop the tree descent. We show that the algorithm suffers logarithmic regret and thus, eventually predicts nodes close to the leaves when possible. Moreover, the algorithm’s models improve over time, hence the prediction confidence at every node improves and the prediction become more specific.

We present both the regret analysis of the proposed algorithm with detailed proofs and experimental results which demonstrate its performance over the ImageNet dataset. These results demonstrate the desired behavior: at the beginning (when the algorithm’s models are inaccurate), the algorithm generates inaccurate predictions (nodes closer to the root) and as it receives more examples (and its models and prediction confidence improve) it outputs leaves more often.