|Ph.D Student||Katz Itamar|
|Subject||Representation Learning via Clustering and Hierarchical|
|Department||Department of Electrical and Computer Engineering||Supervisors||ASSOCIATE PROF. Daniel Soudry|
|ASSOCIATE PROF. Yacov Crammer|
|Full Thesis text|
Data representation is a fundamental concept of machine learning. Input patterns may belong to a space which is not even a vector space, for example a set of people, and some transformation is needed in order to extract features which can be further processed and used. As a result, the performance of learning algorithms generally depends on the specific choice of features. Domain knowledge can be very useful for hand-designing a task-specific representation, and indeed traditionally much effort was aimed at preprocessing data for learning algorithms with a specific task or domain in mind. Using domain knowledge, however, requires human experts and intensive effort, and usually limits the applicability of the obtained representation to a specific data-type or task. We are therefore interested in unsupervised learning of features which is not tailored to any domain-specific data. To this end we utilize two concepts which are useful for learning a good representation, by means of incorporating prior knowledge about data in general into the learned representation. The first concept is clustering, that is the grouping of inputs into a few groups, or clusters. The second concept is imposing a hierarchical structure. Both concepts allow us to identify and exploit an underlying structure which might be useful for extracting features which could be beneficial for various applications.
We address three tasks of learning such features. The first task is segmentation of sequential data, which can be seen as clustering along a time axis. The resulting segment `centroids' are the learned representation which reveals the underlying segments. We formulate this task as a convex optimization problem, and propose two algorithms for solving it, one exactly and another approximately. In addition, outlier robustness is incorporated explicitly into the model. The second task is an extension of Principal Component Analysis (PCA), which we denote adaptive dimension PCA, in which each input is assigned to one of $d$ subspaces, each of different dimensionality, while fixing the average dimension over the set of inputs. Here we use the notion of clustering in a slightly different manner: while inputs are still grouped according to subspace assignment, we do not require inputs in the same group to be similar to each other. This task as well is formulated as an optimization problem, and we propose an efficient algorithm for learning the subspaces and the assignments of inputs to subspaces. The third task is motivated by the real-world problem of spotting a spoken word in an audio recording, which is known as Keyword Spotting (KWS). For extracting features we apply soft clustering using a Gaussian Mixture Model (GMM), and use statistics of the probabilistic assignment of inputs to mixture components in order to build a representation which is useful for the Keyword Spotting problem.
For all three tasks we conduct an empirical study. We use both synthetic and real world data of speech and images in order to demonstrate the usefulness of our feature extraction methods.