|M.Sc Student||Givoli Ofer|
|Subject||Zero-Shot Semantic Parsing for Instructions|
|Department||Department of Computer Science||Supervisor||ASSOCIATE PROF. Roi Reichart|
|Full Thesis text|
In this work we introduce a novel task: training a parser that can parse instructions into compositional logical forms, in domains that were not seen during training. Formally, our task assumes a set of source domains, each corresponding to a toy application (e.g. a calendar or a file manager). Our goal is to train a semantic parser, using examples from the source domains, such that it could parse instructions from a different domain using nothing but the definition of the domain. We are motivated by the long-term goal of allowing developers to enable a natural language user interface for their software with minimal overhead.
In our task, the annotation of the training examples does not include logical forms for the instructions. The task requires learning from weak supervision: the desired state of the application. We created a new dataset of instructions with 1,390 examples from 7 domains. Each example in the dataset is a triplet consisting of (a) the application's initial state, (b) an instruction, to be carried out in the context of that state and (c) the state of the application after carrying out the instruction.
Our model is an extension of the parser introduced by Pasupat and Liang (2015) which was originally designed for question answering. Additional features are extracted based on description phrases that are provided for the interface methods. Also, the provided application logic is used during inference in order to dismiss some of the incorrect candidate logical forms.
We also experimented with two new training methods for this task, tailored to our zero-shot setup. The first training method is a new version of the AdaGrad algorithm which we call Conditional Weight Updates (CWU). In this training method the unconditional weight update step of AdaGrad is replaced with a partial update that depends on encountering examples from multiple domains for which the update is useful. This method did not outperform AdaGrad, which was used for the original parser.
The second training method is based on separating the training process into two steps. In the first step, weights are learned by AdaGrad using examples from a subset of the source domains. In the second step AdaGrad is used again, starting from the weights learned in the first step; and training only on examples from the source domains that were not used for the first step. This training algorithm outperformed AdaGrad.
We hope this work will inspire readers to use our framework to collect a larger dataset and experiment with more models. Our framework is designed to allow integration with existing Java applications with minimal effort, making it easy to define new domains and collect annotated data.