טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentMalca Rivka
SubjectNeural Transition Based Parsing of Web Queries: An
Entity Based Approach
DepartmentDepartment of Computer Science
Supervisor Professor Roi Reichart


Abstract

Web queries with question intent manifest a complex syntactic structure that consists of one or more independent sub-structures. The processing of this structure is important for their interpretation. [PRS16] has formalized the grammar of these queries and proposed semisupervised algorithms for the adaptation of parsers originally designed to parse according to the standard dependency grammar, so that they can account for the unique forest grammar of queries. However, their algorithm rely on resources typically not available outside of big web corporates.

We propose a new bidirectional LSTM (BiLSTM) query parser that: (1) Explicitly

accounts for the unique grammar of web queries using a new transition system that consists of a new set of transitions and configurations; and (2) Utilizes named entity (NE) information from a BiLSTM NE tagger, that can be jointly trained with the parser.

The proposed parser is a transition based parser with a dedicated transition system that contains a new transition denoted with PushToSeg. The new transition refers to the need to segment the query as part of the parsing process. In order to train our model we annotate the query treebank of [PRS16] with NEs. When trained on 2500 annotated queries our segmentation and NE aware parser achieves UAS of 83.5% and segmentation F1-score of 84.5, substantially outperforming existing state-of-the-art parsers.