טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentMador-Haim Sela
SubjectNatural Language Interface for Geographical Information
Systems
DepartmentDepartment of Computer Science
Supervisor Mr. Yoad Winter
Full Thesis textFull thesis text - English Version


Abstract

Geographical Information Systems (GISs) are information systems for processing of data that pertain to spatial or geographic coordinates. Even though GISs are enjoying a rapidly growing user community, the current systems are often difficult to use or require a long learning process. In the GIS literature, it has been well-acknowledged that natural language interfaces (NLIs) would significantly enhance the exploitation of the more complex features of GISs, yet despite the potential value of NLIs for GISs, the work on this subject has so far been rather limited. To the best of our knowledge, existing NLIs for GISs are limited in scope and expressive power.


In general, the design of NLIs to databases is regarded as a difficult problem since human interaction is often vague, ambiguous or highly contextualized. The approach we take in this work is to avoid many of these problems by designing a system that uses a controlled language for GIS queries. Such controlled languages, which are based on fragments of English, can be designed in a way that minimizes the use of vague, ambiguous and context-dependent expressions, while maintaining the ability to express complex queries in a language that is a subset of English. Due to the fact that GISs are a closed, well-defined domain, we can focus on data independent parts of the language. We show that the addition of data dependent portions can be done semi automatically and requires low effort .


Our implementation of an NLI for GISs involves three major tasks :

First, we develop a suitable semantic representation for GIS queries , which we call lSQL, and a method to translate natural-language queries via lSQL into spatially-enabled SQL. Second, we define the data independent lexicon, which is based on an extension of the simple applicative categorial grammar (Ajdukiewicz-Bar-Hillel calculus) and lSQL expressions. The definition of lSQL expressions that represent the semantics of spatial relations (especially prepositions) in accordance with the intuitive understanding of such relations involved tackling certain aspects of spatial prepositions that where never dealt with before. The third task is the development of methods to add the data dependent portion of the lexicon with minimal effort, including an automatic tool that generates lexical entries from the actual geographical database in use .


Furthermore, we implemented the NLI presented in this work as a fully working demo prototype that accepts natural language queries and presents the results graphically using a GIS called QGIS.