טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentFunk Tal
SubjectKnowledge Discovery within Spatial Databases Searching for
Action Patterns in Time and Space Using Data
Mining Techniques
DepartmentDepartment of Civil and Environmental Engineering
Supervisors Professor Gilad Even-Tzur
Professor Maxim Shoshany
Full Thesis text - in Hebrew Full thesis text - Hebrew Version


Abstract

The substantial growth in the amount of spatial information and its availability, leads the need for tools that will assist in transforming this information into useable knowledge. This procedure, known as knowledge discovery, is based mainly on data mining techniques, which allow derivation of characteristic patterns, relationships and trends, 'hidden' within databases.

In the literature there are numerous algorithms and methods available for this task. However, their efficient use requires strategies and structured methodologies, which to large extent do not exist. The main objective of this study is primarily to develop a general methodology which serves knowledge discovery in different geographical applications. The second main objective is to demonstrate the contribution of such methodology for revealing hypotheses regarding spatio-temporal patterns.

Comparison between results obtained for different datasets representing the same area of interest serves as the fundamental principle for the proposed methodology. These databases were formed to represent the real events with synthetic databases which simulate dynamics following typical hypothetical pattern. Certain geographical properties were attached to each item in both datasets, facilitating the inclusion of spatial phenomena in the data mining process. This later element of database construction was necessary due to the fact that there is lack of spatial data mining techniques.

The methodology is built upon three consecutive activities - CFA: Compare, Focus, and Analyze. "Compare" concerns visual comparison between relationships found in different databases. Potential important information exists where the relationships for the real dataset differ significantly from those extracted from the other databases. "Focus" will gather information regarding those potentially significant relationships and display them on a map/ortho-rectified image.  This will include displaying full historical records for the sites maintaining the potential relationships. "Analyze" then further assess relationship for the selected area/datasets by further clustering and/or searching for association rules within them.  The CFA process may then be repeated iteratively until significant relationships had been identified. These relationships represent the hypotheses' set for further phenomenological research by experts in specific knowledge field.

Two programs were chosen for implementing the methodology: Weka Toolkit and ArcGIS. The combination between classic data mining techniques as they performed at Weka Toolkit and the GIS general and visual abilities, allows the extraction of both simple and complex rules through the execution of the CFA technique.

The current research constitutes as preliminary research that its main contribution is by analyzing the existing situation as a basis for making future predictions.