טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentSlavkin Michael
SubjectDetecting Data Structures from Traces
DepartmentDepartment of Computer Science
Supervisor Professor Emeritus Alon Itai


Abstract

This work deals with detecting the data structure that created a series of memory accesses. We restrict our attention to detecting data structures, when the program uses a single static data structure, i.e. it only supports searches (no insertions/deletions) by looking at traces of memory accesses (address sequences) resulting from series of such searches. We constructed an algorithm to detect a specific data structure from among an arbitrary set of data structures.

Traces of memory accesses correspond to both the underlying data structure organization and the traversal methods. In this study we do not explicitly recover the organization but use a machine learning approach to classify data structures (by analyzing statistical information regarding compositions of these records and relations).

We first provide a transformation from a trace of memory accesses to a sequence of features. A set of small building blocks was defined by the composition of a single record and the relations in which this record participates. To detect the topology of records and relations we feed the information on occurrence of these building blocks into the statistical classification tools. To detect not only the topology but also the traversal methods we take into account the statistics of occurrences of short sequences of building blocks in the trace.

We have experimented with a number of popular classification algorithms (C4.5, SVM, Naїve Bayes) and compared their accuracy. The decision tree that results from the C4.5 classifier provides us also with an easy to understand algorithm for classifying data structures.