M.Sc Thesis

M.Sc StudentMishne Alon
SubjectProgramming with Millions of Examples - Scalable Static
Specification Mining
DepartmentDepartment of Computer Science
Supervisor PROF. Eran Yahav
Full Thesis textFull thesis text - English Version


We present a novel code search approach for answering queries focused on API-usage with code showing how the API should be used.

To construct a search index, we develop new techniques for statically mining and consolidating temporal API specifications from code snippets. In contrast to existing semantic-based techniques, our approach handles partial programs in the form of code snippets. Handling snippets allows us to consume code from various sources such as parts of open source projects, educational resources (e.g. tutorials), and expert code sites. To handle code snippets, our approach (i) extracts a possibly partial temporal specification from each snippet using a relatively precise static analysis tracking a generalized notion of typestate, and (ii) consolidates the partial temporal specifications, combining consistent partial information to yield consolidated temporal specifications, each of which captures a full(er) usage scenario.

To answer a search query, we define a notion of relaxed inclusion matching a query against temporal specifications and their corresponding code snippets.

We have implemented our approach in a tool called PRIME and applied it to search for API usage of several challenging APIs. PRIME was able to analyze and consolidate thousands of snippets per tested API, and our results indicate that the combination of a relatively precise analysis and consolidation allowed PRIME to answer challenging queries effectively.