טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentDavid Moshe Yaniv
SubjectSimilarity in Binary Executables
DepartmentDepartment of Computer Science
Supervisor Professor Eran Yahav
Full Thesis textFull thesis text - English Version


Abstract

We address the problem of binary code search in stripped executables (with no

debug information). The main challenge is establishing binary code similarity

even when the binary code has been compiled using different compilers,

optimization levels, target architectures. Moreover, the source code being

compiled might be from another version of the software package or another

implementation altogether. Overcoming this challenge, while avoiding

false-positives, is invaluable to guide other more costly tasks in the field

of binary code analysis such as reverse engineering or automated vulnerability

detection.

We present an iterative process of analyzing and presenting the

different parts of the binary similarity problem. At each step we further

refine our similarity method. Towards this end we incorporate several

representations for binary code, each created by statically analyzing the

binary code to decompose it into smaller parts carrying semantic meaning. These

representations are matched with different concepts and tools from other

fields to create a measure for binary similarity between procedures. These

include fields include model theory, statistical frameworks, SMT solvers and

deep neural networks.

We tested our developed methods in real-world scenarios by employing them to

find vulnerabilities by search and perform name prediction on binary

procedure. We discovered 373 vulnerabilities affecting publicly

available firmware, 147 of them in the latest available firmware

version for the device, and successfully predicted procedure names improving

on the state-of-the-art by 20\% and improving by 84\% over

state-of-the-art neural models that do not use any static analysis.