Ph.D Student | David Moshe Yaniv |
---|---|
Subject | Similarity in Binary Executables |
Department | Department of Computer Science | Supervisor | Professor Eran Yahav |
Full Thesis text | ![]() |
We address the problem of binary code search in stripped executables (with no
debug information). The main challenge is establishing binary code similarity
even when the binary code has been compiled using different compilers,
optimization levels, target architectures. Moreover, the source code being
compiled might be from another version of the software package or another
implementation altogether. Overcoming this challenge, while avoiding
false-positives, is invaluable to guide other more costly tasks in the field
of binary code analysis such as reverse engineering or automated vulnerability
detection.
We present an iterative process of analyzing and presenting the
different parts of the binary similarity problem. At each step we further
refine our similarity method. Towards this end we incorporate several
representations for binary code, each created by statically analyzing the
binary code to decompose it into smaller parts carrying semantic meaning. These
representations are matched with different concepts and tools from other
fields to create a measure for binary similarity between procedures. These
include fields include model theory, statistical frameworks, SMT solvers and
deep neural networks.
We tested our developed methods in real-world scenarios by employing them to
find vulnerabilities by search and perform name prediction on binary
procedure. We discovered 373 vulnerabilities affecting publicly
available firmware, 147 of them in the latest available firmware
version for the device, and successfully predicted procedure names improving
on the state-of-the-art by 20\% and improving by 84\% over
state-of-the-art neural models that do not use any static analysis.