טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentMeir Hagar
SubjectOak: Off-Heap Allocated Keys Big Data Analytics
DepartmentDepartment of Electrical Engineering
Supervisor Professor Idit Keidar
Full Thesis textFull thesis text - English Version


Abstract

We present Oak (Off-heap Allocated Keys), a scalable concurrent key-value map designed for real-time big data analytics. Oak offloads the data from the virtual machine heap in managed-memory languages like Java, thereby reducing garbage collection overheads.


Oak is optimized for big keys and values and for frequent incremental maintenance

of existing values, as prevalent in streaming analytics use cases. To this end, it adopts

a zero-copy approach to data update and retrieval, e.g., through concurrent update-in-place. Oak's API is similar to that of Java's ConcurrentNavigableMap with adjustments for efficient zero-copy implementation. It provides strong (atomic) semantics for read, write, and various read-modify-write operations, such as compute (in-situ update) and put-if-absent, as well as (non-atomic) ascending and descending iterators.


We provide proof of Oak's correctness, by identifying linearization points for all operations, so that concurrent operations appear to execute in the order of their linearization points. We further report on our experiments which show that Oak is faster by 1.3-4.8x than the currently standard concurrent KV-map, the Java ConcurrentSkipListMap. In addition, our results demonstrate that off-heap allocation is beneficial in scenarios with conditional updates of large values.


Our industrial partners are integrating Oak as the core data index in the popular Apache Druid in-memory analytics platform. This integration is beyond the scope of this thesis.