טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentRoman Kaplan
SubjectAccelerating SpMV Multiplication using Compression
on the Plural Many-Core Architecture
DepartmentDepartment of Electrical Engineering
Supervisor Full Professor Ginosar Ran
Full Thesis textFull thesis text - English Version


Abstract

Sparse Matrix-Vector Multiplication (SpMV) is useful for many scientific and machine learning applications. SpMV is known to be memory bandwidth limited. In case of big-data matrices stored in external storage, the problem worsens, leaving the processor idle more than 99% of the time. Using lossless algorithms to pre-compress the data and decompress it upon fetching has traditionally been impractical, since single and multi-core CPUs have been unable to benefit from compressed data fetching due to the long decompression time. With the emergence of many-core architectures, ample computational power exists to hide the decompression latency.

In this work we employ compression to improve the performance of SpMV of big-data input that resides in external storage, and to reduce total energy. Two types of compression are applied to the sparse matrix, space-efficient formats and BZIP2. The space-efficient formats, CSR-DU and CSR-VI, were introduced in previous work and expand the most common CSR format. The latter compression method applied with SpMV, BZIP2, was not found in previous literature and is our novelty. In addition, we show results that BZIP2 is able to exploit more compression entropy from the space-efficient formats than from the common CSR.

A set of large scale sparse matrices is used to comparatively investigate a variety of compression methods. We demonstrate the benefits of the above types of compression on the shared-memory Plural many-core architectural model. This model has a many-core general-purpose CPU, where all cores are equidistant from shared memory.

Using the Roofline model we show how compression affects the operational intensity and performance bounds of our machine when computing SpMV. Compared to using the most common sparse-matrix representation format, CSR, the space-efficient formats provide 2? speedup. The combination of BZIP2 with the space-efficient formants achieve 5? average performance improvement.

Additional advantage of using compression is potential energy-saving. Since fetching the input from external storage is the most energy-consuming step in big-data SpMV computation, by reducing input size we can reduce the overall energy costs of our system. The overhead in this case is decompression. We show that more than 90% of our test-matrices exhibit higher energy saving than 10%. The average energy reduction is 43% while computing SpMV.