M.Sc Thesis

M.Sc StudentBar Lev Lior
SubjectCharacterizing a Data Routing Algorithm for Fault
Tolerant Real-Time NoC Systems
DepartmentDepartment of Electrical and Computers Engineering
Supervisor PROF. Avi Mendelson
Full Thesis textFull thesis text - English Version


This thesis introduces a novel algorithm for real-time applications in Networks-On-Chip (NOC). The Network-On-Chip is a particular type of network that requires tailored solutions for its specifications, such as small simple routers and high bandwidth. On the other hand, real-time systems have unique requirements of their own, namely satisfying timing deadlines for specific tasks running on the system. We will show that realizing a real-time system on such networks requires special attention to the routing algorithms used to route messages across the network. Furthermore, the system is designed to be fault tolerant. The new routing algorithm we are defining will allow the system to continue operating and maintains its real-time constraints even when fault occurs, assuming certain fault models have been defined.

The routing algorithm proposed in this work, called TOLEREAL, is a routing algorithm for networks-on-chip, meaning it determines the manner in which messages travel in the network between processing elements. In previous work regarding fault tolerant algorithms, several approaches to the problem were offered, while the simplicity of the NOC routers was the main obstacle. The work with the best response time, crucial for real-time systems, took the approach of using local information about failures and congestion in the network, and its algorithm chose a path that avoided known failures and attempted to take less congested paths using local indicators. However, the algorithms lacked global knowledge and a pre-analysis of the program, and were not designed to guarantee minimal worst-case timings. Therefore, they did not meet reasonable requirements for real-time systems.

TOLEREAL is, as far as we are aware, the only algorithm specifically suited to guarantee both fault tolerance and the worst-case execution time requirements of the real-time system. Therefore, TOLEREAL is performing better than other existing fault-tolerant routing algorithms: It guarantees deadlines tighter than other algorithms, and so is compatible with a wider range of real-time applications, making it more useful. A network using TOLEREAL is also able to maintain functionality with a fault in one of its links with no degradation in real-time performance.

In order to demonstrate the new proposed algorithm, we enhanced a network-on-chip simulator to perform various unique features. We show that TOLEREAL outperforms other algorithms in various conditions in a common network topology, and demonstrate its ability to withstand the faults defined in the fault model.

As shown in the thesis, the parameter used to test the algorithm for real-time applications is the worst-case transmission time (WCTT), which is defined as the latency of the message from source to destination in the worst possible conditions for that message. The simulation results for the scenarios tested show that TOLEREAL presents an improvement in WCTT over the best previous solution in most possible paths, an improvement that reaches over 30% of the latency on average. Thanks to this improvement, the algorithm qualifies for more real-time systems, and so it is better and more useful than previous work.