## RELIABILITY-AWARE ANALYSIS AND DESIGN OF NETWORK-ON-CHIP

Ph.D. Thesis

ASHISH SHARMA ID: 2013RCP9515



# DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING MALAVIYA NATIONAL INSTITUTE OF TECHNOLOGY JAIPUR

August 2019

## Reliability-Aware Analysis and Design of Network-on-Chip

 $Submitted \ in$ 

fulfillment of the requirements for the degree of

Doctor of Philosophy

by

Ashish Sharma ID: 2013RCP9515

under the supervision of

**Prof. Manoj Singh Gaur** Director & Professor, IIT Jammu

> **Prof. Lava Bhargava** Professor, MNIT Jaipur



# DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING MALAVIYA NATIONAL INSTITUTE OF TECHNOLOGY JAIPUR

August 2019

©Malaviya National Institute of Technology Jaipur - 2019. All rights reserved.

#### DECLARATION

I, Ashish Sharma, declare that this thesis titled, "Reliability-Aware Analysis and Design of Network-on-Chip" and the work presented in it are my own. I confirm that:

- This work was done wholly or mainly while in candidature for a Ph.D. degree at this university.
- Where any part of this thesis has previously been submitted for a degree or any other qualification at MNIT Jaipur or any other institution, this has been clearly stated.
- Where I have consulted the published work of others, this is always clearly attributed.
- Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this Dissertation is entirely my own work.
- I have acknowledged all main sources of help.
- Where the thesis is based on work done by myself, jointly with others, I have made clear exactly what was done by others and what I have contributed myself.

Date:

Ashish Sharma 2013RCP9515

### CERTIFICATE

This is to certify that the thesis entitled "Reliability-Aware Analysis and Design of Network-on-Chip" being submitted by Mr. Ashish Sharma (2013RCP9515) is a bonafide research work carried out under my supervision and guidance in fulfillment of the requirement for the award of the degree of Doctor of Philosophy in the Department of Computer Science & Engineering, Malaviya National Institute of Technology, Jaipur, India. The matter embodied in this thesis is original and has not been submitted to any other University or Institute for the award of any other degree.

Prof. Manoj Singh Gaur Director & Professor, IIT Jammu (Supervisor) Department of Computer Science & Engineering Malaviya National Institute of Technology Jaipur, India

Place: Jaipur Date:

Place: Jaipur Professor (Co-Supervisor) Date: Department of Electronics and Communication Engineering Malaviya National Institute of Technology Jaipur, India

#### ACKNOWLEDGEMENT

This doctoral thesis would have been impossible without the support, review and constructive criticism of many persons. It is a pleasure to thank all of them here.

First of all, I offer my sincere gratitude to my supervisors, **Director and Professor Manoj Singh Gaur**, **Prof. Lava Bhargava** for agreeing as my guides and mentors. They have always been there to support me whenever I needed their views on research methods. Their reviews-always motivated me to do something, and their suggestionsalways provided thoughtful input in my research. The environment they provided is quite friendly. I can not envision having a better mentor for my Ph.D. study. I would especially like to thank Prof. Gaur, Prof. Lava Bhargava, they are always sensitive towards me and encourage me not only in my research but also in general. I feel them as "my guide for life." I don't know if child's sensitivity toward parents would be the correct word to describe the bond which I feel for them.

I am very grateful to **Prof. Vijay Laxmi**, despite her incredibly busy schedule, She is always helping me to improve my research work and resolve the critical research problems. I would like to acknowledge **Dr. Mark Zwolinski** (Professor, University of Southampton, United Kingdom) for his continuous technical support.

I would also like to thank the members of my doctoral guidance committee: Prof. Vijay Laxmi, Prof. D. Boolchandani, Dr. Girdhari Singh and Dr. Emmanuel Shubhakar Pilli. Their inquisitiveness provided a new perspective to the problem.

I feel delighted to thank my parents. I feel helpless in defining your support in words. Appreciation, compassion, warmth, communication and financial support are just a few of them which you provide me.

During the course of my Ph.D. work I have been blessed with a jovial group of fellow scholars and some long-lasting friendships, Dr. Rimpy Bishnoi, Dr. Gaurav Singal, Niyati Gupta, Dr. Ajay Nehra, Dr. Lokesh Garg and Dr. Sapna Khandelwal. Especially thanks to my friends Yogendra Kumar Gupta, Ramakrishna Vaikuntapu, Anugrah Jain, Anurag Sharma and my students Ruby and Prachi. They listen to me and discuss everything very truthfully. I also want to thank all my IIIT Kota colleagues.

Finally, I thank my wife Ms. Deepti Kaushik and son Divyansh Mudgal and Lakshit Mudgal, without their support this endeavor would be impossible.

Finally, in the end, I would like to thank God, the Creator of the Earth, the Almighty. The Supreme in power and knowledge and the most merciful, Whom I acquired guidance and knowledge to do something beneficial for the humanity.

#### ABSTRACT

The proliferation of on-chip cores has lead to gain in performance and throughput of Chip Multi-Processor (CMP). However, it shifts the paradigm from computational to communication-centric design. To scale many-core communication architectures, Networkon-Chip (NoC) emerges as a panacea, to boost the multi-core based high-performance parallel architecture. Rapid shrinking of oxide thickness in CMOS scaling increases the gate-oxide electric field, and make transistors susceptible to wearout. Transistor wearout effects such as "Hot Carrier Injection (HCI), Time-Dependent Dielectric Breakdown (TDDB), Negative Bias Temperature Instability (NBTI), Electromigration (EM), Stress Migration (SM) and Thermal Cycles (TC)" affect device performance within the lifetime of a circuit. It raised the importance of addressing the sustainability of on-chip communication architectures. The aging-induced wearout change the basic parameters of a transistor. Hence, resulted in performance degradation in NoC that can create a serious bottleneck, undermining system performance and reliability. It is important to address the reliability at the system-level to device-level for NoC and router. For that, an early reliability estimation is required at system-level. To achieve this objective, we have developed the "HiPER-NIRGAM" framework that addresses the performance, power, thermal and reliability estimation for 2D NoC and router at system-level. Our modified reliability estimation with SM and TC failure mechanism along with Weibull distribution over existing NBTI and TDDB provides the more accurate estimation of mean time to failure value (MTTF) at system-level. The outcome of system-level reliability analysis made power and temperature a prime suspect for reducing the reliability of NoCs. Our proposed power and thermal-aware reliability enhancement architecture using HVT cell router, power-aware floorplan at system-level enhance the reliability by mitigating the failure mechanisms. Our routing logic based port enable clock gating technique, VC clock gating and slice clock gating on FPGA improve the power optimization and reduce the activity factor, hence enhance the reliability of NoC. We propose four approaches for enhancing the reliability of NoC architecture such as (1)"Aging-aware Timing Framework", (2) "Most Stress Cell" identification algorithm, (3) NBTI delay modeling, and (4) Multi-multi- $V_{dd}$  and  $V_{th}$  cells based cell library characterization that mitigate the effect of NBTI induced delay at device-level. The proposed resizing of "Most Stress Cell" is the most reliable solution to improve the reliability and mitigate the delay degradation.

## Dedications

This thesis is dedicated to my family, especially to my father. He plays different roles in my life as a class teacher, motivator, and creator. My wife, Deepti Kaushik and My sons Divyansh Mudgal and Lakshit Mudgal for always keeping me motivated, listening to me and being with me through my tough time.

# Contents

| Α        | Abstract iii |                                                                         |    |  |
|----------|--------------|-------------------------------------------------------------------------|----|--|
| Li       | st of        | Figures vi                                                              | ii |  |
| Li       | st of        | Tables                                                                  | ĸi |  |
| 1        | Intr         | oduction                                                                | 1  |  |
|          | 1.1          | Motivation                                                              | 4  |  |
|          | 1.2          | Objectives                                                              | 7  |  |
|          | 1.3          | Contributions                                                           | 9  |  |
|          | 1.4          | Thesis Organization                                                     | .0 |  |
| <b>2</b> | Inte         | rconnection Networks                                                    | 1  |  |
|          | 2.1          | Computation and Communication: A Taxonomy                               | 2  |  |
|          | 2.2          | Why Network-on-Chip: Evolution of On-chip Communication Architectures 1 | 6  |  |
|          |              | 2.2.1 Basic Component of SoCs                                           | 6  |  |
|          |              | 2.2.2 Growth and Development of Interconnect Network                    | 6  |  |
|          | 2.3          | Network-on-Chip Architecture                                            | 20 |  |
|          |              | 2.3.1 NoC Components                                                    | 21 |  |
|          |              | 2.3.1.1 Router                                                          | 22 |  |
|          |              | 2.3.1.2 Network Interface (NI)                                          | 25 |  |
|          |              | 2.3.1.3 Link                                                            | 25 |  |
|          |              | 2.3.2 Message Format                                                    | 25 |  |
|          |              | 2.3.3 Topology and Routing:                                             | 26 |  |
|          |              | 2.3.4 Flow Control and Switching Technique:                             | 29 |  |
|          | 2.4          | Reliability of NoC                                                      | 31 |  |
|          |              | 2.4.1 Reliability                                                       | 32 |  |
|          |              | 2.4.2 Life-Time Failure Mechanism                                       | 34 |  |
|          | 2.5          | Aging Effects                                                           | 37 |  |
|          |              | 2.5.1 Impact of Technology Scaling on Aging                             | 8  |  |
|          |              | 2.5.2 Impact of Temperature on Reliability                              | 8  |  |
|          |              | 2.5.3 Power Temperature Relationship: Temperature Modeling 3            | 39 |  |
|          | 2.6          | State-of-the-art                                                        | 1  |  |
|          | 2.7          | NoC Reliability Representation at Different-Level                       | 15 |  |
|          | 2.8          | Summary                                                                 | 19 |  |

| 3            | System-Level: HiPER-NIRGAM Framework for NoC Reliability Esti- |                                                                            |                 |  |  |  |
|--------------|----------------------------------------------------------------|----------------------------------------------------------------------------|-----------------|--|--|--|
|              | ma                                                             | tion 5                                                                     | 51<br>- 0       |  |  |  |
|              | 3.1                                                            | I Proposed Framework: HiPER-NIRGAM                                         |                 |  |  |  |
|              |                                                                | 3.1.1 NIRGAM: NoC Interconnect Routing and Application Modeling            | 4נ<br>בב        |  |  |  |
|              |                                                                | 3.1.2 NoC Power Modeling                                                   | ))<br>20        |  |  |  |
|              |                                                                | 3.1.2.1 NIRGAM Integration With Power Models                               | 56              |  |  |  |
|              |                                                                | 3.1.3 Thermal modeling                                                     | 58              |  |  |  |
|              |                                                                | 3.1.3.1 NIRGAM Integration With Thermal Models                             | 58              |  |  |  |
|              |                                                                | 3.1.3.2 NoC Thermal Analysis                                               | 59              |  |  |  |
|              |                                                                | 3.1.3.3 Router Micro-architecture Thermal Analysis                         | 30              |  |  |  |
|              |                                                                | 3.1.4 Reliability Modeling                                                 | 30              |  |  |  |
|              | 3.2                                                            | Experimental Set-up and Result Analysis                                    | <u>3</u> 1      |  |  |  |
|              |                                                                | 3.2.1 Power Results and Analysis                                           | <u> 5</u> 2     |  |  |  |
|              |                                                                | 3.2.2 Thermal Hotspot Results                                              | 34              |  |  |  |
|              |                                                                | 3.2.3 Reliability Results Analysis                                         | <u> </u>        |  |  |  |
|              | 3.3                                                            | Inferences                                                                 | 71              |  |  |  |
| 4            | Q.,                                                            | stem Level NoC Delichility Enhoncement                                     | 79              |  |  |  |
| 4            | зу<br>4 1                                                      | Depended Architecture and Techniques for Improving NeC Delichility         | ່ວ<br>74        |  |  |  |
|              | 4.1                                                            | Froposed Architecture and Techniques for Improving NoC Reliability         | 14<br>75        |  |  |  |
|              | 4.2                                                            | A 2.1 Comparation of Complex Energy Lifetime Distribution                  | 10<br>76        |  |  |  |
|              |                                                                | 4.2.1 Generation of Samples From Lifetime Distribution                     | 70<br>77        |  |  |  |
|              |                                                                | 4.2.2 Modified Reliability Estimation Tool and Reliability Result Analysis | (7<br>20        |  |  |  |
|              | 4.0                                                            | 4.2.3 Power-aware Floorplan Based Reliability Enhancement                  | 30<br>20        |  |  |  |
|              | 4.3                                                            | Conclusions                                                                | 33              |  |  |  |
| <b>5</b>     | $\mathbf{RT}$                                                  | L-Level: NoC Reliability Enhancement by Reducing Activity Factor 8         | 34              |  |  |  |
|              | 5.1                                                            | Related Work                                                               | 36              |  |  |  |
|              | 5.2                                                            | Clock Gating                                                               | 39              |  |  |  |
|              | 5.3                                                            | Proposed Work: Routing Decision Based Clock Gating On Input Buffers        | 90              |  |  |  |
|              | 5.4                                                            | Result Analysis                                                            | $\frac{92}{92}$ |  |  |  |
|              | 5.5                                                            | Conclusions                                                                | 95              |  |  |  |
|              | <b>.</b>                                                       |                                                                            |                 |  |  |  |
| 6            |                                                                | tigating NBTI Stress in NoC Router                                         | )7<br>00        |  |  |  |
|              | 0.1                                                            |                                                                            | 98<br>00        |  |  |  |
|              | 0.2<br>C.2                                                     | Negative bias temperature instability (NBTT)                               | JU<br>0.1       |  |  |  |
|              | 0.3                                                            | Proposed work: NBTI Aging-Aware Dealy Modeling                             | ) I<br>) I      |  |  |  |
|              |                                                                | 6.3.1 NBTI Based Aging Modeling and Analysis for NOC                       | J2              |  |  |  |
|              |                                                                | 6.3.2 Mitigation Modeling to reduce the aging effect                       | J7              |  |  |  |
|              |                                                                | 6.3.3 Proposed Algorithms                                                  | )8              |  |  |  |
|              | 6.4                                                            | Experimental setup and Result Analysis                                     | )8              |  |  |  |
|              |                                                                | 6.4.1 Aging Analysis Methodology                                           | 11              |  |  |  |
|              |                                                                | 6.4.2 Cell Library Characterization                                        | 12              |  |  |  |
|              |                                                                | 6.4.3 Timing Analysis                                                      | 14              |  |  |  |
|              | 6.5                                                            | Conclusions                                                                | 17              |  |  |  |
| 7            | Со                                                             | nclusions and Future Scope 11                                              | 18              |  |  |  |
| $\mathbf{A}$ | . Hil                                                          | PER-NIRGAM TOOL GUI AND SETUP 12                                           | 22              |  |  |  |

| B. Network-on-Chip Power Model Integrated with NIRGAM   | 129 |
|---------------------------------------------------------|-----|
| C. Network-on-Chip Thermal Model Integrated with NIRGAM | 134 |
| D. Network-on-Chip REST Model Integrated with NIRGAM    | 137 |
| Publications                                            | 141 |
| Bibliography                                            | 155 |

# List of Figures

| 1.1  | Moore's Law                                                                                                                     | 2  |  |  |
|------|---------------------------------------------------------------------------------------------------------------------------------|----|--|--|
| 1.2  | Interconnect Delay and Gate Delay                                                                                               | 3  |  |  |
| 1.3  | Interconnect Delay and Gate Delay                                                                                               |    |  |  |
| 21   | Conceptual representation of an interconnection network. Here PE refers                                                         |    |  |  |
| 2.1  | to processing element and Mem refers to a Memory element                                                                        | 14 |  |  |
| 2.2  | A conceptual representation of point-to-point interconnection architecture                                                      | 17 |  |  |
| 2.3  | A bus-based interconnection architecture                                                                                        | 18 |  |  |
| 2.4  | A Hierarchical bus-based interconnection architecture                                                                           | 18 |  |  |
| 2.5  | A ring-based interconnection architecture                                                                                       | 19 |  |  |
| 2.6  | A crossbar bus-based interconnection architecture                                                                               | 19 |  |  |
| 2.7  | Basic NoC Architecture                                                                                                          | 21 |  |  |
| 2.8  | Network-on-Chip Tile Architecture with Mesh Topology                                                                            | 22 |  |  |
| 2.9  | A Router Microarchitecture                                                                                                      | 22 |  |  |
| 2.10 | A Buffer representation                                                                                                         | 24 |  |  |
| 2.11 | A Buffer with Virtual Channel Organization                                                                                      | 24 |  |  |
| 2.12 | NoC Message Format at Different Layers                                                                                          | 26 |  |  |
| 2.13 | Ring Topology                                                                                                                   | 27 |  |  |
| 2.14 | Mesh Topology                                                                                                                   | 27 |  |  |
| 2.15 | Torus Topology                                                                                                                  | 27 |  |  |
| 2.16 | Store-and Forward switching                                                                                                     | 30 |  |  |
| 2.17 | Virtual Cut Through Switching                                                                                                   | 30 |  |  |
| 2.18 | Wormhole Switching                                                                                                              | 31 |  |  |
| 2.19 | HOL Blocking and VC Buffer Organization                                                                                         | 32 |  |  |
| 2.20 | Reliability Categorization                                                                                                      | 33 |  |  |
| 2.21 | The Empirical Bath-tub Curve Representing Failure Rate With Time                                                                | 37 |  |  |
| 2.22 | CMOS Inverter Circuit                                                                                                           | 39 |  |  |
| 2.23 | CMOS Inverter Circuit Power Representation                                                                                      | 39 |  |  |
| 2.24 | Charging of Load Capacitance                                                                                                    | 40 |  |  |
| 2.25 | Discharging of Load Capacitance                                                                                                 | 40 |  |  |
| 2.26 | Reliability at Different Level and View                                                                                         | 42 |  |  |
| 2.27 | Y-chart                                                                                                                         | 46 |  |  |
| 2.28 | $Y-chart 2.0 \dots \dots$ | 40 |  |  |
| 2.29 | Reliability Representation at Different Level of Abstraction                                                                    | 48 |  |  |
| 2.30 | NOC-Rehability Representation at Different Level                                                                                | 48 |  |  |
| 3.1  | HiPER-NIRGAM: A Frame work for NoC Performance, Power, Thermal                                                                  |    |  |  |
|      | and Reliability Estimation                                                                                                      | 53 |  |  |
|      |                                                                                                                                 |    |  |  |

| 3.2  | NIRGAM NoC Simulator                                                          | 54 |
|------|-------------------------------------------------------------------------------|----|
| 3.3  | NIRGAM Integrated with Power Simulators                                       | 56 |
| 3.4  | Floorplan Generation                                                          | 58 |
| 3.5  | Hotspot Generation(HotSpot tool)                                              | 59 |
| 3.6  | Router Floorplan                                                              | 60 |
| 3.7  | Reliability Estimation (REST Tool)                                            | 61 |
| 3.8  | Topology Power Based on ORION 2.0                                             | 63 |
| 3.9  | Topology Power Based on ORION 3.0                                             | 63 |
| 3.10 | Power Models Comparative analysis based on Flit Size                          | 64 |
| 3.11 | Power Models Comparative analysis based on Buffer Size                        | 65 |
| 3.12 | Power Models Comparative analysis based on Number of Virtual Channel          | 65 |
| 3.13 | Hotspot profile(22 nm and 1 Ghz frequency)                                    | 66 |
| 3.14 | Hotspot Profile(32 nm, 2GHz, buffer Size=64 Bytes)                            | 67 |
| 3.15 | Hotspot Profile<br>(32 nm and 4 GHz , buffer size 32 Bytes) $\ldots$          | 68 |
| 3.16 | Min-Max temperature at different buffer size with various frequencies $\dots$ | 68 |
| 3.17 | MTTF values of Different Topology Size and nm Technology                      | 70 |
| 3.18 | MTTF values with different Buffer Size                                        | 70 |
| 3.19 | MTTF values at different frequencies                                          | 71 |
| 3.20 | 2D Mesh Thermal Profile                                                       | 72 |
| 3.21 | Thermal Hotspot Failure Profile                                               | 72 |
| 4.1  | HVT-NVT Router Mesh                                                           | 75 |
| 4.2  | HVT Buffer Based Router                                                       | 75 |
| 4.3  | Power of Different $V_{th}$ Cell NoC                                          | 78 |
| 4.4  | MTTF Values at Different Buffer Size With NBTI and TDDB Old REST              |    |
|      | Model                                                                         | 79 |
| 4.5  | MTTF Values at Different Buffer Size With NBTI, TDDB, SM and TC $$ .          | 79 |
| 4.6  | MTTF Values at Different Buffer Size With old and new REST Model at           |    |
|      | Constant Frequency                                                            | 80 |
| 4.7  | Lognormal Distribution                                                        | 80 |
| 4.8  | Weibull Distribution                                                          | 81 |
| 4.9  | A Without Power-Aware Floorplan Thermal Profile                               | 81 |
| 4.10 | A Power-Aware Floorplan Thermal Profile                                       | 81 |
| 4.11 | An HVT Based Buffer Thermal Profile                                           | 82 |
| 4.12 | MTTF of All Proposed Technique and Architecture. Here (A) refers to           |    |
|      | LVT router based NoC, (B) refers to HVT router based NoC, (C) refers          |    |
|      | to power-and thermal-aware floorplanned NoC with LVT router and $(D)$         |    |
|      | is Not based on power-and thermal-aware hoorplan integrated with <b>n</b> v i | 82 |
|      | 100001                                                                        | 02 |
| 5.1  | Dynamic Power of Router                                                       | 85 |
| 5.2  | Leakage Power of Router                                                       | 85 |
| 5.3  | CMOS Inverter HCI Effect                                                      | 86 |
| 5.4  | Basic Clock Gating Circuit                                                    | 89 |
| 5.5  | Input Port Clock Gating                                                       | 90 |
| 5.6  | Input channel with VC Buffers Clock Gating                                    | 91 |
| 5.7  | Power results without clock gating at different clock frequencies             | 92 |
| 5.8  | Power results with clock gating at different clock frequencies                | 93 |

| 5.9  | Power results without clock gating                                                                   | 94  |
|------|------------------------------------------------------------------------------------------------------|-----|
| 5.10 | Power results with clock gating                                                                      | 94  |
|      |                                                                                                      |     |
| 6.1  | Network On Chip Architecture                                                                         | 101 |
| 6.2  | NBTI based aging analysis                                                                            | 103 |
| 6.3  | Multi $V'_{dd}$ Mitigation technique to reduce the NBTI stress                                       | 104 |
| 6.4  | Aging-aware Timing Framework                                                                         | 110 |
| 6.5  | CMOS NBTI EFFECT                                                                                     | 111 |
| 6.6  | Shift in Threshold Voltage $V_{th}$ per year in Volt at 45 nm Technology                             | 112 |
| 6.7  | Shift in Threshold Voltage $V_{th}$ per year in Volt at 65 nm Technology                             | 112 |
| 6.8  | Change in $\triangle V_{th}$ of Inverter cell under NBTI stress $\ldots \ldots \ldots \ldots \ldots$ | 113 |
| 6.9  | Change in delay of Inverter cell under NBTI stress                                                   | 113 |
| 6.10 | Set of Critical path and their Slack Values                                                          | 115 |
| 6.11 | Delay statistics of before, after NBTI stress and with mitigation technique                          | 115 |
| 6.12 | Increase in % delay after stress and mitigation in % delay using multi $V_{dd}$                      | 116 |
| 6.13 | Comparison of different mitigation approach                                                          | 117 |
|      |                                                                                                      |     |
| A.1  | Directory Structure in HiPER-NIRGAM                                                                  | 123 |
| A.2  | Adding a New Project in HiPER-NIRGAM                                                                 | 124 |
| A.3  | Adding a Project Name in HiPER-NIRGAM                                                                | 124 |
| A.4  | Adding a Project Name and File Under The Project in HiPER-NIRGAM                                     | 125 |
| A.5  | NoC Parameters configuration in HiPER-NIRGAM                                                         | 125 |
| A.6  | NoC Application and traffic configuration in HiPER-NIRGAM                                            | 126 |
| A.7  | NIRGAM Execution in HiPER-NIRGAM                                                                     | 126 |
| A.8  | Result in Form of Graph in HiPER-NIRGAM                                                              | 127 |
| A.9  | Hotspot in HiPER-NIRGAM                                                                              | 127 |
| A.10 | Reliability Estimation in HiPER-NIRGAM                                                               | 128 |
| C.1  | floorplan description file                                                                           | 135 |
| C.2  | Power Trace File of NoC                                                                              | 135 |
| C.3  | Temperature Trace File of NoC                                                                        | 136 |
| C.4  | Thermal-aware floorplan using Simulated Annealing Algorithm                                          | 136 |
| D.1  | REST Tool Framework for Reliability Estimation                                                       | 138 |
| D.2  | Monte Carlo simulation based time to failure evaluation methodology for                              |     |
|      | NoC [67]                                                                                             | 138 |

# List of Tables

| $2.1 \\ 2.2 \\ 2.3$ | Comparison of asymptotic cost function [1]20Many-core chips employing NoC interconnect21Thermal and Electrical Properties [2]41                                 |
|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $3.1 \\ 3.2 \\ 3.3$ | Comparison Among Different Frameworks    52      NIRGAM Configuration Parameters    55      NIRGAM NoC Architectural Configuration Parameters Taken for the Ex- |
| 3.4                 | periment 62   Hotspot Configuration Parameters 66                                                                                                               |
| $4.1 \\ 4.2$        | Reliability Estimation Configuration Parameters76HVT, NVT and LVT Cell Information78                                                                            |
| $5.1 \\ 5.2$        | NoC power (mW) comparison using optimization technique                                                                                                          |
| 5.3                 | gating                                                                                                                                                          |
| 5.4                 | Timing Analysis with and without clock gating design                                                                                                            |
| 6.1                 | Symbols used in NBTI based Delay Modeling of NoC                                                                                                                |
| 6.2                 | Cell Library Delay(sec) Characterization at 45 nm Technology for Five<br>Year of NBTI Stess at 1.0 volt                                                         |
| 6.3                 | Cell Library Delay(sec) Characterization at 45 nm Technology for Five                                                                                           |
| 6.4                 | Year of NBTI Stess at 1.2 volt                                                                                                                                  |
| 6 5                 | Year of NBTI Stess at 1.4 volt                                                                                                                                  |
| 0.0                 | Year of NBTI Stess at 1.6 volt                                                                                                                                  |

## Chapter 1

# Introduction

Scaling of transistor towards the nanoscale era is continuous with every process generation as per Moore's law. The smaller feature size of Complementary Metal-Oxide Semiconductor (CMOS) technology increases the transistors density. Unicore processor's chip design approaches the end of the line due to the power wall [3], [4]. The proliferation of on-chip cores has lead to gain in performance and throughput of chip Multi-Processor (CMP). However, it shifts the paradigm from computational to communicationcentric design.

Intel Labs created an experimental many-core chip known as "Single-Chip Cloud Computer (SCC)" [5]. The cloud comprises of a scalable cluster of computers. The cloud consists of 24 tiles with two cores per tile, i.e. 48 cores are integrated into a single Silicon chip. It has 24 routers arranged in mesh topology with 256 GB/S bisection bandwidth. This development is intended to achieve the scalable communication, power consumption and on-chip performance for the near future [6]. The TeraFlops Research Chip, also known as Polaris [7], contains 80 cores on a single chip. It is also a tile architecture with mesh-based NoC. The Communication architecture of this chip allows the cores to transfer Terabits of data per second inside the chip. This chip delivered one trillion mathematical calculations per second (one Teraflops) while consuming 68W power. Some other industrial manycore chips like Tilera's 100-core [8] and Epiphany E64G401 [9] with 64 core show the current trends of IC industry towards many-core architectures.

Figure 1.1 shows the 40 years of trends shown in microprocessor design [10], [11], [12]. The Moore's Law [13] still holds with the number of transistors doubling every 24 months.



#### Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp

FIGURE 1.1: Moore's Law "C. Moore: Data Processing in Exascale-Class Computer Systems, April 2011" [10], [11]

The graph prediction shows that the increase in transistor count is proportional to the core count. The performance gains of the overall system, however, is governed by Amdahl's Law [14] even when parallel processing is employed.

Since the inception, of ICs (Integrated Circuits), fabrication is guided by Moore's law which continues to hold even with the advancement of the technology. Figure 1.1 shows growth in number of cores in recent years. Adding a new IP (Intellectual Property) block for providing the computational functionalities is no longer a challenge for many core and high-performance architectures. Main Challenge is supporting communication across these IP cores (processor, memory etc.) especially for high performance parallel applications. In design of such system, it is desired that communication should not become bottleneck and adversely impact the performance gains of the many cores.

Initially, IC chips contained only a small number of cores that were connected with traditional interconnect schemes such as point to point, bus, crossbars, and rings [15, 16]. The interconnection of cores with Point-to-point architecture demands more wires for connection. Increase in wire density often leads to complex layout designs and increased

overheads in respect of area, power and delay. As can be seen from Figures 1.2 and 1.3 that depict the technology trends, the difference between gate and interconnect delay has increased by more than  $100 \times$ .



FIGURE 1.2: Interconnect Delay and Gate Delay Source:International Technology Roadmap for Semiconductors (ITRS)



FIGURE 1.3: Interconnect Delay and Gate Delay Source:International Technology Roadmap for Semiconductors (ITRS 2005)

The shared communication architecture like a bus, also suffers from the arbitration delay and bandwidth limit [15, 17]. This also becomes single point of failure and a bottleneck point. The crossbar and crossbar based bus architecture suffers from the area and power overheads, and bandwidth bottleneck with increasing number of cores [15, 17, 18]. Therefore, a new architecture was required that could accommodate the growth of components and still provide the communication while maintaining the design constraints.

Network-on Chip (NoC) has emerged as an alternative to bus based interconnects. The idea is to 'route the packets and not the wire' [19]. This communication framework consists of interconnected routers, which are also connected to the IP cores. NoC offers a scalable and modular design [19–21]. In NoC, routers are arranged in a specific topology, connected to each other through links, and connected to cores through network interface unit. The NoC has following key characteristics:

- Scalable, modular and layered architecture.
- Point-to-point connection between the routers.
- Separation of communication and computation.
- Globally Asynchronous Locally Synchronous (GALS) implementation that decouples the IP blocks

With increase in level of integration, the number of cores as well as transistor count on a single chip are increasing. This technology and architecture evolution comes with new issues of power, temperature, and reliability. The Dennard scaling [4] stalls the key parameters such as voltage, frequency, clock speed, power, and performance (as indicated by flat line in Figure 1.1). The on-chip interconnection networks also suffer from these constraints imposed by the technology. The manufacturing variability, soft error, aginginduced wearout and lifetime degradation are parameters having adverse impact on the performance and may lead to the failure of on-chip interconnect network [22–25].

### 1.1 Motivation

Rapid shrinking of oxide thickness in CMOS scaling has increased the gate-oxide electric field, making transistors susceptible to wearout. At high transistor densities, as encountered in modern chips, wearout effects such as Hot Carrier Injection (HCI), Time-Dependent Dielectric Breakdown (TDDB), Negative Bias Temperature Instability (NBTI), Electromigration (EM), Stress Migration (SM) and Thermal Cycles (TC) have begun to manifest and started to affect device performance within the lifetime of a circuit. It is required to improve the system reliability and as NoC is an important constituent, it is required that the issues related to reliability of on-chip communication architectures be addressed.

Aging-induced wearout in NoC may lead to synchronization failure due to change in delay parameters of the transistors. Manycore IC designs with NoC architectures experience thermal and power inconsistencies that eventually affect its reliability. Increased transistor density increases the power dissipation and increases the susceptibility of the hardware to malfunctioning. This decreases the reliability of a Chip Multi Processor (CMP). These reliability issues have motivated the researchers to consider the long-term durability in design approaches to improve the lifetime of the devices. Recent studies by ITRS (The International Technology Roadmap for Semiconductors) show that a 10-fold decrease of transistor wear-rate will be needed in the next ten years to maintain current design lifetimes [26]. Any failure mechanism is likely to trigger a permanent fault; rendering system useless.

The wearout of a particular processing unit in Chip Multi Processor may not damage the whole system, but a single fault in the NoC may lead to complete system failure due to protocol-level deadlock and disjoint connectivity of components [27]. The aging effect deviates the transistor parameters from design specifications and degrades the circuit performance. The change in path delay due to aging creates the timing uncertainty in the circuit. Sometimes this uncertainty results not only in path delay but also leads to operation failure after a period. Therefore, these issues have motivated the researchers to consider the reliability analysis and long-term durability in NoC design approaches as well.

The NoC can be viewed at macro and micro level. The macro-architecture is viewed at higher abstraction level. The topology and routing are the design parameters at this level, known as coarser-grained granularity [28]. The micro-architecture view of NoC considers the granularity of individual hardware components. The router microarchitecture components (buffer, routing logic, allocator, arbiter, crossbar and pipeline architecture), and network interface controller are considered at the fine-grained granularity [28].

Most of the research in NoC reliability domain, address the problem at macro-level. These all consider the faults-(1) link failure disconnecting two routers or (2) node failure

(one router fails and gets disconnected from NoC network). Therefore, the topologies are no more regular. A variety of solutions have been proposed [29–33] to handle the irregular topology. The topology agnostic routing algorithms [32] handle the irregularities using reconfiguration concept [34, 35]. These routing may be adaptive, source based and distributed in nature. The table and logic based (LBDR, uLBDR and  $D^2$  LBDR) implementations [30, 31] handle the single and more than one link failure in the 2-D mesh topology.

Few research papers address the aging effect on NoC. The majority of aging work is still aimed at macro-view i.e system-level. X. Fu *et al* [36] discussed NoC structure (combination logic like VCA and storage-cell like buffers) based NBTI analysis and mitigation mechanism. J. A. B. Fortes *et al* [36] and J. Wang [37] focused on router aging, while the authors in [38] consider both router and link aging degradation in NoC. Some of the work is published on aging-aware NoC. Bhardwaj *et al* [39] proposed two different routing algorithm based on congestion-oblivious Mixed Integer Linear Programming (MILP) and congestion-aware adaptive routing algorithm to reduce aging-induced power-performance overheads and enhance the system robustness. L. Wang *et al* [40] have proposed a dynamic programming-based lifetime aware routing algorithm to improve the reliability of a NoC.

Some research articles have focused on the aging of NoC at the device or the microarchitecture level. Paul V.Gratz *et al* [27] address the activity factor and duty cycle based HCI and NBTI analysis and solution of router microarchitecture at RTL Level. Ancajas *et al* [41] improve the lifetime of a NoC router by balancing the switching activities around the circuit.

The major source of the power variation is workload. The change in power density is responsible for increases in temperature. The temperature is the major factor that changes the transistor characteristics and degrades the performance of a circuit. Hence, power and thermal induced effect decrease the reliability of NoC. Therefore, we need an extensive analysis and design to enhance the reliability of NoC.

Based on the extensive literature survey on challenges in NoC reliability analysis from system to device level and aging-aware designs, following are the motivations for our work:

- Due to technology increasing the scale of integration, transistor is more vulnerable to aging-induced failures and may make NoC more susceptible to failures. The failure rate at 16 nm technology is predicted to be almost 100 times that at 180 nm technology [24]. Hence, it is required to make NoC more capable of handling the failure rate.
- Reliability of NoC can be analyzed at the different levels of abstraction (e.g., System, RTL, Circuit, and Device). The majority of work addresses the reliability or fault-tolerance of the routing algorithms and focus only on the macro view of NoC. Very few work address the aging based wearout in NoC. It is needed to explore and analyze the aging effect on NoC.
- The majority of NoC reliability work address at system-level. A very few power and temperature based reliability tools are available to explore these challenges in NoC. A quick NoC reliability estimation is required at system-level.
- A single framework is needed to estimate performance, power, temperature, and reliability based on the NoC design parameters (e.g.,number of VCs, flit width, topology size, buffer depth, frequency, traffic) at macro and micro level. An analysis is required to identify the factors that primarily affect the NoC reliability.
- As per literature survey, only a few work target the aging mitigation at router micro-architecture and device level of NoC. The majority of work is related to aging-aware routing solutions. Hence, these limited analyses and solutions need to be extended.

In this research work, our major emphasis is on the power and thermal induced reliability analysis at system-level. To assess the effectiveness of our proposals, we have analyzed the different lifetime failure mechanism by varying the NoC parameters and estimated the reliability of NoC. To estimate the aging-induced delay degradation, we have proposed "Aging-Aware Timing" framework at device as well as system level.

### 1.2 Objectives

The objective of this thesis work is to analyze and design reliability-aware NoC architectures. This work demonstrates the rigorous analysis of power and thermal induced reliability issues facing modern NoC architectures. These reliability problems are handled at different abstraction levels, and reliability enhancement solutions are proposed.

In this thesis, we consider aging effects at different levels of abstraction. These levels are termed as System, RTL and Device level. The objectives at these different levels are listed as follows:

#### 1. System-Level:

- Identification of the major factors affecting the reliability of NoC and its components.
- Designing a single framework for NoC design metrics (performance, power, thermal and reliability) estimation.
- Propose a new microarchitecture of NoC and its components to enhance the reliability of NoC.
- 2. *RTL-Level:* 
  - As per literature survey the system level represents a more abstract view. For more accurate modes, one needs to consider reliability at RTL level. The switching activity is primary factor that affects the reliability of NoC. This metric is measurable at RTL level.
  - Activity factor and switching are primary causes of HCI life time failure mechanism, hence our objective is to modify the microarchitecture such that the switching activity is reduced and the reliability of the NoC and its components is improved.
- 3. Device-Level:
  - NBTI (Negative Bias Temperature Instability) is responsible for the reduction in the reliability of the transistor.
  - NBTI induced aging analysis is required at device level and a mechanism needs to be devised to manifest NBTI effects at RTL/System level. To achieve this cell library needs to be characterized as per NBTI effects.

Review of related work along with the research gaps that need be addressed shall be discussed in subsequent chapters.

#### **1.3** Contributions

Our primary focus is the analysis and design of reliability-aware NoC architectures. This work demonstrates the rigorous analysis of power and thermal induced reliability issues facing modern NoC architectures.

The significant contributions of thesis are as follows:

- 1. A critical review of the reliability approaches already published in literature for providing the system-level to device-level NoC reliability.
- 2. On the basis of our state of the art survey, we found that a framework is needed for reliability estimation at the system-level for 2D NoCs. We develop a framework called "HiPER-NIRGAM" for performance, power, thermal and reliability estimation for 2D mesh NoCs. This framework employs NIRGAM, a NoC simulator [42], ORION tool for power estimation of NoC and its constituent elements, Hotspot tool for thermal profiling of a chip based on its floorplan and power traces and REST tool for reliability estimation.
- 3. An analysis of reliability estimation using "HiPER-NIRGAM" framework indicates that MTTF (Mean Time To Failures) value depends on the power dissipation, thermal profile and, the floorplan of NoCs. Our proposed solution is (1) use of multi- threshold cell based routers and (2) thermal-aware floorplan for enhancing the reliability of 2D mesh NoCs. For better reliability estimation at system-level, we have used Weibull distribution instead of Lognormal distribution.
- 4. Our rigorous experimental analysis shows that the power dissipation is the key factor in improving the system reliability. We develop a novel "Routing Logic Port" enabled clock gating at input channel buffers at router micro-architecture for low power NoC design.
- 5. The proposed technique at Register-Transfer-Level (RTL) can significantly reduce the router FIFO buffer power. The HCI degradation depends on the switching activity. Our proposal seeks to reduce this activity at RTL-level.
- The rapid shrinking of CMOS design margins in deep submicron technology has made aging mechanism such as Negative Bias Temperature Instability (NBTI), a

prime concern in NoC design. We propose a novel "Pre-Silicon NBTI Aging-Aware Design of Reliable NoC Router". To accomplish this goal, we have characterized and developed aging-aware cell library. In this thesis, we present our novel "RTL-level aging-aware timing analysis framework" and analyze performance on the PARSEC Benchmark Suite (real-time workload stress). Our proposed NBTI stress mitigation algorithm for minimum multi  $V_{dd}$  cell and technique are able to reduce aging-induced NBTI delay effects, hence enhancing the router performance under stress.

### 1.4 Thesis Organization

The remaining thesis is arranged in the following order. In Chapter 2, we have provided a small overview of interconnection network and evolution of NoC. In Chapter 3 we discuss our framework "HiPER-NIRGAM" that can be used for power, thermal induced reliability analysis. In Chapter 4 we discuss system-level NoC reliability analysis and enhancement such as multi-threshold cell-based router microarchitecture and power-aware floorplan to enhance the reliability of NoC. Chapter 5 considers reliability estimation at RTL level and presents clock gating based reliability improvement in NoC. The device-level solution for NBTI based stress analysis and mitigation technique have been discussed in Chapter 6. Conclusions of the thesis are discussed in Chapter 7 that also provides the directions for future work.

## Chapter 2

# Interconnection Networks

Various processing elements of a modern chip are connected through a communication architecture. The communication networks such as buses and NoC, facilitate the communication among the IC components. This research primarily focuses on the Networkon-Chip and router micro-architecture.

The technology advancement and packaging technique increase the transistor count in a single chip. IC architects aim at utilizing this increased transistor density and number of cores to improve the computation engine of the processor design.

They integrate multiple simple cores in a single chip to facilitate the parallelism. To further enhance the performance, the multiple cores must be connected through a highspeed interconnection network. The NoC has emerged as a scalable solution for such an interconnect.

In Section 2.1 we present the trends of computation and communication. In, Section 2.2, we discuss the evolution of on-chip communication architecture. The next section presents the basics of NoC components and router micro-architecture. The reliability of transistor and its relevance to NoC are presented in Section 2.4. Section 2.5 discusses about power and temperature induced aging effect. In, Section 2.6, we present the state-of-the-art review and last section summarizes the chapter.

## 2.1 Computation and Communication: A Taxonomy

Since 2000, uni-core processors have faced tremendous challenges in respect of improvements in performance and energy efficiency. Before technology entered into the deep sub-micron era, IC industry mainly focused on performance with low fabrication cost. Reducing the area, i.e., decreasing the transistor count is the primary approach to reduce the manufacturing cost, while increasing the clock frequency is the only way to improve the performance. How to achieve the improved performance and power efficiency while keeping aligned with the advanced fabrication technology is challenging. The following factors emerged as a big obstacle, and forced the IC and computer architects to migrate from unicore to multicore designs:

- 1. Memory Wall: Memory wall described the growing disparity between microprocessor clock rates and off-chip memory and disk drive I/O rates [43]. The only available option to address this is to increase the cache size for extending the memory bandwidth and preventing it from becoming performance bottleneck.
- 2. ILP Wall: ILP stands for instruction level parallelism. It is hard to exploit the parallelism in the single instruction stream for the high-performance on a single-core. Previously, higher clock frequency and wide-instruction issue architecture have been used to improve the performance of a processor. To further enhance the performance of the CPU, a variety of the methods are used such as splitting instruction into the microinstructions, increasing the pipeline depth, and aggressive branch prediction [3, 44, 45]. But at the same time, deeper the pipeline stages higher is the cycle-per instruction (CPI) and consequently increased delay diminishes the benefit of the performance [46]. The performance gain by increasing the pipeline depth required more register and control logic, increasing the complexity of design, cost and power consumption. Further, The performance has been improved using the wide issue instruction processor architecture such as superscalar and Very long instruction word (VLIW). Even in VLIW, instruction width cannot be increased infinitely and the technological constraints limits this width to 10 instruction [47].
- 3. **Power Wall:** CPU performance is directly proportional to the frequency. Earlier, IC designers would increase the frequency to take advantage of performance gain.

Downside, however, is that power consumption is directly proportional to operating frequency. Power is a crucial constraint for the system and circuit design. Beyond a certain point power wall restricts enhancing the performance using frequency. The other key factor impacting power wall is the transistor size. Shrinking in transistor size results in lower capacitance, higher operating frequency and increased transistor count per unit area. In the deep submicron era, the leakage power plays a significant impact on power wall. Power density increases with advancement of submicron technology.

The combination of these three walls -memory, ILP, power- is called "brick wall" [48].

The Moore's law still holds good in respect of the scaling of transistor size with every process generation. Reduced dimensions of CMOS cells also increases the on-chip transistor density. The Intel's "Prescott" [49] CPU chip was estimated to generate about 40% more heat per clock cycle than earlier variants. Subsequently it was given the name "PresHot" [50]. This is an end line chip for unicore processor design due to the power wall. To accommodate the increase in transistor count, design issues of "brick wall", IC fabrication industry moved to a new era of "Multi-core" architecture.

The primary task is to reduce the frequency, but it still needs to increase the performance, which is contradictory. The computer architects trade-off the transistor cost with the increase in the resources such as core, to explore instruction level parallelism (ILP) and thread level parallelism (TLP) to exploit more parallelism [3]. Also, multiple-cores can be operated at low frequency and voltage to reduce the power consumption, when workload is not computation-intensive.

For maintaining validity of Moore's law the IC designers and manufacturers add more and more components/resources on a chip. Modern chips have come to be known as systemon-chip (SoC) as these integrate the required electronic circuits of various computer components onto a single chip. High-performance and parallel computing platforms have several computing cores. These computing cores are not as complex as a single high-performance core. An efficient and effective communication architecture is required to achieve the benefits of higher degree of parallelism. A high speed communication architecture should have low latency and higher network bandwidth [51]. Hence the paradigm has now shifted from computational centric designs to communication centric designs [22]. Figure 2.1 depicts the conceptual representation of an interconnection network.



FIGURE 2.1: Conceptual representation of an interconnection network. Here PE refers to processing element and Mem refers to a Memory element.

A typical SoC contains different components such as Processing Elements (PEs) like DSP and IP cores, Communication Elements (CEs), Input-Output components and sensors. These elements transfer and share their information through the CEs like crossbar, bus, router and NoCs. As per above discussions, the performance of multicore architecture depends on the interconnection network. The following are the major design consideration factors for choosing an effective interconnect network [52].

- Performance: In many core architecture different applications are running on different PEs. The overall system needs high communication speed for transferring the information among PEs and other components. The performance of interconnection network plays a significant role to decide the overall performance gain [52, 53]. The following are major parameters that define the performance:
  - *Latency:* the time taken by a message from a source to destination. Latency can be represented as:

$$T_{latency} = T_{received} - T_{generate} \tag{2.1}$$

where  $T_{received}$  and  $T_{generate}$  represent the time of generation at the source and time of reception at the destination respectively. The interconnect material, topology, router architecture, routing algorithm play a significant role in propagation, processing, transmission and queuing delays of interconnecting networks. These delays affect the overall latency of the network.

- *Throughput:* The amount of data transferred successfully per unit time. The bandwidth is the primary factor affecting the throughput of the interconnection network [52, 53].
- 2. Modularity and Scalability: The modularity is referred to the ease with which current hardware can be extended and/or adapted to new technology by adding new module or component without any modification in existing structure. Modularity increases the reusability and reduces complexity, and design time. The scalability is the property of a system to accommodate the growth of the system. Adding more resources improves the system performance. Interconnect network must be modular and scalable so that it is easy to add more components in the system [52, 53].
- 3. Physical design and layout: More components shall increase communication requirement and lead to increase in wire length and connections. The physical design involves floorplan, placement, and routing of wiring between the components. The actual layout must match to the physical design constraints like area, power and the number of pins required for communication channels [52, 53].
- 4. Cost and Time to Market: The NRE (Non-Recurring Engineering) cost, i.e., the monetary cost of designing the network should not imbalance the overall system cost. The overall design gain and actual cost must match with cost constraints. The large volume of interconnect design reduces the unit cost of the product. A quick prototyping of interconnection network, lessen the time to market, hence lowering the cost [52, 53].
- 5. **Reliability:** The system components are communicating through an interconnection network. Hence it acts as a backbone for the whole system. Any failure in this network renders system useless. The interconnection networks should be fault tolerant, i.e., it should provide the alternative path for delivering the requested information correctly [52, 53].

## 2.2 Why Network-on-Chip: Evolution of On-chip Communication Architectures

Evolving of the many-core era necessitated need of an efficient communication architecture to connect processing unit, interface components, storage unit and communication elements on a single chip [15], [53].

#### 2.2.1 Basic Component of SoCs

Following are the few important components of a SoC [54].

- Processing Element (PE): These elements can belong to any category –Application Specific Integrated Circuits (ASICs), standard processors, custom-made processors and special-functions hardware components/IPs.
- 2. Interface Components (IF): These comprise of transducers, bridges, arbiters, and interrupt controllers. IFs are designed when input-output protocols of some system component do not match, like conversion of 16-bit data to 8-bit data [54].
- 3. Communication Elements (CE): These can be designed as bus, crossbar, ring, or router [54].
- 4. Memory: These storage elements can be local (or private) to a PE or shared across multiple PEs.

The CEs must provide high performance, low power, low latency and must handle the different types of traffic and load. The communication infrastructure must be scalable and bandwidth-aware to accommodate the traffic.

#### 2.2.2 Growth and Development of Interconnect Network

The following paragraphs explain in brief the evolution of the interconnection network.

1. Point-to-Point Interconnect Architecture:

In point to point (PTP) interconnect architecture, components are connected via dedicated links. An advantage is 100% bandwidth availability. Downside however is that the wire length and metal density increase with the number of components. This raises the physical complexity of implementation like floorplan, placement of component and routing of wires. PTP no longer provides a low-cost and scalable solution [15] [53] [54]. Figure 2.2 shows the conceptual view of point-to-point interconnection network architecture.



FIGURE 2.2: A conceptual representation of point-to-point interconnection architecture

2. Bus Architecture:

To overcome the limitation of dedicated physical link i.e PTP architecture, a shared communication medium called bus was employed to connect the components. For resolving conflicts in bus access, arbiter needed be implemented at the hardware level.

The bus architecture is easy to implement, has lower complexity, reduced design time and hence a more cost effective solution than PTP architecture but suffers from arbitration delay (which increase as number of core increase). Increase in sharing of the bus, limits the maximum achievable bandwidth. The propagation delay along the bus increases with its length and reduces its operating frequency. The sharing the bus resources at recent technology would increase crosstalk, noise and variability problems. Figure 2.3 shows the conceptual view of bus-based interconnection networks architecture [52], [53].

3. Hierarchical Bus Architecture:



FIGURE 2.3: A bus-based interconnection architecture

In segmented bus structure more than one bus with different protocols are connected to each other using the special IF unit called "bridge". The advantage is that each of the bus can be operated at higher clock rate. The transmission of data can be parallelized reducing the communication bottleneck. Disadvantage includes communication delay due to complex IF unit, the arbitration complexity and associated delay. Figure 2.4 shows the conceptual view of hierarchical bus-based interconnection network architecture [15], [51], [55].



FIGURE 2.4: A Hierarchical bus-based interconnection architecture

4. Ring Based Architecture:

In this architecture, components are connected in ring fashion. The message in ring architecture passes through every node between source and destination, leading to increase in the communication delay. The single fault in any element can affect the communication of whole system [51], [52]. Figure 2.5 shows the conceptual view of ring-based interconnection networks architecture.



FIGURE 2.5: A ring-based interconnection architecture

5. Bus Matrix and Crossbar-Bus Architecture:

In this architecture multiple buses are connected through the crossbar matrix. The major disadvantages of crossbar bus architecture are area and power overhead and scalability issues [54], [55]. Figure 2.2 shows the conceptual view of crossbar busbased interconnection networks architecture.



FIGURE 2.6: A crossbar bus-based interconnection architecture

None of the above architectures have capabilities to handle the communication challenges of multi-core architecture. A new interconnect architecture is needed in view of the following:

• The growth of the number of cores per chip requires scalable communication network without any performance degradation.

- The gap between the gate delay and wire delay is large, hence it is hard to synchronize between computation and communication element.
- The wire delay increases exponentially. Hence a new architecture is required that can reduce wire length without creating the bottleneck in the network.
- The communication network must be fault tolerant.
- The communication network provides the flexibility to adopt the new changes.

The network on chip (NoC) has been projected as a solution towards a flexible, scalable, modular and fault tolerant architecture. The NoC can connect any number of homogeneous or heterogeneous both types of components/cores. The major concept of NoC is "route the packet not wire". Bolotin *et al.* [1] analyzed the asymptotic cost function of various interconnects. Table 2.1 shows the cost function of traditional interconnect and NoC, As can be seen here, NoC is the best choice with increasing number of connections.

| Interconnect | Total Area       | Power Consumption | Maximum Frequency  |
|--------------|------------------|-------------------|--------------------|
| РТР          | $O(N^2\sqrt{N})$ | $O(N\sqrt{N})$    | $O(\frac{1}{N})$   |
| NS-Bus       | $O(N^3\sqrt{N})$ | $O(N\sqrt{N})$    | $O(\frac{1}{N^2})$ |
| S-Bus        | $O(N^2\sqrt{N})$ | $O(N\sqrt{N})$    | $O(\frac{1}{N})$   |
| NoC          | O(N)             | O(N)              | O(1)               |

TABLE 2.1: Comparison of asymptotic cost function [1]

### 2.3 Network-on-Chip Architecture

Parallel processing using multiple cores is the basic need of high-performance computing. The dependencies between the application and processes, running on different cores increase the communication among them. In NoC communication fabric, cores are connected through a network of routers. Any communication between two cores is routed through these. A core is connected to one of the routers through NI (network interface). NoC is a distributed in nature, fault tolerant and have all features needed by many-core communication architecture. Since the NoC is a backbone for CMP and MPSoCs, a lot of design effort have been put to develop and improve it. Some examples of NoC are Kilo-NoC [56], Nostrum, SPIN, Xpipes, ArterisTM, and STNoCTM. Table 2.2 lists some of the NoC based many-core products.

| Chip/Product Name                       | Vendor   | No.of Cores |
|-----------------------------------------|----------|-------------|
| Single Chip Cloud Computer (SCC) [5, 6] | Intel    | 48          |
| TeraFlops Research Chip/Polaris [7]     | Intel    | 80          |
| TILE-Gx100 [8]                          | Tilera   | 100         |
| Epiphany-IV E64G401 [57]                | Adapteva | 64          |

TABLE 2.2: Many-core chips employing NoC interconnect

The Epiphany-IV E64G401 [57] has 2D eMesh NoC architecture. The Epiphany [9] is scalable upto 4,095 processors with NoC communication architecture.



FIGURE 2.7: Basic NoC Architecture

Figure 2.7 shows a  $4 \times 4$  2D mesh interconnecting 16 nodes. Here R, represents NoC router and blue lines depict the physical links among routers. Network Interface (NI) is the interface between the IP/core and router.

#### 2.3.1 NoC Components

NoC has tile based modular architecture that provides scalability and reduces the complexity of the design. Each tile contains at least three components-IP-core, router, network adapter. In CMP, all cores are homogeneous and all tiles are similar in size and functionality. MPSoC has heterogeneous components with tiles having different functionality and size. Figure 2.8 shows the tile architecture and its components. The following are major parts of NoC.


FIGURE 2.8: Network-on-Chip Tile Architecture with Mesh Topology

#### 2.3.1.1 Router

The major work of a router is to forward the packet or flits over the links en-route the source-destination cores. The router implements the communication protocols for a reliable communication among the network nodes. The main components of a router are input Virtual Channel (VC) buffers, Routing Unit (RA), Virtual Channel Allocator (VCA), Switch Allocator (SA) and Crossbar (X-bar). The router micro-architecture and components are shown in Figure 2.9. In a 2D mesh, a router has five ports-four in each direction (North, South, East, West) while 5<sup>th</sup> port is used for local PEs [17], [18], [58].



FIGURE 2.9: A Router Microarchitecture

The router receives the packets/flits at input port and stores them in buffer till the computation of the desired output port based on the destination address. Following is a brief description of the router components.

• Buffers and Virtual Channels: Buffers are used to temporarily house the incoming packets or flits when output processing is slower than the incoming flit or packet rates. Buffers can be implemented on both sides of the routers, i.e., input and output ports. As buffers add to area and power overheads, these are not often used at output. These buffers use the First-In-First-Out (FIFO) data-structure in implementation.

The buffers have the positive impact on the throughput of the network. Hence, the requirement of total buffer capacity must be calculated and analyzed carefully [59, 60]. Different buffer organization are proposed to improve the efficiency of router. The virtual channel (VC) based buffer organization reduces the Headof-Line (HOL) blocking [17, 59, 60]. Hence the waiting of packet or flit in a queue i.e queuing time reduces and throughput is increased. The throughput and latency depends on the buffer size, buffer depth and number of virtual channel [59, 60]. The total buffer size is calculated using following equations 2.2 and 2.3.

Total-Buffer Space = No. of VC × No. of blocks in VC × Block size 
$$(2.2)$$

Total-Buffer Space = No. of VC × Buffer depth × Flit size 
$$(2.3)$$

In homogeneous configuration, VCs share total buffer space equally. If VCs are different in size, it is called heterogeneous organization of buffers [15, 17, 18]. Figures 2.10 and 2.11 display the simple buffer and VC based buffer organization. Buffer-less router [61, 62] with deflection routing have been proposed for low-power design. In [63], the authors have proposed a minimal buffer architecture. The buffer-less and minimal buffer architectures have trade off between the low-power and performance.



FIGURE 2.10: A Buffer representation



FIGURE 2.11: A Buffer with Virtual Channel Organization

#### • Route Computation Unit:

The route computation unit is an implementation of the specific algorithm that decides the path of packets or flits. Routing algorithms tend to distribute traffic in a manner that decreases the hotspot nodes and minimizes the contention.

The routing mechanism are of two types, first one is deterministic and the other one is adaptive. Routing implantation can be table based, logic based or algorithm based. The routing must be deadlock and livelock free. The turn models and routing restriction are the ways to provide the deadlock freedom.

• Crossbar:

The crossbar switch is used to connect input port of a switch to output port in a non-blocking manner. The crossbar is the second largest part of the router design and requires more area and has high power consumption. It plays a significant role in the critical path of the router. Low latency and high throughput designs such as double-pumped or bit-interleaved crossbar architecture are preferred because of the performance. The hierarchical crossbar is used as it is a low-power architecture.

• Allocator and Arbiter:

An allocator maps N requests to M resources. A router contains multiple VCs, the Virtual-channel allocator (VA) resolves the conflicts or collision, and assigns the output port VC to a packet/flit waiting at the router's input port. The VC allocation is performed only on head flit. For subsequent data and tails flits, the same VC allocation remains applicable. After the VC allocation phase is completed, the switch allocator (SA) provides the schedule to a crossbar for flits queued in VC buffers. The SA resolves the collision between flits destined for the same output port. The arbiter deploys any one of selection strategies such as the first-come-firstserve, round robin, and priority based algorithms, configures the crossbar switch and assigns output port to some input ports.

• Switch traversal Unit: After completion of VA and SA stage, the flits are read from buffers, propagated across the crossbar and, finally, forwarded to the router next in the path.

#### 2.3.1.2 Network Interface (NI)

The network interface is a communication interface unit between the processing element and router. The core is the source of traffic injection; NI splits the message into the flits. For the flits coming from router port(s), NI combines them into messages before transmitting to the processing element.

#### 2.3.1.3 Link

Link refers to a physical point to point communication entity. In a NoC topology, the routers are connected by the link. The failure of a link may disrupt communication or even lead to deadlock. The noise and error-free links are desired for reliability.

#### 2.3.2 Message Format

Figure 2.12 depicts the representation of data at different NoC layers. The message originates at the application layer. After this layer messages are divided into packets, these packets may be fixed in size or variable in length. In NoC, generally, these packet are equal in size and format. Packets are divided into small unit called "Flits". Each



FIGURE 2.12: NoC Message Format at Different Layers

packet becomes a sequence of one head flit, multiple data flits and one tail flit. The head flits have information of destination tiles and route. The data flits of same packet follow the head flits, and tail flit indicates the end of a packet. The flits structure is used for reducing the need of large buffers to save power and area of a router. In some design, a flit is divided into the phit and directly sent over the physical link [53, 59, 60].

#### 2.3.3 Topology and Routing:

Topology and routing represent the macro architectural view of an interconnection system at higher abstraction level. The on-chip network topology determines the following properties of the network:

- Physical Layout
- Connection between the nodes and channels
- Wire length per hop, total wire length and metal cost
- Maximum number of hops a message may have to traverse
- Network latency
- Energy consumption per hop
- Path diversity



FIGURE 2.13: Ring Topology



FIGURE 2.14: Mesh Topology



FIGURE 2.15: Torus Topology

Figures 2.13, 2.14 and 2.15 show the ring, mesh and tours topologies respectively. An NoC topology can be evaluated in metrics that can be broadly categorized into -(1) traffic independent and (2) traffic dependent. Following is a brief description of these metrics.

#### 1. Traffic Independent Metrics:

• *Degree:* This is defined as a maximum of the number of links at a node. Higher degree indicates more connections and possible higher path diversity. Wire placement and routing cost increases with the degree. Also, the higher degree nodes suffer from the area and power overhead.

The ring topology in Figure 2.13 has degree two, and a torus topology in Figure 2.15 has degree four. A 2D mesh also has degree 4 but for a router, the degree varies from 2 to 4. Interior routers have a degree of 4, boundary routers have a degree of 3 whereas corner routers have a degree of 2.

- *Bisection Bandwidth:* This bisection bandwidth represents the maximum data that can be transferred from one side to other when the network is partitioned into two parts by a cut. It decides the maximum wire requirement. This metric is important for off-chip networks rather than on-chip networks [15, 59, 60].
- *Diameter:* It represents the maximum distance between any two nodes in a network using the shortest path. A fully connected network has diameter value one because each node is directly connected to another node. The diameter provides a rough estimation of latency of a network [15, 17, 59, 60]. The ring, mesh and tours topologies have diameter of 4, 4 and 2 respectively.

#### 2. Traffic dependent Metrics:

- Hop count: The hop count is defined as number of links traversed by a message enroute from the source to the destination. The average hop count of topology is the average computed over hops for all source-designation pair [15, 17, 59, 60]. This metric is indicative of network latency.
- *Path diversity:* It refers to maximum number of parallel paths supported by the topology for source-destination pair. A higher path diversity can be used for balancing the traffic as well as power dissipation and preventing formation of hot spots. The path diversity also improves fault tolerance by providing

an alternative path to routing algorithm for flits transfer. The ring topology exhibits a path diversity of 2 only [15, 17, 59, 60].

#### 2.3.4 Flow Control and Switching Technique:

Flow control and switching are tightly coupled to each other. The switching is responsible for allocating the network resources such as buffers and channels. The flow control regulates the rate at which packets access the buffers, and provide the rules and mechanism to reduce the information loss at higher traffic loads. Flow control provide the synchronization of packet flow between the source and destination, by permitting and stopping the packet transfer based on network congestion [59, 60]. Back pressure mechanism is employed for flow control. When buffers at a router reach predefined threshold, a signal is sent to the neighbor router along that direction signaling congestion. Once this signal is received by a source node, the injection rate is reduced. The switching methods are categorized based on the information structure. The most available flow control mechanisms are ACK/NACK, Stop and Go and credit-based flow control [15, 17, 59, 60].

- Circuit switching: A pre-link is established between source and destination before sending the actual data. The path or a circuit is established by sending a small message that contain the destination address and control information. After receiving this, the destination, sends the acknowledgment message to the source. The message transfer is initiated by the source only on receipt of the acknowledgment. This switching provides the maximum bandwidth. As all the required resources are preallocated, so there is no need of buffering. The circuit switching is suitable when transmission time is much higher than the circuit establishment time.
- 2. Packet Switching: In packet switching the long message is divided into smaller packets. The pre-path establishment and resource reservation are not required in packet switching. Each packet is independent of the other and may follow a different route to reach the destination. This allow for parallel transfer of packets and reduce message latency. The sequencing of the packet is necessary to obtain the original message. Possibility of contention at any intermediate node and/or network congestion necessitates the buffer provisioning at nodes. switch also forces to buffer of packets on nodes. It requires zero link set-up time. In some cases, the packet

transmission delay may be more than that of the circuit switching [15, 60]. Packet switching improves link utilization. Packet based switching can be implemented as follows:

• Store and Forward Switching (SAF): In this a node can forward the packet to next node, only after the entire packet has been received. For that each node must have the buffer capacity equal to total packet length. Buffering increases the waiting time, hence not suitable for time-critical applications.



FIGURE 2.16: Store-and Forward switching

Figure 2.16, illustrates the traversal of packet using SAF Switching. The packet consists of the 4-flits A, B, C and D. It requires four cycles to transmit the packet to each hop, hence the latency is four cycles per hop.

Virtual Cut Through switching (VCT): In VCT switching as soon as the packet header is received, its route is calculated by the routing algorithm. It is forwarded to next hop without waiting for subsequent or entire packets. As compared to SAF, latency is reduced and smaller buffer space is needed [17, 59, 60].



FIGURE 2.17: Virtual Cut Through Switching

Figure 2.17 shows progression of a 4-flit packet in VCT switching.

• Wormhole Switching (WH): As soon as the head flit is received at a node, it is routed to the next node. This is similar to VCT, the only difference is that in case of no buffer availability at next node, the flits are frozen in-situ. In such a scenario, the entire packet may be stored across multiple routers. In VCT, flits are allowed to progress till the router where head flit is blocked. This requires that buffer space in VCT should be multiple of packet size whereas in WH, it need only be multiple of flit size. This, however, can create HoL (Head of Line) blocking where one flit of a packet can block the flits (of other packets) queued after it. This is avoided by adding VC (virtual channel) i.e. multiple buffer queues at each input port [15, 17, 59, 60].



FIGURE 2.18: Wormhole Switching

Figure 2.18 illustrates flit traversal via wormhole switching. Here each node has buffer space that can store only two flits. Once the first two flit A and B are forwarded and stored at next node, the remaining data/body (C) flit and tail (D) flit are stalled at node 0. The dark grey color shows the idle/ stall condition of the packet. After moving the head flit to node number 2, the buffer space is free. Hence subsequent body and tail flit will move forward.

• *Virtual Channel:* Figure 2.19 depicts a scenario of the head of line blocking and how virtual channel based buffer organization handles the HOL blocking.

# 2.4 Reliability of NoC

In accordance with Moore's law [13] the transistor dimensions are expected to scale down by about 30%, the circuit performance is expected to improve at a speed of 1.4x,



FIGURE 2.19: HOL Blocking and VC Buffer Organization

doubling the transistor density for each technology generation. The scaling of technology is required to satisfy speed, power consumption, design complexity and product cost required by advanced applications.

The operating frequency and voltage are reduced making the circuits more susceptible to permanent/transient failures owing to environmental and intrinsic factors. These factors affect the performance and may corrupt the functionality of a circuit. It is required to analyze their impact on the reliability of a circuit, to provide the methods to mitigate the effects of reliability failures.

#### 2.4.1 Reliability

Reliability describes the capability of a system to carry out its required features and functionality while sustaining its resistance to unexpected conditions, and its ability to remain operational for a long time. At deep sub-micron technology, CMOS reliability issues can be spatial and/or temporal. Spatial unreliability manifests as fabrication errors, can be detected after production. Faulty chips can be discarded. They can be random or systematic, may cause yield loss. Aging can also lead to spatial unreliability through systematic degradation of the circuit parameters. Temporal effects are time-varying and change depending on operating conditions such as the operating voltage, temperature, switching activity, presence and activity of neighboring circuits.



FIGURE 2.20: Reliability Categorization

The following are the details of spatial and temporal based effect on reliability.

Spatial Unreliability: The main source of spatial is process variability, which
is increasing with the scalability of the nanometer technology. Process variability
leads to the parametric variation of identical transistors within a circuit or a small
neighborhood variations in channel length and doping density between the devices.
These variation vary from die to die, wafer to wafer. The systematic variability
is caused by optical proximity correction, layout-induced strain, while the random
variability results from random dopant fluctuation (RDF) effects, line edge and
width roughness (LER and LWR), fixed charges in the gate dielectric and interface
roughness [64].

- 2. **Temporal Unreliability:** The temporal unreliability effects are observed in an operational circuit at certain conditions such as a given temperature, workload, and duration of time. Since last reboot etc. in this the circuit may experience temporary or permanent malfunctioning. Transient effects are temporary to the circuit, and disappear after the factors causing these are no more active. The temporal unreliability is categorized into following:
  - Aging Effects: Due to aging effect, the transistor performance is degraded, and ultimately affects the life-period of a circuit.

The circuit aging results in a time a time-dependent variation of the physical parameters, i.e., electrical properties of an IC over the time. HCI (Hot carrier injection), TDDB (Time dependent dielectric breakdown), BTI (Bias temperature instability), EM (electromigration) are the prominent physical manifestations of the aging [64, 65].

• Transient Effects: The transient effect changes the quality of a signal i.e signal integrity (SI). A good-quality signal guarantees reliable, error free and fast data transfer within and between the modules. The signals can be distorted by noise and interference. This distortion may result in circuit malfunctioning. Functioning can be restored once the noise and/or interference causing the transient error is removed. Figure 2.20 shows the category of the effect that makes the device unreliable [66].

#### 2.4.2 Life-Time Failure Mechanism

Figure 2.20 is a taxonomy of the effects responsible for making system and chip unreliable. Red color highlights the effects considered in this thesis work. In this thesis work we have explored the aging induced effect on the network of chip.

Network-on-Chip is adequate to overcome the limitations of traditional communication constructs like buses. The rapid shrinking of transistor size and consequent increase in transistor density have lead to increase in power density, and resulted in the thermal and reliability issues.

These issues may lead to lifetime failures like Time dependent dielectric breakdown (TDDB), Negative bias temperature instability (NBTI), Electromigration (EM), Stress Migration (SM) and Thermal cycles (TC) [64–66]. These are the leading

causes of reliability degradation. Therefore it is important to estimate the reliability and find ways to enhance it.

The lifetime failure models who are responsible for the wearout of the system are TDDB, NBTI, SM, and TC.

(a) Time dependent dielectric breakdown (TDDB): TDDB is caused by the formation of a conduction path between the gate oxide and substrate due to tunneling current. Temperature rise due to some thermal issues such as regional temperature differential and local hotspot accelerate TDDB. The MTTF (Mean Time To Failures) model of TDDB is given by Equation 2.4.

$$MTTF_{TDDB} \propto \left(\frac{1}{V}\right)^{a-bt} \times e \frac{X + \frac{Y}{T} + ZT}{kT}$$
(2.4)

where k is Boltzmann's Constant and a, b, X, Y and Z are model fitting parameters and the values are as follows a=78, b=-0.081, X=0.759eV, Y=-66.8eVK and  $Z=-8.37 e^{-4}eV/K$  as given in [67].

(b) Negative bias temperature instability (NBTI) NBTI affects PMOS transistors. It is the instability of PMOS transistor parameters such as threshold voltage, transconductance, saturation current under negative bias and high temperature. It exhibits time dependence logarithmically. NBTI Life Time Failure Model is expressed by the mathematical Equation 2.5.

$$MTTF_{NBTI} \propto \left[ ln(\frac{A}{1+2e^{\frac{B}{kT}}}) - ln(\frac{A}{1+2e^{\frac{B}{kT}} - C}) \times \frac{T}{e^{\frac{D}{kT}}} \right]^{\frac{1}{\beta}}$$
(2.5)

where A, B, C, D and  $\beta$  are model fitting parameters. In this work researchers used the values as A=1.6328, B=0.07377, C=0.01, D=-0.06852 and  $\beta = 0.3$ as per [68].

(c) Stress migration (SM): This physical phenomenon relates to the mass transportation of metal atoms due to mechanical stress generated because of thermal conflicts among metal and dielectric materials. This stress is proportional to change in temperature  $(T_0 - T)$  where T is the operating temperature of the device and  $T_0$  is the stress-free temperature (the temperature at which metal deposition starts). The model to describe MTTF of stress migration is given by the Equation 2.6.

$$MTTF_{SM} \propto (To - T)^{-s} \times e^{\frac{Ea}{kT}}$$
 (2.6)

Where, s and  $E_a$  (activation energy) are material dependent constants, k is a Boltzmann constant and  $T_0$  is an absolute temperature which we assume 500K as per [69].

(d) Thermal cycle (TC): Normal fluctuations in power cause damage to the system and deterioration occurs with every temperature cycle ultimately causing permanent system failure. The thermal cycle is more pronounced at device interface (i.e., at the joints). It is modeled using Coffin-Manson equation [69]. The model to describe MTTF of thermal cycle is given by the Equation 2.7.

$$MTTF_{TC} \propto \left(\frac{1}{T - T_{ambient}}\right)^q$$
 (2.7)

Where q is the material dependent Coffin Manson exponent, T is the system operating temperature and  $T_{ambient}$  is the ambient temperature.

Any circuit including NoC dissipates power that is transferred into heat, raising its temperature. This temperature rise is observed in any operational transistor or circuit, and is responsible for thermal induced variation. Therefore, the power and temperature are the key parameters that drive the failure mechanism of a circuit, and affect its reliability.

The aggressive scale of integration and shrinking the size of the transistor is resorted to for achieving the higher performance and parallelism. However, the technology trends come with power, thermal and reliability issues. The drastic decrease in transistor size with technology, increase the number of the transistors in same chip area. The higher transistor density creates layout complexity, hence, Spatial and temporal variability effect, affect the reliability of a circuit. An increase in transistor density, and consequent increase in the heat produced due to the dynamic power consumption is responsible for the rise in power density and temperature. The increase of temperature deviates the properties of the transistor, including heat-related reliability degradation. The aging effect is responsible for the gradual shift in gate and path delay over time.

In the deep sub-micron era, reliability is the crucial parameter to design a circuit or system. Figure 2.21 represents how the failure rate of a circuit changes over the time. This bath-tub curve [70] has three phases.



FIGURE 2.21: The Empirical Bath-tub Curve Representing Failure Rate With Time

- Mortality Phase: In this phase the circuit fails right after the production due to the manufacturing defects and fabrication errors, solid effects and mask defects, etc. in IC.
- Useful Life: During the useful or operation time phase the rate of failure of the circuit suffers almost constant. In this phase, the circuit is suffered from random defects and soft error.
- Wearout Phase: In this phase the circuit suffers wearout due to the aging effects. The curve in Figure 2.21 shows the sharp increase in the failure rate due to wearout effects.

# 2.5 Aging Effects

Aging mainly affects time-dependent failure mechanisms. The aging changes the electrical characteristic of the transistors and shifts the threshold voltage, affecting the timing characteristics of the circuit.

#### 2.5.1 Impact of Technology Scaling on Aging

Before entering into the nano-CMOS era, supply voltage and transistor length both were scaled with the same factor as the transistor dimensions to maintain the constant electrical field across the channel [71]. Dennard's scaling [4] can not be maintained in multicore era as  $V_{dd}$  can no longer be scaled down as it  $(V_{dd})$ approached  $V_{th}$ .

The Dennard's Law can be expressed using the following equation:

$$E_{new} = \frac{V_{dd}.S_V}{L.S_L} \tag{2.8}$$

where  $V_{dd}$  is the supply voltage,  $S_V$  is scaling in source voltage,  $S_L$  is scaling factor of transistor length, and L and E denoted the channel length and electrical field across the channel respectively.

#### Before nano-CMOS era:

 $S_L = S_V$  and  $E_{new} = E_{old}$ ; i.e. the electrical field almost constant.

#### After nano-CMOS era:

 $S_L < S_V$  and  $E_{new} > E_{old}$ ; i.e. the new electrical field is stronger across the channel, and same holds true for field across gate i.e.  $E_{new}(Gate) > E_{old}(Gate)$ .

With aging, electrical field across the channel begins to produce hot carrier injection (HCI) and induced bias temperature instability (BTI) effects across the gate. The technology in the deep sub-micron era, increases the leakage current and high power density worsens the aging impact on the circuit.

#### 2.5.2 Impact of Temperature on Reliability

The deep sub-micron technology increases the transistor density by packing more and more transistors in same chip area. The resultant increase in power density is the primary reason for increase in temperature. Thermal Design Power (TDP) defines the constraint on power consumption of a chip. The power dissipation more then the TDP limit may damage the chip. To prevent violation of TDP, parts of chip not in use are switched off and/or non-critical operations are carried out at reduced frequency. Such inactive or sub-optimally active regions are respectively



FIGURE 2.22: CMOS Inverter Circuit

FIGURE 2.23: CMOS Inverter Circuit Power Representation

referred to as dark or dim silicon [72–79]. Another factor contributing to degradation is leakage current or power, which is directly related to junction temperature. In the many-core system, not all cores are working on the same frequency, some operate at lower frequency or may be off due to power limitation. These core are called the dark and dim core, while the power required to maintained the constraints is called Thermal Safe Power [80]. To maintain the TDP, Dynamic Thermal Management (DTM) technique are used, hence increasing the cooling cost of the chip. Aging mostly affects the delay and threshold voltage of the device. A detailed discussion of aging-induced delay and  $V_{th}$  variation is presented in Chapter 5.

#### 2.5.3 Power Temperature Relationship: Temperature Modeling

The transistor switching depends on the voltage difference across the gate and source. The logic gates is a higher level abstraction of underlying transistor circuits implementing a Boolean function. The CMOS inverter is made with PMOS and NMOS connection arranged in the Figure 2.22. The Figure 2.23 shows the different currents such as sub-threshold, short circuit and gate leakage current associated with this gate implementation.

In this CMOS circuit,  $V_{in}$  represents input voltage whereas output voltage is  $V_{out}$ . Initially  $V_{out}$  is zero. When the input goes high to low  $V_{out}$  goes low to high. During the low-to-high transition at output node, the load capacitance  $C_l$  charges from 0 to  $V_{dd}$ . The charging path of the CMOS inverter shown in Figure 2.24. In this



FIGURE 2.24: Charging of Load Capacitance

FIGURE 2.25: Discharging of Load Capacitance

charging process PMOS is in ON condition and NMOS goes in OFF condition, due to gate-to-source voltage being zero. In charging process the PMOS can be model as a register  $R_p$ . While in discharging process the NMOS goes ON and it can be model as register  $R_n$ .

The total energy drawn by low-to-high transition is  $C_l.V_{dd}$ . The energy actually stored in the capacitor  $C_l$  is  $1/2 C_l.V_{dd}^2$ . The other half of the energy is dissipated as heat in  $R_p$ . Same thing happens when transition goes from high-to-low, the energy is dissipated in form of heat at  $R_p$ . The other source of heat are gate leakage and short circuit current.

The power consumed by IC is dissipated in the form of heat, which can be removed after transfer to the environment. Power P being the rate of energy consumption is related to energy E over a period of time T as:

$$P = \frac{E}{T} \tag{2.9}$$

Thermal Modeling: The heat is transferred from high to low temperature. Suppose the temperatures are  $T_1$  and  $T_2$  across a thermal conductor of the surface area Alength L if heat transfer rate is Q. The heat flux q and thermal resistance  $R_{th}$  can be represented as [81] [82]:

$$q = \frac{Q}{A} \tag{2.10}$$

$$Q = -\kappa A \frac{(T_1 - T_2)}{/}L$$
 (2.11)

$$R_{th} = \frac{T_1 - T_2}{Q} \tag{2.12}$$

$$R_{th} = \frac{T_1 - T_2}{Q} = \frac{1}{\kappa} \times \frac{L}{A}$$

$$(2.13)$$

The Ohm's Law as:

$$R_{th} = \frac{V_1 - V_2}{I} = \frac{\rho}{1} \times \frac{L}{A} \tag{2.14}$$

TABLE 2.3: Thermal and Electrical Properties [2]

| Thermal quantity               | unit | Electrical quantity       | unit |
|--------------------------------|------|---------------------------|------|
| Q, Heat transfer rate, power   | W    | I, Current                | A    |
| T , Temperature difference     | K    | V , Voltage difference    | V    |
| $R_{th}$ , Thermal resistance  | K/W  | R, Electrical resistance  | ω    |
| $C_{th}$ , Thermal capacitance | J/K  | C, Electrical capacitance | F    |

Table 2.3 shows analogy between thermal and electrical properties. To ensure that a circuit is within TDP constraints, its power consumption and equivalent temperature must satisfy the values as per TDP at the respective operating frequency. The Junction temperature (temperature at the active Silicon surface)  $T_j$  should not exceed the specified temperature  $T_{j0}$  based on the TDP. Thermal resistance from the active surface to the ambient temperature  $T_a$  should be at most:

$$R_{th} = \frac{(T_{j0} - T_a)}{TDP}$$
(2.15)

# 2.6 State-of-the-art

This section describes the state-of-the-art NoC reliability enhancement techniques at different level. The reliability for NoC can be viewed in macro and micro perspective. The routing and topology based reliability analysis and enhancement



technique constitute macro view, while the architecture based solutions are the part of micro view.

FIGURE 2.26: Reliability at Different Level and View

• **RAMP** [83]: RAMP stands for "Reliability-Aware MicroProcessor". It is an architecture-level model that used the state-of-the-art device models of wearout and dynamically tracks the lifetime of the microprocessor based on application behavior. The failures models used in RAMP tool [84] assume the steady state operation at specific temperature and load, to measure the reliability concerning MTTF.

It utilized the concept of sum-of-failure-rates (SOFR) which assume uniform device vulnerability to errors and uniform failure probability. However, aging gradually affects the failure rate which should not be assumed constant. In [69], authors presented a modified RAMP by including the Montecarlo simulation and using the Lognormal distribution which is non-monotonic for reliability analysis. Reliability metric employed by RAMP is MTTF (Mean Time to Failure) which is the estimated lifetime of the system being in good condition.

• J.Shin *et al* [85]: Jeonghee's work addresses the lifetime reliability at architecture-level. Their lifetime failure models employ the concept of FITs of Reference Circuit (FORC) without having to deal with technology-specific details of the circuit. FORC based approach is used to calculate the performance-reliability trade-off for microprocessor chips.

- Li-Shiuan Peh *et al* [86]: Peh's work addressed a fault modeling tool that can capture the runtime PV-induced faults at system-level for NoCs. The author designed the circuit level fault model and integrated it with system-level NoC simulator Garnet and thermal Hotspot simulator. Process parameters like threshold voltage, channel length and width, and oxide thickness were considered in this approach.
- Yun Xiang *et al* [87]: In this work, system-level reliability is calculated using device level model. The reliability based on the component-level temporal failure models using the Monte Carlo simulation at system-level. Their model is designed for component failures induced by intra-die, and inter-die variations. The rainbow cycle count algorithm was used for modeling the thermal cycle model.
- Ayse Kivilcim *et al* [88]: This work addresses the power, thermal effects on the reliability. They proposed a two-level reliability modeling based on the statistical and cycle-accurate simulation. They discuss the long-term tradeoff between power management and reliability, and short-term changes in the system failure rate when power management or scheduling policies vary. The Dynamic Power Management (DPM) strategies, traffic workload distribution schemes and thermal modeling optimize the failure rate i.e., reliability.
- Zhenyu et al [89]: This work describes the estimation of the reliability concerning MTTF based on the temperature dependent permanent fault. It addresses the temperature-aware optimization of thermal profile based on the application-specific MPSoC reliability. The domain-specific optimization algorithm improves the MPSoC reliability (MTTF) by 85% with less than 5% area overhead.
- James P.G. Sterbenz *et al* [90]: This work addresses the architectural framework for resilience and survivability in the communication network. It provides a survey about the failure and resilience.
- A.Dalirsani *et al* [91]: In this work author have proposed an analytical model for reliability factor of a NoC based SoCs. The reliability factor is the probability that NoC fault can be recovered without affecting the system functionality. They classify the switch fault in NoC. Authors also consider the transient fault of analysis.

- Ababei, Cristinel *et al* [92]: Ababei *et al* address the energy-consumption and reliability-aware application mapping for NoCs. They used a branch and bound algorithm for finding the trade-off between for energy consumption and reliability. They compare their approach to updated adaptive routing table with simulated annealing algorithms and have 5% hardware overhead.
- Abella et al [93]: Abella et al proposed the NBTI-aware processor called *Penelope*. They introduced techniques for NBTI mitigation for both combinational and storage blocks. They show that the guard-band reductions lie between 12.6% and 18% for the different blocks without impacting any critical path.
- Zhiliang Qian *et al* [94]: Here, authors have proposed a thermal-aware application-specific routing algorithm for Network-on-chip. Authors used the traffic information of an application and developed the routing scheme based on the optimal distribution ratio of the communication traffic among the set of candidate paths. The peak energy reduction can be as high as 16.6% for both synthetic and industry benchmarks.
- J.Henkel *et al* [95]: This work proposed a thermal-aware agent-based power economy (TAPE) for many-core architecture. In this scheme, an agent-based power distribution approach is used which balances the power consumption of many-core processors in a pro-active manner. Authors achieve 11.23% decrease in peak temperature compared to the design approaches that have no thermal management.
- Feiyang Liu *et al* [96]: Lie *et al* used a dynamic thermal-balance routing (DTBR) algorithm for NoC. This minimal adaptive routing algorithm based on an architectural thermal model. A thermal-aware router is proposed to implement it. DTBR can make an efficient uniform distribution of temperature and hence, reduces the hotspot temperature by 20% in different traffic patterns.
- Hanumaiah *et al* [97]: Some thermal-aware techniques such as dynamic voltage and frequency scaling (DVFS) can also be used for thermal management. Task migration is a primary technique used in the many-core processor for thermal management. Task migration can be coupled with DVFS for better thermal management. In [97], the author proposed a formulation and

an efficient solution to the problem of maximizing the performance-per-watt (PPW) of a heterogeneous multi-core processor. The control variables include supply voltage and clock frequency of each core, allocation of tasks to cores and fan speed. This scheme gives 37% increment in PPW for simple benchmarks.

- Alexandre Yamamoto *et al* [67]: In this work reliability estimation based on Monte Carlo algorithm. The authors address the dynamic reliability management scheme for CMPs with NoC.
- Seyab Khan [98]: Khan's proposal is an analytical model for NBTI induced delay under temperature variation. The delay is modeled on the threshold voltage shift and holes mobility degradation. The analysis is performed at different temperature for different technologies. The impact of NBTI on PMOS transistor delay increases by 11.5% at 45 nm technology.
- Hemanta Kumar Mondal [99, 100]: Hemant proposed low-power and performance aware wireless NoC architecture and techniques.
- Namita Sharma and Prashant Agrawal [101–103]: Namita *et al* proposed the low-power and energy efficient micro-architecture and techniques for memory, compilers and links.

# 2.7 NoC Reliability Representation at Different-Level

Authors in [104] proposed a Gajski-Kuhn chart called "Y-chart" [104] for representing the different perspectives in VLSI hardware design. Y-chart shown in Figure 2.27 depicts three aspects of every design along three axes –Physical domain, Structural domain, and Behavioural domain.Each circle represents an abstraction level,which increase as we move away from the origin. Outermost circle represents the highest i.e the system-level abstraction. An improved version of Y-chart called Y-chart 2.0, was proposed by Wolfgang [105].

It is an EDA coordinate system for extra-functional properties. The Y-chart 2.0 represents the effect of performance (time), power and temperature on system reliability.



FIGURE 2.27: Y-chart Source: Embedded System Design Modeling, Synthesis and Verification by D.Gajski *et al*, Springer 2009



FIGURE 2.28: Y-chart 2.0 Source: A Talk by Prof. Dr. Wolfgang Nebel on New Modeling Concept for Complex Embedded System at IITD and MNIT, Jaipur, Feb 2014

Router and link are primary components of a NoC. A classical five-stage NoC pipelined router contain combinational logic structure (e.g., VCA logic) and sequential structure (e.g., virtual channel buffers). The NBTI aging affects these structural components of NoC. Fu *et al* [36] presented NoC structure based NBTI analysis and mitigation mechanism. Fu *et al* [36] and Kodi *et al* [37] focused on router aging, while Duato *et al* [38] consider both router and link aging degradation in NoC. Some of the work is published on aging-aware NoC. Bhardwaj *et al* [39] proposed two different routing algorithms based on congestion-oblivious Mixed Integer Linear Programming (MILP) and congestion-aware adaptive routing algorithm to reduce aging-induced power-performance overheads and enhance the system robustness. Wang *et al* [40] shows with their experiment and analysis that dynamic programming-based lifetime aware routing algorithm can be used to improve the reliability of a NoC.

Kim *et al* [27] have evaluated the impact of workload upon router activity factor and duty cycle at RTL level. They proposed the HCI and NBTI aware router microarchitecture using exercise mode architecture technique. Ancajas *et al* [41] improve the lifetime of a NoC router by balancing the switching activities around the circuit. They have analyzed the crossbar circuit and performed the HSPICE simulation of HCI degradation effect. A proactive aging management in [106] for heterogeneous NoCs employ NBTI wear-out monitoring system and criticality-driven routing approach to restrict aging degradation. Their schemes were compared to the stateof-the-art BRAR (Buffered-Router Aware Routing) and achieve 53%, 38% and 29% improvements on system performance, network latency, and Energy Delay Product per Flit (EDPPF) overheads respectively.

Reliability can be represented in different terms at the different levels of abstraction. Figure 2.29 shows the representation of reliability from physical-level to system-level. The reliability at system-level can be represented in term of the sum of failure rate or MTTF. Failure mechanisms NBTI and HCI depend on the duty cycle and switching activity factor respectively. These parameters can be generated at RTL and logic-level. At circuit-level stress can be viewed over the PMOS and NMOS on-off conditions, while the device-level aging can be modeled independently. The reliability at device-level considers the individual failure mechanism.



FIGURE 2.29: Reliability Representation at Different Level of Abstraction



FIGURE 2.30: NoC-Reliability Representation at Different Level

Figure 2.30 shows the effect of design constraints of NoC like performance, power, area, temperature and reliability and the techniques to improve the design, and effect on each parameter to achieve maximum reliable designs.

Figure 2.30 shows that the increase in the performance and throughput can be achieved by increasing frequency as well as number of VC buffers. That strategy increases the power of NoC and the temperature and adversely impacts reliability of NoC. The congestion, traffic load and NoC architecture effect the performance of NoC. The congestionaware routing algorithm, pipelined router architecture and VC buffer organization, and capacity can improve the performance, but at the cost of power and temperature overheads, and reduce reliability of NoC. The traffic congestion control may decrease the dynamic power consumption and thermal hotspot, improving reliability.

The switching activity is another driving factor affecting the power consumption. Activity parameters can be controlled or optimized at the RTL and gate level using the clock-gating, power gating, and other power-aware router architectures.

Hotspot reductions can drive the temperature-based reliability improvement. The Dynamic Thermal Management (DTM), task migration from hotspot node, clock-gating, power-gating and temperature-aware routing algorithm are the most useful techniques to manage the temperature-aware design.

In summary, congestion-aware, power-aware and thermal-aware based reliability improvement are possible by using their respective improvement techniques.

# 2.8 Summary

In this chapter, we describe the importance of communication architectures for highperformance computing. The NoC interconnection emerges as a solution for manycore (CMP and MPSoC) advanced computing systems. We provide the essential details about parameters to design an interconnection network like topology, routing, switching mechanism and flow control technique based on different message formats such as packet and flits. This chapter presents the fundamental components of a router microarchitecture like VCB, route computational logic, switch and VC allocators, arbiter, and crossbar. Additionally, we have provided the details of the drawbacks of increasing scale of integration with technology advancement.

We describe the spatial and temporal variability effect on the device performance and life span. We have chosen the aging based effect as an essential subject for analysis and improvement. The NBTI and HCI are physical phenomena that we consider for our analysis and mitigation. In next chapter, we describe the implementation of system-level power, thermal and reliability framework "HiPER-NIRGAM" for NoC.

# Chapter 3

# System-Level: HiPER-NIRGAM Framework for NoC Reliability Estimation

After an extensive literature review on NoC reliability as presented in previous chapter, we have identified following research gaps that have motivated us to work at system-level framework.

 Y-chart proposed by Gajaski [104] is good for representing abstraction at various levels in VLSI design. Error at lower abstraction manifests as (un) reliability at higher abstraction. The Y-chart itself does not detail reliability model at any abstraction but motivates us to identify how the concept of reliability can be added at different abstractions. In this chapter, we shall be addressing the reliability of NoC and its components at highest abstraction level i.e system level.

As per Gajaskis's [104] Y-chart has the least accuracy but fast estimation of design metrics at system-level.

- 2. As per literature survey, the available tools that address the reliability are RAMP and its modified version [69, 83] and REST [67]. We need a new analysis framework as
  - RAMP [69, 83], addresses the reliability of only a microprocessor. It is not customized for estimation of NoC reliability.

- RAMP assumes uniform device vulnerability to errors and uniform failure probability to calculate SOFR (sum of failure rate). With aging, the failure rate shall change. This aging-induced variation is not taken into account by RAMP.
- REST [67] addresses only TDDB and NBTI failure mechanism. It does not consider SM and TC failure mechanisms.

We compare features of existing tools in Table 3.1.

| Features and Metrics       | RAMP 1.0      | RAMP 2.0               | REST       |
|----------------------------|---------------|------------------------|------------|
| Performance Metric         | ×             | ×                      | ×          |
| Power Metric               | X             | ×                      | X          |
| Thermal Metric             | X             | ×                      | X          |
| NoC Reliability            | ×             | ×                      | 1          |
| Microprocessor Reliability | ✓             | 1                      | 1          |
| Reliability Models         | SM, TC, TDDB, | SM, TC, TDDB, NBTI, EM | TDDB, NBTI |
| Distribution Scheme        | Exponential   | Lognormal              | Weibull    |
| Simulation Scheme          | SOFR          | Montecarlo             | Montecarlo |

TABLE 3.1: Comparison Among Different Frameworks

#### The objectives for system-level reliability work:

- The primary objective is to define a unified reliability evaluation schemes at architecture level for NoC.
- Extend SOFR by removing constraint of constant failure rate.
- Include SM and TC failure models, in addition to TDDB and NBTI, for reliability estimation.
- Develop a single framework that addresses the performance, power, thermal and reliability estimation for NoC.
- A critical reliability analysis of NoC and its components.

# 3.1 Proposed Framework: HiPER-NIRGAM

HiPER-NIRGAM is a framework for performance, power, thermal and reliability estimation for 2D mesh NoCs.

The HiPER-NIRGAM simulator framework employs different tools as part of its toolchain. Each tool is used for computing values of a given evaluation metric.

HiPER-NIRGAM tool has following features.

- 1. Python based Graphical User Interface (GUI): This GUI allow users to configure NoC parameters such number of tiles, mesh size (rows and columns), routing algorithm, traffic pattern etc.)
- 2. Cycle-accurate simulator NIRGAM for NoC Performance Modeling.
- 3. Power modeling using ORION and McPAT simulator: The framework is integrated with ORION and McPAT simulators for computing power consumptions of various NoC components.
- 4. Floor-planning of NoC and router.
- 5. Thermal modeling using HotSpot simulator: Power dissipation leads to a temperature rise. Hotspot simulator provides information on thermal profile of the chip by computation of temperature variations across various parts of the chip.
- 6. Reliability estimation based on lifetime failure mechanism(s).

The HiPER-NIRGAM framework is shown in Figure 3.1. This framework has been employed for work present in this thesis. The framework is capable of computing all metrics related to performance, power, thermal and reliability estimation regarding network-onchip. In the following paragraphs, we discuss various tools that constitute our framework.



FIGURE 3.1: HiPER-NIRGAM: A Frame work for NoC Performance, Power, Thermal and Reliability Estimation

#### 3.1.1 NIRGAM: NoC Interconnect Routing and Application Modeling

NIRGAM [42, 107] is a cycle-accurate NoC performance simulator. The key performance metrics employed for evaluation are latency and throughput. Latency is computed as average latency per packet, average latency per flit. Throughput is also averaged over a number of cycles. NIRGAM is extendible and new routing algorithms, applications traffic models and even newer topologies can be added. NIRGAM simulator has modular design and has been coded in the SystemC language.

NIRGAM supports NoC configuration parameters such as topologies, routing logic, switching technique, number of virtual channels, buffer size, flit size, application modeling and clock frequency. NIRGAM 2.0 supports different 2D topologies (Mesh, Torus) and its version 3.0 supports 3D mesh topology. It supports wormhole switching mechanism. NIRGAM allows user to configure the number of virtual channels per physical link. The number of FIFO buffers in an input channel can also be specified. Each buffer is of the size of a flit.

NIRGAM supports a tile-based architecture. Different applications can be attached to different tiles. Traffic generated at a given tile can be Variable Bit Rate (VBR), Constant Bit Rate (CBR), multimedia or bursty traffic. Packet injection rate is specified in terms of load percentage (percentage of channel bandwidth to be used). Table 3.2 lists values of some of the configuration parameters of NIRGAM.



FIGURE 3.2: NIRGAM NoC Simulator

The working of NIRGAM simulator is shown in Figure 3.2. NIRGAM is open source software. Its source code tree consists of two directories – core and router.

| Parameter Name | Value                | Description                        |
|----------------|----------------------|------------------------------------|
|                |                      | Defines 2-dimensional Mesh and     |
| TOPOLOGY       | MESH, TORUS          | Tours topology                     |
|                |                      | Defines number of rows in the      |
| NUM_ROWS       | Positive Integer     | selected topology                  |
|                |                      | Defines number of Columns in the   |
| NUM_COLS       | Positive Integer     | selected topology                  |
|                |                      | XY, ODD-EVEN and Source            |
| RT_ALGO        | XY,OE,SOURCE         | Routing algorithms                 |
|                | Any natural number   | Number of buffers in FIFO          |
| NUM_BUFS       | <= MAX_NUM_BUFS      | input channel, default value is 16 |
|                |                      | Defines clock frequency in GHz     |
| CLK_FREQ       | Floating point value | Default value is 1GHz              |
|                |                      | Defines clock cycles for           |
| SIM_NUM        | Any natural number   | which simulation runs              |

**TABLE 3.2:** NIRGAM Configuration Parameters

- core directory contains the modules implementing router components such as input channel, output channel, Virtual Channel Allocator (VCA), Switch Allocator (SA) and Crossbar (X-bar).
- 2. Router directory has code for implementing various routing algorithms.

#### 3.1.2 NoC Power Modeling

The technology is shrinking the size of a transistor. That allows packing more and more transistors in the same chip area and increases the power density of a chip. Every chip has its thermal design power (TDP) budget. More power dissipation means more heat generation that will further lead to change in device properties. Uncontrolled temperature rise may result in burning and ultimately damaging the chip. Temperature rise may be uneven across the chip as different components have different power consumption and dissipation. It is required to do power analysis and build power model for every component of a system. To this end, researchers [108–110] have designed power simulators for NoC. These power simulators compute the power consumption of NoC components. Some of the more popular simulators are listed below:

 ORION is an open source power-performance Simulator for interconnection networks [108, 111]. Along with power, it also provides area estimates for the various components.

- 2. McPAT (Multicore Power, Area, and Timing) [109, 112]. McPAT is an integrated power, area, time modeling framework for multicore and many core architecture. It supports 90nm to 22nm technology for calculating the area and power. It supports in-order and out-of-order processor cores, NoC, shared memory (caches) protocols and multi-clock domain. McPAT interconnect power modeling is based on the number of VCs, number of stages in the pipelined router design, duty cycles and voltage, frequency and technology parameters.
- 3. DSENT (Design Space Exploration for Network Tool) [110, 113]. DSENT is a modeling tool that connects emerging photonics with electronics for opto-electronic NoC. It integrates timing, area, and power models that are accurate within 20% for the deep sub-100 nm regime.



FIGURE 3.3: NIRGAM Integrated with Power Simulators

#### 3.1.2.1 NIRGAM Integration With Power Models

All tools provide the static, leakage and dynamic power of each component. ORION 2.0 [114] version supports the FIFO buffers, arbiter, crossbar, clock and link power based on the router micro-architecture. For power and area computation, router microarchitectural specifications such as the number of ports, the number of virtual channels along with voltage and frequency are passed as parameters to ORION [108, 111], which computes the static and dynamic power of each NoC component. Initially, NIRGAM was integrated with ORION 2.0. With release of more accurate ORION 3.0, we have integrated this latest version also. Figure 3.3 conceptualizes the integrated model.

We have chosen ORION 3.0 [115] tool for NoC power and area estimation. ORION 3.0 [115] is dedicated power-area simulator for NoC, while the other available simulator such as McPAT [109, 112] and Wattch [116] are more focused on microprocessors. ORION 3.0 achieves average estimation errors of no more than 9.8%, when compared to actual implementation. McPAT power modeling shows more than 20% errors [112, 115].

#### **ORION 2.0 versus ORION 3.0**

- 1. ORION 2.0 is based on the circuit-level template, which models a specific logic structure for implementing the different router component. There, however, is a significant mismatch between the actual RTL code for the corresponding router component blocks and the logic structure assumed in the ORION 2.0 template.
- 2. ORION 3.0 fundamentally differs from the earlier version as the estimation models are derived from actual post-P&R (Placement and Routing) layout data that correspond to the actual RTL generator and respective target library.
- ORION 3.0 model is based on the instances (gate count) calculation of each component. According to ORION 3.0 modeling, power and area estimations are more accurate with this instance modeling.
- 4. At router microarchitectural level, ORION supports
  - Technology: 32nm to 90 nm
  - Transistor Type: HVT (High threshold Voltage), NVT (Normal threshold Voltage) and LVT (low threshold Voltage) cells.
  - $V_{dd}$ : .8 volt to 1.2 volt as per technology and transistor type.
  - Operating frequency: as per transistor type.
  - Number of input buffers.
  - Sharing buffer mode: if variable PARM\_in\_share\_buf is set to 1, which means input virtual channels share buffers physically.

NIRGAM simulator contains config directory which has a file named *nirgam.config* to configure the basic parameters of NoC such as topology, rows and columns, etc. constant.h and extern.h contain the flit size, number of VCs and buffer size. These parameters are passed to ORION for power and area computations. We have chosen ORION 3.0 over
ORION 2.0 since former estimates power more accurately [115]. Details of ORION 2.0, ORION 3.0 and implementation with NIRGAM are addressed in Appendix-B.

### 3.1.3 Thermal modeling

In HiPER-NIRGAM tool chain, we have used the HotSpot 6.0 [81] for thermal modeling of network on chip. HotSpot is a compact thermal model used at early design stage to explore the spatial and temporal temperature variations. The total energy drawn from power supply is  $C_L V_{dd}^2$ , but the actual energy stored in capacitor is  $1/2C_L V_{dd}^2$ , the other half of energy dissipated in the form of heat. The active switching power and leakage power is responsible for heat generation. The heat is generated from the active Silicon.



FIGURE 3.4: Floorplan Generation

#### 3.1.3.1 NIRGAM Integration With Thermal Models

HotSpot is based on an equivalent circuit of thermal resistances and capacitances that corresponds to micro-architecture blocks and constitutes essential aspects of the thermal package. It estimates the temperature of each block in the floorplan by constructing an equivalent matrix. It is compatible with NoC power models like Orion and McPAT. The typical layers of the hotspot model are the heat sink, heat spreader, thermal interface material(thermal paste), Silicon substrate, on-chip interconnect layers, ceramic packaging substrate, etc. It does not require detailed design or synthesis description.

HotSpot model has been integrated with the thermal-aware floor-planning tool called "hotfloorplan" which was developed for reducing the heat dissipation from the design. The *hotfloorplan* tool makes use of the functions of area and power for an objective to form a floorplan. Figure 3.4 shows the input and output files to the *hotfloorplan* tool [81].

Hotspot model is used to explore the floorplan using average power and floorplan description of the design. We used the default Hotspot configuration for steady temperature and its profile generation. The models used for simulations are the grid and block level models. The "hotspot" main tool is used to create the thermal profile of design. The temperature traces are mapped to the floorplan of design. The thermal profile represents the temperature difference of each component of the design. Figure 3.5 block diagram shows the generation of temperature profile using default hotspot configuration file hotspot.config.

### 3.1.3.2 NoC Thermal Analysis

We consider the mesh topology for our experiment purpose. In our experimentation, we choose five rows and five columns of mesh topology. The essential requirement to develop the mesh floorplan are mesh floorplan description file and average power of each router. The mesh floorplan description file contains the connectivity information i.e the connection between routers, area of router, minimum and maximum aspect ratio. The area and power of each router is obtained through ORION NoC power models. The created floorplan is power-aware floorplan. The generated floorplan file .flp, each router power trace per cycle mesh.ptrace and hotspot configuration file are used to create temperature traces of mesh mesh.ttrace. These temperature traces are spatially mapped to mesh floorplan to generate the thermal profile of mesh. The thermal profile shows the temperature difference between the different routers. The most used router is the thermal hotspot and it shows up in red color in thermal profile.



FIGURE 3.5: Hotspot Generation(HotSpot tool)

#### 3.1.3.3 Router Micro-architecture Thermal Analysis

A four-stage pipelined router-architecture has components input channels (virtual channel buffers), routing algorithms, virtual channel and switch allocation unit, arbiter, and crossbar. The synthesis and depth analysis shows that the virtual channel buffers consume more than 40% of area and power of a router. The virtual channel allocation (VCA), Switch allocation (SA) unit, and crossbar unit decide the critical path of the pipelined router. The most of the activities are generated in the crossbar unit, so it is a significant source of dynamic power. The complexity of allocation and arbiter unit is increased due to increase in the number of the VCs and buffer depth.

The router floorplan description file *router\_floorplan.desc* contains the area and aspect ratio of each component. The average power, power traces and area of each component are obtained through *NIRGAM-ORION* integration. The router floorplan *router.flp* is created using *hotfloorplan* tool. The router power trace file *router.ptrace*, router floorplan *router.flp* and *hotspot.config* file are used to create router temperature traces *router.ttrace*. The thermal profile of router shows that the VCB (Virtual Channel Buffers) are thermally hotspot as these consume more power and generate more heat. The buffers reading and writing activities generate more dynamic power which, in turn, results into increase in temperature.



FIGURE 3.6: Router Floorplan

### 3.1.4 Reliability Modeling

HotSpot results on thermal profile(s) are used for reliability analysis. The continuous downscaling of the CMOS technologies have made thermal issues, process variation, lifetime failures, etc. quite significant. These problems may lead to lifetime failures like Time dependent dielectric breakdown (TDDB), Negative bias temperature instability (NBTI), Electromigration (EM), Stress Migration and Thermal cycles. These are the leading cause of reliability degradation, so it is essential to estimate the reliability and find ways to enhance it. Here in this system-level work, we are estimating the reliability in terms of Mean Time to Failure (MTTF) which is the estimated lifetime of the system in good condition.

NoC reliability is calculated by using Reliability Estimation Tool (REST) [67]. This tool evaluates the reliability based on Monte Carlo simulations. Weibull distribution is used in the tool for generating the time to failure instances. Previous work related to reliability estimation employed Sum of Failure Rates (SOFR) model assumes a uniform failure rate which is not very likely in real time scenario. The design flow of reliability estimation is shown in Figure 3.7.



FIGURE 3.7: Reliability Estimation (REST Tool)

# 3.2 Experimental Set-up and Result Analysis

We used the proposed HiPER-NIRGAM framework architecture of NoC for our experimental set-up to find the impact of power, thermal on reliability at system-level. NIRGAM is integrated with ORION 2.0 and 3.0 NoC power modeling tools. Our experiment are based on the NoC power simulator ORION version 2.0 and 3.0 for less than 65 nm technology node. These simulators provide the power traces and values based on the NoC architecture taken in NIRGAM.

NIRGAM-ORION integration is configured according to the power-performance configurable file indexed in Table 3.3. The power-performance configurable file contains the technology node (nm), flit size, VC count and input buffer capacity. Our experiment is based on the parameters values taken in Table 3.3. The switching activity depends on the traffic behavior, target simulation cycles and the operating frequency of the NoC architecture.

| Parameter Name                | Value                 |
|-------------------------------|-----------------------|
| Topology                      | $\operatorname{Mesh}$ |
| Number of Rows                | 5                     |
| Number of Columns             | 5                     |
| Routing Algorithm             | XY                    |
| Number of simulation cycles   | 15000                 |
| Warm up Cycles                | 100                   |
| Number of Target clock cycles | 14900                 |
| Clock Frequency               | $1 \mathrm{GHz}$      |
| Application Traffic           | CBR                   |
| Flit Size                     | 5 Byte                |
| Packet Size                   | 8 Byte                |
| Number of VCs                 | 6                     |
| Buffer Size                   | 32 Byte               |
| Traffic Load                  | 100%                  |
| Flit Interval                 | 2 Clock Cycles        |

TABLE 3.3: NIRGAM NoC Architectural Configuration Parameters Taken for the Experiment

#### 3.2.1 Power Results and Analysis

We have calculated the power of NoC architecture by ORION 2.0 and 3.0. We have done our experiment of NoC power based on the LVT i.e., low  $V_{th}$  high performance transistor cells. The supply voltage of LVT cell is 1.2 Volt. Our experiment is based on the different mesh topology size keeping the other architectural parameters same, for both version of the ORION. In our experiments, topology varies from  $2 \times 2$  to  $5 \times 5$  2D mesh topology. We have generated the static, dynamic and total power based on 65nm and 45nm technology as per NIRGAM parameters given in Table 3.3.

As stated earlier, we have chosen ORION 3.0 as it is more accurate in power and area estimation [115]. Here we present results of ORION 2.0 for illustrating this point. As can be seen from Figure 3.8 and Figure 3.9, ORION 2.0 overestimates the power consumption. In subsequent studies, we use results from ORION 3.0 simulator only.

Comparison among ORION-2.0 and ORION-3.0 has been done in Figure 3.8 and Figure 3.9 on the basis of topologies and technology. The power of NoC is increase as size of mesh increase. Topology  $5 \times 5$  has maximum power and  $2 \times 2$  has minimum power. It is due to the increase in number of routers in a mesh i.e more resource/hardware added in NoC. Figure 3.8 shows that the power difference between these topologies is very less, even if an increase in topology size. Technology based power variations are also



FIGURE 3.8: Topology Power Based on ORION 2.0



FIGURE 3.9: Topology Power Based on ORION 3.0

not accurate, because at lower voltage level on 45 nm, its shows more power estimation than 65nm. These result shows that ORION 2.0 is not modeled accurately for power estimation.

Figure 3.9 the power estimation of NoC based on different technology and mesh size. It shows following measurable outcomes.

• Power estimated by ORION 3.0 at same mesh size is lesser than the ORION 2.0. It is due to the tristate crossbar model and active instance based calculation modeling in ORION 3.0.

- ORION 3.0 shows the significant difference in power with increase in mesh size, which is not reflected in ORION 2.0.
- ORION 3.0 shows the right reflection of technology based power estimation. At lower operating voltage at 45 nm technology it shows lesser power than 65 nm. Which is oppositely reflected in ORION 3.0.

Our power estimation result analysis shows that ORION 3.0 is more accurate than ORION 2.0, which is as per ORION 3.0 literature [115]. For later experiment, we have chosen the ORION 3.0 for estimating the power consumption of NoC.



FIGURE 3.10: Power Models Comparative analysis based on Flit Size

Figure 3.10, Figure 3.11 and Figure 3.12 reflect increase in power with increase in flit size, buffer size and number of VCs respectively while keeping other respective parameters constant. Increase in each parameters is associated with increase in the memory size, hence more power consumption. The input physical buffer size depends on the flit size and number of VCs. It is increasing with increasing in flit size and VCs. More buffer size implies more power consumption.

### 3.2.2 Thermal Hotspot Results

The power trace and an average power of a given NoC configuration have the major impact on the thermal profile as mentioned in framework simulation. We are taking



FIGURE 3.11: Power Models Comparative analysis based on Buffer Size



FIGURE 3.12: Power Models Comparative analysis based on Number of Virtual Channel  $$\rm nel$$ 

different configuration for correct analysis of local thermal hotspot due to the buffer size and frequency. We set-up our experiment on buffer sizes of 16, 32 and 64 bytes per VC at the different frequency and obtained the temperature variations across the chip. Hotspot configuration parameters are specified in Table 3.4.

Configuration 1: The 2D mesh dimensions are 5 rows and 5 columns, x-y routing,

| Parameter Name                                            | Value                    |  |  |
|-----------------------------------------------------------|--------------------------|--|--|
| Chip Configurations                                       |                          |  |  |
| Chip thickness in meters                                  | .00015                   |  |  |
| Silicon thermal conductivity in $W/(m-K)$                 | 100.0                    |  |  |
| Silicon specific heat in $\mathrm{J/(m^3-K)}$             | $1.75 \mathrm{x} 10^{6}$ |  |  |
| Temperature threshold for DTM (Kelvin)                    | 354.95                   |  |  |
| Heat Sink Specifications                                  |                          |  |  |
| Convection capacitance in J/K                             | 140.4                    |  |  |
| Convection resistance in K/W                              | 0.1                      |  |  |
| Heat sink side in meter                                   | 0.06                     |  |  |
| Heat sink thermal conductivity in $W/(m-K)$               | 400.0                    |  |  |
| Heat sink specific heat in ${ m J}/({ m m^3-K})$          | $3.55\mathrm{x}10^{6}$   |  |  |
| Heat Spreader Specifications                              |                          |  |  |
| Spreader side in meter                                    | 140.4                    |  |  |
| Spreader thickness in meters                              | 0.001                    |  |  |
| Heat spreader thermal conductivity in $W/(m-K)$           | 400.0                    |  |  |
| Heat spreader specific heat in ${ m J}/({ m m^3-K})$      | $3.55\mathrm{x}10^{6}$   |  |  |
| Interface Material specifications                         |                          |  |  |
| Interface material thickness in meters                    | $2.0 \mathrm{x} 10^5$    |  |  |
| Interface material thermal conductivity in $W/(m-K)$      | 4.0                      |  |  |
| Interface material specific heat in ${ m J}/({ m m^3-K})$ | $4.0 \mathrm{x} 10^{6}$  |  |  |

| TABLE $3.4$ : | Hotspot | Configurati | on Parameters |
|---------------|---------|-------------|---------------|
|---------------|---------|-------------|---------------|

constant-bit-rate (CBR) traffic, flit size 10 bytes, frequency of 1GHz, 10 active nodes with random destination and designed at 22 nm technology. Figure 3.13 shows its thermal profile. The router[1][2] and router [2][2] are thermal hotspot node (shown in red color) have temperature 321.37 Kelvin higher than the other node. It is due to greater activity on these particular routers. If we see the thermal profile carefully the red and yellow color majority is in x-y direction that is due do x-y routing logic.



FIGURE 3.13: Hotspot profile(22 nm and 1 Ghz frequency)

Configuration 2: The 2D mesh dimensions are 5 rows and 5 columns, x-y routing,

CBR traffic, 10 active nodes, random destination, 10000 simulation cycles, flit size 10 Bytes, 2 GHz frequency and design with 32 nm technology. The temperature difference (min-max) rises with the increase in buffer size. Temperature differences are .66 Kelvin, .88 Kelvin, 1.34 Kelvin at different buffer sizes of 16, 32 and 64 respectively keeping the others parameters same. Figure 3.14 shows the thermal profile of buffer size 64 bytes at frequency 2 GHz.



FIGURE 3.14: Hotspot Profile(32 nm, 2GHz, buffer Size=64 Bytes)

The configuration 1 and configuration 2 are implemented at 22 nm and 32 nm respectively. We compare both configurations; configuration 2 has higher frequency and rest of NoC parameters are same. The impact of technology is that at configuration 1 has min-max temperature variation is 2.12 Kelvin at 22 nm at low frequency (1 GHz) more than configuration 2 at 2 GHz (min-max temperature =1.34). It is due to the increment in power density with downscaling of technology.

**Configuration 3:** The 2D mesh dimensions are 5 rows and 5 columns, 32 nm technology, x-y Routing, CBR traffic, 10 active nodes, destination random, 10000 simulation cycles and flit size of about 10 bytes. The local hotspot is increasing with increase in buffer size and frequency. The temperature difference is approximately same with the buffer size of 64 bytes at frequency 2 GHz and buffers size of 32 bytes at frequency 4 GHz. Both frequency and buffer size are dominating parameters in thermal-aware NoC design. Figure 3.15 shows the thermal profile of buffer size 32 bytes at frequency 4 GHz.

Experimental results of min-max temperature graph in Figure 3.16 show that any increase in buffer size will increase the local thermal hotspot in a NoC chip. Increase in buffer sizes at higher frequencies (1 GHz to 4 GHz) have more impact on the size of local thermal



FIGURE 3.15: Hotspot Profile(32 nm and 4 GHz , buffer size 32 Bytes)



FIGURE 3.16: Min-Max temperature at different buffer size with various frequencies

hotspot creation in a chip. The size of hotspot increase due to change in dynamic power. All the results shown in Figure 3.16 are obtained at 32 nm technology.

### 3.2.3 Reliability Results Analysis

A localized change in temperature creates the local hotspot, which has serious impact on the reliability of NoC. We have calculated the reliability based on TDDB and NBTI lifetime failure mechanisms. Simulation framework employs the Equations 2.4 and 2.5 in reliability estimation tool [67] and calculates the combined effect of relative MTTF value. The REST tool calibrates the MTTF of failure mechanisms with standard MTTF value 30. This calibrates the MTTF as being the default value for a standard 1mm x 1mm block at the default temperature. This assumes the MTTF for all failure models will be the same i.e. Ceteris Paribus, which is most likely not true. The argument that satisfies our concerns the most is that, REST is always comparing the MTTFs, not working with absolute values.

For NBTI calibration, we have used Equation 7.

$$NBTI_{\text{calibration}} = MTTF_{\text{std}}/Calculate \quad NBTI \quad MTTF_{\text{std temp}}$$
(3.1)

In all subsequent Figures, MTTF values calibrated with respect to default one have been shown.

Reliability analysis has been done by REST tool, which uses Monte Carlo simulation based on MIN-MAX theorem for accurate MTTF calculation of NoC router. We have used Monte Carlo simulation for  $10^5$  iterations. Weibull distribution has been used to generate the time to failure instances with shape parameter  $\beta = 1.64$ .

Figure 3.18 shows that increase in buffer size and frequency reduces the MTTF value i.e decreases the reliability of a NoC. If we analyze the Figure 3.16 it shows maximum fluctuation in temperature at higher frequency and buffer size that temperature variation reflect in decrement in MTTF value or reliability. Result show that the choosing a effective size of a buffer has major impact on thermal-aware reliability design of a NoC.



FIGURE 3.17: MTTF values of Different Topology Size and nm Technology



FIGURE 3.18: MTTF values with different Buffer Size

We experiment the impact of different frequencies on same buffer size with other default configuration of NIRGAM and Hotspot. The result shows that the MTTF value decreases with increase in frequency, i.e., reducing the reliability of chip due to nonuniform variation in temperature. The power density variation due to dynamic power consumption leads to change in temperature. Figure 3.19 shows the frequency impact on two different power model that will lead to decrease in reliability of NoCs.

Experimental analysis exhibits the following outcomes:

• MTTF values at 65nm technology are higher than those at 45nm technology for different NoC sizes (refer to Figure 3.17). As technology size in nm decreases, the



FIGURE 3.19: MTTF values at different frequencies

NoC reliability decreases too. The decrease in MTTF values in Figure 3.17 shows these trends.

• The increase in mesh size (increase in the number of nodes in NoC within the same chip area), implies increase in power density of the chip and, hence, decrease in the reliability of the NoC.

HiPER-NIRGAM tool provides the performance, power, thermal and reliability estimation for 2D NoCs. From the results, it can be inferred that the buffers are most power hungry component in NoC router. As higher power consumption increases dissipation as well as temperature affecting reliability, any solution aimed at enhancing reliability needs to reduce the maximum consumption. In our proposal we target buffers as these are major sources of power consumption. The buffer power reduction is the big factor that may provide reasonable improvement in reliability.

## 3.3 Inferences

In this chapter we presented our analysis framework *HiPER-NIRGAM*, a toolchain that integrates provides NoC performance, power, temperature and reliability estimation.

ORION [111, 114] power models is integrated to calculate the average power, power traces and area of NoC components at nanometer technology. HotSpot [2] provides information on thermal profile in terms of temperature variations across an IC. The network traffic (CBR, multimedia, bursty, hotspot traffic), buffer size, frequency, and nm technology of design play a major role in heat dissipation. Thermal model of heat dissipation can be expressed in terms of thermal capacitance (J/K), temperature difference (K), thermal resistance (K/W) and heat transfer rate (W).

Figure 3.20 shows thermal profile of  $4 \times 4$  2D mesh NoC, where it depicts that the router[0][1], router[1][1] and router[2][1] are hotspot nodes for specified traffic application, active source-destination pair communications, frequency and buffer depth and nm technology. Figure 3.21 shows the "Thermal Failure Profile" of  $4 \times 4$  2D mesh NoC. The Figure 3.21 depict that hotspot node have maximum % failure rate.



FIGURE 3.20: 2D Mesh Thermal Profile

FIGURE 3.21: Thermal Hotspot Failure Profile

An analysis of thermal profile for our own router micro-architecture reiterates that virtual channel buffers are the primary hotspot component in router micro-architecture. The power of buffers depends on the number of reads and writes in the buffers that are called *activities* in buffers. The buffer's power varies with the number of VC used, flit width, buffer depth, the frequency of router and nanometer technology cells. We shall be presenting mitigation against reliability degradation in next chapter and our target shall be reliability improvement through reduction in buffer power consumption.

# Chapter 4

# System Level NoC Reliability Enhancement

In previous chapter, we conducted our experiment to identify the components responsible for heat dissipation and more likely to failure and/or degrade reliability. Buffers were the most heat dissipating components. In this chapter, we shall explore how the buffer power consumption can be reduced and impact of this reduction on reliability.

In our "HiPER-NIRGAM" framework, REST tool has been incorporated for estimation of reliability. This tool includes new SM and TC lifetime failure models. This tool uses the Monte Carlo method to estimate the MTTF of a SoC based on the chip characteristics (floorplanning, voltage, temperature). The REST takes at least two inputs to calculate the SoC reliability, the first is floorplan (area information) and the second is temperature traces of each component.

Initial system-level reliability estimation as discussed in previous chapter has following analysis outcomes:

- The node, which has maximum power consumption and heat dissipation is a "Hotspot" node.
- The hotspot node has maximum % failure rate.
- Thermal profile depends on floorplanning of components.

- Power of NoC components depends on their micro-architecture and configuration parameters.
- The buffers consume more than 30% power of a router.
- The buffers are hotspot component of a router, hence subject to maximum failure rate.

# While devising a mitigation technique that can improve reliability improvement, the following possibilities can be considered.

- Reduce the thermal hotspot, i.e., maximum temperature difference to enhance the reliability of NoC.
- Decrease the power consumption of NoC so that that hotspot can be avoided.
- Avoid the traffic congestion to reduce dynamic power.
- Explore low-power techniques to reduce power consumption and hotspot in NoC.
- The Buffer is the primary factor to decrease the reliability of NoC. Aim for reducing power consumption of buffers.

# We have proposed the following *contributions* at system-level to enhance the reliability of NoC.

- 1. Improving NoC reliability using multi- $V_T$  router.
- 2. Enhancing the router reliability using HVT (High threshold Voltage) buffers.
- 3. Reducing hotspot and improve reliability through power-aware floorplanning.

# 4.1 Proposed Architecture and Techniques for Improving NoC Reliability

As per our experiment analysis using the framework, we saw that the power is the major challenge to enhance the reliability of NoC. We have proposed the two architecture:

- Using HVT (High Threshold Voltage) router instead of LVT (Low Threshold Voltage) router.
- Replacing the LVT buffer by HVT buffers in router micro-architecture.



The leakage power is the dominated component of power dissipation. LVT cell has the low threshold voltage and high speed, but they have more leakage power. HVT cells have higher threshold voltage and lower leakage power, but these operate at a lower frequency. The buffers consume more than 30% of the router power, so we replace the LVT buffers with lower leakage power HVT buffers. The experiment results analysis shows a significant improvement in reliability of a router. The result inferences show that this reduces router power, decreases the temperature, hence reduces the hotspot creation. Consequently the reliability of a router is enhanced. This mechanism is called "Fine-Grained" reliability enhancement.

Further, we extend this concept to a route for the mesh network. We design the entire router with HVT cells. This router is called as "HVT Router". This technique is called "Coarse-Grained" reliability enhancement. HVT router reduces the overall power consumption of NoC. This power advantage, reduces the temperature of the chip, hence improves the significantly mean time to failure value. Both fine-grained and coarsegrained HVT techniques enhance the reliability of NoC.

# 4.2 Experimental Setup Result and Analysis

As outlined in [67], we have also defined the Monte Carlo (MC) simulation and Weibull distribution scheme [117] to estimate the MTTF failure of the NoC micro-architecture

and its components. The details of MC simulation and distribution schemes are described in detail in Appendix D.

The random number are generated through MC simulation. Our modified REST tool take  $10^5$  iterations with min-max algorithm to generate the minimum MTTF value corresponding to design of NoC.

We have modified the REST tool by employing the Weibull distribution instead of lognormal distribution. We have also added the life time failure model of TC and SM in along with already available NBTI and HCI models. We have taken the shape parameter  $\beta = 1.64$  as per [67, 83].

| Parameter Name               | Value                   |
|------------------------------|-------------------------|
| Number of Iteration          | 10,0000                 |
| "Shape parameter $(\beta)$ " | 1.640                   |
| Earlier REST Tool            | "NBTI, TDDB"            |
| New modified REST Tool       | "NBTI, TDDB, TC, SM"    |
| Earlier Distribution Scheme  | Lognormal               |
| Modified Distribution Scheme | "Lognormal and Weibull" |

TABLE 4.1: Reliability Estimation Configuration Parameters

### 4.2.1 Generation of Samples From Lifetime Distribution

Every single iteration for all routers required a random instance of failure. The distribution scheme is used to to generate the failure instance. The lognormal and Weibull are the most popular famous lifetime distribution schemes. Equation 4.1 and Equation 4.2 are used to model the samples in Weibull distribution [67].

$$f(x) = e^{-\frac{x}{\alpha}\beta} \tag{4.1}$$

$$x_{samples} = \alpha [-\ln(1-\mu)]^{\frac{1}{\beta}}$$
(4.2)

1

Random numbers are generated in the interval of [0,1] and  $\mu = \operatorname{rand}(0,1)$  [67]. Here,  $\alpha$  and  $\beta$  are scale and shape parameters respectively. Equation 4.3 shows how MTTF is related to scale parameter.

$$\alpha = \frac{MTTF}{\Gamma(1 + \frac{1}{1+\beta})} \tag{4.3}$$

Equation 4.4 expresses the relation between MTTF and the scale parameter for the lognormal distribution.

$$\alpha = \ln(MTTF) - \frac{\beta^2}{2} \tag{4.4}$$

Equation 4.5 gives the random samples for the lognormal distribution as per [69].

$$x_{samples} = e^{\left[\ln(\alpha) - (\beta V2.erfcin(2\mu - 1))\right]}$$

$$(4.5)$$

Here,  $\alpha$  and  $\beta$  used for lognormal distribution.

# 4.2.2 Modified Reliability Estimation Tool and Reliability Result Analysis

In many-core architecture, power and performance of NoC and its components play a significant role in reliability estimation. The temperature dependent life time failure mechanism affect the device reliability. The performance of the transistor is function of threshold voltage  $V_{th}$ . High Threshold Voltage (HVT) transistors are operate at lower frequency as compared to transistor with Low Threshold Voltage (LVT). The LVT cells suffer from high leakage power as compared to HVT cells.

In the following paragraphs we present our results related to reliability analysis on the all three types of cells (HVT, NVT and LVT). We have extracted the parameters of each types of cell from Orion tool. In the Table 4.2, we have listed all the parameters of different cells.

| Type of cell                         | Supply voltage (V) | Operating Frequency (GHz) |
|--------------------------------------|--------------------|---------------------------|
| HVT (High Threshold Voltage Cell)    | 0.8                | 0.2                       |
| NVT (Nominal Threshold Voltage Cell) | 1.0                | 1.0                       |
| LVT (Low Threshold Voltage Cell)     | 1.2                | 3.0                       |

TABLE 4.2: HVT, NVT and LVT Cell Information

Figure 4.3 shows the comparative power analysis of HVT, NVT and LVT based router NoC. The result inferences shows that LVT consumed more power due to operating at higher frequency and supply voltage. HVT has the least consumption of the three. It is less than 5% of the power consumption of a LVT cell.



FIGURE 4.3: Power of Different  $V_{th}$  Cell NoC

Non- uniform rise in temperature due to local hotspot creation has a significant reflection in the reliability of a NoC design. We have done the comparative analysis of the previously designed lifetime failure models (Old REST Model) for the NoC and newly integrated (1) Stress Migration and (2) Thermal Cycle failure models (New REST Model).

Figure 4.4 shows the change in the MTTF values with the change in buffer size at different frequencies using old REST model with NBTI and TDDB failure mechanism. The MTTF value decreases with increase in buffer size and frequency. A decrease in MTTF indicate an increase in rate which failure is more likely. Figure 4.5 shows MTTF values estimated with our modified REST model for using all four failure are lower as compared to Figure 4.4.



FIGURE 4.4: MTTF Values at Different Buffer Size With NBTI and TDDB Old REST Model



FIGURE 4.5: MTTF Values at Different Buffer Size With NBTI, TDDB, SM and TC

Figure 4.6 shows MTTF values at different buffer sizes with old and modified REST models while keeping the frequency constant at 1 GHz. We have used both the old and new reliability estimation models. We have implemented NoC routers at various nm technologies for reliability analysis. The result inferences show that MTTF values with old REST model appears to be an overestimate. Modified REST model calculates the reliability with more accurately using NBTI, TDDB, SM and TC failure mechanism with the Min-Max algorithm.

Figure 4.7 and Figure 4.8 show the reflection of lognormal and Weibull distribution schemes on the LVT and proposed HVT cell-based router NoCs. From these results, it



FIGURE 4.6: MTTF Values at Different Buffer Size With old and new REST Model at Constant Frequency

appears that the Weibull distribution scheme is more accurate than lognormal distribution. The HVT based router NoC is more reliable than LVT based router NoC.



FIGURE 4.7: Lognormal Distribution

### 4.2.3 Power-aware Floorplan Based Reliability Enhancement

The thermal modeling of NoC depends on the floorplan of NoC (.flp), power traces (.ptrace), avg power (.p) and hotspot configuration (hot.config). Figure 4.9 shows the thermal profile of a mesh without power and thermal-aware floorplan, while Figure 4.10 shows the same for a power and thermal-aware floorplan. The hotspot temperature is lower in power and thermal-aware floorplan. The HVT cell-based thermal-profile is shown in



FIGURE 4.8: Weibull Distribution

Figure 4.11. It has least maximum temperature among all three cases. The reliability depends on the floorplan and temperature traces. Hence the HVT cell-based router mesh has maximum reliability.

| route  | <mark>route:(3)(i2):(3)(i1):</mark><br>[3][3] | [ <b>3ອູ້ເໝ</b> ຼີຍ <mark>[2ອູ່ເໝຼີຍ</mark> r[0][0] | 331.19<br>330.15<br>329.12<br>328.09<br>327.06 |
|--------|-----------------------------------------------|-----------------------------------------------------|------------------------------------------------|
|        | router[2][1]                                  | router[1][1]                                        | 326.03<br>324.99<br>324.31                     |
| router | [ <b>2</b> dj@der[2][2]                       | router[1] <mark>2ibuter[0]</mark>                   | 1]                                             |
|        | router[1][3]                                  |                                                     |                                                |
|        | router[0][3]                                  | router[0][2]                                        |                                                |

FIGURE 4.9: A Without Power-Aware Floorplan Thermal Profile



FIGURE 4.10: A Power-Aware Floorplan Thermal Profile



FIGURE 4.11: An HVT Based Buffer Thermal Profile



FIGURE 4.12: MTTF of All Proposed Technique and Architecture. Here (A) refers to LVT router based NoC, (B) refers to HVT router based NoC, (C) refers to power-and thermal-aware floorplanned NoC with LVT router and (D) is NoC based on power-and thermal-aware floorplan integrated with HVT router.

Figure 4.12 shows the comparison of MTTF values for proposed techniques and architectures. The following are the notation taken in Figure 4.12:

- A: LVT based router mesh
- B: HVT with simple floorplan
- C: Power and thermal-aware LVT router
- D: Power and thermal-aware HVT router

The result shows that the proposed architecture and technique D shows the 8.89% increase in MTTF with B and 17.65% increase with C. The proposed architecture B has 8.09% increment as compared to C. The result inferences show that HVT with power and thermal-aware floorplan architecture and technique improve the reliability of NoC.

# 4.3 Conclusions

To increase the accuracy of reliability estimation of NoC architecture at system-level, a accurate power and thermal modeling is required.

Our proposed HVT cells based architecture lead to following inferences:

- SM and TC failure mechanism provide more accurate MTTF estimation. Incorporation of these in the modified REST model, employed in this thesis work, give us better estimates than other approaches.
- HVT buffer based router have more reliable than LVT buffer based router.
- Thermal-aware floorplan can be used for boosting the reliability of NoC without adding any hardware (Area) cost.
- Combining both approaches HVT based cell router and thermal-aware floorplan reduces the thermal hotspot creation hence increase the reliability of NoC.
- In proposed technique and architecture, there is no area overhead.
- In power-aware floorplan, the temperature difference/thermal hotspot is reduced from 6.88 to 1.6. The experimental results show that the proposed technique D improved the MTTF by 8.89% as compared to technique B and 17.65% improvement as compared to the technique C.

# Chapter 5

# RTL-Level: NoC Reliability Enhancement by Reducing Activity Factor

Parallel-processing is a hope for computer architect when it comes to facilitate the highperformance computing using many-core architectures. NoC architecture has emerged as the backbone for many-core communication architecture. Due to rapid scaling of transistor feature size power and its induced effect as well as increase in leakage power are posing challenges for IC designers. Network-on-Chip is a subsystem of many-core architecture, hence the power consumed by it has an impact on temperature and reliability of overall system. Therefore, low power NoCs are becoming popular in multi-core architectures.

A simple NoC architecture has router, links between the routers and Network Interface (NI) for communication between a core and a router. Each router has five input ports, four for all four direction (E, W, N and S) and remaining one used for local core attached to it. A router micro-architecture consists of VCA, SA, crossbar, input buffers and routing unit. NoC in MIT RAW chip consumes more than 28% of the total power budget. The input channel buffers consume more than 20% of area and power of NoC, hence low-power NoC architecture and technique are desired for power-efficient manycore systems. In this chapter, we propose a technique for reduced power consumption input channel buffers using clock gating technique.



To know the power dissipation of each NoC router component, we have implemented the 4 stage pipeline NoC router microarchitecture in Verilog and synthesized it with Synopsys DC using 32nm technology. This synthesis helps as explore the leakage and dynamic power consumed by each router component individually. In our synthesized design, each input port of a router has two virtual channel buffers and each VC buffer is 4-flit deep. The dynamic and leakage power consumptions for different NoC router components are shown in Figure 5.1 and Figure 5.2 respectively. In summary, these results support that buffers become the dominant consumer for dynamic as well as leakage power in NoC router.

As per our previous analysis, increase in power affects reliability. The higher power dissipation decreases the reliability of NoC. As our synthesis results indicate that buffers are responsible for > 50% of dynamic and > 75% of leakage power component in a router. This observation is motivation of our primary objective of enhanced router reliability through reduction in power consumption at buffers.

The dynamic power depends on the load capacitance  $C_{load}$ , frequency f, supply voltage  $V_{dd}$  and switching activity  $\alpha_{SA}$ . Hot Carrier Injection (HCI) is a wear-out mechanism. The primary source of "Hot Carrier" is the heating inside the MOSFET Channel due to circuit operations. These hot carriers gain sufficient kinetic energy to be injected into the gate oxide. As a result of this the MOSFET characteristics such as the threshold voltage shift and result in degradation of the device performance. HCI effect is observed during the dynamic transition (when current flows through the device).

In the CMOS inverter shown in Figure 5.3, PMOS suffers HCI effect when the  $C_L$  is charging. The NMOS transistor experiences the HCI degradation during  $C_L$  discharging

to the ground. Each of the CMOS transistors degrade due to HCI during half of a switching period.



FIGURE 5.3: CMOS Inverter HCI Effect

Time to Failure (TTF) due to HCI degradation is given by Equation 5.1 [118, 119].

$$TTF_{HCI}(T,\alpha_{SA})|_{AC} = A_{HCI} \times \frac{1}{d_g f \alpha_{SA}} \times (I_{sub})^{N\prime} \times e^{\frac{Ea_{HCI}}{kT}}$$
(5.1)

Where  $d_g$  is the transition delay,  $\alpha_{SA}$  is the switching activity, and f is the clock frequency,  $I_{sub}$  is the substrate current under stress  $V_G = V_D$ , T is the run time temperature, k is the Boltzmann's constant, N' is the technology-related exponent and  $A_{HCI}$  is a fitting constant. The equation shows that the TTF due to HCI is inversely related to the switching activity  $\alpha_{SA}$  and frequency f. Frequent switching not only increases the dynamic power consumption, but also speeds up the aging effect.

### 5.1 Related Work

Michele Petracca and Luca Carloni [120] have analyzed the effects of clock gating on the semi-custom design of Network-on-Chip router. They have applied clock gating at the FIFO buffers and have reduced the activity factor of the input data. Utilizing of available tools, they have measured the power at different injection rates and congestion probability. They have also found out that by applying clock gating most power consuming FIFO buffers will remain disabled most of the time as sampling rate will be reduced at every clock cycle. Clock gating does not introduce any area overhead in the design.

Trong-Yen Lee and Chi-Han Huang [121], have proposed i.e. Smart Power Saving (SPS) architecture that utilizes clock gating to develop a power optimized architecture. SPS architecture has shown low power consumption and area reduction in virtual channels of NoC. They have also compared their results with previously proposed architectures.

J. V Bruch and C. A Zeferino [122], have employed clock gating and data encoding techniques based Bus-invert approach to improve the energy efficiency of NoC. They have also performed SystemC based simulation and synthesis and have also reduced the switch activity of the circuit.

Kornaros *et al.* [123], have developed a power efficient MPSoC (Multiprocessor System on Chip) architecture which consists of power and thermal aware sensors at every node which can predict the workload in real time. Several power management techniques are employed like dynamic frequency scaling in the architecture to control the network power. They have also applied dynamic clock management in the design which regulated the clocking frequency of the processors. By employing the prediction techniques, they have controlled the power and temperature of every active node.

In [124] R. Bondade *et al*, have introduced self reconfigurable channel data buffering scheme in NoC. They have developed a power efficient design which is of higher data throughput. They have used the design approaches at system level to optimize the network power. At the time of network congestions, buffer utilizes adaptive flow control by reconfiguring the channel buffers. They have achieved up to 58% of power savings over conventional NoC designs.

D. Zoni *et al.* [125], tried to control the temperature at run time by using Dynamic Thermal Management (DTM) approach. Researchers focused on performance and temperature of NoC in multi-core architectures the design time optimizations. They have also used the concept of clock gating to maintain the performance of cores.

Dynamic Voltage and Frequency Scaling (DVFS) is the most frequently used technique used for NoC power optimization in multi-core architectures. E. Talpes *et al* [126], have developed a design framework for multiple-clock processors which adapts the frequency according to the application running at a given time. They have also done the comparative analysis of different power management strategies. They have also compared GALS (Globally Asynchronous and Locally Synchronous) processors with Dynamic Voltage Scaling (DVS) and inferred that DVS improves the performance.

Huang *et al.* [127], capture the end-to-end traffic by using a small table in the network interface (NI) of each core. The application-driven traffic pattern table (ATPT) is used to predict the injected traffic in the NoC. The prediction of link utilization is employed to adjust the voltage and frequency level of the circuit. Authors used direct-set DVFS (DS-DVFS), latency-aware DVFS (LA-DVFS), and power-aware DVFS (PA-DVFS) to optimize the link power in the network.

D. Atienza and E. Martinez [128], control the power dissipation of the circuit by global monitoring of the temperature through thermal sensors in the chip. They have also changed or adjusted the voltage and frequency of processors in the MPSoC as per the requirements of processor at run time. They have used DVFS in the MPSoC architecture to maintain the power dissipation of the circuit.

W. Jang *et al.* [129], have tried to minimize the power consumption of many-core and NoC architecture by using VFI (Voltage Frequency Island) technique. They have utilized various DVFS aware and VFI aware control schemes on the NoC and reduced the power consumption of the circuit.

S.Vrudhula *et al.* [97], have effectively optimized the power of NoC by enabling system level DVFS approach. Feng Wang *et al* [130], proposed the flexible virtual channel concept through power gating approach to reducing the NoC power. K.Latif *et al* [131], used the virtual channel sharing scheme to lessen the power of input channel buffers. M.R.Casu and P.Giaccone [132], proposed the rate-based vs. delay-based Control for DVFS in NoC to reduce the power of NoC.

Hang-Sheng Wang *et al.* [111], have used architectural level parameterized power models for different components (FIFO, crossbar, and arbiter) of a router for estimating the NoC power consumption.

A.B.Kahng *et al.* [114], have introduced H-tree clock distribution style and improved its leakage and clock power modeling. A.B.Kahng *et al* [115], have improved the power modeling of NoC router by calculating power concerning instance modeling. After an extensive literature review, we infer the following and identify some of the research gaps.

- 1. It is very clear that an increase in power is the primary cause for decrease the reliability. Hence low-power architecture technique is required.
- 2. System-level-contribution is higher-level abstraction work. It has lower accuracy than the RTL-Level.
- 3. ORION [111, 114, 115] does not cater to all micro-architecture changes for a NoC router implementation in *HiPER-NIRGAM* power consumptions, therefore, can not be consider reasonably accurate. Hence, for precise and accurate power calculation RTL-level implementation is required.
- 4. All previously published works employ power reduction techniques at architecturelevel and do not consider the switching activities and/or reliability.
- 5. S.Vrudhula *et al.* [97] used DVFS at system-level, but the authors do not address the actual area and delay overheads of implemented technique for NoC.
- 6. Kim *et al.* [27] used only VC and SW allocation units for RTL analysis and also proposed exercise mode that is more effective for NBTI rater than HCI. It is required to develop a simple and more effective mechanism that will reduce switching and dynamic power of NoC.

# 5.2 Clock Gating



FIGURE 5.4: Basic Clock Gating Circuit

Clock gating is a widespread technique for reducing dynamic power dissipation in synchronous circuits. The Figure 5.4 shows the basic clock gating circuit. Clock gating disables that portion of the circuitry that is not switching its state as switching consumes a lot of power. Clock gating utilizes the enable condition as an input to gate the clock. However, it introduces the overhead related to implementation of clock tree network.

Clock gating does not produce any effect on leakage power dissipation, but it is highly efficient in controlling the circuit power consumption at the run time. For the clock gating, the gated clock signal is derived when an AND operation is performed between clock and wire enable signal. The latch provided in the clock gating circuit is necessary to avoid the glitches that will occur at the output of AND gate.

# 5.3 Proposed Work: Routing Decision Based Clock Gating On Input Buffers

High-performance computing leads to the beginning of a new era of interconnect networks. Technology scaling related to power and thermal issues forces researchers to separately analyze computation (core) and communication (NoC) powers. NoC consumes a significant amount of power in multi-core architectures. Consequently, NoC power optimization is gaining more attention by the design engineers for the high performance of the system. In this work, we have proposed the clock gating technique as follows.



FIGURE 5.5: Input Port Clock Gating

- 1. First we have employed the routing logic enabled clock gating technique at input channel buffers. Figure 5.5 shows the routing logic enabled clock gating at input channel buffers. In this technique the primary clock gating circuit is added to each input channel. The clock and wired enabled signals are input to a latch. The latch output and the clock signal are input to a AND gate that will generate the gated signal for input channel buffer. The routing logic selects the routing path and enables the one of wired enabled signals, one signal for each of five direction ports. At any instance of the clock, only one of the signals receives logic 1, and others are recipients of logic 0. Only the buffer at the port receiving Logic 1 remains connected to the active clock while others are disabled at the same time. As disabled input channels do not provide any activities this does save the buffers power. Figure 5.6 shows the circuit for clock gating at input channel with VC buffers.
- 2. We have also applied the clock gating on FPGA Slices and Block RAM (BRAM), after deploying our design on Kintex 7 FPGA board. Our NoC design contains routing logic enabled clock gating at input channel buffer. In this, the routing decision is analyzed to detect the source register that does not contribute to routing in each clock cycle. Clock gating signals neutralize needless switching activities. Enabling of intelligent clock gating controls the number of registers depends on the level of granularity. Xilinx's intelligent clock gating support granularity up to eight registers. The intelligent clock gating at slice level does not alter the logic so that timing is minimum affected by clock gating.



FIGURE 5.6: Input channel with VC Buffers Clock Gating

Clock gating is mainly applied at the FIFO buffers. Buffers are made up of flip-flop based registers that store the incoming flits in the input channel. Buffers are primarily synchronous circuits whose registers get updated at every clock cycle whether there is any writing operation or not. Consequently, capacitive load increases with every register to the clock network and hence power gets increased. Therefore, clock gating is applied at input channel buffers to avoid unnecessary capacitive loading and discharging effect. In this approach, we will enable the clock only when wire enable signal arrives at input channel.

## 5.4 Result Analysis

Our experimental setup has NoC router architecture; it has used five input channels, and each input channel consists of four virtual channels. These virtual channels used to avoid the deadlock and increase the throughput of a router. In our design, the depth of FIFO buffer is in the multiple of flit size. For example, the flit size is 8 bit, and buffer depth is 4 byte i.e it is 4 times the size of a flit. The total buffer size per port is in the multiple of buffer depth and number of virtual channels.

In this section, we have discussed the power, area, and timing results which we have obtained after applying clock gating at NoC.We have used Xilinx FPGA XC7K325T to perform our experiments and have used Xilinx ISE 14.7 for the synthesis and simulation. For the analysis, we have used XPower Analyzer tool which is supported by Xilinx to derive the power results. For simulation, we have used 2x2 Mesh with X-Y routing and the input channel consists of four virtual channels that store the incoming flits at the input port. Packets are transported in the form of flits, and each flit is 32 bit wide.



FIGURE 5.7: Power results without clock gating at different clock frequencies



FIGURE 5.8: Power results with clock gating at different clock frequencies TABLE 5.1: NoC power (mW) comparison using optimization technique

| Optimization technique | Static power | Dynamic power | Total power |
|------------------------|--------------|---------------|-------------|
| Without CG             | 158.35       | 161.18        | 319.52      |

| Optimization technique | Static power | Dynamic power | Total power |
|------------------------|--------------|---------------|-------------|
| Without CG             | 158.35       | 161.18        | 319.52      |
| I/P port and VC CG     | 158.28       | 157.05        | 315.33      |
| Xilinx CG              | 157.97       | 98.60         | 256.57      |

We have simulated and synthesized our Verilog implemented design NoC with network interface module on Kintex7 Evaluation board. We have calculated the power of NoC design by using Xilinx X-power analyzer tool including switching activity file. We have calculated the static, dynamic and total power of the NoC as shown in Table 5.1. We have used the CG (clock gating), VC (virtual Channel) and BRAM (block RAM) as a notation. We can analyze the given results that the dynamic power will get reduced by applying clock gating technique in our design because it controls and disables the needless switching activities.

TABLE 5.2: Comparison on the basis of FIFO power (mW) optimization using clock gating

| Power optimization technique   | FIFO power | Logic power |
|--------------------------------|------------|-------------|
| Without clock gating           | 71.00      | 29.65       |
| Input port and VC clock gating | 63.40      | 21.19       |
| Xilinx clock gating            | 17.72      | 27.68       |

Table 5.2 shows the comparison from FIFO buffer power and logic power of implemented design by utilizing clock gating as power optimization technique. Both of the powers


FIGURE 5.9: Power results without clock gating



FIGURE 5.10: Power results with clock gating

TABLE 5.3: Comparison on the basis of area utilization with and without clock gating design

|                   |           | 1           |              |  |
|-------------------|-----------|-------------|--------------|--|
| Type of           | Resources | Utilization | Utilization  |  |
| Resources         | Available | (with-CG)   | (without-CG) |  |
| No.of             |           |             |              |  |
| Slice LUTs        | 203800    | 6336        | 7730         |  |
| No.of             |           |             |              |  |
| Occupied Slice    | 50950     | 2604        | 3208         |  |
| No.of             |           |             |              |  |
| LUT F/F pair used | 203800    | 7029        | 8434         |  |
| No.of             |           |             |              |  |
| bounded IOBs      | 500       | 305         | 305          |  |

| Timing Analysis          |                      |                      |
|--------------------------|----------------------|----------------------|
| Property                 | Without Clock Gating | With Clock Gating    |
| Minimum period (ns)      | $2.864 \mathrm{ns}$  | $3.110  \mathrm{ns}$ |
| Minimum input arrival    |                      |                      |
| time before $clock$ (ns) | $1.962 \mathrm{ns}$  | $1.941 \mathrm{ns}$  |
| Maximum output required  |                      |                      |
| time after clock $(ns)$  | $2.721 \mathrm{ns}$  | $2.721 \mathrm{ns}$  |

TABLE 5.4: Timing Analysis with and without clock gating design

get reduced when we apply clock gating. We can also analyze the effect of clock gating on the area utilization from Table 5.3. The number of slice LUTs and occupied slices are reduced when we have applied clock gating. Table 5.4 shows the timing analysis of with and without clock gating technique; the result shows that the delay is slightly increasing due to clock gating technique. The path delay for router to gated clock (router/IP/gated\_clk) is 1.002 ns for gated clock while the path delay for router to crossbar (router/switch\_allocation/ack\_direction to crossbar/multiplexer) is 2.864 ns. With the advancements in technology, devices that operate at higher clock frequency is today's need. Clock frequency is the primary source of dynamic power consumption in the multi-core architectures. This is clearly seen in Figure 5.9 that dynamic power consumption gets increased with the increase in operating frequency of the circuit.

Clock gating produces a significant effect on logic power, signal power, input-output power and FIFO power as clearly evident in Figure 5.7 and 5.8. Buffer power consumption as shown in Figure 5.9 is the main topic of concern in the NoC architecture. Therefore, we have reduced the buffer power utilization by applying clock gating technique at input channel as shown in figures 5.9 and 5.10. We have saved 10.70% FIFO buffer power using routing logic enabled input channel buffers clock gating technique. While, with intelligent clock gating (on Slices and block RAM), dynamic power is furthermore improved by 37%.

### 5.5 Conclusions

Various architectures and techniques are designed to reduce the power of interconnect networks. Clock gating is one of the most popular technique that reduces the dynamic power of NoC architecture. Our approach includes clock gating at input channel and VC FIFO buffers. Result shows that after applying clock gating significant improvements are shown in the dynamic power consumption. Our work improved the power consumption the power of 2D NoC by reduction of 10.70% in the power of input channel buffers with clock gating techniques. We have also seen a significant amount of power reduction approximately 39% when we apply Xilinx intelligent clock gating in the NoC. As far as area is concerned, it reduces the active utilization of LUTs of the FPGA board thus saving the die area.

# Chapter 6

# Mitigating NBTI Stress in NoC Router

In recent times, industry and researchers are focusing to add more cores per chip with increasing integration. The sharp growth in core numbers in System-on-Chip (SoC) poses challenges for efficient and reliable communication among the cores. The scalability, fault tolerance and higher communication bandwidth features made Network-on-chip (NoC), a popular communication architecture than conventional bus architecture. However, the rapid shrinking of CMOS design margins in deep submicron technology has made aging mechanism such as Negative Bias Temperature Instability (NBTI), a prime concern in NoC design. In this chapter, we propose a novel pre-Silicon, NBTI stress aware circuitto-system level solution for a reliable NoC router. To achieve reliability objective, we develop an NBTI aging-aware timing analysis framework based on the real workload stress. Our proposal is based on identification of most stressed path and cells due to NBTI induced effects. We develop an aging-aware cell library and used the multi  $V_{dd}$ technique to mitigate the NBTI induced delay effects.

On the basis of volume of experimental analysis, we observed 66.67% increase in delay over a duration of five years due to NBTI induced aging effects with real workload stress. Our proposed NBTI stress mitigation algorithm and technique could able to reduce aging induced delay effects by 36%, hence enhancing the router performance under stress.

Shrinking transistor dimension in deep sub-micron technology and high transistor density inevitably increase the vulnerability of transistors to permanent and transient faults. The highly condensed design inevitably amplifies the reliability issues. The aging effect deviates the transistor parameters from design specifications and degrades the circuit performance. The change in path delay due to aging creates the timing uncertainty in the circuit. This uncertainty may lead to operation failure after a period.

The lifetime failure mechanism like NBTI, Time dependent dielectric breakdown (TDDB), Hot Carrier Injection (HCI), Electromigration (EM), Stress Migration (SM) and Thermal cycles (TC) are the primary cause of the reliability degradation. The NBTI and HCI are operational stress induced wear-out mechanism in deep sub-micron technology. NBTI is recoverable device aging mechanism. Duty cycle majorly affects the NBTI stress mechanism, cause the shift in threshold voltage of a device that will further lead to path degradation. Therefore, reliability analysis is observed at the pre-silicon stage to detect the path delay degradation due to the aging.

The remaining chapter is organized as follows. In Section 6.1, we have discussed the previous research on aging analysis of NoC and its components at the different level of abstraction. The lifetime wear-out mechanism, modeling and their impact on NoC are discussed in Section 6.2. In Section 6.3, we elaborate the proposed work of NBTI based critical path delay degradation analysis and mitigation mechanism to improve the performance of a NoC. The experimental setup and result analysis is discussed in Section 6.4. Finally; conclusions are derived in Section 6.5.

### 6.1 Related Work

A very limited work has been done in aging of an on-chip interconnect networks. Most of the works address the system-level reliability failure for an on-chip network.

The RAMP [83] tool has used for calculated the reliability of microprocessor and further modified it by adding lognormal distribution and Montecarlo simulation in newer version RAMP 2.0. The REST [67] tool is based on the sum of failure rates (SOFR) with uniform failure probability. The RAMP and REST both tools calculate the mean time to failure (MTTF) values for reliability analysis. The REST is more accurate by using Weibull distribution with Montecarlo simulations. Sharma *et al.* [133] address the reliability analysis framework HiPER-NIRGAM for network-on-chip, where NIRGAM [107] is an NOC simulator. Sharma *et al.* [134], [135] analyzed the reliability based on NBTI, TDDB, SM and TC lifetime failure mechanism of the NoC. Their analysis is based on the variation of NoC parameters like the number of virtual channels, buffer size, frequency and different topology and their size for various cell router HVT (High threshold voltage), NVT (Normal/Standard threshold voltage) and LVT (Low threshold voltage). Another approach used by researchers is proposal of reliability aware routing algorithms. Notable works here include congestion-oblivious and congestion-aware routing [39], lifetime aware algorithms [40].

Kim *et al.* [27] have evaluated the impact of workload upon router activity factor and duty cycle at RTL level. Ancajas *et al.* [41] improved the lifetime of a NoC router by balancing the switching activities. They employed HSPICE simulation of HCI degradation effect. A proactive aging management that monitors NBTI wear-out in heterogeneous NoCs was presented in [106]. Their schemes compared to the state-of-the-art BRAR (Buffered-Router Aware Routing) and achieve 53%, 38% and 29% improvements on system performance, network latency and Energy Delay Product per Flit (EDPPF) overheads.

Following are the research gaps after the extensive literature review.

- RAMP [83] and REST [67] tools calculate the sum of failure rate of components. Their reliability target metrics is MTTF, which work at system-level. MTTF does not address the exact delay degradation in NoC. The delay degradation is based on critical path of NoC. At system-level, it is not possible to address the critical path. Therefore, we have taken critical path degradation analysis as an objective.
- Ancajas *et al.* considered the HCI delay degradation in NoC. They didn't consider the NBTI based delay degradation in their work. NBTI is the most prominent aging effect observed in PMOS transistor so that we are focusing to analyze and mitigating this effect for NoC in this chapter.
- The Buffered-router aware routing is used to calculate reliability in terms of energydelay product per Flit (EDPPF), that does not address the aging based critical path degradation. Also, this work addresses at system-level, which does not consider the transistor-level NBTI delay degradation.

Following are the objectives of this chapter after identifying the above gaps:

- 1. To Analyze the potential critical path (PCP) based delay degradation because due to aging a critical path may no longer act as a critical path.
- Identify the most stressed/aged i.e. affected cell (gate and flip-flop) in PCP due to NBTI effect.
- 3. To mitigate the NBTI induced delay degradation for NoC.

Following are the main contributions to fulfill above objectives:

- 1. Aging-aware multi  $V_{dd}$  cell library characterization at 45nm technology.
- 2. Proposed the algorithm to find out *MostStressed* path and gate due to NBTI induced effects.
- Developed a timing analysis framework based on actual stress caused by PARSEC 2.0 benchmark suite.
- 4. Proposed a mitigation technique and algorithm to use the minimum multi  $V_{dd}$  cell to reduce the aging effects.

## 6.2 Negative bias temperature instability (NBTI)

NBTI is the most prominent aging effect observed in PMOS transistor fabricated with deep submicron technology. NBTI degrades the transistor robustness by shifting its primary parameters. The variation in parameters impacts the gate performance and the lifetime of the circuit. The NBTI affects a stressed transistor when it is reverse biased ( $V_{gs} = -V_{dd}$ ). NBTI effect occurs due to the breaking of Si-H bonds at the interface between the substrate and the gate oxide and freeing an  $H^+$  ion [136]. The drift direction is away from the  $Si/SiO_2$  interface. The dangling Si atoms in the gate dielectric will decrease the transistor performance. This effect is worsened with increasing the temperature. The interface trap carries charge carriers in the channel, increases the absolute threshold voltage  $V_{th}$  and decreases the transistor performance. Equation 6.1 is used for the calculating the threshold voltage drift.

$$\Delta V_{th} = A \times \exp\left(\frac{E_a}{k_B \times T}\right) \times V_{gs}^b \times t_{stress}^n \times \left(\frac{1+C}{W}\right)$$
(6.1)

Where  $V_{gs}$  is gate-source voltage, W is the width of transistor and T is the operating temperature. The drift is dependent on temperature T, the gate-source voltage  $V_{gs}$ , the time t stress the transistor is in NBTI stress mode, and the transistor width W. A,  $E_a$ ,  $k_B$ , b, n and C are constant [137].

## 6.3 Proposed work: NBTI Aging-Aware Dealy Modeling



FIGURE 6.1: Network On Chip Architecture

A NoC router contains different components like input channel buffers, Routing Logic (RL), arbiter, Virtual Channel Allocator (VCA), crossbar and switch traversal, Figure 6.1 shows all these components. The clock synchronizes these components. The path delay of each component depends on the clock and component design. The lifetime failure mechanism increases the delay of each transistor that will make the overall circuit leading to synchronization failure among components. Routing logic (RL) decides the route of the flits based on routing algorithm. The VC allocator and arbiter act as per RL decision. The delay in routing logic due to NBTI stress can fail the synchronization between these router pipeline stages and increase the latency of the router. To extend the life time and fault-free execution of NoC, wear-out effects must be analyzed before fabrication of Silicon. The NBTI stress shifts transistor's threshold voltage, hence leads to critical path degradation. In NoC router VCA and switch allocator are primary component that decide the maximum operating frequency of a router, because these pipeline stages have worst path in NoC router.

In this chapter, our work is focused on the accurate NBTI based delay degradation analysis and mitigation mechanism for router. To achieve this we have proposed two frameworks one for calculating the NBTI stress based delay degradation and other for reducing the effect of NBTI induced delay. Figure 6.2 shows the NBTI based aging delay analysis framework and Figure 6.3 indicates the work-flow of NBTI induced mitigation technique and process.

#### 6.3.1 NBTI Based Aging Modeling and Analysis for NOC

For NBTI based delay analysis, first, we synthesize the router micro-architecture at gate level. We analyze the set of worst delay paths (>20 % slack variation) of each component and router at specified technology library (45nm) using Synopsys Design Vision [138] and Prime Time [138] tools. The path contains the number of gates connected to each other from the primary input to primary output. Our main focus is to find out the NBTI induced aging delay of each combinational and sequential circuit called *cells* present in the path. Figure 6.2 shows framework of NBTI based delay degradation analysis.

Aging Analysis Mechanism: First of all, we have synthesized the 5-stage pipeline router (Verilog code) using Synopsys Design Compiler [138] at 45 nm technology. Design compiler generates the netlist and RTL technology schematic of router microarchitecture. Synopsys Prime Time is used for critical path analysis. All paths having delays  $\geq 80\%$ of the longest path delay are considered as *Potential Critical Paths*. In the next step, we have created the lookup table for all the cells present in the *Potential Critical Paths*. Further, we applied MOSRA (MOS Reliability Analysis) [139] model for NBTI using Synopsys HSPSIC [140] tool to calculate the shift in threshold voltage  $V_{th}$  of each cell.



FIGURE 6.2: NBTI based aging analysis



FIGURE 6.3: Multi $V_{dd}^\prime$  Mitigation technique to reduce the NBTI stress

We also compute the delay of each cell. The delay difference between, with and without applying the MOSRA model on a cell, shows an increase in delay due to the NBTI effect.

For mitigation of the NBTI effect, we have performed the same MOSRA analysis at different  $V_{dd}$  values on all cells, and calculated the change in delay.

| Symbols           | Definition                                       |
|-------------------|--------------------------------------------------|
| $p_i$             | Path delay                                       |
| CP                | Critical path                                    |
| $g_i$             | $i^{ m th} { m Logic}$ gate                      |
| G                 | Set of Gates                                     |
| $d_i$             | Delay of $i^{\text{th}}$ gate                    |
| d                 | Set of gate delay                                |
| $	riangle d_i$    | Change in gate delay i.e $(d'_i - d_i)$          |
| riangle d         | Set of Change in Gate delay                      |
| $D_{cp}$          | Critical path of routing logic                   |
| $D'_{cp}$         | Critical path delay after stress                 |
| $V_{dd}$          | Source voltage                                   |
| $V'_{dd}$         | New increase Source voltage                      |
| $d'_i$            | Delay of $i^{\text{th}}$ gate after stress       |
| $V_{th}$          | Threshold voltage                                |
| $V_{th}^{\prime}$ | Threshold voltage after stress                   |
| $(V_{th})_i$      | Threshold voltage of $i^{th}$ gate               |
| $\triangle$ Sd    | Set $	riangle d$ in descending order             |
| $Pos_i$           | Position of gate in Sd with reference to $G$ Set |
| $CP_{opt}$        | Optimized critical Path after mitigation         |

TABLE 6.1: Symbols used in NBTI based Delay Modeling of NoC

The architecture A of a system S contain its components Comp and design Des. The architecture can be abstracts as A (S, Comp, Des). In our scenario, the architecture of a router R contains its different components like RL, VCA, SA and crossbar. The design of router R can be encapsulated as a set P where each element  $P_k \in P, k \in [1, \dots, n]$  one path from input to output of a router.

$$P = \{P_1, P_2, P_3, P_4, \dots, P_{n-1}, P_n\}$$
(6.2)

Each path  $P_k \in P$ ,  $k \in [1, \dots, n]$  contains a number of gates and flip-flops. The delay of each path is the sum of all cell (gate, flip-flops) delays. The set G represent the collection of cells, present in path P. The G can be represent as

$$G = \{g_1, g_2, g_3, g_4, \dots, g_{n-1}, g_n\}$$
(6.3)

and corresponding cell delay set can be represent as

$$d = \{d_1, d_2, d_3, d_4, \dots, d_{n-1}, d_n\}$$
(6.4)

Suppose the path,  $P_i$  has delay  $D_i$ , contains the N number of cells then delay  $D_i$  define as

$$D_i = \Sigma d_{q_i} \tag{6.5}$$

where  $d_{g_i}$  is represent the  $i^{th}$  cell/gate delay.

The corresponding set of path delay D can be defined as

$$D = \{D_1, D_2, D_3, D_4, \dots, D_{n-1}, D_n\}$$
(6.6)

The critical path  $C_P$  of design, is the longest path delay from set of path delay D. where critical path  $C_P$  can be defined as

$$D(C_P) = max\{D_1, D_2, D_3, D_4, \dots, D_{n-1}, D_n\}$$
(6.7)

The critical path  $C_P$  is our primary concern for aging analysis. Suppose path  $P_i$  and  $P_j$  are two paths where delay of critical path  $P_i$  is more than that of  $P_j$  and circuit is without NBTI stress. After some time due to NBTI stress it is possible that path  $P_i$  no longer acts as a critical path, because the path  $P_j$  delay exceeds that  $P_i$ . A Potential Critical Path (PCP) is a path that can potentially be the critical path after some time due to stress. In our notations, the gate/cell delay after NBTI stress is  $d'_i$ . The amount of shift in cell delay after some time due to stress can be represented as  $\Delta d_i = (d'_i - d_i)$ . The set of shift in cell delay due to stress is represented by  $\Delta d$  as follows:

$$\triangle d = \{ \triangle d_1, \triangle d_2, \triangle d_3, \triangle d_4..... \triangle d_{n-1}, \triangle d_n \}$$
(6.8)

Now, it may be changed in worst path delay, so it is necessary to compute and identify the critical path delay after stress. The corresponding set of change in path delay is D' can be defined as

$$D' = \{D'_1, D'_2, D'_3, D'_4, \dots, D'_{n-1}, D'_n\}$$
(6.9)

The critical path after stress can be expressed as

$$D'(C_P) = max\{D'_1, D'_2, D'_3, D'_4, \dots, D'_{n-1}, D'_n\}$$
(6.10)

Parameters that are affected due to NBTI stress are threshold voltage and delay. The shift in threshold voltage due to stress can be expressed as

$$V_{th}' = V_{th} + |\bigtriangleup V_{th}| \tag{6.11}$$

Where  $V'_{th}$  is increase threshold voltage of aged circuit and  $|\triangle V_{th}|$  is the shift in  $V_{th}$  due to aging. The increase in delay due to NBTI stress slows down the circuit performance. In next subsection we discuss technique for mitigation modeling of NBTI stress.

#### 6.3.2 Mitigation Modeling to reduce the aging effect

Pre-silicon design time techniques are used to overcome the delay introduced by the aging. The multi  $V_{dd}$  is the technique to speed up the aged circuit. Suppose that  $\Delta V_{dd}$  is the value by which supply voltage is increased. The final new supply voltage after increasing is  $V'_{dd}$  can be represent as

$$V'_{dd} = V_{dd} + \triangle V_d \tag{6.12}$$

Delays of all cells, with this new supply voltage, are computed with and without NBTI stress. We characterize the cell library based on the following parameters

- 1. Multi  $V_{dd}$  operating voltage
- 2. NBTI based real workload stress
- 3. Stress duration in years

We replace the most influenced (stressed)cell due to NBTI stress by the new cell which is operating at higher supply voltage from new characterize library. We propose an algorithm to find out the most influenced cell due to stress and called it as "*Most Stressed*" cell.

#### 6.3.3 Proposed Algorithms

In this subsection, we present our proposed algorithm to find out the "Most Stressed" cell. The Algorithm 2 is used to compute the respective shifts in threshold voltage and delay due to NBTI stress. To obtain these stress-induced shifts, we used Equation 6.1 and Synopsys HSPICE tool for MOS Reliability Analysis (MOSRA). The Algorithm 3 used to calculate the new threshold and delay based on mitigation technique. This algorithm applies the new source voltage  $V_{dd}$  to characterize the stressed gates. In Algorithm 1 we call the procedure STRESS\_CALCULATION $(G, d_i, V_{th}, V_{dd})$  from algorithm 2 to find out the threshold and delay after stress. The new critical path delay  $D'_{cp}$  represents the delay of the path have maximum worst delay. In steps 5 to 8, the  $\triangle$  shift in threshold voltage and delay are calculated. These shifts are sorted and arranged in descending order. The  $\triangle$  Sd represents this sorted set of delay shifts. Order of element in G, dmay differ from  $\triangle$  Sd. To maintain the sequence order of element in G, d and  $\triangle$  Sd, we have used a position variable  $POS_i$  for mapping the position among these sets. The first element of the  $\triangle$  Sd is *Most Stressed* cell, which is replaced by the new cell operating at increased source voltage cell. The new cell has delay  $d_{new}$  which is less than the stress induced delay. After this replacement, we identify the next Most Stressed Cell and replace it by new cell characterized at  $V'_{dd}$ . This process of replacement is continued till the value of new critical path delay  $D'_{cp}$  approaches within a certain threshold of the critical path delay without stress. In worst case, all cells are characterized and replaced. The objective behind the Most Stressed gate is that only requisite number of multi  $V_{dd}$ cells need be changed to mitigate the aging effect. The variable  $ocpd_{am}$  represent the optimized critical path delay after mitigation.

#### 6.4 Experimental setup and Result Analysis

This section describes the experimental details of our proposed work for NBTI based reliability analysis of a NoC router and mitigation technique to reduce the aging effect at pre-silicon design. For this purpose, we have proposed the aging-aware timing

# Algorithm 1 Critical Path Delay After NBTI Stress Mitigation (OCPD-A-NBTI-SM) and "*Most Stressed*" Cell

**Require:** Path delay  $(p_i)$ , Logic gate  $(g_i)$ , Gate delay  $(d_i)$ , Threshold voltage  $(V_{th})$ , G, Source voltage  $V_{dd}$ . **Ensure:**  $D'_{cp}$ , Set of Most Stressed gates 1: procedure OCP\_DEALYAFTERSTRESS( $G, d, V_{dd}$ )  $\triangleright$  Critical path delay without NBTI Stress  $D_{cp} = \sum_{i=1}^{n} d_i, \quad \forall (g_i) \in CP;$ 2:  $d'_i = \text{STRESS} \text{CALCULATION}(G, d_i, V_{th}, V_{dd}) \quad \triangleright \text{ Delay of } i^{th} \text{ cell after NBTI}$ 3:Stress  $\begin{array}{ll} D_{cp}' = \sum_i^n d_i', \ \forall (g_i) \in CP; \\ \text{for i=1 to n do} \end{array}$ ▷ New critical path delay after NBTI stress 4:5: $\triangle(d_i) \leftarrow d'_i - d_i \quad \triangleright \text{ Delay difference between NBTI stress and without NBTI}$ 6: stress for  $i^{th}$  cell  $\triangle (V_{th})_i \leftarrow (V'_{th})_i - (V_{th})_i$ 7: end for 8:  $\triangle$  Sd = SORT( $\triangle d$ )<sub>descending</sub>  $\triangleright$  Descending order of delay difference 9: for i=1 to n do 10:for j=1 to n do 11:if  $\triangle Sd_i = \triangle d_j$  then 12: $POS_i = j$  and break  $\triangleright$  Position of Most Stressed Cell 13:end if 14:end for 15:end for 16:for i=1 to n do 17: $d_{new} = \text{MITIGATE} \text{DELAY}(g_{POS[i]}, V_{th}, V'_{dd})$ 18: $\begin{array}{l} d'_{POS[i]} = d_{new} & \rhd \\ D'_{cp} = \sum_{i}^{n} d'_{i}, \ \forall (g_{i}) \in CP \\ \text{if } D_{cp} \leq D'_{cp} \text{ then} \end{array}$  $\triangleright$  Replacement of 'Most Stressed' Gate by new Cell 19:20:21:22:count = i23:ocpd\_am= $D'_{cp}$  $\triangleright$  Optimal Critical path after mitigation else 24:break; 25:26:end if end for 27:28:for i=1 to count do print Critical Gates i.e.  $g_{POS[i]}$ 29:end for 30:print Mitigated Delay i.e  $D'_{cp}$ 31:32: end procedure

Algorithm 2 Stress Calculation Function of each gate Present in critical path

**Require:** Device Technology Parameters,  $V_{th}$ ,  $V_{dd}$ , G **Ensure:** change in Delay  $d'_i$  and threshold voltage  $V'_{th}$  after stress 1: procedure STRESS\_CALCULATION $(G, d_i, V_{th}, V_{dd})$ 2: for i=1 to n do

3:  $d'_i = d_i + \text{NBTI}$  induced delay 4:  $V'_{th} = V_{th} + \text{NBTI}$  induced threshold voltage 5: end for 6: return  $d'_i$ 

7: end procedure

Algorithm 3 Delay Mitigation using Multi  $V_{dd}$  Technique

**Require:** Logic gate  $(g_i)$ , Gate delay  $(d_i)$ , Threshold voltage  $(V_{th})$ , Change in gate delay  $(\triangle d_i)$ , Change in critical path delay  $(\triangle D_{cp})$ , New increase source voltage  $(V'_{dd})$ , G. **Ensure:**  $D_{cp}$ , Set of critical gates 1: **procedure** MITIGATE\_DELAY $(g_i, V_{th}, V'_{dd})$ 2:  $d'_i = \text{STRESS}_CALCULATION}(G, d_i, V_{th}, V_{dd})$ 3: return  $d'_i$ 

framework, shown in Figure 6.4 for NBTI stress based aging analysis and mitigation of NoC component. The parameter which is used here to compare the reliability results is the propagation delay. For experiment setup first, we used our own NoC router micro-



FIGURE 6.4: Aging-aware Timing Framework

architecture written in Verilog. The simulation is carried out using the Synopsys VCS for generating the switching activity file (.saf) and value change dump (.vcd) file. The VCD file define the duty cycle in its rectangular wave form of simulation. The duty cycle is the main source of change in NBTI stress in a circuit [27, 41]. The router and its components are synthesized at gate level using Synopsys Design compiler at 45nm technology. For the detailed timing analysis, we used Synopsys Prime Time tool. We have used delay calculation along paths to construct the PCP of router from which we determine the path with the highest path delay. We create an abstract library of these cells at specific technology and different operating voltages using Synopsys HSPICE tool at circuit level netlist.

#### 6.4.1 Aging Analysis Methodology

This section describes the aging analysis methods. The life time degradation due to NBTI stress depends on the duty cycle of the input signals. The lower the duty cycle i.e the lower is NBTI stress on circuit. We have considered different cells present in potential critical paths. We have varied the values of duty cycle at the input of cell and observed the NBTI stress using MOSRA analysis for different time periods. We have calculated the change in threshold voltage of each cell. Based on this, we characterize the delay cell library at different operating conditions like source voltage, stress time and value of duty cycle.



FIGURE 6.5: CMOS NBTI EFFECT

Figure 6.6 and Figure 6.7 display the shift in threshold voltage at different operating voltages at 45nm and 65nm technology respectively. From these figures, it can be inferred that threshold voltage shift increases with aging and highest shift is observed at year five. As the circuits age, the time these are subjected to the stress also increases. This results in continual shift of the threshold voltages. From these results, it can be inferred that long time stress is responsible for increase in  $\Delta V_{th}$ .

For example the CMOS inverter shown in Figure 6.5 has been tested on 45nm technology at source 0.8 volt for aging analysis. It has been observed the gate to source voltage  $V_{qs}$ 



FIGURE 6.6: Shift in Threshold Voltage  $V_{th}$  per year in Volt at 45 nm Technology

FIGURE 6.7: Shift in Threshold Voltage  $V_{th}$  per year in Volt at 65 nm Technology

of PMOS transistor  $V_{gs}$  is equal to  $-V_{dd}$ , thus implying that PMOS is negatively biased. The negative voltage applied between gate and source terminal acts as a catalyst in increasing the rate of generation of traps at the interface of dielectric and silicon. The impact of these traps on PMOS transistor threshold voltage depends on the duration of the time for which the stress is applied. In this experiment, we have applied the stress ranging upto five years to analyze the reliability of the device.

The input pattern applied is both high to low and low to high for five year duration on it. The Synopsys HSPICE with MOSRA analysis outputs the shift in threshold voltage and delay of inverter per year. Suppose the inverter is *Most Stressed Cell* then its threshold and delay characteristics for new increase source voltage set .8 to 1.8 volt. The statistics of these experiment are shown in Figure 6.8 and 6.9.

Result Inferences: Figure 6.9 shows the change in  $\Delta V_{th}$  of an inverter at different operating voltages under the NBTI stress. The inferences of these result is that long duration stress causes more change in threshold voltage at higher operating voltage.

#### 6.4.2 Cell Library Characterization

The aging-aware cell library is characterized for different source voltages. Tables 6.2–6.5 show the NBTI based delay characterization of some cells library at different  $V_{dd}$  values. The  $\Delta d$  of each cell is calculated for five-year stress. These tables are looked up for replacing a stressed cell  $c_1$  by another cell having delay not more than that of  $c_1$  without stress. Cell  $c_2$  is operating at higher  $V_{dd}$ . The experiment shows that the XNOR is



FIGURE 6.8: Change in  $\triangle V_{th}$  of Inverter cell under NBTI stress



FIGURE 6.9: Change in delay of Inverter cell under NBTI stress

profoundly affected. The reason behind this is more number of PMOS involved in stress as compared to other gates. Table statistics shows XNOR is *MostStressed* cell among all cells.

| Cell Name | Delay without NBTI Stress | Delay With NBTI Stress | Delay(ns) Increase |
|-----------|---------------------------|------------------------|--------------------|
| OR        | 3.70E-11                  | 3.96E-11               | 0.0026009          |
| INV       | 8.06E-12                  | 7.41E-12               | 0.00065            |
| NAND      | 2.10E-12                  | 2.21E-11               | 0.0200031          |
| MUX       | 3.25E-10                  | 5.50 E- 10             | 0.22499            |
| DFF       | 6.19E-11                  | 1.51E-10               | 0.089101           |
| XNOR      | 2.73E-09                  | 5.03E-09               | 2.3                |

TABLE 6.2: Cell Library Delay(sec) Characterization at 45 nm Technology for FiveYear of NBTI Stess at 1.0 volt

| Cell Name | Delay without NBTI Stress | Delay With NBTI Stress | Delay(ns) Increase |
|-----------|---------------------------|------------------------|--------------------|
| OR        | 2.81E-11                  | 2.99E-11               | 0.00173            |
| INV       | $5.75 	ext{E-12}$         | 5.95 E- 12             | 0.0002             |
| NAND      | 1.65 E-11                 | 1.72 E- 11             | 0.00071            |
| MUX       | 5.01E-10                  | 5.01 E- 10             | 0.0002             |
| DFF       | 6.31E-11                  | 6.50 E- 11             | 0.00193            |
| XNOR      | 1.08E-10                  | 2.10E-10               | 0.102              |

TABLE 6.3: Cell Library Delay(sec) Characterization at 45 nm Technology for FiveYear of NBTI Stess at 1.2 volt

TABLE 6.4: Cell Library Delay(sec) Characterization at 45 nm Technology for FiveYear of NBTI Stess at 1.4 volt

| Cell Name | Delay without NBTI Stress | Delay With NBTI Stress | Delay(ns) Increase |
|-----------|---------------------------|------------------------|--------------------|
| OR        | 2.29E-11                  | 2.41E-11               | 0.0012             |
| INV       | 4.93E-12                  | 5.07E-12               | 0.00014            |
| NAND      | 1.39E-11                  | 1.44E-11               | 0.0005             |
| MUX       | 9.55E-11                  | 1.14E-10               | 0.0185             |
| DFF       | 5.35 E- 11                | 5.54E-11               | 0.0019             |
| XNOR      | 4.65E-10                  | 8.76E-10               | 0.411              |

TABLE 6.5: Cell Library Delay(sec) Characterization at 45 nm Technology for FiveYear of NBTI Stess at 1.6 volt

| Cell Name | Delay without NBTI Stress | Delay With NBTI Stress | Delay(ns) Increase |
|-----------|---------------------------|------------------------|--------------------|
| OR        | 2.07E-11                  | 2.17E-11               | 0.00093            |
| INV       | 4.55E-12                  | 4.64E-12               | 0.000089           |
| NAND      | 1.24E-11                  | 1.28E-11               | 0.00044            |
| MUX       | 5.01E-10                  | 5.15 E- 10             | 0.0141             |
| DFF       | 3.66 E-11                 | 4.91E-11               | 0.01243            |
| XNOR      | 2.20E-10                  | 3.88E-10               | 0.1678             |

#### 6.4.3 Timing Analysis

Our main objective is to calculate the NBTI stress for router micro-architecture. We have used the PARSEC [141] real-time benchmarks running on gem5 [142] to collect traces. We have simulated router using Synopsys VCS [138] tool with real-time traffic traces generated by Parsec [41, 141]. We create the traffic based VCD and standard delay format (.sdf) to analysis the NBTI stress. We extract the cell duty cycle value from VCD and multiply with corresponding NBTI stress. Now, the NBTI stress on a cell with real traffic load is represented as

$$Cell_{Delay} = Duty_{Cycle} \times Delay_{NBTIStress}$$
(6.13)

The PCP set is the collection of critical path with slack less then 20% We have taken the worst 500 paths among them for analysis. Figure 6.10 shows the number of paths with their slack values. The router has best (maximum) slack value of .04 ns and worst slack



FIGURE 6.10: Set of Critical path and their Slack Values



FIGURE 6.11: Delay statistics of before, after NBTI stress and with mitigation technique

value of zero. Equation 6.13 is used for calculating the actual path delay after stress based on real traffic Trace i.e PARSEC 2.0 workload. We found out the change in PCP and obtained the worst delay after stress. The experiment shows the worst delay before



FIGURE 6.12: Increase in % delay after stress and mitigation in % delay using multi  $$V_{dd}$$ 

and after the stress is 0.96 and 1.6 respectively. As can be seen from Figure 6.12, Vips benchmark shows the maximum increase of 66.67% in delay due to NBTI stress. We offline analyze the *Most Stressed Path* and *Gate* within PCP. Our experiment shows that the XNOR Gate is highly affected due to stress. Now, as per our Algorithms, we replace the *Most Stressed* cells with increasing  $V_{dd}$  value from .8 to 1.2 volt. We recalculate the critical delay i.e path exhibiting. The *swaptions* benchmark in Figure 6.12 shows the maximum reduction in delay by 36.7% after mitigation technique.

As we know multi- $V_{dd}$  technique is responsible for increase in power of the router and its components. Another possible alternative is using a multi-threshold cell. We have used the LVT cell to mitigate the delay effect. We replace the most affected gates through the LVT cell to achieve the minimum delay degradation.

The result of multi- $V_{th}$  cell and multi- $V_{dd}$  approach are compared in Figure 6.13. The result inferences show that the multi-Vth cell are more appropriate for power-delay metric. The LVT cells are faster but more vulnerable to NBTI based aging delay. The LVT cells are operating at higher frequency with low threshold value. Hence, we used resizing of most stressed gate for mitigating the aging effect. Although this approach has been proposed but, to the best of our knowledge, we are the first to apply it to NoC to mitigate the delay effect on router micro-architecture. Yet another solution for mitigating aging effect is resizing the transistor. Figure 6.13 shows the comparison of



FIGURE 6.13: Comparison of different mitigation approach

mitigation delay using all approaches. For the best trade-off among area, power and future probability of degradation, the resizing is the best approach.

### 6.5 Conclusions

In this chapter, we have proposed a novel pre-silicon solution to reduce the NBTI induced aging effects on an NoC router. We have analyzed the NBTI induced delay based on true stress caused by PARSEC 2.0 benchmark suite. We have developed a novel PCP timing analysis framework based on NBTI stress called *Aging-aware Timing Framework*. We have also developed the novel algorithms to find out the most stressed path and cells. Our proposed mitigation technique and algorithm is based on replacing stressed cells by another cell operating at higher operating voltage to counteract aging induced shift in threshold voltage. Our experimentation suggests that upto 66.67% delay increase takes place due to NBTI stress. Our proposed mitigation technique and algorithm show significant reduction (upto 36%) in shift of delay due to NBTI stress. Increasing size of the most stressed cell offers the best solution on aging induced delay mitigation in NoC router with low overhead of area and power overheads as compared to (1) changing supply voltage and (2) multi- $V_{th}$  cells.

## Chapter 7

# **Conclusions and Future Scope**

As per Moore's law the density of the transistors in semiconductor ICs roughly doubles in every eighteen months. Because of this transistor feature sizes are subjected to rigorous technology scaling. This trend is likely to continue for next generation of integration. Decrease in size and increase in complexity and integration density adversely affect the reliability of NoC that is becoming a major concern for system designers. The aging induced degradation may cause loss of synchronization and/or component failure which, in turn, may lead to entire system failure.

This chapter summarizes conclusions drawn from this research work and presents future research directions. The primary aim of this thesis work is to analyze the thermal induced reliability analysis at the device to the system-level. Our system level framework (HiPER-NIRGAM) based reliability analysis shows that power density/temperature is the primary cause in decreasing the reliability of NoC. The analysis also shows that buffers are one of the primary factor contributing to power consumption and affecting reliability. Our proposed HVT cell-based micro-architecture and thermal-aware floorplan technique increase the reliability of the NoC. Our result indicates that the HVT cell-based NoC router has reduced the hotspot temperature, with resultant improvement in the reliability. Thermal-aware floorplan also reduces the thermal hotspot of NoC without area overhead. Hence a combination of both is more useful in improving the lifetime of NoC. The routing logic enabled clock gating technique to reduce the activities in the buffer, reduces the dynamic power and enhances the reliability of NoC. The proposed Aging-aware framework and algorithms provide the pre-Silicon NBTI-aware design of NoC.

Following is the summary of conclusions from this thesis:

- NoC is communication backbone for high-performance computing like CMPs and MPSoCs architectures. At the same time technology in semiconductor fabrication has result in shrinking transistor size and rise in the power, thermal and reliability issues. NoC and its constituent routers also face the same problems. Hence, the quick reliability estimation at the higher level is required.
- We have developed the "HiPER-NIRGAM" framework for performance, power, thermal and reliability estimation at system-level. This framework considers all NoC parameters – architecture, technology information etc. – for thermal-aware reliability estimation for 2D NoCs.
- The analysis using the proposed framework shows that the power is the primary factor affecting reliability. An increase in power consumption leads to higher rise in temperature, creating local thermal hotspots. The increase in temperature reduces the reliability of NoCs.
- We proposed the system-level reliability enhancement using power-aware router architecture:
  - As per our power, temperature and reliability analysis, the buffer is the main victim of increased the power, temperature and its the reliability is most affected. Our HVT cell-based router and buffer architecture show a significant reduction in power and thermal hotspot.
  - Power-aware floorplan is reduce the temperature difference/thermal hotspot from 6.88 to 1.6 Kel.
  - The experimental results show that the proposed technique HVT with powerthermal aware floorplan improved the MTTF by 17.65% improvement as compared to basic NoC architecture.
  - Proposed architecture mitigates the failure mechanism, and enhances the reliability of NoC and its router.

- The proposed power-aware floorplan mechanism reduces the thermal hotspot in NoC, hence increases the reliability by delaying the onset of the failure mechanism.
- We have added the reliability estimation by SM and TC failure mechanisms with both Weibull and lognormal distribution schemes. The Weibull distribution with all four failure mechanism (NBTI, TDDB, SM, and TC) estimates the MTTF of NoCs.
- As we know the NoC buffers are the major source of power consumption, our router logic based clock gating on VC buffers ensures that reduction in the buffer power consumption. Further one step-down clock gating is used on VCs to reduce the activity in buffers. This increases the reliability based on the reduction in switching activities. We have verified this on Kintex-7 board FPGA and used smart clock gating at slices to improve the switching power. All three configurations at RTL level show the significant improvement in power consumption of NoCs.
- Our work improved the power consumption the power of 2D NoC by reduction of 10.70% in the power of input channel buffers with clock gating techniques. We have also seen a significant amount of power reduction approximately 39% when we apply Xilinx intelligent clock gating in the NoC.
- Our device-level work addresses the aging-aware NoC router micro-architecture by modeling device parameter shift due to NBTI effect that becomes prominent with aging.
  - The aging-induced delay degradation analysis based on the real workload (PAR-SEC 2.0) and lifetime show that the worst path delay of the router is increase significantly due to increasing in gate delay of cell and this delay is induced by NBTI aging effect. Possible solution is to reduce operating frequency but this would result in loss of performance.
  - We have developed a novel "aging-aware timing framework" to create the aging-aware cell library based on the real workload stress. The aging-aware library is also used for calculating the aging-induced path delay.
  - The proposed "Most Stressed Cell" algorithm can be used for identifying stressed cell at pre-Silicon design and replace these with cells operated at

higher supply voltages, cells with larger transistor size or multi- $V_{th}$  cells. Any of these approaches is suited for mitigating the NBTI induced delay but comes with overhead in area and power cost.

- The resizing, i.e., increasing the size of "Most Stressed Cell" is the best technique for future reliability gain and mitigating delay degradation with very low area overhead.
- Our proposed mitigation technique and algorithm show significant reduction (upto 36%) in shift of delay due to NBTI stress.

From our conclusions made in this thesis, we are able to identify few research problems towards reliability of NoC architectures for extending this work in future.

- The system-level NoC reliability enhancement contributions in this thesis are based on pre-Silicon solutions. The run-time reliability enhancement techniques can be explored to increase the reliability of NoC.
- In this thesis, we have considered only switching activity without taking into consideration duty cycle effect to improve the reliability of NoC at RTL level.
- Aging-aware framework is focused only on NBTI effect. Other phenomena can also be taken into account while estimating shift in device parameters due to aging.

# Appendix: A

### HiPER-NIRGAM TOOL GUI AND SETUP

HiPER-NIRGAM is a framework for 2D NoCs that provides the performance, power, thermal and reliability estimation.

#### HiPER-NIRGAM installation steps:

 Set the SystemC path in Makefile.defs, if your SystemC inside Hiper-Nirgam is not working

SYSTEMC = /home/ashish/systemc-2.2.0

This step is mostly used in linux64 to resolve SystemC installation issues. First install SystemC then set its path in Makefile.defs.

- 2. Set ARCH = linux64 in Makefile, if you are using 64-bit OS
- 3. do make clean than make
- 4. Check if the command ./nirgam is working? If it shows core dump and segmentation fault error, do following change in .bashrc file
  - a. go to home directory write gedit.bashrc it will open .bashrc file.
  - b. export SC\_SIGNAL\_WRITE\_CHECK = DISABLE
  - c. ulimit -s unlimited
- Close the previous terminal and open new terminal in *HiPER-NIRGAM* folder again Execute command *make clean* followed by *make*. Now execute ./nirgam (hopefully it work)

- 6. Go to *HiPER-NIRGAM/core/HOTSPOT* and type command *make* then test using following :
  - a ./hotfloorplan
  - b./hotspot
- 7. Go to HiPER-NIRGAM/core/REST and type command make then test ./REST
- For using GUI version
   a. go to *HiPER-NIRGAM/gui* and write python *nirgam.py* on terminal b. it will open *HiPER-NIRGAM* GUI

The directory structure is shown in Figure A.1. The *core* directory contains the ORION, Mcpat, Hotspot and REST tool. The *config* directory contains the nirgam configuration file, application directory contain the traffic and each tile modeling parameters. The GUI directory provides the graphical user interface.



FIGURE A.1: Directory Structure in HiPER-NIRGAM

The new project can be created as shown in Figure A.2. The project name and file (name.xml) can be added to create project as shown in Figure A.3 and A.4. The nirgam configuration parameters such as topology, routing algorithm, etc. can be set and saved as a file as shown in Figure A.5. The simulation cycles, frequency, routing algorithm and failure link information are also input to this configuration. This configuration interface provides the two options for power calculation, ORION or McPAT.

|                                                                | 🗢 🌖 († EV 40) | Sat Oct 21 2017 11:01:25 | AM 🔆 ASHISH-MUDGAL |
|----------------------------------------------------------------|---------------|--------------------------|--------------------|
| New Crine<br>Save Crine<br>Save Al<br>Cose Crine<br>Quet Crine |               | tello Usertt             |                    |
| HiPER     NIRGAM     Network On Chip toolchain                 |               | мсрат                    | ORION              |
|                                                                |               | NIRGAM                   |                    |
|                                                                |               | HOTSPOT                  |                    |
|                                                                |               | REST                     |                    |

FIGURE A.2: Adding a New Project in HiPER-NIRGAM



FIGURE A.3: Adding a Project Name in HiPER-NIRGAM

The application configuration is shown in Figure A.6, in this configuration user can attach different type of traffic to each tile. Afterwards user can run the *NIRGAM* from tool box shown in upper window. Figure A.7 shows a snapshot of execution of *NIRGAM*. The respective latency, throughput and power results are available as charts (refer Figure A.8). Figure A.9 shows the hotfloorplan and hotspot tool for finding the temperature traces and local-hotspot node in the mesh. The red color shows higher temperature and formation of a hotspot.

| NIRGAM-GUI                                   | 🛛 📚 🧔 🍂 🛅 🕪)) Sat Oct 21 20 | 017 10:47:35 AM 🛛 🕀 ASHISH-MUDGAL |
|----------------------------------------------|-----------------------------|-----------------------------------|
| 👩 🖴 🗟 📥 🅸 🔏 🗰 🎆 🎆 🚺 🎯 📑 🔍 🌡                  |                             |                                   |
|                                              | Creating Nev                | v Project                         |
| HIPER<br>NIRGAM<br>Network On Chip toolchain | мсрат                       | ORION                             |
|                                              | NIRGAM                      | 1                                 |
|                                              | HOTSPOT                     |                                   |
|                                              | REST                        |                                   |

FIGURE A.4: Adding a Project Name and File Under The Project in HiPER-NIRGAM

| NIRGAN | 4-GUI      |                       |                                                               | 🔰 🌍 📭 🔝 🕫 | Sat Oct 21 2017 10:47:57 | AM 🔮 ASHISH-MUDGAL |
|--------|------------|-----------------------|---------------------------------------------------------------|-----------|--------------------------|--------------------|
| Q      | 1          | 🖮 🕸 🔏 🧊               | 🎬 🚰 👖 🎯 💷 🔍 🌡                                                 |           |                          |                    |
|        | ashish.xml | nirgam.config         |                                                               |           |                          |                    |
|        |            | TOPOLOGY              | MESH                                                          | ¥         |                          |                    |
|        |            | NUM_ROWS              |                                                               | 5 0       | Creating New Project     |                    |
|        |            | NUM_COLS              |                                                               | 5 0       |                          |                    |
| 9      |            | NUM_SLICE             |                                                               | 1.1       |                          | 1                  |
|        |            | RT_ALGO               | XY                                                            | -         |                          |                    |
|        |            | LOG                   |                                                               | ¥         |                          |                    |
|        |            | SIM_NUM               | 500                                                           |           | MCPAT                    | ORION              |
|        |            | WARMUP                | \$                                                            |           |                          |                    |
| Ľ      |            | TG_NUM                | 400                                                           |           |                          |                    |
| TEX    |            | Other Parameters Link | ( Fail                                                        |           |                          |                    |
|        |            |                       | Attach Application (Use McPat) Attach Application (Use Orion) |           |                          |                    |
| JEX    |            |                       |                                                               |           | NIRGAM                   |                    |
| Q      |            |                       |                                                               |           |                          |                    |
|        |            |                       |                                                               |           |                          |                    |
|        |            |                       |                                                               |           |                          |                    |
| 6      |            |                       |                                                               |           |                          |                    |
|        |            |                       |                                                               |           | HOTEBOT                  |                    |
|        |            |                       |                                                               |           | HOISPOT                  |                    |
|        |            |                       |                                                               |           |                          |                    |
|        |            |                       |                                                               |           |                          |                    |
| 6      |            |                       |                                                               |           |                          |                    |
| 5      |            |                       |                                                               |           |                          |                    |
| 2      |            |                       |                                                               |           | REST                     |                    |
| -      |            |                       |                                                               |           |                          |                    |
| Ĭ      |            |                       |                                                               |           |                          |                    |
|        |            |                       |                                                               |           | Ø <b>1</b>               |                    |

FIGURE A.5: NoC Parameters configuration in HiPER-NIRGAM

Figure A.10 shows the reliability estimation for NoCs. The right side of GUI shows the progress of each execution step of the tool.

The *HiPER-NIRGAM* is a system-level framework for NoCs and has capability to address the routing, application, technology, performance, power, thermal and reliability.

| ам-gui    | . 📥 🧐        | 1           | <b>/</b> | iii 💽   | M @       | 📑 🔘 🥈                |                     |  | ¥ 🦻 🕸 🖬 | <ul> <li>(i) Sat Oct 21 2017 10:51:</li> </ul> | 26 AM 🔅 ASHISH-I |
|-----------|--------------|-------------|----------|---------|-----------|----------------------|---------------------|--|---------|------------------------------------------------|------------------|
| ashish.xm | nl nirgam.co | nfig applic | ation.co | infig   |           |                      |                     |  |         |                                                |                  |
| l         |              |             |          | Atta    | ch Single | Thats It             | Attach Applications |  |         | Creating New Project.                          |                  |
| 1         | Attach       |             | w        | to Tile |           | Edit Tile Properties |                     |  |         |                                                |                  |
|           | Attach       |             | *        | to Tile |           | Edit Tile Properties |                     |  |         |                                                |                  |
|           | Attach       |             | *        | to Tile |           | Edit Tile Properties |                     |  |         |                                                |                  |
|           | Attach       |             |          | to file |           | Edit Tile Properties |                     |  |         |                                                |                  |
|           | Attach       |             | *        | to file | 4         | Edit Tile Properties |                     |  |         |                                                |                  |
|           | Attach       |             |          | to nie  |           | Edit Tile Properties |                     |  |         | MCPAT                                          | ORION            |
|           | Attach       |             | *        | to Tile | 6         | Edit Tile Properties |                     |  |         |                                                |                  |
|           | Account      | Bursty      |          | co me   | 7         | Edit Tile Properties |                     |  |         |                                                |                  |
|           |              |             |          |         |           |                      |                     |  |         |                                                |                  |
|           |              |             |          |         |           |                      |                     |  |         | нотѕрот                                        |                  |
|           |              |             |          |         |           |                      |                     |  |         |                                                |                  |
|           |              |             |          |         |           |                      |                     |  |         | REST                                           |                  |
|           |              |             |          |         |           |                      |                     |  |         |                                                |                  |

FIGURE A.6: NoC Application and traffic configuration in HiPER-NIRGAM

| NIRGAM    | -cui 😽 🧿 👣                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | En 4)) Sat Oct 21 2017 10:51:3 | 4 AM 🖞 ASHISH-MUDGAL |
|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|----------------------|
| 0         | 🖴 🗟 📥 🕸 🔛 🗊 🧱 🗱 🖉 🕹                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                |                      |
|           | ashish.xml nirgam.config application.config Output                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                |                      |
| 3128      | And a many approximation participation of the second s                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                |                      |
|           | Artachina application Bursty, so                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                |                      |
| 1         | Creating tile 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Creating New Project           |                      |
|           | Attaching application Bursty.so                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                |                      |
|           | Creating tile /<br>Attaching andication Burgty so                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                |                      |
|           | Creating tipe is 8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                |                      |
|           | Attaching application Sink.so                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                |                      |
|           | Creating tile 9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                |                      |
|           | Creating Up to 10 |                                |                      |
|           | Attaching application Sink.so                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                |                      |
|           | Creating tile 11                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | MCPAT                          | ORION                |
|           | According approaction sinks of Creation till 12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                |                      |
|           | Attaching application Sink.so                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                |                      |
|           | Creating tile 13                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                |                      |
|           | Actioning application sink so<br>Creation tile 14                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                |                      |
| TeV       | Attaching application Sink.so                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                |                      |
| TEV       | Creating tile 15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                |                      |
| <u> </u>  | Actacing application sink so<br>Creation tile 16                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                |                      |
|           | Attaching application Sink.so                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                |                      |
| TEX       | Creating tile 17                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | NIRGAM                         |                      |
|           | Actacing application sink, so<br>Creation tills 18                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                |                      |
| 2         | Attaching application Sink.so                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                |                      |
|           | Creating tile 19                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                |                      |
|           | Attaching application Sink.so                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                |                      |
| · 🔜 🛛     | Network setup!                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                |                      |
|           | Start NIRGAM simulation!                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                |                      |
|           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                |                      |
| PDP       | kile score received 7 packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                |                      |
|           | tile 2 core received 4 packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | HOTEDOT                        |                      |
|           | tile 3 core received 6 packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | HOISPOI                        |                      |
|           | tile store received a packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                |                      |
|           | tile 6 core received 4 packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                |                      |
|           | Cile 7 core received 8 packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                |                      |
|           | title score received packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                |                      |
| Gi        | tile 10 core received 5 packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                |                      |
|           | tile 11 core received 9 packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                |                      |
|           | tile 12 core received o packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                |                      |
| 10000     | tile 14 core received 12 packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                |                      |
|           | Itile 15 core received 15 packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | REST                           |                      |
| Per l     | Lite To Core received backets tile To Core received Backets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                |                      |
| Presente. | tile 18 core received 13 packets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                |                      |
| 0         | tile 19 core received 8 packetsSystemC: simulation stopped by user.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                |                      |
|           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                |                      |
|           | You entered: ashish                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                |                      |

FIGURE A.7: NIRGAM Execution in HiPER-NIRGAM



FIGURE A.8: Result in Form of Graph in HiPER-NIRGAM



FIGURE A.9: Hotspot in HiPER-NIRGAM



FIGURE A.10: Reliability Estimation in HiPER-NIRGAM

# Appendix: B

## Network-on-Chip Power Model Integrated with NIRGAM

Network-on-Chip communication architecture has emerged as a solution to accommodate the communication among large number of cores. NoC consumes a significant part of many-core system power. A system-level simulator is required to estimate the power consumption of NoC. ORION [108, 111, 114, 115] estimates the area and power of NoC components.

The architects and designers want early design exploration. The ORION meets early design-space exploration requirements and provides accurate NoC power and area estimation. ORION is a dedicated power simulator for Network-on-Chip, while other simulators such as Wattch [116] and McPAT [109, 112] are more focused on the microprocessor. They model detailed microprocessor power while ORION models detailed NoC power consumption.

The ORION models the basic NoC components such as routers and links. It is capable of computing power consumption in each component of the router such as input buffers, virtual channels, route computation unit, arbiter, allocator and crossbar. Therefore, if any component/algorithm/technique is changed in the NIRGAM or any NoC simulator, its power can be calculated using ORION Simulator.

# Difference between ORION 2.0 and ORION 3.0 and the capability of these tools

1. ORION 2.0 is based on the circuit-level template, which models a specific logic structure for implementing the different router component. However, often there is
a significant mismatch between the actual RTL code for the corresponding router component blocks and the logic structure assumed in the ORION 2.0 template.

- 2. Logic template-based approaches inherently cannot capture the difference in RTL generators and the effects of the design tools properly.
- 3. ORION 3.0 fundamentally differs from the earlier version in that the estimation models are derived from actual post-P&R layout area and power data that correspond to the RTL generator and the target library.

ORION 3.0 incorporates two modeling approaches.

- 1. Parametric Modeling
  - The first approach is based on parametric modeling. In contrast to ORION 2.0, the parametric models are not based on the pre-defined logic templates.
  - Instead, for each component block in the router RTL, appropriate parametric models are derived from the post-synthesis netlist by observing how instance count changes with microarchitectural and implementation parameters.
  - Using these parametric models, automatic parametric regression analysis by mean of least-square regression (LSQR) with actual post P & R area and power data is performed to refine the models, which produce highly accurate area and power estimation across multiple router RTLs, microarchitectures, and implementation parameters.
  - This approach/ mechanism relies on a one-time characterization of postsynthesis data to derive the parametric model of component blocks, and the automatic fitting of these models to post P & R data using parametric regression.
- 2. Non-Parametric Modeling
  - These estimation models are also derived from the actual post P & R layout area and power data that correspond to the actual RTL generator and the actual target cell library.
  - The non-parametric modeling is used to derive accurate surrogate models based on a sample set of post-placement and routing results.

- The ORION 3.0 used metamodeling technique for an automatic model generation like Multivariate Adaptive Regression Splines (MARA), Radial Basis Functions (RBF), Kriging (KG), Multivariate Adaptive regression and Support Vector Machine Regression.
- This modeling does not require implementation details for an architect. The ORION 2.0 models for the area, power, and gate count have a large error up to 110% on average versus actual implementation, while ORION 3.0 estimation errors no more than 9.8% across microarchitecture and implementation parameters.

We have used the ORION 3.0 latest version in our thesis, because of its higher accuracy than ORION 2.0.

#### NIRGAM Integration with ORION

Figure 3.13 shows the basic block diagram of NIRGAM integrated with ORION. The integrated simulator has following primary inputs

- 1. Mesh Size: The number of NoC routers available in a mesh topology.
- Topology: Type of topologies like Mesh and Torus. In this thesis we consider only mesh topology for our experiment.
- 3. Number of Virtual channels
- 4. Flit size
- 5. Packet size
- 6. Clock/ frequency
- 7. Routing Algorithms
- 8. Different Traffic

Now, these parameters are passed to the ORION simulator. The ORION has the following capabilities:

• It can model the NoC (Router) power at different technologies.

- It provides the leakage power and dynamic power of NoC and its every component based on the frequency and all parameters passed to it.
- The dynamic power of buffers are calculated using the number of reading and writing activities based on the traffic flits inside the buffers.

NIRGAM simulator contains config directory having *nirgam.config* file to configure the basic parameters of NoC such as topology, rows and columns, etc. constant.h and extern.h to configure the flit size, number of VCs and their organization. These configuration parameters are passed to the *SIM\_router\_power* and *SIM\_router.cpp* which is inside the *orion* directory. We have implemented the code inside NoC.cpp file to generate the power traces per cycle, for each router and components. Technology parameters and different types of threshold cells (HVT, NVT and LVT) parameters are configured in *SIM\_parameters.h, orion\_router\_power.cpp* file which is inside the orion directory.

ORION 3.0 is a power and area simulator for interconnection networks built on top of ORION 2.0. The structure of ORION 3.0 is different from that of ORION 2.0. ORION 3.0 implements a new class structure as well as new models for router micro-architectural blocks. These models are significantly enhanced compared to ORION 2.0, and they are highly accurate with respect to physical implementation.

**Capabilities of ORION Tool:** It is used to calculate the power and area of NoC router and components.

SIM\_port.h file contains router microarchitectural and technological parameters.

#### **Technology Parameters:**

- PARM\_TECH\_POINT: ORION 3.0 supports only 65nm and 45nm technologies. ORION 2.0 updates values for 90nm, 65nm, 45nm and 32nm while keeping the original values from ORION 1.0 for 800nm to 110nm.
- PARM\_TRANSISTOR\_TYPE: can be set to HVT, NVT, or LVT. Here HVT means high VT, NVT means normal VT and LVT means low VT. LVT corresponds to high performance router. NVT corresponds to normal operating performance router and HVT corresponds to low power low performance router.
- PARM\_Vdd: PARM\_Vdd is the operating voltage in Volt.

• PARM\_Freq: PARM\_Freq is the operating frequency in Hz.

If router operates at high frequency, we use LVT and set PARM\_Vdd to a higher value. If router operates at low frequency, we have used HVT and set PARM\_Vdd to a lower value. If the router operates at normal frequency, we use NVT and set PARM\_Vdd to the value in between the LVT and HVT's PARM\_Vdd values. For example, if we simulate the router at 65nm technology, reasonable settings for PARM\_Vdd could be 1.2V for LVT and 0.8V for HVT.

#### **Router Parameters:**

- PARM\_v\_class: PARM\_v\_class is the number of message classes in each router.
- PARM\_v\_channel: PARM\_v\_channel is the number of virtual channels within each message class.
- PARM\_in\_buf\_set: PARM\_in\_buf\_set is the number of input buffers. Set PARM\_in\_share\_buf to 0, which means that input virtual channels don't physically share buffers, then PARM\_in\_buf\_set is the number of buffers per virtual channel. If you set PARM\_in\_share\_buf to 1, which means that input virtual channels physically share buffers, then PARM\_in\_buf\_set is the total number of buffers that are shared.

For ORION integration with NIRGAM, more details of implementation can be downloaded from [107], where code is available with detailed comments.

# Appendix: C

## Hotspot Tool Integrated with NIRGAM

Creation of a hotspot depends on the power density, i.e. the power per unit area. ORION, the power simulator, has following outputs:

- Area of NoC components.
- Power Traces i.e power per cycle, of each component.
- Average power of each components.

Thermal Hotspot model works on the mapping of power traces, average power to the NoC floorplan. Thermal mapping (Hotspot) depends on the following parameters:

- Average power and power traces of each component.
- Floorplan of NoC/router.
- Thermal properties (Hotspot configuration file i.e hotspot.config.

Floorplan and Hotspot Creation Steps: Here, we are showing some output of ORION and input of Hotspot tool

Figure C.1 shows the basic steps to create the floorplan. It contains *unit-name*, *areain-m*<sup>2</sup>, *min-aspect-ratio*, *max-aspect-ratio* and *rotable* of router. We consider all routers are homogeneous so that in Figure C.1 all routers have the same area. We consider the minimum and maximum aspect ratio are 1 and 3 respectively as per floorplan tool.

| floorpla    | n.desc (~/HIP                                        | ER-NIRG                                                                                                                                                                                                                                                                                            | AM/Version                                       | 17/power-fl                                                                                                                                                                                                                                      | oorplan-file | s) - gedit                            |                                                                                                                                                     |               |                                                                                              |           |   |                | <b>0</b> ' | t <b>, En 4</b> )) | 3:11 P/ | мф |
|-------------|------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------------------------------------------------------------------------------------------|-----------|---|----------------|------------|--------------------|---------|----|
| 0           | Open 🔻                                               | Γ                                                                                                                                                                                                                                                                                                  |                                                  |                                                                                                                                                                                                                                                  |              |                                       |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
|             | ном                                                  | OTV                                                                                                                                                                                                                                                                                                | ×                                                | floorplan.c                                                                                                                                                                                                                                      | lesc ×       | floorplan_router.desc                 | ×                                                                                                                                                   | pow.ptrace    | ×                                                                                            | pow_avg.p | × | pow_avg_router | ×          | router_po          | ower    | ×  |
|             | # Line For                                           | mat: <u< td=""><td>nit-name&gt;</td><td><ar< td=""><td>ea-in-m2&gt;</td><td><min-aspect-ratio></min-aspect-ratio></td><td><max-< td=""><td>aspect-ratio&gt;</td><td><rota< td=""><td>able&gt;</td><td></td><td></td><td></td><td></td><td></td><td></td></rota<></td></max-<></td></ar<></td></u<> | nit-name>                                        | <ar< td=""><td>ea-in-m2&gt;</td><td><min-aspect-ratio></min-aspect-ratio></td><td><max-< td=""><td>aspect-ratio&gt;</td><td><rota< td=""><td>able&gt;</td><td></td><td></td><td></td><td></td><td></td><td></td></rota<></td></max-<></td></ar<> | ea-in-m2>    | <min-aspect-ratio></min-aspect-ratio> | <max-< td=""><td>aspect-ratio&gt;</td><td><rota< td=""><td>able&gt;</td><td></td><td></td><td></td><td></td><td></td><td></td></rota<></td></max-<> | aspect-ratio> | <rota< td=""><td>able&gt;</td><td></td><td></td><td></td><td></td><td></td><td></td></rota<> | able>     |   |                |            |                    |         |    |
| ٢           | router[0][<br>router[0][<br>router[0][<br>router[0][ | 0] 2<br>1] 2<br>2] 2<br>3] 2                                                                                                                                                                                                                                                                       | .56561e-0<br>.56561e-0<br>.56561e-0<br>.56561e-0 | 7 1<br>7 1<br>7 1<br>7 1<br>7 1                                                                                                                                                                                                                  | 3<br>3<br>3  | 1<br>1<br>1                           |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
| 2           | router[0][<br>router[1][<br>router[1][               | 4] 2<br>0] 2<br>1] 2<br>2] 2                                                                                                                                                                                                                                                                       | .56561e-0<br>.56561e-0<br>.56561e-0              | 7 1<br>7 1<br>7 1<br>7 1                                                                                                                                                                                                                         | 3<br>3<br>3  | 1<br>1<br>1                           |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
| 9           | router[1][<br>router[1][<br>router[2][               | 3] 2<br>4] 2<br>0] 2                                                                                                                                                                                                                                                                               | .56561e-0<br>.56561e-0<br>.56561e-0              | 7 1<br>7 1<br>7 1<br>7 1                                                                                                                                                                                                                         | 3<br>3<br>3  | 1 1 1                                 |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
| TeX         | router[2][<br>router[2][<br>router[2][<br>router[2][ | 1] 2<br>2] 2<br>3] 2<br>4] 2                                                                                                                                                                                                                                                                       | .56561e-0<br>.56561e-0<br>.56561e-0              | 7 1<br>7 1<br>7 1<br>7 1                                                                                                                                                                                                                         | 3<br>3<br>3  | 1<br>1<br>1                           |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
|             | router[3][<br>router[3][<br>router[3][<br>router[3][ | 0] 2<br>1] 2<br>2] 2<br>3] 2                                                                                                                                                                                                                                                                       | .56561e-0<br>.56561e-0<br>.56561e-0<br>.56561e-0 | 7 1<br>7 1<br>7 1<br>7 1                                                                                                                                                                                                                         | 3<br>3<br>3  | 1<br>1<br>1                           |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
|             | router[3][<br>router[4][<br>router[4][               | 4] 2<br>0] 2<br>1] 2                                                                                                                                                                                                                                                                               | .56561e-0<br>.56561e-0<br>.56561e-0              | 7 1<br>7 1<br>7 1                                                                                                                                                                                                                                | 3<br>3<br>3  | 1 1 1 1                               |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
|             | router[4][<br>router[4][<br># Connecti               | 3] 2<br>4] 2<br>vity in                                                                                                                                                                                                                                                                            | .56561e-0<br>.56561e-0<br>formation              | 7 1<br>7 1                                                                                                                                                                                                                                       | 3            | 1                                     |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
|             | # Line for                                           | mat <un< th=""><th>tti-name&gt;</th><th><un< th=""><th>tt2-name&gt;</th><th><wire_density></wire_density></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th></un<></th></un<>                                                                          | tti-name>                                        | <un< th=""><th>tt2-name&gt;</th><th><wire_density></wire_density></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th></un<>                                                                           | tt2-name>    | <wire_density></wire_density>         |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
| and a state | router[0][<br>router[0][<br>router[0][               | 0] r<br>1] r<br>2] r<br>3] r                                                                                                                                                                                                                                                                       | outer[0][<br>outer[0][<br>outer[0][              | 1] 1<br>2] 1<br>3] 1<br>4] 1                                                                                                                                                                                                                     |              |                                       |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
|             | router[1][<br>router[1][<br>router[1][               | 0] r<br>1] r<br>2] r                                                                                                                                                                                                                                                                               | outer[1][<br>outer[1][<br>outer[1][              | 1] 1<br>2] 1<br>3] 1                                                                                                                                                                                                                             |              |                                       |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
|             | router[1][<br>router[2][<br>router[2][               | 3] r<br>0] r<br>1] r<br>2] r                                                                                                                                                                                                                                                                       | outer[1][<br>outer[2][<br>outer[2][              | 4] 1<br>1] 1<br>2] 1<br>3] 1                                                                                                                                                                                                                     |              |                                       |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |
| -           | router[2][<br>router[3][<br>router[3][               | 3] r<br>0] r<br>1] r                                                                                                                                                                                                                                                                               | outer[2][<br>outer[3][<br>outer[3][              | 4] 1<br>1] 1<br>2] 1                                                                                                                                                                                                                             |              |                                       |                                                                                                                                                     |               |                                                                                              |           |   |                |            |                    |         |    |

FIGURE C.1: floorplan description file

| .ptrace (~/HiPER-NIRG | AM/Version7/pow | ver-floorplan-files) - g   | edit        |                           |                 |                   |              |                            | 🧿 🛊 🖪 🕪) 3:0        | 06 PM 🕴 |
|-----------------------|-----------------|----------------------------|-------------|---------------------------|-----------------|-------------------|--------------|----------------------------|---------------------|---------|
| Open 👻 🖪              |                 |                            |             |                           |                 |                   |              |                            |                     |         |
| _                     | HOWTO           | ×                          | floorp      | lan.desc                  | ×               | floorplan_route   | er.desc      | ×                          | pow.ptrace          |         |
| router[0][0]          | router[0][1]    | router[0][2]               | router[0][: | <pre>] router[0][4]</pre> | ] router[1][0   | ] router[1][1]    | router[1][2  | <pre>!] router[1][3]</pre> | router[1][4] ro     | uter[2  |
| [0] router[           | 2][1] route     | r[2][2] router             | [2][3] rou  | iter[2][4] rout           | ter[3][0] rou   | ter[3][1] route   | er[3][2] rou | iter[3][3] rout            | er[3][4] router[4][ | 0]      |
| router[4][1]          | router[4][2]    | router[4][3]               | router[4][4 |                           | 0 444707        | 0 444707          | 0 444707     | 0 444707                   | 0 444707            |         |
| 0.113278              | 0.113278        | 0.113278                   | 0.111797    | 0.111/9/                  | 0.111797        | 0.111797          | 0.111797     | 0.111797                   | 0.111797            |         |
| 0.111797              | 0.111797        | 0.111797                   | 0.111797    | 0.111/9/                  | 0.111/9/        | 0.111/9/          | 0.111/9/     | 0.111/9/                   | 0.111/9/            |         |
| 0.116306              | 0.116328        | 0.115585                   | 0.113071    | 0.111797                  | 0 112375        | 0 111797          | 0 111797     | 0 111797                   | 0 111797            |         |
| 0.111797              | 0.111797        | 0.111797                   | 0.111797    | 0.111797                  | 0.111797        | 0.111797          | 0.111797     | 0.111797                   | 0.111797            |         |
| 0.111797              | 0.111797        | 0.111797                   | 0.111797    | 0.111797                  |                 |                   |              |                            |                     |         |
| 0.11337 0.11587       | 0.118525        | 0.114094                   | 0.112375    | 0.113017                  | 0.111797        | 0.111797          | 0.111797     | 0.111797                   | 0.111797            |         |
| 0.111797              | 0.111797        | 0.111797                   | 0.111797    | 0.111797                  | 0.111797        | 0.111797          | 0.111797     | 0.111797                   | 0.111797            |         |
| 0.111797              | 0.111797        | 0.111797                   | 0.111797    |                           |                 |                   |              |                            |                     |         |
| 0.111797              | 0.11667 0.119   | 0.1149                     | 4 0.114266  | 0.113142                  | 0.112375        | 0.113071          | 0.111797     | 0.112375                   | 0.113071            |         |
| 0.111797              | 0.111797        | 0.111797                   | 0.111797    | 0.111797                  | 0.111797        | 0.111797          | 0.111797     | 0.111797                   | 0.111797            |         |
| 0.111797              | 0.111797        | 0.111797                   | 0.111797    |                           |                 |                   |              |                            |                     |         |
| 0.114473              | 0.117944        | 0.117358                   | 0.115411    | 0.115464                  | 0.112321        | 0.11489 0.113     | 0.1          | 0.11                       | 4266 0.113666       |         |
| 0.112375              | 0.113045        | 0.111797                   | 0.112375    | 0.113071                  | 0.111797        | 0.111797          | 0.111797     | 0.111797                   | 0.111797            |         |
| 0.111797              | 0.111797        | 0.111797                   | 0.111797    |                           |                 |                   |              |                            |                     |         |
| 0.115075              | 0.115586        | 0.115485                   | 0.112471    | 0.114362                  | 0.111797        | 0.115511          | 0.112321     | 0.111797                   | 0.114244            |         |
| 0.112321              | 0.113595        | 0.113516                   | 0.111797    | 0.112493                  | 0.112992        | 0.111797          | 0.111797     | 0.111/9/                   | 0.111/9/            |         |
| 0.112375              | 0.111/9/        | 0.111/9/                   | 0.111/9/    | 0.111/9/                  | 14065 0 1       | 11707 0 111       | 707 0 1      | 1000 0.11                  | 1707 0 115000       |         |
| 0.11375 0.11081       | 0.113220        | 0.113071                   | A 112005    | A 112671                  | 0.1<br>0 111707 | 0.111<br>0 111707 | 0.1          | 0.11                       | 0.115008            |         |
| 0 111797              | 0.111797        | 0.111797                   | 0.112995    | 0.1150/1                  | 0.111/5/        | 0.111/9/          | 0.1150/1     | 0.115005                   | 0.111/5/            |         |
| 0.116081              | 0.117505        | 0.118021                   | 0 115446    | A 111797                  | 0 114248        | 0 112899          | 0 111797     | 0 113071                   | A 11352 A 111797    |         |
| 0.11352 0.11179       | 7 0.111         | 797 0.1140                 | 62 0.1      | 11797 0.11                | 13666 0.1       | 11797 0.111       | 797 0.1      | 14187 0.11                 | 2995 0.113045       |         |
| 0.111797              | 0.111797        | 0.113045                   |             |                           |                 |                   |              |                            |                     |         |
| 0.115485              | 0.116063        | 0.117269                   | 0.116057    | 0.111797                  | 0.115307        | 0.113017          | 0.111797     | 0.114672                   | 0.111797            |         |
| 0.112375              | 0.111797        | 0.111797                   | 0.112375    | 0.112621                  | 0.111797        | 0.112321          | 0.111797     | 0.111797                   | 0.113049            |         |
| 0.111797              | 0.112992        | 0.111797                   | 0.111797    | 0.112992                  |                 |                   |              |                            |                     |         |
| 0.113745              | 0.117641        | 0.115785                   | 0.116609    | 0.111797                  | 0.113145        | 0.11372 0.112     | 375 0.1      | 0.11                       | 1797 0.113688       |         |
| 0.113071              | 0.111797        | 0.114818                   | 0.112321    | 0.112375                  | 0.111797        | 0.111797          | 0.111797     | 0.113666                   | 0.111797            |         |
| 0.112995              | 0.111797        | 0.111797                   | 0.112995    |                           |                 |                   |              |                            |                     |         |
| 0.114191              | 0.120533        | 0.118111                   | 0.112471    | 0.111797                  | 0.113071        | 0.11479 0.113     | 688 0.1      | 0.11                       | 1797 0.112995       |         |
| 0.114244              | 0.112375        | 0.116685                   | 0.111797    | 0.113688                  | 0.113071        | 0.111797          | 0.112895     | 0.112321                   | 0.112375            |         |
| 0.111797              | 0.111797        | 0.111797                   | 0.111797    |                           |                 |                   |              |                            |                     |         |
| 0.112899              | 0.11459 0.117   | 947 0.1123                 | 75 0.3      | .11797 0.11               | 13516 0.1       | 14269 0.113       | 0.1          | 0.11                       | 1797 0.112375       |         |
| 0.112992              | 0.112992        | 0.116207                   | 0.111797    | 0.112471                  | 0.112992        | 0.111/9/          | 0.113190     | 0.111/9/                   | 0.112408            |         |
| 0.112375              | 0.111/9/        | 0.111/9/                   | 0.111/9/    | 0 112275                  | 0 112471        | 0 115550          | 0 113600     | 0 111707                   | 0 111707            |         |
| 0.114000              | 0.110/25        | 0.110400                   | 2 0 111707  | 0.112375                  | 0.1124/1        | 0.110000          | 0.115088     | 0.111/9/                   | 0.111/9/            |         |
| 0.113663              | 0.113493        | 0.11372 0.1135<br>A 111707 | A 111797    | 0.112521                  | 0.112995        | 0.111/9/          | 0.110000     | 0.111/9/                   | 0.113000            |         |
| 0.115368              | 0.117983        | 0.11547 0.1141             | 97 A        | 13688 0 1                 | 11797 0 1       | 1352 0.112995     | 0.111797     | 0.112375                   | 0.112995            |         |
| 0.115411              | 0.114212        | 0.111797                   | 0.111797    | 0.111797                  | 0.112375        | 0.112375          | 0.112845     | 0.111797                   | 0.112321            |         |
| 0.112995              | 0.111797        | 0.111797                   | 0.111797    | 0.111/0/                  | 0.112575        | 0.112010          | 0.112040     | 0.111/0/                   | *******             |         |
| 0.114711              | 0.116363        | 0.116174                   | 0.113064    | 0.113573                  | 0.112375        | 0.111797          | 0.111797     | 0.111797                   | 0,112992            |         |
|                       |                 |                            |             |                           |                 |                   | Dia          | in Text - Tab Width:       | 8 × 101 Col1 1      |         |

FIGURE C.2: Power Trace File of NoC

| gcc.ttra            | ice (~/HiP | PER-NIRG     | AM/Versie | on7) - gedit | :       |        |            |           |        |        |        |        |         |         |         |           |          |            | 🛛 🧿 🕻   | L En ◀)) | 3:33 PM 🔱               |
|---------------------|------------|--------------|-----------|--------------|---------|--------|------------|-----------|--------|--------|--------|--------|---------|---------|---------|-----------|----------|------------|---------|----------|-------------------------|
| 0                   | Open       | <b>▼</b> [∓] |           |              |         |        |            |           |        |        |        |        |         |         |         |           |          |            |         |          | Save                    |
|                     | н          | OWTO         | ×         | floorplan.   | desc ×  | fl     | oorplan_ro | uter.desc | ×      | pow.pt | race × | po     | w_avg.p | ×       | pow_avg | _router   | ×        | router_po  | wer ×   | go       | c.ttrace ×              |
|                     | router     | [0][0]       | router    | [0][1]       | router[ | 0][2]  | router[    | 0][3]     | router | [0][4] | router | [1][0] | router  | [1][1]  | router  | 1][2]     | router   | 1][3]      | router[ | 1][4]    | router[2]               |
|                     | [0]        | router       | [2][1]    | router       | [2][2]  | router | [2][3]     | router    | [2][4] | router | [3][0] | router | [3][1]  | router[ | 3][2]   | router[   | 3][3]    | router[    | 3][4]   | router[  | 4][0]                   |
|                     | router     | [4][1]       | router    | -[4][2]      | router  | 4][3]  | router     | 4][4]     | 40.04  | 40.00  | 40.00  | 40.00  | 40.05   | 40.03   | 40.00   | 40.00     | 40.04    | 40.00      | 40.00   | 40.04    | 40.00                   |
|                     | 48.90      | 48.90        | 48.98     | 49.02        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.90  | 48.95   | 48.93   | 48.80   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.96        | 48.98     | 40.04        | 48 89   | 48 88  | 48 93      | 48 98     | 49.01  | 48.93  | 48.97  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49 01    | 48.99      | 48.92   | 48 91    | 48 93                   |
|                     | 48,94      | 48,93        | 48.89     | 48.84        | 10105   | 10100  | 10100      | 10150     | 10101  | 10100  | 10172  | 10150  | 10125   | 10195   | 10100   | 10100     | 10101    | 10100      | 10172   | 10171    | 10100                   |
| 100                 | 48.90      | 48.96        | 48.98     | 49.02        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        |         |        |            |           |        |        |        |        |         |         |         |           |          |            |         |          |                         |
|                     | 48.90      | 48.96        | 48.98     | 49.02        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        | 40.00   | 40.00  | 40.02      | 40.00     | 40.01  | 40.02  | 40.00  | 40.06  | 40.05   | 40.02   | 40.06   | 40.00     | 40.01    | 40.00      | 40.00   | 40.01    | 40.02                   |
|                     | 48.90      | 48.90        | 48.98     | 49.02        | 48.89   | 40.00  | 48.95      | 48.98     | 49.01  | 48.95  | 48.92  | 48.90  | 48.95   | 48.95   | 48.80   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.95                   |
|                     | 48.90      | 48.96        | 48.98     | 49.02        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
| $\mathbf{O}$        | 48.94      | 48.93        | 48.89     | 48.84        | 10105   | 10100  | 10170      | 10120     | 10102  | 10175  | 10172  | 10170  | 10170   | 10175   | 10100   | 10177     |          |            | 10172   | 10172    | 10175                   |
|                     | 48.90      | 48.96        | 48.98     | 49.02        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        |         |        |            |           |        |        |        |        |         |         |         |           |          |            |         |          |                         |
| TFX                 | 48.90      | 48.96        | 48.98     | 49.02        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        | 40.00   | 40 00  | 48.02      | 40.00     | 40.01  | 40.02  | 40.02  | 40.06  | 40.05   | 40.03   | 40.06   | 49.00     | 40.01    | 40.00      | 49.03   | 49.01    | 49.03                   |
|                     | 48.90      | 48.90        | 48.90     | 49.02        | 40.09   | 40.00  | 40.95      | 40.90     | 49.01  | 40.95  | 40.92  | 40.90  | 40.95   | 40.95   | 40.00   | 40.99     | 49.01    | 40.99      | 40.92   | 40.91    | 40.95                   |
| _                   | 48,90      | 48,96        | 48.98     | 49.02        | 48.89   | 48.88  | 48,93      | 48,98     | 49.01  | 48,93  | 48,92  | 48,96  | 48,95   | 48,93   | 48,86   | 48,99     | 49.01    | 48,99      | 48,92   | 48,91    | 48.93                   |
| 6-0                 | 48.94      | 48.93        | 48.89     | 48.84        |         |        |            |           |        |        |        |        |         |         |         |           |          |            |         |          |                         |
|                     | 48.90      | 48.96        | 48.98     | 49.02        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        |         |        |            |           |        |        |        |        |         |         |         |           |          |            |         |          |                         |
|                     | 48.90      | 48.96        | 48.98     | 49.02        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        | 40 00   | 40 00  | 49 02      | 40 00     | 40.01  | 49 02  | 49 02  | 49 06  | 49.05   | 49 02   | 40 06   | 49 00     | 40 01    | 49 00      | 49 02   | 49 01    | 49 02                   |
|                     | 48.94      | 48.93        | 48.89     | 49.02        | 40.09   | 40.00  | 40.95      | 40.90     | 49.01  | 40.93  | 40.92  | 40.90  | 40.95   | 40.95   | 40.00   | 40.33     | 49.01    | 40.33      | 40.92   | 40.91    | 40.93                   |
|                     | 48.90      | 48,96        | 48,99     | 49.03        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48,93  | 48,92  | 48,96  | 48,95   | 48.93   | 48.86   | 48.99     | 49.01    | 48,99      | 48,92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        |         |        |            |           |        |        |        |        |         |         |         |           |          |            |         |          |                         |
|                     | 48.90      | 48.96        | 48.99     | 49.03        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        | 40.00   | 40.00  | 40.00      | 40.00     | 40.04  | 40.00  | 40.00  | 40.00  | 40.05   | 40.00   | 40.00   | 40.00     | 40.04    | 40.00      | 40.00   | 40.04    | 40.03                   |
|                     | 48.90      | 48.90        | 48.99     | 49.03        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.80   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
| Section of the last | 48.90      | 48.96        | 48.99     | 49.03        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48,93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        |         |        | 10100      | 101.50    |        | 10100  | 101.52 | 10100  | 10100   | 10100   | 10100   |           |          |            |         |          |                         |
|                     | 48.90      | 48.96        | 48.99     | 49.03        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        |         |        |            |           |        |        |        |        |         |         |         |           |          |            |         |          |                         |
|                     | 48.91      | 48.97        | 48.99     | 49.03        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        | 40.00   | 40.00  | 40.00      | 40.00     | 40.04  | 40.02  | 40.00  | 40.06  | 40.05   | 40.00   | 40.06   | 40.00     | 40.04    | 40.00      | 40.00   | 40.04    | 40.03                   |
|                     | 48.91      | 48.97        | 48.99     | 49.03        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.80   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     | 48.91      | 48.97        | 48.99     | 49.03        | 48.89   | 48.88  | 48,93      | 48,98     | 49.01  | 48,93  | 48,92  | 48,96  | 48,95   | 48,93   | 48.86   | 48,99     | 49.01    | 48,99      | 48,92   | 48,91    | 48.93                   |
|                     | 48.94      | 48.93        | 48.89     | 48.84        |         |        |            | 10110     |        |        |        |        |         |         |         |           |          |            |         |          |                         |
| 124                 | 48.91      | 48.97        | 48.99     | 49.03        | 48.89   | 48.88  | 48.93      | 48.98     | 49.01  | 48.93  | 48.92  | 48.96  | 48.95   | 48.93   | 48.86   | 48.99     | 49.01    | 48.99      | 48.92   | 48.91    | 48.93                   |
|                     |            |              |           |              |         |        |            |           |        |        |        |        |         |         |         | Plain Tex | kt 🔻 Tab | Width: 8 🔻 | Ln      | 1, Col 1 | <ul> <li>INS</li> </ul> |

FIGURE C.3: Temperature Trace File of NoC

Figure C.2 shows the power traces of NoC router per cycle. While Figure C.3 shows the temperature traces.

#### Thermal-Aware Floorplan:



FIGURE C.4: Thermal-aware floorplan using Simulated Annealing Algorithm

Figure C.4 shows flowchart of the simulated annealing based thermal-aware floorplan.

# Appendix: D

### Network-on-Chip REST Model Integrated with NIRGAM

In this thesis, NoC reliability estimation has been carried out by the REST tool. The inputs and outputs of the REST tool are listed blow.

- Input: Temperature Traces (router.ttrace) and floorplan of NoC NoC.flp
- **Output:** *MTTF* (Mean Time To Failure)

We have considered the default value of MTTF as 30 as per REST [67] calibration sequence. This calibrates the MTTF = 30 as the default value for a standard 1mm x 1mm block at the default temperature. This assumes the MTTF for all failure models will be the same for Ceteris Paribus, which is most likely not true. Reliability estimates by REST are relative and not absolute. In other words, REST is always comparing the MTTFs, not working with absolute values.

For NBTI calibration, we have used given Equation.

NBTIcalibration = MTTFstd/Calculate\_NBTI\_MTTFstd temp

Calibration is employed for all failure mechanism by REST and this calibration is vis-avis std MTTF.

### **Details of MTTF calculation:**

Figure D.1 shows the framework for reliability estimation using REST tool. In this framework, NIRGAM configuration file *nirgam.config* is used to provide all NoC parameters. Based on these NoC parameters ORION calculated the area and power of each



FIGURE D.1: REST Tool Framework for Reliability Estimation



FIGURE D.2: Monte Carlo simulation based time to failure evaluation methodology for NoC [67]

router of NoC. These output are provided to the hotspot tool with floorplan description file *noc.desc and router.desc* to create thermal-aware floorplan. Finally temperature traces *noc.ttrace* and floorplan file *noc.flp* are input to REST tool. REST provide the normalized MTTF using Equation 7.

Figure D.2 shows the Monte Carlo (MC) simulation framework for reliability estimation. The input to the HotSpot tool are power traces and area description file of NoC. The output of HotSpot is temperature trace of each router (R1 to maximum size of Mesh) or subblocks. The routers temperature depends on the application traffic. These temperature are input to the Equations 2.4, 2.5, 2.6 and 2.7 listed in Chapter 2. Based on these equations, we modeled the mean time to failure of the probability distribution associated with each subblock/router. We generate the samples/instances during Monte Carlo iteration.

| Algo            | rithm D.1 Monte Carlo algorithm        | 1                                                                                     |
|-----------------|----------------------------------------|---------------------------------------------------------------------------------------|
| Requ            | iire: NoC floorplan and power tra      | ces of each router                                                                    |
| $\mathbf{Ensu}$ | re: MTTF of NoC                        |                                                                                       |
| 1: <b>p</b> i   | rocedure MTTF(flp, ttrace, F, p        | $\operatorname{trace})$                                                               |
| 2:              | for l=1 to F do                        | $\triangleright$ F: No. of failure types                                              |
| 3:              | Calculate $MTTF_l$ using failu         | re equations of NBTI, TDDB, SM and TC                                                 |
| 4:              | for J=1 to N $do$                      | $\triangleright$ N= 10 <sup>5</sup> Monte Carlo iterations                            |
| 5:              | $t{f^j}_{min} \leftarrow \texttt{INF}$ | $\triangleright$ Initialization                                                       |
| 6:              | for $k=1$ to $R$ do                    | $\triangleright \ \mathbf{R}: \ \mathbf{No.of} \ \mathbf{subblock} / \mathbf{Router}$ |
| 7:              | $tf_k \leftarrow generate\_inste$      | $ance(MTTF_l)$                                                                        |
| 8:              | if $	hence tf_k < t f_{min}^j$         |                                                                                       |
| 9:              | MIN_MAX                                |                                                                                       |
| 10:             | $tf_k = tf^j{}_{min}$                  |                                                                                       |
| 11:             | end if                                 |                                                                                       |
| 12:             | end for                                |                                                                                       |
| 13:             | end for                                |                                                                                       |
| 14:             | $tf_l = (\sum_j^N tf_{min}^j)N$        |                                                                                       |
| 15:             | end for                                |                                                                                       |
| 16:             | $return \ tf = MIN\{tf_l\}$            | $\triangleright$ Estimate of MTTF of Whole NoC.                                       |

Algorithm D.1 [67] defines the Monte Carlo simulation process. In algorithm, F is the failure type – 1 for NBTI, 2 for TDDB, 3 for SM and 4 for TC failure mechanism. For each failure mechanism (1 to 4) we execute the  $N = 10^5$  simulation steps:

- For each router generate failure time instance for distribution schemes.
- MIN-MAX analysis of  $tf^{j}_{min}$
- Calculate the time to failure for the current failure mechanism using step number 13 in algorithm.

• Finally calculate the MTTF of complete NoC.

# Publications

### • Journal:

- Ashish Sharma, Manoj Singh Gaur, Lava Bhargava, Vijay Laxmi, "ARNR: Aging-Aware Reliable NoC Router", Microelectronics Reliability 2019. Communicated.
- Yogendra Gupta, Lava Bhargava, Ashish Sharma M.S. Gaur (2019): Hybrid buffers based coarse-grained power gated network on chip router microarchitecture, International Journal of Electronics, DOI: 10.1080/00207217.2019.1644674.
- N. Gupta, A. Sharma, V. Laxmi, M. S. Gaur, M. Zwolinski and R. Bishnoi, " σ<sup>n</sup> LBDR: generic congestion handling routing implementation for two-dimensional mesh network-on-chip ", in IET Computers Digital Techniques, vol. 10, no. 5, pp. 226-232, Sep 2016.
- International Conference:
- Ashish Sharma, Manoj Singh Gaur, Lava Bhargava, Vijay Laxmi, and Mark Zwolinski, "HiPER-NIRGAM: A tool chain based framework for modeling thermal-aware reliability estimation in 2d mesh NoCs", in University Booth, DATE, 2015.
- Ashish Sharma, Prachi Upadhyay, Ruby Ansar, Vijay Laxmi, Lava Bhargava, Manoj Singh Gaur, and Mark Zwolinski, "A framework for thermal aware reliability estimation in 2d NoC", in VLSI Design and Test (VDAT), 2015 19th International Symposium on. IEEE, 2015, pp. 1–6.
- Ashish Sharma, and Manoj Singh Gaur, Ruby Ansar, Prachi Upadhyay, Manish Singhal, "Characterizing impacts of multi-vt routers on power and reliability of network-on-chip", in Contemporary Computing (IC3), 2015 Eighth International Conference on. IEEE, 2015, pp. 476–480.

- Ashish Sharma, Ruby Ansar, Manoj Singh Gaur, Lava Bhargava, Vijay Laxmi, "Reducing FIFO Buffer Power Using Architectural Alternatives at RTL", in VLSI Design and Test (VDAT), 2016 20 th International Symposium on. IEEE, 2016, pp. 1–2.
- Ashish Sharma, Yogendra Gupta, Sonal Yadav, Lava Bhargava, Manoj Singh Gaur, Vijay Laxmi, "A Power, Thermal and Reliability-Aware Network-on-Chip" 2017 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS), Bhopal, 2017.
- Sharma A., Tailor M., Bhargava L., Gaur M.S. (2019) 3D-LBDR: Logic-Based Distributed Routing for 3D NoC. In: Rajaram S., Balamurugan N., Gracia Nirmala Rani D., Singh V. (eds) VLSI Design and Test. VDAT 2018. Communications in Computer and Information Science, vol 892. Springer, Singapore.
- M. S. Gaur, V. Laxmi, M. Zwolinski, M. Kumar, N. Gupta and Ashish, "Networkon-chip: Current issues and challenges," VLSI Design and Test (VDAT), 2015 19th International Symposium on, Ahmedabad, 2015, pp. 1-3.
- Niyati Gupta, Manoj Kumar, Ashish Sharma, Manoj Singh Gaur, Vijay Laxmi, Masoud Daneshtalab, and Masoumeh Ebrahimi. 2015. Improved Route Selection Approaches using Q-learning framework for 2D NoCs. In Proceedings of the 3rd International Workshop on Many-core Embedded Systems (MES '15). ACM, NewYork, NY, USA, 33-40.

# Bibliography

- Evgeny Bolotin, Israel Cidon, Ran Ginosar, and Avinoam Kolodny. Cost considerations in network on chip. *Integr. VLSI J.*, 38(1):19–42, October 2004.
- [2] Wei Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M.R. Stan. Hotspot: a compact thermal modeling methodology for early-stage vlsi design. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 14(5):501-513, May 2006.
- [3] John L. Hennessy and David A. Patterson. Computer Architecture, Fifth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 5th edition, 2011.
- [4] R. H. DENNARD, F. H. GAENSSLEN, H. N. YU, V. LEO RIDEOVT, E. BASSOUS, and A. R. LEBLANC. Design of ion-implanted mosfet's with very small physical dimensions. *IEEE Solid-State Circuits Society Newsletter*, 12(1):38–50, Winter 2007.
- [5] Intel Corp. The single-chip cloud computer. Available at. http://www.intel. compressroom/archive/releases/2009/20091202comp\_sm.htm.
- [6] Shi Sha, Jiawei Zhou, C. Liu, and Gang Quan. Power and energy analysis on intel singlechip cloud computer system. In 2012 Proceedings of IEEE Southeastcon, pages 1–6, March 2012.
- [7] Intel Corp. Teraflops research chip. Available at. http://download.intel.com/ pressroom/kits/Teraflops/Teraflops\_Research\_Chip\_Overview.pdf.
- [8] Tilera Corp. Tilera tile multicore processors. Available at. http://www.tilera.com/ products/processors/.
- [9] Adapteva. Epiphany architecture. Available at. http://www.adapteva.com/.
- [10] Karl Rupp. 40 years of microprocessor trend data. Available at. https://www.karlrupp. net/2015/06/40-years-of-microprocessor-trend-data/.
- [11] William Harrod. A journey to exascale computing. Available at. https://science. energy.gov/~/media/ascr/ascac/pdf/reports/2013/SC12\_Harrod.pdf.

- [12] Chuck Moore. Data processing in exascale-class computer system. Available at. http: //www.lanl.gov/conferences/salishan/salishan2011/3moore.pdf.
- [13] G. E. Moore. The future of integrated electronics. volume 2, 1964.
- [14] Gene M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, Spring Joint Computer Conference, AFIPS '67 (Spring), pages 483–485, New York, NY, USA, 1967. ACM.
- [15] J. Duato, S. Yalamanchili, and L.M Ni. Interconnection networks an engineering approach. IEEE, 1997.
- [16] Tobias Bjerregaard and Shankar Mahadevan. A survey of research and practices of networkon-chip. ACM Comput. Surv., 38(1), June 2006.
- [17] W. J. Dally and B. P. Towles. Principles and practices of interconnection networks. Morgan Kaufmann, 2004.
- [18] L. Benini and G. De Micheli. Networks on chips: a new soc paradigm. Computer, 35(1):70– 78, Jan 2002.
- [19] W. J. Dally and B. Towles. Route packets, not wires: on-chip interconnection networks. In Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232), pages 684-689, 2001.
- [20] J. Henkel, W. Wolf, and S. Chakradhar. On-chip networks: a scalable, communicationcentric embedded system design paradigm. In VLSI Design, 2004. Proceedings. 17th International Conference on, pages 845–851, 2004.
- [21] S. Kumar, A. Jantsch, J. P. Soininen, M. Forsell, M. Millberg, J. Oberg, K. Tiensyrja, and A. Hemani. A network on chip architecture and design methodology. In *Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002*, pages 105–112, 2002.
- [22] S. Rusu, M. Sachdev, C. Svensson, and B. Nauta. Trends and challenges in vlsi technology scaling towards 100 nm. In Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design, pages 16–17, Jan 2002.
- [23] C. Constantinescu. Trends and challenges in vlsi circuit reliability. *IEEE Micro*, 23(4):14–19, July 2003.
- [24] S. Borkar. Microarchitecture and design challenges for gigascale integration. In Microarchitecture, 2004. MICRO-37 2004. 37th International Symposium on, pages 3-3, Dec 2004.

- [25] S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. *IEEE Micro*, 25(6):10–16, Nov 2005.
- [26] ITRS. Itrs 2009 report on reliability. Available at. http://www.itrs2.net/ itrs-reports.html.
- [27] Hyungjun Kim, Arseniy Vitkovskiy, Paul V. Gratz, and Vassos Soteriou. Use it or lose it: Wear-out and lifetime in future chip multiprocessors. In *Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture*, MICRO-46, pages 136–147, New York, NY, USA, 2013. ACM.
- [28] Chita R. Das Chrysostomos Nicopoulos, Vijaykrishnan Narayanan. Network-on-Chip Architectures A Holistic Design Exploration. Springer, 2010.
- [29] C.J. Glass and L.M. Ni. The turn model for adaptive routing. In Proceedings, The 19th Annual International Symposium on Computer Architecture, pages 278–287, 1992.
- [30] J. Flich and J. Duato. Logic-based distributed routing for nocs. Computer Architecture Letters, 7(1):13-16, 2008.
- [31] S. Rodrigo, S. Medardoni, J. Flich, D. Bertozzi, and J. Duato. Efficient implementation of distributed routing algorithms for nocs. *Computers Digital Techniques*, IET, 3(5):460–475, 2009.
- [32] Ge-Ming Chiu. The odd-even turn model for adaptive routing. IEEE Trans. Parallel Distrib. Syst., 11(7):729-738, July 2000.
- [33] L.M. Ni and P.K. McKinley. A survey of wormhole routing techniques in direct networks. Computer, 26(2):62-76, 1993.
- [34] Martin Radetzki, Chaochao Feng, Xueqian Zhao, and Axel Jantsch. Methods for fault tolerance in networks-on-chip. ACM Comput. Surv., 46(1):8:1–8:38, July 2013.
- [35] Martin Radetzki, Chaochao Feng, Xueqian Zhao, and Axel Jantsch. Methods for fault tolerance in networks-on-chip. ACM Comput. Surv., 46(1):8:1–8:38, July 2013.
- [36] X. Fu, T. Li, and J. A. B. Fortes. Architecting reliable multi-core network-on-chip for small scale processing technology. In 2010 IEEE/IFIP International Conference on Dependable Systems Networks (DSN), pages 111–120, June 2010.
- [37] A. K. Kodi, A. Sarathy, A. Louri, and J. Wang. Adaptive inter-router links for low-power, area-efficient and reliable network-on-chip (noc) architectures. In 2009 Asia and South Pacific Design Automation Conference, pages 1–6, Jan 2009.

- [38] C. Hernández, F. Silla, and J. Duato. A methodology for the characterization of process variation in noc links. In 2010 Design, Automation Test in Europe Conference Exhibition (DATE 2010), pages 685–690, March 2010.
- [39] K. Bhardwaj, K. Chakraborty, and S. Roy. Towards graceful aging degradation in nocs through an adaptive routing algorithm. In *Design Automation Conference (DAC)*, 2012 49th ACM/EDAC/IEEE, pages 382–391, June 2012.
- [40] L. Wang, X. Wang, and T. Mak. Dynamic programming-based lifetime aware adaptive routing algorithm for network-on-chip. In Very Large Scale Integration (VLSI-SoC), 2014 22nd International Conference on, pages 1-6, Oct 2014.
- [41] Dean Michael Ancajas, James McCabe Nickerson, Koushik Chakraborty, and Sanghamitra Roy. Hci-tolerant noc router microarchitecture. In Proceedings of the 50th Annual Design Automation Conference, DAC '13, pages 40:1–40:10, New York, NY, USA, 2013. ACM.
- [42] M.S.Gaur, B.M.Al-Hashimi, V.Laxmi, Navaneeth, Naveen Choudhary, and Lavina Jain. Nirgam: A simulator for noc interconnect routing and applications modeling. In University Booth at DATE 2010, pages 1-2, March 2010.
- [43] Wm. A. Wulf and Sally A. McKee. Hitting the memory wall: Implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20-24, March 1995.
- [44] Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA '92, pages 124–134, New York, NY, USA, 1992. ACM.
- [45] Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. SIGARCH Comput. Archit. News, 20(2):124–134, April 1992.
- [46] David W. Wall. Limits of instruction-level parallelism. Western Research Laboratory (WRL), Research Report 93/6, 1993.
- [47] Sangyeun Cho, Pen-Chung Yew, and Gyungho Lee. A high-bandwidth memory pipeline for wide issue processors. *IEEE Transactions on Computers*, 50(7):709–723, July 2001.
- [48] David A. Patterson. Future of computer architecture. Berkeley EECS Annual Research Symposium (BEARS), College of Engineering, UC, 2006.
- [49] Intel prescott. Available at. https://en.wikipedia.org/wiki/Pentium\_4#Prescott.
- [50] P. Dadvar and K. Skadron. Potential thermal security risks. In Semiconductor Thermal Measurement and Management IEEE Twenty First Annual IEEE Symposium, 2005., pages 229–234, March 2005.

- [51] Sudeep Pasricha and Nikil Dutt. On-Chip Communication Architectures: System on Chip Interconnect. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.
- [52] Umit Y. Ogras and Radu Marculescu. Modeling, Analysis and Optimization of Network-on-Chip Communication Architectures, volume 184 of Lecture Notes in Electrical Engineering. Springer Publishing Company, Incorporated, New York NY, 2013.
- [53] Tim Kogel, Rainer Leupers, and Heinrich Meyr. Integrated System-Level Modeling of Network-on-Chip Enabled Multi-Processor Platforms. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
- [54] Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, and Gunar Schirner. Embedded System Design: Modeling, Synthesis and Verification. Springer Publishing Company, Incorporated, 1st edition, 2009.
- [55] S. Pasricha, N. Dutt, and M. Ben-Romdhane. Constraint-driven bus matrix synthesis for mpsoc. In Asia and South Pacific Conference on Design Automation, 2006., pages 6 pp.-, Jan 2006.
- [56] Boris Grot, Joel Hestness, Stephen W. Keckler, and Onur Mutlu. Kilo-noc: A heterogeneous network-on-chip architecture for scalability and service guarantees. In *Proceedings* of the 38th Annual International Symposium on Computer Architecture, ISCA '11, pages 401–412, New York, NY, USA, 2011. ACM.
- [57] Adapteva. Epiphany architecture reference. Available at. http://www.adapteva.com/ docs/e64g401\_datasheet.pdf.
- [58] Open source noc router rtl. Available at. http://nocs.stanford.edu/cgi-bin/trac. cgi/wiki/Resources/Router.
- [59] Natalie Enright Jerger and Li-Shiuan Peh. On-Chip Networks. Morgan Claypool, 2009.
- [60] Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh. On-Chip Networks. Morgan Claypool, 2017.
- [61] Thomas Moscibroda and Onur Mutlu. A case for bufferless routing in on-chip networks. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, pages 196–207, New York, NY, USA, 2009. ACM.
- [62] Thomas Moscibroda and Onur Mutlu. A case for bufferless routing in on-chip networks. SIGARCH Comput. Archit. News, 37(3):196–207, June 2009.
- [63] Gnaneswara Rao Jonna, John Jose, Rachana Radhakrishnan, and Madhu Mutyam. Minimally buffered single-cycle deflection router. In Proceedings of the Conference on Design,

Automation & Test in Europe, DATE '14, pages 310:1–310:4, 3001 Leuven, Belgium, Belgium, 2014. European Design and Automation Association.

- [64] G. Gielen, P. De Wit, E. Maricau, J. Loeckx, J. Martín-Martínez, B. Kaczer, G. Groeseneken, R. Rodríguez, and M. Nafría. Emerging yield and reliability challenges in nanometer cmos technologies. In *Proceedings of the Conference on Design, Automation and Test in Europe*, DATE '08, pages 1322–1327, New York, NY, USA, 2008. ACM.
- [65] Elie Maricau Georges Gielen. Analog IC Reliability in Nanometer CMOS. Springer Publishing Company, Incorporated, 2013.
- [66] M. L. Fair, C. R. Conklin, S. B. Swaney, P. J. Meaney, W. J. Clarke, L. C. Alves, I. N. Modi, F. Freier, W. Fischer, and N. E. Weber. Reliability, availability, and serviceability (ras) of the ibm eserver z990. *IBM J. Res. Dev.*, 48(3-4):519–534, May 2004.
- [67] Alexandre Yasuo Yamamoto and Cristinel Ababei. Unified reliability estimation and management of noc based chip multiprocessors. *Microprocess. Microsyst.*, 38(1):53–63, February 2014.
- [68] Sufi Zafar, B.H. Lee, J. Stathis, A. Callegari, and Tak Ning. A model for negative bias temperature instability (nbti) in oxide and high kappa; pfets 13 times;-c6d8c7f5f2. In VLSI Technology, 2004. Digest of Technical Papers. 2004 Symposium on, pages 208-209, June 2004.
- [69] Jayanth Srinivasan. Lifetime Reliability Aware Microprocessors. PhD thesis, Champaign, IL, USA, 2006. AAI3223724.
- [70] D. Kececioglu. Reliability Engineering Handbook, Part I and II. Prentice Hall, Englewood Cliffs, New Jersey, 1991.
- [71] S. Borkar. Design challenges of technology scaling. IEEE Micro, 19(4):23-29, Jul 1999.
- [72] Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. Dark silicon and the end of multicore scaling. In *Proceedings of the 38th Annual International Symposium on Computer Architecture*, ISCA '11, pages 365–376, New York, NY, USA, 2011. ACM.
- [73] Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. Dark silicon and the end of multicore scaling. SIGARCH Comput. Archit. News, 39(3):365–376, June 2011.
- [74] K. Swaminathan, E. Kultursay, V. Saripalli, V. Narayanan, M. T. Kandemir, and S. Datta. Steep-slope devices: From dark to dim silicon. *IEEE Micro*, 33(5):50–59, Sept 2013.

- [75] Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. Conservation cores: Reducing the energy of mature computations. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, pages 205–218, New York, NY, USA, 2010. ACM.
- [76] Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. Conservation cores: Reducing the energy of mature computations. SIGARCH Comput. Archit. News, 38(1):205-218, March 2010.
- [77] Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. Conservation cores: Reducing the energy of mature computations. SIGPLAN Not., 45(3):205–218, March 2010.
- [78] Ahmed Hemani Axel Jantsch Hannu Tenhunen (eds.) Amir M. Rahmani, Pasi Liljeberg. The Dark Side of Silicon: Energy Efficient Computing in the Dark Silicon Era. Springer International Publishing, 1 edition, 2017.
- [79] Ahmed Hemani Axel Jantsch Hannu Tenhunen (eds.) Amir M. Rahmani, Pasi Liljeberg. Dark vs. Dim Silicon and Near-Threshold Computing. Springer International Publishing, 1 edition, 2017.
- [80] S. Pagani, H. Khdr, J. J. Chen, M. Shafique, M. Li, and J. Henkel. Thermal safe power (tsp): Efficient power budgeting for heterogeneous manycore systems in dark silicon. *IEEE Transactions on Computers*, 66(1):147–162, Jan 2017.
- [81] Hotspot. Available at. http://lava.cs.virginia.edu/HotSpot/.
- [82] Wei Huang. HotSpot—A Chip and Package Compact Thermal Modeling Methodology for VLSI Design. PhD thesis, Charlottesville, Virginia, USA, January 2007.
- [83] Jayanth Srinivasan, Sarita V. Adve, Pradip Bose, and Jude A. Rivers. The case for lifetime reliability-aware microprocessors. SIGARCH Comput. Archit. News, 32(2):276-, March 2004.
- [84] Jayanth Srinivasan. Ramp:reliability-aware microprocessors. Available at. http://rsim. cs.uiuc.edu/ramp/.
- [85] Jeonghee Shin, Victor V. Zyuban, Zhigang Hu, Jude A. Rivers, and Pradip Bose. A framework for architecture-level lifetime reliability modeling. 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), pages 534–543, 2007.

- [86] K. Aisopos, C. H. O. Chen, and L. S. Peh. Enabling system-level modeling of variationinduced faults in networks-on-chips. In 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC), pages 930–935, June 2011.
- [87] Y. Xiang, T. Chantem, R. P. Dick, X. S. Hu, and L. Shang. System-level reliability modeling for mpsocs. In 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 297-306, Oct 2010.
- [88] Ayse Kivilcim Coskun, Tajana Simunic Rosing, Kresimir Mihic, Giovanni De Micheli, and Yusuf Leblebici. Analysis and optimization of mpsoc reliability. *Journal of Low Power Electronics*, 2(1):56–69, 2006.
- [89] Zhenyu Gu, Changyun Zhu, Li Shang, and Robert P. Dick. Application-specific mpsoc reliability optimization. *IEEE Trans. Very Large Scale Integr. Syst.*, 16(5):603–608, May 2008.
- [90] James P. G. Sterbenz, David Hutchison, Egemen K. Çetinkaya, Abdul Jabbar, Justin P. Rohrer, Marcus Schöller, and Paul Smith. Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines. *Comput. Netw.*, 54(8):1245–1265, June 2010.
- [91] A. Dalirsani, M. Hosseinabady, and Z. Navabi. An analytical model for reliability evaluation of noc architectures. In 13th IEEE International On-Line Testing Symposium (IOLTS 2007), pages 49–56, July 2007.
- [92] Cristinel Ababei, Hamed Sajjadi Kia, Om Prakash Yadav, and Jingcao Hu. Energy and reliability oriented mapping for regular networks-on-chip. In Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip, NOCS '11, pages 121–128, New York, NY, USA, 2011. ACM.
- [93] Jaume Abella, Xavier Vera, and Antonio Gonzalez. Penelope: The nbti-aware processor. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 40, pages 85–96, Washington, DC, USA, 2007. IEEE Computer Society.
- [94] Zhiliang Qian and Chi ying Tsui. A thermal-aware application specific routing algorithm for network-on-chip design. In Design Automation Conference (ASP-DAC), 2011 16th Asia and South Pacific, pages 449–454, Jan 2011.
- [95] T. Ebi, M. Faruque, and J. Henkel. Tape: Thermal-aware agent-based power econom multi/many-core architectures. In Computer-Aided Design - Digest of Technical Papers, 2009. ICCAD 2009. IEEE/ACM International Conference on, pages 302-309, Nov 2009.
- [96] Feiyang Liu, Huaxi Gu, and Yintang Yang. Dtbr: A dynamic thermal-balance routing algorithm for network-on-chip. Computers Electrical Engineering, 38(2):270 – 281, 2012.

- [97] V. Hanumaiah and S. Vrudhula. Energy-efficient operation of multicore processors by dvfs, task migration, and active cooling. *Computers, IEEE Transactions on*, 63(2):349–360, Feb 2014.
- [98] Seyab Khan and Said Hamdioui. Temperature dependence of nbti induced delay. In 2010 IEEE 16th International On-Line Testing Symposium, pages 15-20, July 2010.
- [99] H. K. Mondal, S. H. Gade, S. Kaushik, and S. Deb. Adaptive multi-voltage scaling with utilization prediction for energy-efficient wireless noc. *IEEE Transactions on Sustainable Computing*, 2(4):382–395, Oct 2017.
- [100] H. K. Mondal, S. H. Gade, R. Kishore, and S. Deb. Adaptive multi-voltage scaling in wireless noc for high performance low power applications. In 2016 Design, Automation Test in Europe Conference Exhibition (DATE), pages 1315–1320, March 2016.
- [101] N. Sharma, T. V. Aa, P. Agrawal, P. Raghavan, P. R. Panda, and F. Catthoor. Data memory optimization in lte downlink. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 2610–2614, May 2013.
- [102] N. Sharma, P. R. Panda, M. Li, P. Agrawal, and F. Catthoor. Energy efficient data flow transformation for givens rotation based qr decomposition. In 2014 Design, Automation Test in Europe Conference Exhibition (DATE), pages 1–4, March 2014.
- [103] P. Agrawal, P. Raghavan, M. Hartman, N. Sharma, L. Van der Perre, and F. Catthoor. Early exploration for platform architecture instantiation with multi-mode application partitioning. In 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1-8, May 2013.
- [104] Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, and Gunar Schirner. Embedded System Design: Modeling, Synthesis and Verification. Springer Publishing Company, Incorporated, 1st edition, 2009.
- [105] Morv-modelling reliability under variability. Available at. https://www.offis.de/en/ offis/project/morv.html.
- [106] D. M. Ancajas, K. Chakraborty, and S. Roy. Proactive aging management in heterogeneous nocs through a criticality-driven routing approach. In 2013 Design, Automation Test in Europe Conference Exhibition (DATE), pages 1032–1037, March 2013.
- [107] Nirgam. Available at. https://www.arm.ecs.soton.ac.uk/technologies/nirgam/.
- [108] Hang Sheng Wang. Orion3.0: A power-performance simulator for interconnection networks. Available at. http://vlsicad.ucsd.edu/ORION3/.

- [109] Sheng Li. Mcpat:an integrated power, area, and timing modeling framework for multicore and manycore architectures. Available at. http://www.hpl.hp.com/research/mcpat/.
- [110] Chen Sun et al. Dsent: Design space exploration for network tool. Available at. https: //sites.google.com/site/mitdsent/.
- [111] Hang-Sheng Wang, Xinping Zhu, Li-Shiuan Peh, and S. Malik. Orion: a powerperformance simulator for interconnection networks. In *Microarchitecture*, 2002. (MICRO-35). Proceedings. 35th Annual IEEE/ACM International Symposium on, pages 294–305, 2002.
- [112] Sheng Li, Jung Ho Ahn, R.D. Strong, J.B. Brockman, D.M. Tullsen, and N.P. Jouppi. Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In *Microarchitecture*, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on, pages 469–480, Dec 2009.
- [113] Chen Sun, Chia-Hsin Owen Chen, George Kurian, Lan Wei, Jason Miller, Anant Agarwal, Li-Shiuan Peh, and Vladimir Stojanovic. Dsent - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In *Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip*, NOCS '12, pages 201– 210, Washington, DC, USA, 2012. IEEE Computer Society.
- [114] A.B. Kahng, Bin Li, Li-Shiuan Peh, and K. Samadi. Orion 2.0: A power-area simulator for interconnection networks. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 20(1):191–196, Jan 2012.
- [115] A.B. Kahng, Bill Lin, and S. Nath. Explicit modeling of control and data for improved noc router estimation. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pages 392–397, June 2012.
- [116] D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In *Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201)*, pages 83–94, June 2000.
- [117] H. Sajjadi-Kia and C. Ababei. A new reliability evaluation methodology with application to lifetime oriented circuit design. *IEEE Transactions on Device and Materials Reliability*, 13(1):192–202, March 2013.
- [118] X. Li, J. Qin, and J. B. Bernstein. Compact modeling of mosfet wearout mechanisms for circuit-reliability simulation. *IEEE Transactions on Device and Materials Reliability*, 8(1):98-121, March 2008.
- [119] JEDEC Solid State Technology Association. Failure mechanisms and models for semiconductor devices jep122g. 2011.

- [120] Luca Carloni Michele Petracca. The benefits of using clock gating in the design of networks-on-chip, columbia university academic commons. Available at http://hdl.handle.net/10022/ac:p:10683, 2011.
- [121] Trong-Yen Lee and Chi-Han Huang. Design of smart power-saving architecture for network on chip. VLSI Design, 2014.
- [122] C. A ZEFERINO J. V BRUCH. Evaluation of architectural alternatives to reduce power consumption in a network-on-chip. In 2nd Workshop on Circuits and Systems Design (WCAS 2012), 2012.
- [123] George Kornaros and Dionisios Pnevmatikatos. Dynamic power and thermal management of noc-based heterogeneous mpsocs. ACM Trans. Reconfigurable Technol. Syst., 7(1):1:1– 1:26, February 2014.
- [124] R. Bondade and D. Ma. Self-reconfigurable channel data buffering scheme and circuit design for adaptive flow control in power-efficient network-on-chips. *IEEE Transactions* on Circuits and Systems I: Regular Papers, 57(11):2890-2903, Nov 2010.
- [125] D. Zoni, S. Corbetta, and W. Fornaciari. Thermal/performance trade-off in network-onchip architectures. In 2012 International Symposium on System on Chip (SoC), pages 1-8, Oct 2012.
- [126] E. Talpes and D. Marculescu. Toward a multiple clock/voltage island design style for poweraware processors. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 13(5):591-603, May 2005.
- [127] Y.S.-C. Huang, K.C.-K. Chou, and Chung-Ta King. Application-driven end-to-end traffic predictions for low power noc design. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 21(2):229–238, Feb 2013.
- [128] D. Atienza and E. Martinez. Inducing thermal-awareness in multicore systems using networks-on-chip. In 2009 IEEE Computer Society Annual Symposium on VLSI, pages 187–192, May 2009.
- [129] W. Jang, D. Ding, and D. Z. Pan. Voltage and frequency island optimizations for manycore/networks-on-chip designs. In *The 2010 International Conference on Green Circuits* and Systems, pages 217–220, June 2010.
- [130] Feng Wang, Xiantuo Tang, Qinglin Wang, Zuocheng Xing, and Hengzhu Liu. Flexible virtual channel power-gating for high-throughput and low-power network-on-chip. In Digital System Design (DSD), 2014 17th Euromicro Conference on, pages 504–511, Aug 2014.

- [131] K. Latif, T. Seceleanu, and H. Tenhunen. Power and area efficient design of network-onchip router through utilization of idle buffers. In Engineering of Computer Based Systems (ECBS), 2010 17th IEEE International Conference and Workshops on, pages 131–138, March 2010.
- [132] M.R. Casu and P. Giaccone. Rate-based vs delay-based control for dvfs in noc. In Design, Automation Test in Europe Conference Exhibition (DATE), 2015, pages 1096–1101, March 2015.
- [133] Ashish Sharma, Manoj Singh Gaur, Lava Bhargava, Vijay Laxmi, and Mark Zwolinski. Hiper-nirgam: A tool chain based framework for modeling thermal-aware reliability estimation in 2d mesh nocs. In DATE 2015 University Booth, 2015.
- [134] Ashish Sharma, Prachi Upadhyay, Ruby Ansar, Vijay Laxmi, Lava Bhargava, Manoj Singh Gaur, and Mark Zwolinski. A framework for thermal aware reliability estimation in 2d noc. In VLSI Design and Test (VDAT), 2015 19th International Symposium on, pages 1-6. IEEE, 2015.
- [135] Ruby Ansar, Prachi Upadhyay, Manish Singhal, Ashish Sharma, and Manoj Singh Gaur. Characterizing impacts of multi-vt routers on power and reliability of network-on-chip. In Contemporary Computing (IC3), 2015 Eighth International Conference on, pages 476–480. IEEE, 2015.
- [136] M.A. Alam, H. Kufluoglu, D. Varghese, and S. Mahapatra. A comprehensive model for pmos nbti degradation: Recent progress. *Microelectronics Reliability*, 47(6):853 - 862, 2007. Modelling the Negative Bias Temperature Instability.
- [137] Yinghai Lu, Li Shang, Hai Zhou, Hengliang Zhu, Fan Yang, and Xuan Zeng. Statistical reliability analysis under process variation and aging effects. In *Design Automation Conference*, 2009. DAC '09. 46th ACM/IEEE, pages 514–519, July 2009.
- [138] synopsys-tools. Available at. https://www.synopsys.com/.
- [139] B. Tudor, J. Wang, C. Sun, Z. Chen, Z. Liao, R. Tan, W. Liu, and F. Lee. Mosra: An efficient and versatile mos aging modeling and reliability analysis solution for 45nm and below. In 2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology, pages 1645–1647, Nov 2010.
- [140] Hspice. Available at. https://www.synopsys.com/verification/ams-verification/ hspice.html.
- [141] C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. In 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72–81, Oct 2008.

[142] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1-7, August 2011.