Tải bản đầy đủ - 0 (trang)
5 Combining Reliability, Energy Consumption, and Performance

5 Combining Reliability, Energy Consumption, and Performance

Tải bản đầy đủ - 0trang

240



A. G. Erichsen et al.



1.50



Energy



Area



Exec Cycles



1.00

0.50



0.00



8-8-8-8



8-8-8-4



8-8-4-4



Homogeneous



8-4-4-4



8-4-4-2



8-4-2-2



8-2-2-2



ISA-DTMR



4-4-4-4



2-2-2-2



Homogeneous



(a) Scenario 1

1.50



Energy



Area



Exec Cycles



1.00

0.50

0.00



8-8-8-8



Homogeneous



8-8-8-4



8-8-4-4



8-4-4-4



8-4-4-2



ISA-DTMR



8-4-2-2



8-2-2-2



4-4-4-4



2-2-2-2



Homogeneous



(b) Scenario 2



Fig. 4. Energy, area, and performance for each scenario normalized to the 8-8-8-8

processor



and area, providing expressive energy and area reductions at a low performance

cost, and even improving the MWTF with the aforementioned savings in some

cases. In this subsection, we summarize the benefits of applying the ISA-DTMR

technique to improve fault tolerance in multicore systems.

Protecting applications with TMR, ISA-DTMR provides a fault tolerance

mechanism to mask all single faults in the processor for critical applications

and it still is able to reduce the area and energy consumption by executing the

replicas on heterogeneous cores. Protecting applications with DMR, by using

the proposed approach, it is also possible to improve the MWTF of the DMRprotected applications for almost all processor configurations, with the exception

of the POCSAG application in the 8-4-2-2 and 8-2-2-2 processors, as discussed

in Sect. 4.2. In the most significant case the MWTF is improved by more than

50% (x264 in the 8-2-2-2). This means that such applications will be able to

perform more work until an error is detected and the re-execution of the application is performed, resulting in less re-executions. Therefore, the total energy

consumption will be reduced as well as the performance overhead for all the reexecutions. Executing Unprotected applications, even though these applications

are not critical for the system, when the proposed technique is used, such applications are able to perform more work until a failure (up to 35%), when compared

to the homogeneous 8-issue quad-core processor, which also means that these

applications will fail less times. Thus, improving reliability just by choosing a

core configuration that best fits the application behavior, instead of running all

applications on homogeneous cores. In addition, the energy consumption can be

reduced when choosing a smaller core without highly affecting performance for

some applications, such as in the big.LITTLE approach.



ISA-DTMR: Selective Protection in Configurable Heterogeneous Multicores



5



241



Conclusion and Future Work



In this work, the ISA-DTMR is proposed, exploiting the fact that a number

of different microarchitectures that implement the same ISA are available. The

results showed that this technique can improve the fault tolerance consuming

fewer resources, compared to the baseline, on both scenarios, with the exception

of the 8-4-2-2 and 8-2-2-2 designs. The 8-4-4-2 exhibit the most balanced configuration, improving fault tolerance with more than 28% of reduction in energy

consumption and half area occupation, compared to the baseline.

This work also shows that it is possible to improve the applications reliability by correctly choosing the core’s issue-width. As future work, this technique will be applied to other processor architectures and more scenarios with

different applications will be assessed. In addition, a dynamic scheduler and

dynamic application criticality assessment mechanism will be implemented,

and a software-based checker will be assessed and compared to the hardware

approach.

Acknowledgement. This work was supported in part by CNPq, FAPERGS, and

CAPES.



References

1. Arm Limited: Arm DynamIQ technology framework to design and build Cortex-A

CPU systems (2017). https://developer.arm.com/technologies/dynamiq

2. Ashraf, R.A., Mouri, O., Jadaa, R., Demara, R.F.: Design-for-diversity for

improved fault-tolerance of TMR systems on FPGAs. In: 2011 International Conference on Reconfigurable Computing and FPGAs, pp. 99–104, November 2011

3. Avizienis, A., Kelly, J.P.J.: Fault tolerance by design diversity: concepts and experiments. Computer 17(8), 67–80 (1984)

4. Beck, A.C.S., Lisbˆ

oa, C.A.L., Carro, L.: Adaptable Embedded Systems. Springer

Science & Business Media, Heidelberg (2012). https://doi.org/10.1007/978-1-46141746-0

5. Bolchini, C.: A software methodology for detecting hardware faults in VLIW data

paths. IEEE Trans. Reliab. 52(4), 458–468 (2003)

6. Bolchini, C., Carminati, M., Miele, A.: Self-adaptive fault tolerance in multi/many-core systems. J. Electron. Test. 29(2), 159–175 (2013)

7. Geuskens, B., Rose, K.: Modeling Microprocessor Performance. Springer Science

& Business Media, Heidelberg (2012). https://doi.org/10.1007/978-1-4615-5561-2

8. Kriebel, F., Rehman, S., Sun, D., Shafique, M., Henkel, J.: ASER: adaptive soft

error resilience for reliability-heterogeneous processors in the dark silicon era. In:

ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6, June 2014

9. Kriebel, F., Shafique, M., Rehman, S., Henkel, J., Garg, S.: Variability and reliability awareness in the age of dark silicon. IEEE Des. Test 33(2), 59–67 (2016)

10. Littlewood, B.: The impact of diversity upon common mode failures. Reliab. Eng.

Syst. Saf. 51(1), 101–113 (1996)

11. Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed design and evaluation of

redundant multi-threading alternatives. In: Proceedings 29th Annual International

Symposium on Computer Architecture, pp. 99–110 (2002)



242



A. G. Erichsen et al.



12. Pillai, A., Zhang, W., Kagaris, D.: Detecting VLIW hard errors cost-effectively

through a software-based approach. In: 21st International Conference on Advanced

Information Networking and Applications Workshops, AINAW 2007, vol. 1, pp.

811–815, May 2007

13. Ray, J., Hoe, J.C., Falsafi, B.: Dual use of superscalar datapath for transientfault detection and recovery. In: MICRO-34 Proceedings of the 34th ACM/IEEE

International Symposium on Microarchitecture, pp. 214–224, December 2001

14. Reinhardt, S.K., Mukherjee, S.S.: Transient fault detection via simultaneous multithreading. In: Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No. RS00201), pp. 25–36, June 2000

15. Reis, G.A., Chang, J., Vachharajani, N., Mukherjee, S.S., Rangan, R., August, D.I.:

Design and evaluation of hybrid fault-detection systems. In: 32nd International

Symposium on Computer Architecture (ISCA 2005), pp. 148–159, June 2005

16. Sabena, D., Reorda, M.S., Sterpone, L.: On the development of software-based

self-test methods for VLIW processors. In: IEEE Symposium on Defect and Fault

Tolerance in VLSI and Nanotechnology Systems (DFT), pp. 25–30, October 2012

17. Sartor, A.L., Becker, P.H.E., Beck, A.C.S.: Simbah-FI: simulation-based hybrid

fault injector. In: 2017 VII Brazilian Symposium on Computing Systems Engineering (SBESC), pp. 94–101, November 2017

18. Sartor, A.L., Becker, P.H.E., Hoozemans, J., Wong, S., Beck, A.C.S.: Dynamic

trade-off among fault tolerance, energy consumption, and performance on a

multiple-issue VLIW processor. IEEE Trans. Multi-scale Comput. Syst. 55(99),

1 (2017)

19. Sartor, A.L., Lorenzon, A.F., Carro, L., Kastensmidt, F., Wong, S., Beck, A.C.S.:

A novel phase-based low overhead fault tolerance approach for VLIW processors.

In: Computer Society Annual Symposium on VLSI, pp. 485–490, July 2015

20. Sartor, A.L., Wong, S., Beck, A.C.S.: Adaptive ILP control to increase fault tolerance for VLIW processors. In: Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 9–16, July 2016

21. Sartor, A.L., Lorenzon, A.F., Carro, L., Kastensmidt, F., Wong, S., Beck, A.C.S.:

Exploiting idle hardware to provide low overhead fault tolerance for VLIW processors. J. Emerg. Technol. Comput. Syst. 13(2), 13:1–13:21 (2017)

22. Scott, J., et al.: Designing the low-power m* core architecture. In: IEEE Power

Driven Microarchitecture Workshop. Citeseer (1998)

23. Shye, A., Moseley, T., Reddi, V.J., Blomstedt, J., Connors, D.A.: Using processlevel redundancy to exploit multiple cores for transient fault tolerance. In:

IEEE/IFIP Conference on Dependable Systems and Networks, pp. 297–306, June

2007

24. Sterpone, L., Sabena, D., Campagna, S., Reorda, M.S.: Fault injection analysis of

transient faults in clustered VLIW processors. In: IEEE Symposium on Design and

Diagnostics of Electronic Circuits and Systems, pp. 207–212, April 2011

25. Tambara, L.A., Kastensmidt, F.L., Azambuja, J.R., Chielle, E., Almeida, F.,

Nazar, G., Rech, P., Frost, C., Lubaszewski, M.S.: Evaluating the effectiveness

of a diversity TMR scheme under neutrons. In: European Conference on Radiation

and its Effects on Components and Systems (RADECS), pp. 1–5, September 2013

26. Wang, Z., Yang, L., Chattopadhyay, A.: Architectural reliability estimation using

design diversity. In: Symposium on Quality Electronic Design, pp. 112–117, March

2015

27. Wong, S., van As, T., Brown, G.: ρ-VEX: A reconfigurable and extensible softcore

VLIW processor. In: Conference on Field-Programmable Technology, pp. 369–372,

December 2008



Analyzing AXI Streaming Interface

for Hardware Acceleration in AP-SoC

Under Soft Errors

Fabio Benevenuti(B)



and Fernanda Lima Kastensmidt



Instituto de Inform´

atica – PGMICRO,

Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil

{fbenevenuti,fglima}@inf.ufrgs.br

http://www.ufrgs.br/pgmicro



Abstract. The focus of this work lies on Xilinx’s SRAM-based FPGAs

and All Programmable System-on-Chip (AP-SoC) devices that combines

FPGAs and ARM processors having the AMBA Advanced eXtensible

Interface (AXI) as one of its main interfaces. The use of commercial offthe-shelf SRAM-based FPGA devices integrating multi-core processors

and custom IP blocks through general purpose interfaces can help in

coping with performance requirements and time-to-market constraints.

On the other hand, when considering its application in critical cyberphysical systems, there are reliability issues that must be dealt with.

SRAM-based FPGAs are susceptible to soft-errors causing persistent

changes on configuration memory that will accumulate until reconfiguration is performed. Mitigation techniques inside the user-designed IP

blocks allow to delay this reconfiguration but choices on the interface

between those blocks have impact on the effectiveness of the mitigations. This work consisted in evaluating an IP block generated by Xilinx’s

High-level Synthesis (HLS) tools and designed to use the AXI Streaming interface available. The results obtained from fault injection allowed

to evaluate separately the reliability of the IP block core and the IP

block AXI interface showing that, in this case, the IP block interface can

undermine the efforts placed in the IP core hardening.

Keywords: Streaming interface · FPGA

Fault injection · High-level synthesis



1



· Reliability



Introduction



As remarked in [1], state-of-art commercial off-the-shelf (COTS) SRAM-based

FPGA devices have been used to leverage high-performance circuits in complex

systems and heterogeneous hardware designs. A special case of those COTS

devices is the Xilinx’s All Programmable System-on-chip (AP-SoC) that profits

from general purpose processing system (PS) combined with the custom programmable logic (PL).

c Springer International Publishing AG, part of Springer Nature 2018

N. Voros et al. (Eds.): ARC 2018, LNCS 10824, pp. 243–254, 2018.

https://doi.org/10.1007/978-3-319-78890-6_20



244



F. Benevenuti and F. L. Kastensmidt



Being a COTS device, however, it brings some limitations on the choice

of communication interfaces between the embedded processor and the programmable logic. In this sense, this work analyzes the case of Xilinx 7 Series

FPGAs and AP-SoCs, such as the Xilinx Zynq-7000 which is based on a dualcore ARM Cortex-A9, and, ultimately, have the ARM Advanced Microcontroller

Bus Architecture (AMBA) Advanced eXtensible Interface (AXI) as one of its

major communication interface.

Further, in a scenario of always shortening time-to-market, we consider the

case of new development methodologies such as high-level synthesis (HLS) and

the use of general purpose COTS reusable blocks for AXI bus interconnects

and direct memory access (DMA), provided by the AP-SoC manufacturer to be

implemented on the programmable logic, supporting the interconnection between

the user designed custom IP blocks and the processing system.

The manufacturer provides a series of soft IP blocks implementing different

aspects of the AXI interface to be used at the FPGA, including several types of

interconnects, arbiters, crossbars, protocol converters, stream routers and clock

rate converters. Some of these functions are also available as hard IP blocks at

the processor system in parts such as Zynq-7000.

The aspect of reliability is arisen when those high performance applications

on AP-SoC and its SRAM-based FPGA are also critical systems, as discussed

on the following sections.

The main novelty of this work is on decomposing the reliability of a HLS generated module in its processing core and its interface components allowing deeper

analysis of its reliability behavior. The case-study is a serial stream system composed of processor and accelerator module. The reliability behavior is analyzed

according to the number of accumulated upsets, performed by bitstream fault

injection and modeled by a Weibull distribution.

Results show the impact of architectural decisions about interface types and

better assessment of reliability improvements due to mitigation techniques implemented in C language level, previously partially masked by the interface component reliability.



2

2.1



Reliability in SRAM-Based FPGA

Soft Errors



High performance computing, aerospace and critical cyberphysical systems

(CPS) in general must consider the effect that glitches, electromagnetic interference and radiation can have on electronic components.

FPGAs are reconfigurable systems with its functionality programmed by the

user on a configuration memory. In SRAM-based FPGAs, specifically, this configuration memory is implemented as millions of SRAM cells. In the case of

combinational logic circuits, for instance, truth tables can be stored in look-up

tables (LUT) as part of the configuration memory of the FPGA. Several other

resources of the FPGA also make use of this configuration memory, including all

the interconnection point switches that route the signals throughout the FPGA

fabric.



AXI Streaming Interface in AP-SoC Under Soft Errors



245



Single event upsets (SEU), also called bit-flips, may change the content of the

SRAM configuration memory, changing, effectively, the used-defined combinational logic or any other relevant configuration causing errors, failures and occasionally severe losses. Single event upsets and single event transients (SET) also

have impact on other storage elements besides the FPGAs configuration memory,

such as flip-flops (FF) used to implement sequential logic, but in SRAM-based

FPGAs, due to the huge amount of configuration memory, its susceptibility is a

continuous research subject.

Also, when those upsets occur in user data flip-flops (FF) it may have a

temporary effect, limited on how often those registers will be captured or updated

and how far the erroneous data will propagate on the processing pipeline. On the

other hand, when those upsets occur on configuration memory cells, the effect

is persistent, cumulative and, unless some type of reprogramming is applied [2],

it will lead eventually to a functional failure.

However, a single upset alone does not necessarily create a soft error [3]

as, for each specific application of the SRAM-based FPGA, only a relatively

small portion of the millions of configuration memory bits will convey useful

user-programmed configuration. In general, it is expected that only 5% to 10%

of the upsets will cause a functional error [3], being characterized as critical

configuration bits [2].

Aside the critical bits, it is the persistence and accumulation of bit-flips

along the time that will lead to errors and failures. Thus, given a specific application of SRAM-based FPGA, it is useful to determine what is the reliability of

that application, in terms both of critical bit-flips that will lead to immediate

failure and accumulated bit-flips, which modules are more sensitive, which mitigation techniques can be applied and how long the FPGA reprogramming can

be delayed.

2.2



Fault Injection



We use fault injection on the SRAM-based FPGA configuration memory to

evaluate the overall reliability and also the relative reliability of different modules

or IP blocks implemented on the programmable logic.

Compared to radiation exposition experiments, this approach has a lower

cost and allows for a finer analysis of how each component contributes to the

reliability to the whole system and yet, when fault injection is applied to the

whole system, its results can be related to the radiation exposition experiments

as seen on Velazco et al. [4] and Tambara et al. [1].

The fault injection engine for Xilinx SRAM-based FPGA is described by

Tonfat et al. [5] and consists of instantiation of the Xilinx Internal Configuration Access Port (ICAPE2) hard IP and instrumentation of the FPGA with

additional design modules of the fault injection platform.

On Xilinx 7 Series FPGA and AP-SoC the configuration memory is segmented in frames of 3232 bits, or 101 words of 32 bits, grouped in rows, columns

and subcolumns. The frame of 3232 bits is the minimum unit to read or write

through the ICAPE2 configuration port.



246



F. Benevenuti and F. L. Kastensmidt



A same configuration frame may convey both configuration for DSP resources

or combinational logic look-up tables and other logic resources as configuration

for signal routing through interconnect point switches and multiplexers.

In this setup, injecting a fault consists of reading a frame, inverting a single

bit value in a given position and writing the frame back into the configuration memory. The frame address and bit position where a fault will be injected

is defined at the fault injection campaign control station which runs a script

according to the test plan.

When searching for critical bits, such script will scan sequentially and exhaustively the whole region of interest, injecting one fault, verifying the design proper

functioning, cleaning the fault injected and repeating all steps on the next position.

When evaluating the design reliability, the script injects faults randomly over

the region of interest, accumulating instead of cleaning each fault, and repeats

the process until a design malfunction is detected. All the accumulated faults

are then cleaned and a new sequence is started. This is repeated until a sufficient

number of failures is collected.

As the upsets on configuration memory are, by nature, persistent, it is not

required that faults be injected on every possible state or clock cycle of the whole

processing cycle of the design under evaluation. Since persistent faults will still

be present on the next and all the following processing cycles, it suffices that the

design under evaluation executes only a single complete processing cycle between

each fault injected.

The total number of faults accumulated, or bit-flips, is then equal to the number of design processing cycles. The duration of each design processing cycle, in

clock cycles or any other time unit, can be related to the exposition time and vulnerability factor while, together with the total number of faults accumulated and

number of errors detected, can be related to radiation fluence and known hardware device cross section. These relationships are further discussed by Velazco

et al. [4].



3

3.1



Hardware Accelerator Interface and Analysis

Methodology

Benchmark Application



This work explores as benchmark application an out-of-core accelerator for

matrix multiplication generated by high-level synthesis (HLS) from C language

source code, as presented by dos Santos et al. [6].

Matrix multiplication represents a good benchmark because it is based on

the multiply-accumulate (MAC) pattern that is present on several current applications of digital signal processing (DSP), pattern recognition, machine learning

and neural networks. Also it mixes both programmable logic and mathematical

resources (DSP) available at the FPGA and is a real opportunity to contribute

with processing power and acceleration to the processor system in scenarios of

reconfigurable computing.



AXI Streaming Interface in AP-SoC Under Soft Errors



247



This matrix multiplication out-of-core accelerator was originally conceived

using an AXI-S data streaming interface. This interface allows for chaining several user designed blocks in a processing pipeline. Since this architecture leads

to a serial subsystem whose reliability depends on the product of the reliability

of each component block, it motivates a deeper study on the reliability of each

component and how it compromises the whole subsystem reliability.

3.2



Interface Choice at High Level Synthesis Tool



The Xilinx’s HLS tool can generate different interface adapters for the hardware

IP blocks generated from C language code. Among the interface styles we find

the AXI bus and the AXI-S streaming. The choice of interface to be generated

also is done by the use of specific #pragma directives provided by the HLS tool.

The first interface to be evaluated in this work is the AXI-S streaming interface with data transfer assisted by an AXI direct memory access (DMA) IP, also

provided by Xilinx, as seen at dos Santos et al. [6] and simplified at Fig. 1. The

only difference between the unhardened and hardened data transfer is that in

the unhardened there is only one AXI DMA block while in the hardened there

are three.



Fig. 1. Data transfer interfaces (control signals omitted for clarity).



3.3



Design Hardening Approaches



The benchmark application adopted in this work consists of an out-of core matrix

multiplication accelerator written in C language from which an IP block is generated using Xilinx’s high-level synthesis (HLS) tool.

As this IP block is to be implemented in a SRAM-based FPGA it becomes

more susceptible to radiation-induced upsets as described previously.

With the use of mitigation techniques one can postpone the correction of

those persistent failures until the moment when it is safe or more economic to

correct. Carmichael [7] proposes the use of triple modular redundancy as such

mitigation technique.

In this work, we compare the unmitigated design with data transfer throughout a single input and a single output channel and a mitigated designs, for



248



F. Benevenuti and F. L. Kastensmidt



instance with the matrix multiplication core mitigated with TMR at the C language level and with redundant data transfer throughout triple input and triple

output AXI-S channels.

This approach presents both advantages and challenges. Since the high-level

specification is in C language, as advantage we have the possibility of applying

to the design several classes of mitigation techniques originally conceived to be

used on software. The study and reuse of such applicable techniques is one of the

motivations for hardening at high-level synthesis. Meanwhile, a great challenge

is how to add mitigations at the C language level that will be preserved by

high-level synthesis.

Much of mitigations based on error detection codes, redundancy and algorithm specific properties are prone to be removed from the final implemented

circuit through by the high-level synthesis optimizations and further HDL synthesis optimizations.

3.4



TMR at High Level Synthesis Tool



The use of TMR in matrix multiplication is discussed in more details by dos

Santos et al. [6] but we can summarize that in its implementation at the C

language level, as seen by the HLS tool, the modular redundancy and voting

are simply additional function calls as seen on Fig. 2. The difference between

the unhardened and the TMR mitigated matrix multiplication core is only the

triplication of the multiply and accumulate instruction and the presence of the

voter before transferring accumulated value to the result matrix.



void m x m t m r 3 c g p c o r e 3 c h ( ) {

#pragma HLS INLINE o f f

f o r ( i n t i =0; i
f o r ( i n t j =0; j
accum1 = accum2 = accum3 = 0 ;

f o r ( i n t k =0; k
accum1 += mat a1 [ i ] [ k ] ∗ mat b1 [ k ] [ j ] ;

accum2 += mat a2 [ i ] [ k ] ∗ mat b2 [ k ] [ j ] ;

accum3 += mat a3 [ i ] [ k ] ∗ mat b3 [ k ] [ j ] ;

}

mat c1 [ i ] [ j ]= v o t e r 1 ( accum1 , accum2 , accum3 ) ;

mat c2 [ i ] [ j ]= v o t e r 2 ( accum1 , accum2 , accum3 ) ;

mat c3 [ i ] [ j ]= v o t e r 3 ( accum1 , accum2 , accum3 ) ;

}

}

}



Fig. 2. Pseudo code for TMR matrix multiplication.



Special care must be taken in C language coding, including the use of specific

#pragma directives provided by the HLS tool, to avoid any undesired code simplification that would eliminate the modular redundancy. Special coding is also

required to avoid undesired simplification and merge at the hardware description



AXI Streaming Interface in AP-SoC Under Soft Errors



249



language (HDL) synthesis level that can be done by proper directives placed as

design constraints.

During the fault injection stage it was found that the design reliability is

highly sensitive to the design area, thus the simpler the better. So, to this work,

the C language code for the matrix multiplication core, seen on, is even simpler

than that presented by dos Santos et al. [6].

3.5



IP Block Decomposition for Fault Injection



At the current stage of this work the accelerator IP block depicted at Fig. 1 is

where the mitigation technique is being applied and is also the region of interest

for fault injection. This block is exactly the IP block as generated by the HLS

tool, which is further detailed at Fig. 3.



Fig. 3. Simplified internal view of the generated HLS IP block.



To evaluate separately the reliability of the matrix multiplication accelerator

core, which is mitigated by TMR, and the other parts of the IP block the HLS

was directed to keep the function body hierarchy and generate separate HDL

specifications for each C language function of interest. It was achieved by the

use of the #pragma HLS INLINE off directive as seen on Fig. 2.

Since each function of interest is a distinct HDL entity, when the IP block

is instantiated at the design those HDL entities can be placed arbitrarily at the

SRAM-based FPGA floorplan using HDL synthesis constraints.

3.6



Floorplanning and Experimental Procedure



The unhardened and the TMR mitigated IP blocks generated with the HLS tool

were implemented in a Xilinx 7 Series Artix-7 SRAM-based FPGA. This device

is fabricated in technology of 28 nm and reliability metrics, such as the static

cross section per bit for neutron upsets, are provided by its manufacturer [3].

Three physical blocks were defined on the device floorplan for placement of

the design components. The first one is the region of interest for fault injection,

shown at the left side of the floorplans at Fig. 4, the second one, shown at the



250



F. Benevenuti and F. L. Kastensmidt



right side of the floorplans at Fig. 4, will contain other parts of the IP block that

are kept safe away from fault injection and the third and last physical block, not

shown at Fig. 4, contains all the other components of the design not related to

the experiment.

While moving the modules on the device physical floorplan shall lead to a

different set of critical bits, it has no significant impact on the shape or scale of

the reliability curve. On the other hand, using the same physical block for fault

injection has the advantage of promoting that all modules are analyzed under

exactly the same fault intensity.

For fault injection, first only the matrix multiplication accelerator core, as

indicated at Fig. 3 and described at Fig. 2, is placed inside the fault injection

physical block (Fig. 4a) while other components are kept safe and a fault injection

campaign is executed. Then all the other components of the IP block, such as

AXI interface adapters and local memory, are placed at the fault injection while

the matrix multiplication accelerator core is kept safe (Fig. 4b). Finally all the

IP block components are placed at the fault injection physical block (Fig. 4c).



Fig. 4. Fault injection targets



Fault injection was executed with the help of the fault injection engine presented by Tonfat et al. [5] accumulating faults until a functional failure was

observed. Each fault injected consisted of a single bit flip at the configuration

memory of the SRAM-based FPGA. The fault injection control station randomized the locus of the fault uniformly over the configuration memory address



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

5 Combining Reliability, Energy Consumption, and Performance

Tải bản đầy đủ ngay(0 tr)

×