Tải bản đầy đủ - 0trang
6 Performance/Energy Relative to GPU
B. Liebig et al.
Table 6. Single FPGA kernel vs GK210 GPU
for one cell
[cells per second]
A on FPGA
A on GPU 322.3
B on FPGA
B on GPU
C on FPGA
C on GPU
D on FPGA
D on GPU
E on FPGA
E on GPU
expected for a throughput-architecture such as a GPU). However, the FPGA is
more energy eﬃcient (in terms of Joules per cell).
Conclusion and Future Work
We presented a new approach for hardware synthesis of larger CellML models
that oﬀers superior latency and energy eﬃciency compared to CPU and GPU.
Furthermore, our specialized HLS tool signiﬁcantly exceeds the quality-of-results
of a state-of-the-art industrial HLS system. The performance and size of the
accelerators created by our approach can be ﬂexibly scaled, achieving signiﬁcant
speed-ups in most cases even when dedicating just a quarter of a mid-size FPGA
to the accelerator circuit.
To extrapolate the power of our approach beyond the Virtex 7-class devices,
which were introduced in 2010, to current generation FPGAs, we have performed
an initial experiment compiling and mapping model A to a modern XCVU13P-3
UltraScale+ FPGA. We used a total 16 FP units and achieved an fmax of 316
MHz, which would yield a speed-up of 8.3x relative to the desktop class CPU
in single-accelerator performance. As each accelerator requires only 2.9% of that
FPGA’s area, an additional speed-up could be achieved by tiling accelerators, e.g.
8 accelerators implemented in parallel still reach 282 MHz. This huge potential
makes further research on reconﬁgurable computing for cell simulation highly
Improved HLS for Complex CellML Models
1. Cuellar, A.A., Lloyd, C.M., Nielsen, P.M.F., et al.: An overview of CellML 1.1, a
biological model description language. Simulation 79(12), 740–747 (2003)
2. Yu, T., Bradley, C., Sinnen, O.: ODoST: automatic hardware acceleration for
biomedical model integration. TRETS 9(4), 27:1–27:24 (2016)
3. Yu, T., Oppermann, J., Bradley, C., Sinnen, O.: Performance optimisation strategies for automatically generated FPGA accelerators for biomedical models. Concurrency Comput.: Practice Experience 28(5), 1480–1506 (2016)
4. Bradley, C., Bowery, A., Britten, R., et al.: OpenCMISS: a multi-physics & multiscale computational infrastructure for the VPH/Physiome project. Progress Biophys. Mol. Biol. 107(1), 32–47 (2011). Experimental and Computational Model
Interactions in Bio-Research: State of the Art
5. Faville, R.A., Pullan, A.J., Sanders, K.M., et al.: Biophysically based mathematical
modeling of interstitial cells of Cajal slow wave activity generated from a discrete
unitary potential basis (2009). CellML ﬁle: faville model 2008.cellml (Catherine
6. Miller, A.K., Marsh, J., Reeve, A., et al.: An overview of the CellML API and its
implementation. BMC Bioinform. 11, 178 (2010)
7. de Dinechin, F., Pasca, B.: Designing custom arithmetic data paths with FloPoCo.
IEEE Des. Test Comput. 28(4), 18–27 (2011)
8. Oppermann, J., Koch, A., Yu, T., Sinnen, O.: Domain-speciﬁc optimisation for the
high-level synthesis of CellML-based simulation accelerators. In: 25th International
Conference on Field Programmable Logic and Applications, FPL 2015, London,
United Kingdom, 2–4 September 2015, pp. 1–7. IEEE (2015)
9. Liebig, B., Koch, A.: High-level synthesis of resource-shared microarchitectures
from irregular complex c-code. In: 2016 International Conference on FieldProgrammable Technology (FPT), pp. 133–140. IEEE (2016)
10. Huthmann, J., Liebig, B., Oppermann, J., Koch, A.: Hardware/software cocompilation with the Nymble system. In: 8th International Workshop on Reconﬁgurable and Communication-Centric Systems-on-Chip, pp. 1–8. IEEE, July 2013
11. Huthmann, J., Mller, P., Stock, F., Hildenbrand, D., Koch, A.: Accelerating highlevel engineering computations by automatic compilation of geometric algebra to
hardware accelerators. In: 2010 International Conference on Embedded Computer
Systems: Architectures, Modeling and Simulation, pp. 216–222, July 2010
12. Thielmann, B., Huthmann, J., Koch, A.: Precore - a token-based speculation architecture for high-level language to hardware compilation. In: 2011 21st International
Conference on Field Programmable Logic and Applications, pp. 123–129. September 2011
13. Huthmann, J., Oppermann, J., Koch, A.: Automatic high-level synthesis of multithreaded hardware accelerators. In: 2014 24th International Conference on Field
Programmable Logic and Applications (FPL), pp. 1–4, September 2014
14. Nane, R., Sima, V.M., Pilato, C., et al.: A survey and evaluation of FPGA
high-level synthesis tools. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
35(10), 1591–1604 (2016)
15. Xilinx, Inc.: Vivado Design Suite User Guide - High-Level Synthesis (2012)
16. Fingeroﬀ, M., Bollaert, T.: High-Level Synthesis Blue Book. Mentor Graphics Corporation, Wilsonville (2010)
17. Pilato, C., Ferrandi, F.: Bambu: a modular framework for the high level synthesis
of memory-intensive applications. In: 2013 23rd International Conference on Field
Programmable Logic and Applications (FPL), pp. 1–4. IEEE (2013)
B. Liebig et al.
18. Nane, R., Sima, V.M., Olivier, B., et al.: DWARV 2.0: a CoSy-based C-to-VHDL
hardware compiler. In: 2012 22nd International Conference on Field Programmable
Logic and Applications (FPL), pp. 619–622. IEEE (2012)
19. Canis, A., Choi, J., Aldham, M., et al.: LegUp: high-level synthesis for FPGAbased processor/accelerator systems. In: Proceedings of International Symposium
on Field Programmable Gate Arrays (FPGA), pp. 33–36 (2011)
20. Lloyd, C.M., Lawson, J.R., Hunter, P.J., et al.: The CellML model repository.
Bioinformatics 24(18), 2122–2123 (2008)
21. Detrey, J., de Dinechin, F.: Parameterized ﬂoating-point logarithm and exponential
functions for FPGAs. Microprocess. Microsyst. Spec. Issue FPGA-based Reconﬁgurable Comput. 31(8), 537–545 (2007)
22. Grandi, E., Pasqualini, F.S., Bers, D.M.: A novel computational model of
the human ventricular action potential and Ca transient (2010). CellML ﬁle:
grandi pasqualini bers 2010 ﬂat.cellml (Geoﬀrey Nunns)
23. Hornberg, J.J., Binder, B., Bruggeman, F.J., et al.: Control of MAPK
signalling: from complexity to what really matters (2005). CellML ﬁle:
hornberg binder brugge-man schoeberl heinrich westerhoﬀ 2005.cellml (Catherine
24. Iyer, V., Hajjar, R.J., Armoundas, A.A.: Mechanisms of abnormal calcium homeostasis in mutations responsible for catecholaminergic polymorphic ventricular
tachycardia (2007). CellML ﬁle: iyer 2007 ss.cellml (Penny Noble)
25. Iyer, V., Mazhari, R., Winslow, R.L.: A computational model of
the human left-ventricular epicardial myocyte (2004). CellML ﬁle:
iyer mazhari winslow 2004.cellml (Steven Niederer)
An Intrusive Dynamic Reconﬁgurable
Cycle-Accurate Debugging System
for Embedded Processors
Habib ul Hasan Khan(&), Ahmed Kamal, and Diana Goehringer
Technische Universitaet Dresden (TUD), Dresden, Germany
Abstract. This paper presents a dynamic partial reconﬁgurable debugging
system for embedded processors based upon a device start and stop (DSAS)
approach . Using this approach, a cycle-accurate debugging system can be
dynamically conﬁgured to any embedded processor-based design at runtime.
The debugging system offers lossless debugging because the design is stopped
during data transfer to prevent the loss of data. The data can be transferred by
any available data communication interface such as Ethernet or UART and can
be viewed by open-source waveform viewers. The technique offers debugging
without the need to re-synthesize the design by using the dynamic partial
Keywords: FPGA Á Debugging Á Simulation Á Device start and stop
DSAS Á Device under test Á Dynamic partial reconﬁguration
The debugging process of current embedded designs is becoming cumbersome because
of increasing design complexities. It is revealed that 35 to 45% of the total development
effort is spent on veriﬁcation  and this fraction is likely to grow. Moreover, these
studies reveal that debugging constitutes about 60% of the total veriﬁcation efforts.
This is due to the fact that FPGA-based designs have lower visibility.
On-chip visibility normally can be enhanced by instrumentation of the design 
before implementation. These instruments called integrated logic analyzers (ILA) can be
used to save a predetermined window of a subset of signal data into memory blocks for
offline analysis. However, because of resource limitation, the signals have to be selected
before compilation. Hence a new set of data can only be observed after the circuit has been
recompiled, a process that can take hours . Moreover, such trace based embedded
design solutions operate mainly on the design before place and route (PAR). These tools
instrument the original user circuit with trace buffers and their connections before
mapping, making fewer resources available for the original design. Insertion of debug
circuitry can alter the place and route of the design and hence can prove hazardous for the
design in many ways such as the embedded design may no longer ﬁt in the FPGA device,
or timing issues may arise because of the debugging circuitry.
© Springer International Publishing AG, part of Springer Nature 2018
N. Voros et al. (Eds.): ARC 2018, LNCS 10824, pp. 433–445, 2018.
H. H. Khan et al.
With the advent of Dynamic Partial Reconﬁguration (DPR) , the time consuming
recompilation step can be avoided. Since reconﬁguration of an embedded design is
very fast compared to recompilation (tens of milliseconds versus minutes to hours), by
taking advantage of DPR, the debug-cycle can be sped up.
This paper presents a DPR-based debugging system for embedded processors using
a device start and stop (DSAS) approach. In this methodology, the debugging system is
present on the dynamic partition and is conﬁgured at runtime. Then the debugging
system starts and stops the Device Under Test (DUT) which is the static design and
saves the data to external memory without any debug window limitation hence providing a continuous, lossless stream of data without any limitation. Moreover, as the
debugging system is conﬁgured for the design under test through DPR at runtime
therefore re-compilation of the design is no longer required. Furthermore, the debugging data stored on the external devices can be viewed based on open source waveform
viewers like GTKwave.
The rest of the paper is organized as follows. Section 2 presents related work and
provides background information. Section 3 discusses the design methodology of the
proposed design. In Sect. 4 the results are discussed. The paper is concluded in Sect. 5.
2 Related Work
Commercial signal capture tools offered by the two major FPGA vendors: Xilinx’s
ChipScope Pro and Altera’s SignalTap II are based upon embedded logic analyzer IP
which is instantiated into the user-circuit during regular compilation. A device-neutral
product is offered by Synopsys as Identify, offering similar functionality. It is possible to
modify the trigger conditions at runtime, but not the signal sets. Hence changing the
signals under observation requires FPGA recompilation. Furthermore, instrumentation
is normally done after a failure is observed, hence requiring an iteration of the development process. Another tool called Certus by Mentor, allows pre-instrumentation of a
large set of interesting signals in the FPGA prior to compilation. Then, during debugging, a small subset of signals can be selected for observation. This may provide more
runtime flexibility to designers than in other tools, but it still requires a set of signals to
be preselected for observation before any information about possible bugs is available.
A design-level scan was proposed  to connect memory elements such as
Flip-Flops (FFs) and embedded RAMs in sequence by using the FPGA resources.
However, the main drawback of the technique is its high area overhead because FPGA
resources are used to implement the scan-chains in the design. In , the authors
proposed to pre-insert trace buffers into the design in advance, and then perform low
level bitstream modiﬁcation using incremental techniques for connecting the trace
buffers to the desired signals. However, this technique still requires pre-reservation of
FPGA resources, making them unavailable to the original design. Furthermore, once
the debugging process is complete, the trace buffers need to be removed which may
alter the place and route of the design.
A virtual overlay network was introduced in  which multiplexes the signals into
the trace buffers instantiated into the free FPGA resources to avoid unnecessary re-spins.
However, this technique requires spare resources which is not always the case.
An Intrusive Dynamic Reconﬁgurable Cycle-Accurate Debugging System
A framework called Dynamic Modular Development (DMD)  used the Xilinx Partial
reconﬁguration flow for accelerating the embedded design process by partitioning the
design modules into separate partially reconﬁgurable regions and automatically merging
embedded modules which are not required to be modiﬁed anymore into the surrounding
static region. Consequently, rapid turnaround times can be achieved by partitioning
frequently modiﬁed modules into separate partial reconﬁgurable regions .
A bitstream modiﬁcation technique was presented in  which allows to modify
the bitstream after PAR process. The embedded logic analyzer is instantiated to the
design prior to netlisting. The signals of interest can be connected to the embedded
design by changing the partial bitstream hence reducing the time spent in PAR process.
But, when the set of signals for tracing is changed, re-routing needs to be performed
which can signiﬁcantly affect the design’s time to market. Furthermore, logic analyzer
needs to be removed from the design after design validation which can affect design
response of the validated design. Software-like debug features such as watchpoints and
breakpoints to enhance debug capability in reconﬁgurable platforms was presented in
. But changing the watchpoints or breakpoints required recompilation of designs.
A new methodology based upon reconﬁgurability of FPGA was proposed 
which permits to monitor a large number of internal signals for an arbitrary number of
clock cycles by using only limited external pins and hence eliminating the need for
repeated iterations of the re-synthesis, placement and routing processes. A multiplexer
(MUX) is instantiated into the design with the MUX inputs being all the potential
signals required to trace. Different signals can then be selected by reconﬁguring the
bitstream for select signals of the MUX. The main disadvantage of this methodology is
that the contents of the registers need to be shifted within one clock cycle which greatly
affects the maximum frequency (Fmax) of the design.
A design-for-debug infrastructure namely distributed reconﬁgurable fabric was
proposed  whose components can be distributed widely in the FPGA and can
debug a large number of signals. The reconﬁgurable logic is programmed to implement
various debug paradigms, such as assertions, signal capture and what-if analysis which
can accelerate the debugging process. However, still the design needs to be synthesized
and implemented after placement of the debug architecture and also needs signiﬁcant
hardware resources. A programmable logic core based debugging system  comprising an access network was introduced which can be controlled by the PLC to select
the signals required to be debugged.
In some intrusive debugging works [16, 17], the clock of the embedded design was
controlled to get debugging data however these works required breakpoints to stop the
clock and hence system state, very close to the breakpoint, could be monitored. An
intrusive debug approach  based upon stopping the clock by monitoring the
occupancy of trace buffers was proposed. However, the approach needs a lot of scarce
FPGA resources (1 MB RAM), emulation hardware, and also requires external intervention for data handling. In our previous work , we introduced a debugging
solution which required only 4 KB RAM for saving the data hence even small FPGAs
can be equipped with the debugging system with automated data saving process.
However, the debugging system is required to be instantiated before synthesis and PAR
process which in some case require a lot of time.
H. H. Khan et al.
From the above discussion, it is evident that clock management in response to
memory occupancy can be used to get a continuous, cycle accurate stream of debugging data. The above methodology can be augmented with DPR to save the time spent
on the iterative process of synthesis and PAR.
The main contributions of the work are:
• An access network associated device start and stop approach for complete debugging of microprocessors.
• Using DPR technique to employ our debugging system as a reconﬁgurable module
to the embedded design on requirement basis to reduce the time spent on iterative
PAR process of traditional debugging solutions.
3 Debugging Methodology
In this section, we will describe our methodology for a dynamic partial reconﬁgurable
debugging system for embedded processors based upon a device start and stop (DSAS)
approach. In this methodology, the device under test (DUT) is the static partition and
the debugging system is conﬁgured as the dynamic reconﬁgurable part. The embedded
processor can keep on performing the desired task without the debugging system if not
required. However, once required, the debugging system is dynamically conﬁgured to
the design using partial bitstream, then it clocks the DUT present on the static partition
and performs data logging to the trace buffers. Once the trace buffers are full, the
debugging system stops the clock so that no data is lost and saves the data to external
memory during the intermediate period and once done it starts clocking the DUT again.
Hence providing a continuous, lossless, stream of data with effectively unlimited debug
window. Moreover, since the debugging system is installed to the design under test
(DUT) through a partial bitstream hence re-implementation of the design is not
required. Furthermore, the debugging data can be sent to the terminal using a UART or
Ethernet interface which is saved in a log ﬁle on the external devices can be used for
debugging based on open source waveform viewers like GTKWave. A block diagram
of the debugging methodology is shown in Fig. 1.
The main beneﬁts of the proposed technique are debugging of embedded processors due to no loss of debugging data, re-utilization of the same FPGA resources for
Signal set to be selected by the controller
May have thousands of
Device start/stop signal
Fig. 1. Debugging methodology
Data transfer either by
Ethernet or UART
An Intrusive Dynamic Reconﬁgurable Cycle-Accurate Debugging System
other applications thanks to DPR, no requirement of any speciﬁc data acquisition
interface (even a UART can be used) and no requirement of an external emulation
Furthermore, open-source waveform viewers are used subsequently removing the
dependency to use proprietary software. Hence, a cost-effective solution is presented.
Device Under Test (DUT)
The debugging solution is generic and can be used for any embedded design. However,
the methodology is ideally suited for complex embedded microprocessors where it is
difﬁcult to identify bugs in the absence of a continuous stream of lossless data. The
methodology has been validated by using two different embedded microprocessors as
DUT. The embedded processor is treated as a Blackbox and all the interfaces originating from the processor are monitored continuously hence providing a complete
picture of the embedded processor activities. The details of the two processors are
3.1.1. The ﬁrst embedded processor is Xilinx Microblaze . Microblaze is
debugged by connecting its interfaces to the debugging system. AXI interconnects can
also be connected. Microblaze has already been equipped with a special debugging port
(Trace port) which can provide debugging data including the status of the internal
The proposed debugging system can be connected to the Trace port for a continuous stream of data without any loss. The trace port also provides access to some inner
registers which are not available on other Microblaze interfaces. In order to debug
Microblaze by trace port, a debugging solution was provided by Lauterbach  which
required an external hardware needed to be connected to the trace port hence required
extra cost. By utilizing our proposed debugging system, any interfaces (not limited to
trace port) can be debugged without extra overhead cost.
3.1.2. We have chosen an embedded processor based upon RISC-V architecture to
highlight the usability of our proposed debugging system. The microprocessor (ORCA
developed by Vectorblox)  is an open source core based upon RV32IM architecture. Software compilation can be carried out through the available RISC-V cross
compiler toolchain. The core was chosen because of its low hardware utilization and
hence is suitable for small FPGAs . However, the core doesn’t have a debugging
solution and hence is hard to debug. The proposed debugging system can be used for
complete debugging of the core.
We used the black box approach for debugging of the microprocessor. Hence all
the exposed interfaces (including AXI interfaces) are connected to the debugging
system for monitoring. The microprocessor fetches the instructions from the memory
which are decoded and then executed. The execution of the instruction can result in
either saving the data to the memory or the data is used for processing the next
instruction. In the ﬁrst case, once the data is being written to the memory, the data can
be acquired by the debugging system. In the second case, data after processing will be
saved to the memory. Since in our methodology, there is a continuous lossless stream
of data, therefore, monitoring the interfaces results in debugging of the microprocessors. The internal registers can also be debugged by making them visible to the
H. H. Khan et al.
debugging system. One important feature of the processor is that it can be stalled by
writing to a Control and Status Register CSR (0x800).
When the trace buffer is full, the microprocessor needs to be halted so that the data can
be sent to the terminal without data loss. Halting the microprocessor is a necessity for
debugging the microprocessor at runtime because the data communication is not fast
enough to ensure the completeness of data. As already mentioned, the processor can be
stalled by writing to a speciﬁc register but we didn’t choose to stall the processor by
writing to the register but by managing the clock. In order to halt the processor, a
custom-made clock manager is developed which can stop the clocking of the embedded
processor once the connected trace buffers are full.
However, another solution is available for Xilinx FPGAs. The power down pin
available at the clocking wizard Xilinx IP  can also be used for stopping the clock.
Xilinx provided the power down function for power gating but the same function can
be used for debugging without the need to develop any custom made IP. However, if
the design contains any logic which gets resets upon the absence of clock, that speciﬁc
logic need to be removed. Otherwise, it will not be possible to get the continuous
stream of debugging data from the embedded processor (Fig. 2).
Buﬀer full signal
Fig. 2. Clock management
In order to have low resource utilization, our proposed debugging system has been
conﬁgured to debug 16 signals simultaneously. However, the embedded designs may
contain large number of nodes need to be debugged. In order to have provision for
connecting a large number of nodes, a concentration network can be used. A concentration network has more number of inputs than outputs. The controlling processor can
select any output set from the input nodes of the concentration network by just
changing the parameter of the concentration network by writing to a selection register
without the need to synthesize the block. A concentration network proposed in  can
be used to connect the DUT with the debugging system. The concentration network can
increase the observability of the embedded design at the expense of some logic
An Intrusive Dynamic Reconﬁgurable Cycle-Accurate Debugging System
Since the design was veriﬁed on the Zedboard, ARM processor has been used in the
design as the main controller for data transfer. However, an embedded processor can also
be used instead of the ARM processor to make the design independent of any speciﬁc
processor. Hence, the debugging approach remains valid not only for Xilinx Zynq SoCs
but also for other FPGA families without ARM processor. Furthermore, the data can be
transmitted by either an Ethernet interface or a UART (whichever is available). Data
transmission through Ethernet is faster than UART and hence it is preferred. However,
since the processor is not being clocked in either case, no data is lost.
The data is received in a log ﬁle in *.txt format. First, a de-multiplexing operation
has been performed and then the data has been converted to the Value Change Dump
(VCD). Since *.txt format is not directly convertible to VCD format, an application has
been created for this conversion so that the design can be monitored by any open
source HDL simulator like GTKWave.
Dynamic Partial Reconﬁguration
Dynamic Partial Reconﬁguration (DPR) is the ability to reconﬁgure a portion of the
FPGA at run-time while the rest of FPGA remains active . DPR offers the flexibility to change a part of the system’s hardware components to reconﬁgure it to another
mode of operation reusing the same hardware resources on the FPGA without halting
the rest of the system. In current research work, DPR is used to load the proposed
Debugging System (DS) to debug the embedded microprocessors at runtime without
the need to repeat the FPGA design flow to add the DS with DUT and re-implement the
whole system again on the FPGA. Furthermore, an added advantage is to reuse the
same hardware resources consumed by the DS for another Reconﬁgurable Modules
(RM) at runtime after the debugging phase is ended as shown in Fig. 3.
Fig. 3. Using DPR to load the Debugging System (DS)
Xilinx DPR design flow  is used for our proposed debugging system. The DPR
design flow requires the partitioning of the system into a static region and a Reconﬁgurable Region (RR). In our case, the static region is the DUT that will not change
during the runtime and the RR is allocated for the DS or any RM that will be conﬁgured at the same RR after the debugging phase is over. The Hardware Description
H. H. Khan et al.
Language (HDL) ﬁles of the different constituents of the DUT and debugging system
were used as input for DPR design flow. Floorplanning was carried out to ensure
efﬁcient utilization of the hardware resources. Time of reconﬁguration treconf : is the time
consumed to switch to a new operation mode. As treconf : depends on the size of the RR
(Fig. 3), the size of the RR should be optimized to host the largest RM.
In Fig. 3, the proposed reconﬁgurable system has three RMs (DS, Blank and
Reconﬁgurable Application (RA) for another application). The RR on the FPGA is
dynamically reconﬁgured with one of these RMs according to the time slot. A full
conﬁguration mode is the DUT with one of the RMs. The output of the DPR flow is a
set of partial bitstream ﬁles for each RM of the system and a set of full bitstream ﬁles
for each conﬁguration mode.
In the proposed DPR-based debugging system, it is possible to load other RMs for
another application to reuse the same allocated resources on the reconﬁgurable region
when the DS is not activated (Fig. 3). Therefore, routing or interconnections between
the DUT on the static region and DS or any other RMs on the RR should be changed
according to the mode of conﬁguration. Hence, to maintain the validation of data flow
between the DUT and the RM, a reconﬁgurable re-routing technique should be used as
shown in Fig. 4. In a previous work , a proposed re-routing technique is presented
to reconﬁgure the interconnections between the static region and RR for DPR design at
A Re-routing technique between the
Reconfigurable partition and the Static part
Fig. 4. Routing between the static and RR.
The proposed methodology has been tested on the Digilent Zedboard, which has an
XC7Z020-484 FPGA We used Xilinx Vivado 2017.1 for the design process carried out
on Intel Core i7-6700 CPU running at 3.4 GHz and having 16 GB of RAM. The time
taken by the design process when the debugging circuitry is synthesized as a reconﬁgurable module was about 23 min in comparison to the traditional flow without DPR
which took 17 min. It is evident that the difference in synthesis time between the two
methodologies is negligible. The main advantage of the presented methodology is the
capability of dynamic reconﬁguration. The DPR-based debugging system provides the
flexibility to load debugging circuitry to the DUT at runtime without the need to repeat