Tải bản đầy đủ - 0 (trang)
10 The Future: System Design with Customizable Architectures, Software, and Tools

10 The Future: System Design with Customizable Architectures, Software, and Tools

Tải bản đầy đủ - 0trang

Application-Specific Customizable Embedded Systems



63



What characterizes this new breed of ASIPs in the embedded world is

that unlike their predecessors, these ASIPs are created not just to provide

flexibility through programmability, but in a large part, also to provide an

easier implementation alternative to ASICs for their respective application

domains. This trend is expected to grow significantly into other domains (and

sub-domains as evidenced by the networking and communication spaces) in

the near future.



Review Questions

[Q 1] What are the different ways to customize an embedded system?

[Q 2] Describe the methodologies to customize a single embedded CPU.

[Q 3] How do the CPU extension methodologies vary compared to instructionset customization?

[Q 4] What are the principles of the template-based custom instruction generation?

[Q 5] Describe the techniques to manage and reduce the complexity of the

design space for custom instruction generation.

[Q 6] Today, extending the base processor with custom units and generating complex instructions from primitive ones are handled efficiently by

research or commercial methodologies. What are the additional challenges in the MPSoC era and which are the issues that are more acute

for customizing heterogeneous multi-core systems?

[Q 7] Given the ASIP categorization of methodologies and techniques describe

the Tensilica’s design environment.



Bibliography

[1] 3DSP. http://www.3dsp.com.

[2] Federico Angiolini, Jianjiang Ceng, Rainer Leupers, Federico Ferrari, Cesare Ferri, and Luca Benini. An integrated open framework for heterogeneous MPSoC design space exploration. In DATE’06: Proceedings of the

conference on Design, Automation and Test in Europe, pages 1145–1150,

2006.



64



Multi-Core Embedded Systems



[3] Jeffrey M. Arnold. The architecture and development flow of the S5

software configurable processor. J. VLSI Signal Process. Syst., 47(1):3–

14, 2007.

[4] Arteris. http://www.arteris.com.

[5] John Bainbridge and Steve Furber. Chain: A delay-insensitive chip area

interconnect. IEEE Micro, 22(5):16–23, 2002.

[6] P. Banerjee, M. Haldar, A. Nayak, V. Kim, V. Saxena, S. Parkes,

D. Bagchi, S. Pal, N. Tripathi, D. Zaretsky, R. Anderson, and J.R. Uribe.

Overview of a compiler for synthesizing MATLAB programs onto FPGAs.

Trans. on VLSI, 12(3):312–324, 2004.

[7] Francisco Barat, Rudy Lauwereins, and Geert Deconinck. Reconfigurable

instruction set processors from a hardware/software perspective. IEEE

Trans. Softw. Eng., 28(9):847–862, 2002.

[8] Souvik Basu and Rajat Moona. High level synthesis from Sim-nML processor models. In VLSID’03: Proceedings of the 16th International Conference on VLSI Design, pages 255–260. IEEE Computer Society, 2003.

[9] Partha Biswas, Nikil Dutt, Paolo Ienne, and Laura Pozzi. Automatic

identification of application-specific functional units with architecturally

visible storage. In DATE’06: Proceedings of the conference on Design,

Automation and Test in Europe, pages 212–217, 2006.

[10] P. Bonzini and L. Pozzi. A retargetable framework for automated discovery of custom instructions. In ASAP’07: Application Specific Systems,

Architectures and Processors, pages 334–341. IEEE, 2007.

[11] Lakshmi N. Chakrapani, John Gyllenhaal, Wen-mei W. Hwu, Scott A.

Mahlke, Krishna V. Palem, and Rodric M. Rabbah. Trimaran: An infrastructure for research. In Instruction-Level Parallelism. Lecture Notes

in Computer Science, 2004.

[12] Karam S. Chatha and Ranga Vemuri. MAGELLAN: multiway hardwaresoftware partitioning and scheduling for latency minimization of hierarchical control-dataflow task graphs. In CODES’01: Proceedings of the

Ninth International Symposium on Hardware/Software Codesign, pages

42–47. ACM, 2001.

[13] Nathan Clark, Amir Hormati, Scott Mahlke, and Sami Yehia. Scalable

subgraph mapping for acyclic computation accelerators. In CASES ’06:

Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pages 147–157. ACM, 2006.

[14] Nathan Clark and Hongtao Zhong. Automated custom instruction generation for domain-specific processor acceleration. IEEE Trans. Comput.,

54(10):1258–1270, 2005.



Application-Specific Customizable Embedded Systems



65



[15] Jason Cong, Yiping Fan, Guoling Han, Ashok Jagannathan, Glenn Reinman, and Zhiru Zhang. Instruction set extension with shadow registers for configurable processors. In FPGA’05: Proceedings of the 2005

ACM/SIGDA 13th International Symposium on Field-programmable

Gate Arrays, pages 99–106. ACM, 2005.

[16] Abhinav Das, Jiwei Lu, and Wei-Chung Hsu. Region monitoring for local

phase detection in dynamic optimization systems. In CGO’06: Proceedings of the International Symposium on Code Generation and Optimization, pages 124–134. IEEE Computer Society, 2006.

[17] Robert P. Dick and Niraj K. Jha. MOGAC: A multiobjective genetic algorithm for hardware-software cosynthesis of distributed embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17:920–935, 1998.

[18] R. Dimond, O. Mencer, and Wayne Luk. CUSTARD: a customisable

threaded FPGA soft processor and tools. International Conference on

Field Programmable Logic and Applications, 0:1–6, 2005.

[19] P. Eles, Zebo Peng, K. Kuchcinski, and A. Doboli. System level hardware/software partitioning based on simulated annealing and tabu search.

Des. Automat. Embedd. Syst., 2(1):5–32, 1997.

[20] A. Fauth, J. Van Praet, and M. Freericks. Describing instruction set

processors using nML. In Proceedings on the European Design and Test

Conference, pages 503–507, 1995.

[21] Carlo Galuzzi, Koen Bertels, and Stamatis Vassiliadis. A linear complexity algorithm for the generation of multiple input single output instructions of variable size. LNCS, Embedded Computer Systems: Architectures,

Modeling, and Simulation, 4599/2007:283–293, 2007.

[22] David Goodwin and Darin Petkov. Automatic generation of application

specific processors. In CASES’03: Proceedings of the 2003 International

Conference on Compilers, Architecture and Synthesis for Embedded Systems, pages 137–147. ACM, 2003.

[23] J. Grode, P. V. Knudsen, and J. Madsen. Hardware resource allocation

for hardware/software partitioning in the LYCOS system. In DATE’98:

Proceedings of the Conference on Design, Automation and Test in Europe,

pages 22–27. IEEE Computer Society, 1998.

[24] Sumit Gupta, Rajesh Kumar Gupta, Nikil D. Dutt, and Alexandru Nicolau. Coordinated parallelizing compiler optimizations and high-level synthesis. ACM Trans. Des. Autom. Electron. Syst., 9(4):441–470, 2004.

[25] C. Gyllenhaal, B.R. Rau, and W.W. Hwu. Hmdes version 2.0 specification. In Technical Report, IMPACT-96-3, The IMPACT Research Group.

Springer-Verlag, 1996.



66



Multi-Core Embedded Systems



[26] George Hadjiyiannis, Silvina Hanono, and Srinivas Devadas. ISDL: an

instruction set description language for retargetability. In DAC’97: Proceedings of the 34th Annual Conference on Design Automation, pages

299–302. ACM, 1997.

[27] Ashok Halambi, Peter Grun, Vijay Ganesh, Asheesh Khare, Nikil Dutt,

and Alex Nicolau. EXPRESSION: a language for architecture exploration

through compiler/simulator retargetability. In DATE’99: Proceedings of

the Conference on Design, Automation and Test in Europe, pages 485–

490. ACM, 1999.

[28] Scott Hauck, Thomas W. Fry, Matthew M. Hosler, and Jeffrey P. Kao.

The Chimaera reconfigurable functional unit. IEEE Trans. Very Large

Scale Integr. Syst., 12(2):206–217, 2004.

[29] Jă

org Henkel and Rolf Ernst. An approach to automated hardware/software partitioning using a flexible granularity that is driven by high-level

estimation techniques. Trans. on Very Large Scale Integration (VLSI)

Systems, 9(2):273–289, 2001.

[30] Jă

org Henkel and Yanbing Li. Avalanche: an environment for design space

exploration and optimization of low-power embedded systems. IEEE

Trans. Very Large Scale Integr. Syst., 10(4):454–468, 2002.

[31] Andreas Hoffmann, Tim Kogel, Achim Nohl, Braun Gunnar, Schliebusch

Oliver, Wahlen Oliver, Wieferink Andreas, and Meyr Heinrich. A novel

methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language. IEEE Transactions

on Computer-Aided Design of Integrated Circuits and Systems, 20:1338–

1354, 2001.

[32] Shiwen Hu, Madhavi Valluri, and Lizy Kurian John. Effective management of multiple configurable units using dynamic optimization. ACM

Trans. Archit. Code Optim., 3(4):477–501, 2006.

[33] Ing-Jer Huang and Ping-Huei Xie. Application of instruction analysis/scheduling techniques to resource allocation of superscalar processors.

IEEE Trans. Very Large Scale Integr. Syst., 10(1):44–54, 2002.

[34] Improv Systems Inc. http://www.improvsys.com.

[35] MIPS Technologies Inc. http://www.mips.com.

[36] Stretch Inc. http://www.stretchinc.com.

[37] ARC International. http://www.arc.com.

[38] Alex Jones, Debabrata Bagchi, Sartajit Pal, Prith Banerjee, and Alok

Choudhary. PACT HDL: a compiler targeting ASICs and FPGAs with

power and performance optimizations. Kluwer Academic Publishers, Norwell, MA, 2002.



Application-Specific Customizable Embedded Systems



67



[39] Alex Jones, Raymond Hoare, Dara Kusic, Gayatri Mehta, Josh Fazekas,

and John Foster. Reducing power while increasing performance with

SuperCISC. Trans. on Embedded Computing Sys., 5(3):658–686, 2006.

[40] Theo Kluter, Philip Brisk, Paolo Ienne, and Edoardo Charbon. Speculative DMA for architecturally visible storage in instruction set extensions.

In CODES/ISSS ’08: Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis,

pages 243–248. ACM, 2008.

[41] Shinsuke Kobayashi, Yoshinori Takeuchi, Akira Kitajima, and Masaharu

Imai. Compiler generation in PEAS-III: an ASIP development system.

In SCOPES’01: Workshop on Software and Compilers for Embedded Systems, 2001.

[42] C. Liem, T. May, and P. Paulin. Instruction-set matching and selection for

DSP and ASIP codegeneration. In European Design and Test Conference,

EDAC, European Conference on Design Automation, ETC European Test

Conference, pages 31–37. IEEE Computer Society, 1994.

[43] CoWare Inc. LISATek. http://www.coware.com.

[44] Jiwei Lu, Howard Chen, Pen-chung Yew, and Wei-chung Hsu. Design and

implementation of a lightweight dynamic optimization system. Journal

of Instruction-Level Parallelism, 6:2004, 2004.

[45] Roman Lysecky, Greg Stitt, and Frank Vahid. Warp processors. ACM

Trans. Des. Autom. Electron. Syst., 11(3):659–681, 2006.

[46] Prabhat Mishra, Mahesh Mamidipaka, and Nikil Dutt. Processormemory coexploration using an architecture description language. Trans.

on Embedded Computing Sys., 3(1):140–162, 2004.

[47] Rajat Moona. Processor models for retargetable tools. In Proceedings of

Eleventh IEEE International Workshop on Rapid Systems Prototyping,

pages 34–39, 2000.

[48] Fernando Moraes, Ney Calazans, Aline Mello, Leandro Mă

oller, and Luciano Ost. HERMES: an infrastructure for low area overhead packetswitching networks on chip. Integr. VLSI J., 38(1):69–93, 2004.

[49] Ralf Niemann and Peter Marwedel. An algorithm for hardware/software

partitioning using mixed integer linear programming. In Proceedings of

the Design Automation for Embedded Systems, pages 165–193. Kluwer

Academic Publishers, 1997.

[50] Hamid Noori, Farhad Mehdipour, Kazuaki Murakami, Koji Inoue, and

Morteza Saheb Zamani. An architecture framework for an adaptive extensible processor. The Journal of Supercomputing, 45(3):313–340, Sep.

2008.



68



Multi-Core Embedded Systems



[51] Pierre G. Paulin and Miguel Santana. FlexWare: A retargetable

embedded-software development environment. IEEE Des. Test, 19(4):59–

69, 2002.

[52] Zebo Peng and Krzysztof Kuchcinski. An algorithm for partitioning of

application specific systems. In Proceedings of the European Conference

on Design Automation (EDAC), pages 316–321, 1993.

[53] Altera Nios II Processor.

http://www.altera.com/products/ip/processors/nios2/ni2-index.html.

[54] Xilinx MicroBlaze Processor.

http://www.xilinx.com/products/design resources/proc central/microblaze.htm.

[55] G. Quan, X. Hu, and G. Greenwood. Preference-driven hierarchical hardware/software partitioning. In Proceedings of the IEEE/ACM International Conference on Computer Design, pages 652–658, 1999.

[56] Rahul Razdan, Karl S. Brace, and Michael D. Smith. PRISC software

acceleration techniques. In ICCS’94: Proceedings of the 1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors, pages 145–149. IEEE Computer Society, 1994.

[57] Robert Schreiber, Shail Aditya, Scott Mahlke, Vinod Kathail, B. Ramakrishna Rau, Darren Cronquist, and Mukund Sivaraman. PICO-NPA:

High-level synthesis of nonprogrammable hardware accelerators. J. VLSI

Signal Process. Syst., 31(2):127–142, 2002.

[58] Vinoo Srinivasan, Shankar Radhakrishnan, and Ranga Vemuri. Hardware

software partitioning with integrated hardware design space exploration.

In DATE’07: Proceedings of the Conference on Design, Automation and

Test in Europe, pages 28–35, 1998.

[59] S. Stergiou, F. Angiolini, S. Carta, L. Raffo, D. Bertozzi, and

G. De Micheli. XPipes Lite: a synthesis oriented design library for networks on chips. In Design, Automation and Test in Europe, 2005, volume 2, pages 1188–1193, 2005.

[60] Fei Sun, Srivaths Ravi, Anand Raghunathan, and Niraj K. Jha. Synthesis of application-specific heterogeneous multiprocessor architectures

using extensible processors. In VLSID’05: Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International

Conference on Embedded Systems Design, pages 551–556. IEEE Computer Society, 2005.

[61] Fei Sun, Srivaths Ravi, Anand Raghunathan, and Niraj K. Jha.

Application-specific heterogeneous multiprocessor synthesis using extensible processors. IEEE Trans. Comput., 25(9):1589–1602, 2006.



Application-Specific Customizable Embedded Systems



69



[62] Target Compiler Technologies. http://www.retarget.com.

[63] Tensilica. http://www.tensilica.com.

[64] Stamatis Vassiliadis, Stephan Wong, and Sorin Cotofana. The MOLEN

rho-mu-coded processor. In FPL’01: Proceedings of the 11th International

Conference on Field-Programmable Logic and Applications, pages 275–

285. Springer-Verlag, 2001.

[65] Girish Venkataramani, Tobias Bjerregaard, Tiberiu Chelcea, and Seth C.

Goldstein. Hardware compilation of application-specific memory access

interconnect. IEEE Transactions on Computer Aided Design of Integrated

Circuits and Systems, 25(5):756–771, 2006.

[66] Scott J. Weber, Matthew W. Moskewicz, Matthias Gries, Christian Sauer,

and Kurt Keutzer. Fast cycle-accurate simulation and instruction set generation for constraint-based descriptions of programmable architectures.

In CODES+ISSS’04: Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis,

pages 18–23. ACM, 2004.

[67] Lehrstuhl Informatik Xii, Steven Bashford, Ulrich Bieker, Berthold Harking, Rainer Leupers, Peter Marwedel, Andreas Neumann, and Dietmar

Voggenauer. The MIMOLA language, version 4.1, 1994.

[68] Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. Applicationspecific customization of soft processor microarchitecture. In FPGA’06:

Proceedings of the 2006 ACM/SIGDA 14th International Symposium on

Field Programmable Gate Arrays, pages 201–210. ACM, 2006.

[69] Pan Yu and Tulika Mitra. Characterizing embedded applications for

instruction-set extensible processors. In DAC’04: Proceedings of the 41st

Annual Conference on Design Automation, pages 723–728. ACM, 2004.

[70] Pan Yu and Tulika Mitra. Disjoint pattern enumeration for custom instructions identification. In FPL’07: Field Programmable Logic and Applications, pages 273–278. IEEE, 2007.

[71] Weifeng Zhang, Brad Calder, and Dean M. Tullsen. An event-driven

multithreaded dynamic optimization framework. In PACT’05: Proceedings of the 14th International Conference on Parallel Architectures and

Compilation Techniques, pages 87–98. IEEE Computer Society, 2005.

[72] Vladimir D. Zivkovic, Erwin de Kock, Pieter van der Wolf, and Ed Deprettere. Fast and accurate multiprocessor architecture exploration with

symbolic programs. In DATE’03: Proceedings of the Conference on Design, Automation and Test in Europe, page 10656. IEEE Computer Society, 2003.



3

Power Optimization in Multi-Core

System-on-Chip

Massimo Conti, Simone Orcioni, Giovanni Vece and Stefano Gigli

Universit`

a Politecnica delle Marche

Ancona, Italy

{m.conti, s.orcioni, g.vece, s.gigli}@univpm.it



CONTENTS

3.1

3.2



Introduction . . . . . . . . . . . . . . .

Low Power Design . . . . . . . . . . . .

3.2.1

Power Models . . . . . . . . . .

3.2.2

Power Analysis Tools . . . . .

3.3

PKtool . . . . . . . . . . . . . . . . . .

3.3.1

Basic Features . . . . . . . . .

3.3.2

Power Models . . . . . . . . . .

3.3.3

Augmented Signals . . . . . . .

3.3.4

Power States . . . . . . . . . .

3.3.5

Application Examples . . . . .

3.4

On-Chip Communication Architectures

3.5

NOCEXplore . . . . . . . . . . . . . . .

3.5.1

Analysis . . . . . . . . . . . . .

3.6

DPM and DVS in Multi-Core Systems .

3.7

Conclusions . . . . . . . . . . . . . . . .

Review Questions . . . . . . . . . . . . . . . . .

Bibliography . . . . . . . . . . . . . . . . . . .



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



72

74

75

80

82

82

83

84

85

86

87

90

91

95

100

101

102



71



72



Multi-Core Embedded Systems



3.1



Introduction



In recent years, due to the continuous development in the field of silicon technology, it is possible to implement complex electronic systems in a single

integrated circuit. Systems-on-chips (SoCs) have favored the explosion of the

market of electronic appliances: small mobile devices, which provide communications and information capabilities for consumer electronics and industrial

automation. These devices require complex electronic and high levels of system integration and need to be delivered in a very short time in order to meet

their market window.

The design complexity of these systems requires new design methodologies

and the development of a seamless design flow that integrates existing and

emerging tools. The International Technology Roadmap for Semiconductors

(ITRS) and MEDEA+ Roadmap evidence some key points that electronic

design automation companies must consider in order to deal with such design

complexity, among them:

• Intellectual Property Reuse

Intellectual property (IP) reuse is becoming critical for an efficient system development; the need to shorten the time to market is stimulating

reusability of both hardware and software. A good way to keep design

costs under control is to minimize the number of new designs that are

required each time a new SoC is developed: reuse existing design components where possible.

The development of reusable IPs requires:

– The development of standards, including general constraints and

guidelines, as well as executable specifications for intra- and intercompany IP exchange, such as SystemC, XML and UML

– The creation of parameterizable, qualified and validated IPs

– The use of hierarchical reuse methodology, allowing the reuse of

the IPs and of the testbenches at different levels of abstraction

Furthermore, the IP reuse methodology is indispensable when the design

of a system is developed in cooperation between different companies, or

when the design center is distributed all over the world and consequently

the project management is distributed.

A lot of work has been done on the development of standards for IP

qualification. The SPIRIT Consortium developed the IP-XACT specification to enable rapid, reliable deployment of IPs into advanced design

environments. The Virtual Socket Interface Alliance (VSIA) developed

the international standard QIP (Quality Intellectual Property) for measuring IP quality. OpenCores is the world’s largest community for development of open source hardware IPs.



Power Optimization in Multi-Core System-on-Chip



73



• Low Power Design

The continuous progress of micro and nano technologies led to a growing integration and clock frequency increment in electronics systems.

These combined effects led to an increase both in power density and

energy dissipation, with important consequences above all in portable

systems. Some design and technology issues related to power efficiency

are becoming crucial, in particular for power optimized cell libraries,

clock gating and clock trees optimization, and dynamic power management. Emphasis is now moving to architectural level (software energy

optimization), optimum memory hierarchy organization and run time

system management.

• System Level Design Methodologies and On-Chip Communication

The design of complex systems-on-chips and multi-core systems requires

the exploration of a large solution space. Current design approaches

start with low level models of components and interconnect them when

most architectural decisions have been fixed. Multi-core system design

methodologies perform architecture exploration at high level, taking into

account constraints at this level. Multi-core system design methodologies

must select:

– The global communication architecture, which may be multi-level

bus architecture, network-on-chip (NoC) architecture or mixed-bus

NoC

– Synchronous or asynchronous architectures for local and global

communication

– The partitioning of system specification and the allocation of components, such as software (real time operating system) or hardware

IPs to execute them

Transaction level modeling (TLM) [39] has been widely used to explore

the space solution at system level in a fast and efficient way.

• Design for Testability and Manufacturability

When the complexity increases the time spent in the verification and

validation increases much more than the time spent in the design, a

designer must consider, among other specifications, the simplification of

the test phase in prototyping and in production. Design methodologies

that take these aspects into account are:

– Formal verification

– Hierarchical specification and verification and reuse of test benches

at different levels of abstraction

– HW/SW co-verification



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

10 The Future: System Design with Customizable Architectures, Software, and Tools

Tải bản đầy đủ ngay(0 tr)

×