Tải bản đầy đủ - 0 (trang)

3 OCHP - Open Clearing House Protocol

Design of a Reconfigurable Parallel Nonlinear Boolean Function …

135

Grain-128

Filtering function

17

3

13

Grain-128

Feedback function

20

2

13

Trivium

Feedback function

5

2

4

pomaranch

Filtering function

6

3

9

Decim

Filtering function

14

2

92

In order to use the nonlinear Boolean function in stream ciphers, they must fit the

corresponding cryptographic characteristics. According to the analysis above, we can

summary some characteristics of nonlinear Boolean function as follows:

(1) When the number of variables is less, the number of AND terms is more. In

order to increase the complexity of nonlinear Boolean function, when the number of

variables is few, the expression is inevitable complex. This characteristic increases the

difficulty of code breaking and improves the security of cryptographic algorithms.

(2) When the number of variable is more, the number of AND terms is less. In the

operation of Boolean function, the calculation of high order AND terms is always the

bottleneck, in order to improve the speed of algorithm, on the basis of ensuring the

security, we can decrease the number of AND terms as far as possible.

(3) High order AND terms and low order AND terms have a relationship with the

inclusion. When the number of input variables is already determined, high order AND

terms and low order AND terms must have a inclusion relationship, in the process of

calculation, we can utilize the inclusion relationship designing the hardware to

improve the processing efficiency of algorithms.

3

Design of Reconfigurable Hardware of Nonlinear Boolean

Function

According to the calculation characteristics of nonlinear Boolean function

analyzed above, the reconfigurable nonlinear Boolean function of stream cipher

algorithms can be designed and realized with three parts: a kind of improved

ALM (Adaptive Logic Module) is used to realize the reconfigurable hardware of

low order AND terms which account for a large proportion; Tree-like network

structure is used to realize the reconfigurable design of high order AND terms and

output XOR network; The linear part of nonlinear Boolean function can be

accomplished parallel with nonlinear part. Among them the improved ALM

circuit is designed on the basis of characteristics (1) of Boolean function, the

136

S. Yang

Tree-like network is designed on the basis of characteristics (2) and (3) of

Boolean function. The reconfigurable hardware structure of nonlinear Boolean

function is shown as Fig. 1.

Configurabtion

of AND

Input data

Nonlinear part

...

...

High order AND

High order AND

&

Reconfiguration of

AND times

Reconfiguration

of network

...

...

...

...

Low order AND

Low order AND

&

Linear part

...

...

Reconfigurable XOR network

Output of Boolean function

Fig. 1. Reconfigurable hardware structure of nonlinear Boolean function

3.1

Reconfigurable Design of Low Order AND Terms

Through the statistical research on the public cryptographic algorithms, we find

that the times of AND terms are not more than 10 times in many stream cipher

algorithms, so how to design and realize the low order AND terms has realistic

significance.

For any expression of nonlinear Boolean function, the transformation from

arbitrary form of nonlinear Boolean function to standard algebraic form is a

complicated process by using programmable AND-OR array, for example, as to n

input random nonlinear Boolean function, if we transform it to standard algebraic

form, it needs to calculate 2n modulus, with the increase of n, the calculation will

be very complex, and the storage resources occupied by modulus will grow

exponentially. So we consider using LUT to realize the low order AND terms. For

LUT can realize any N input logic function, the time delay is small and each input

is logically equivalent, so it is advantageous to realize the mapping algorithms

and we just need to consider the requirements of input and output terminals.

However, the LUT is actually a memory, for N input LUT requires 2N storage

units. With the increase of the input, the scale of LUT increases exponentially and

area becomes larger. Therefore, in the actual design, we need to value the number

of LUT input and take a more reasonable value.

Design of a Reconfigurable Parallel Nonlinear Boolean Function …

137

Combined with the structure characteristics of nonlinear Boolean function in

stream cipher algorithms and idea of programmable logic module in the circuit of

FPGA, this paper proposes an improved ALM structure with 5 input variables

LUT to realize low order AND terms. The structure of improved ALM is shown

as Fig. 2.

b

c0

LUT0

0

1

LUT1

0

1

LUT2

0

1

LUT3

0

1

a

Input data

4 bit

4 bit

4 bit

4 bit

4 bit

4 bit

4 bit

4 bit

LUT4

0

1

LUT5

0

1

LUT6

0

1

LUT7

0

1

c1

d0

e0

0

1

0

1

F0(a,b,c0,d0,e0)

0

1

F1(a,b,c1,d1,e1)

0

1

0

1

0

1

d1

e1

Fig. 2. The structure of improved ALM

The improved ALM circuit designed in the paper can realize reconfigurable

nonlinear Boolean function with strong adaptation ability by changing the

configuration information. The reconstruct ability is as shown in Table 2.

Table 2.

Reconstruct ability of improved ALM circuit

Type of function

ALM_Config

Output of function

4 variables

c0 = 0

ALM_Dataout0=F40(a,b,d0,e0)

c1 = 1

ALM_Dataout1=F41(a,b,d1,e1)

c0=c0

ALM_Dataout0=F50(a,b,c0,d0,e0)

c1=c1

ALM_Dataout1=F51(a,b,c1,d1,e1)

5 variables

This structure has these reconfigurable characteristics:

138

S. Yang

(1) It can realize a Boolean function of any one of the five input variables, for

example ALM_Dataout0=F50(a,b,c0,d0,e0) or ALM_Dataout1=F51(a,b,c1,d1,e1).

The storage resources are monopolized by Boolean function.

(2) It can simultaneously achieve two Boolean functions of five input variables,

but the function needs to have two identical variables, and the other three

variables have the same expression, such as ALM_Dataout0=F50(a,b,c0,d0,e0) and

ALM_Dataout1=F51(a,b,c1,d1,e1). These two Boolean functions reuse the storage

unit.

(3) It can realize two Boolean functions of four input variables, through

choosing the corresponding terminal, the expression has some flexibility, such as

ALM_Dataout0=F40(a,b,d0,e0) and ALM_Dataout1=F41(a,b,d1,e1). Each Boolean

function monopolizes four LUT units.

(4) According to the requirements of algorithms, we can reconstruct

reconfigurable circuit with better adaptation ability by increasing the number of

LUT units and the steps of MUX.

For two Boolean functions of five variables with the same structure, the

realization of FPGA needs two 32 bit LUT units and 64 MUX units, while our

structure just needs one 32 bit LUT units and 38 MUX units, the area savings rate

reaches 50%, and the time delay has not changed. So our design has a good

applicability for the nonlinear Boolean function with few variables and high

repetition rate.

3.2

Reconfigurable Design of High Order AND Terms

Statistical analysis shows that realization of the high order AND terms are the

critical path and bottleneck problem in the nonlinear Boolean function. Through

the choice of configuration information, our design is to calculate the relationship

between the AND terms in advance, then we adopt tree like structure to generate

the high order AND terms based on the configuration information.

Design of a Reconfigurable Parallel Nonlinear Boolean Function …

Input data

Dn

Ā1ā

Configurabtion

of AND

0 1

Dn-1

0 1

Dn-1

0 1

&

0 1

&

&

Dn-2

...

139

D3

0 1

...

...

D2

0 1

D1

0 1

&

D0

0 1

&

&

&

Output data

Fig. 3. The structure of reconfigurable high order AND terms

The structure of reconfigurable high order AND terms is shown as Fig. 3. By

setting the data selector logic, the structure can accomplish any AND logic with

arbitrary variables, when the input data which may come from the state value of

shift register is not the effective variable in the AND logic, the data selector will

select constant “1” entering to the next level circuits under the control of

configuration information. Due to the constant “1” does not change the output of

AND logic, so it will not affect the transmission of effective variables down to the

next level circuits, then we can achieve any AND logic with arbitrary variables in

the shift register and complete the refactoring operation of AND logic in the

overall XOR logic. Through the control of configuration information, the

structure can reuse the logical resources and time delay, and finally achieve the

goal of improving the utilization ratio of resources and computing efficiency.

3.3

Reconfigurable Design of Output Network

To obtain the output of the final function operation, the reconfigurable output

network of nonlinear Boolean function is to XOR each AND terms, for different

algorithms, the number of the XOR terms is different, so through reconfigurable

design, we can improve the computing speed of nonlinear Boolean function based

on realization of the reconfigurable output network. It is assumed that the

nonlinear Boolean function has p XOR terms, in the traditional implementations

they set p as controller node and use the p-1 XOR gate cascade output, the overall

time delay of the output network is a level of AND gate and p-1 levels of XOR

gate, the logic resources of the design are p AND gates and p-1 XOR gates. With

140

S. Yang

the increase of the number of AND terms, the time delay will increase very

obviously.

Based on the analysis of the characteristics of the above implementations, this

paper proposes an optimized implementation method based on tree structure. As

shown in Fig. 4, it is assumed that the nonlinear Boolean function has p XOR

terms, the first level of tree structure has p/2 XOR terms, the second level has p/4

XOR terms, the n-th level has p/2n XOR terms, then the logic resources finally

are p AND gates and p-1 XOR gates, the output delay of the circuit is a level of

AND gate and log2p levels of XOR gate.

Output of AND

Configuration

of XOR

&

&

&

&

&

&

&

&

Output of XOR

Fig. 4. The structure of reconfigurable output network

Compared with the computing result of traditional implementation way, the

reconfigurable tree output network proposed in this paper can reduce the time

delay from p-1 levels of XOR gate to log2p levels of XOR gate under the constant

of the logic resources and configuration information, and the optimization effect

will be more obvious when the number of terms is more.

4

4.1

Performance and Analysis

Performance of This Design

Based on the analysis above, the prototype has been accomplished with RTL

description using Verilog language and synthesized by Quartus II 10.0 form

Altera Corporation, the prototype has been verified successfully, the result shows

that our design can realize the nonlinear Boolean function of random variables

and times in the 80 levels of cipher algorithms, Table 3 gives the clock frequency

and resource occupancy when the number of variables are 40, 60 and 80.

Design of a Reconfigurable Parallel Nonlinear Boolean Function …

141

Furthermore, our design has been synthesized under 0.18Pm CMOS process

using Synopsys Design Compiler to evaluate performance more accurately, the

performance result shows in Table 4.

Table 3.

The performance of reconfigurable nonlinear Boolean function based on FPGA

Device

EP2S180F1020I4

Table 4.

Maximum clock

variables

frequency

40

233 MHz

172

60

158 MHz

326

80

125 MHz

498

ALUT

The performance of reconfigurable nonlinear Boolean function based on ASIC

Number of

variables

4.2

Number of

Constraint

Area

Combinational

Non combinational

Delay

Slack

40

5 ns

228734

6896

3.22 ns

+0.87

60

5 ns

447468

10032

3.89 ns

+0.66

80

5 ns

603218

14783

4.02 ns

+0.36

Contrasts with Other Designs

Based on the synthesis result above, we make a comparison with the structure

of reconfigurable nonlinear Boolean function with the structure of CPLD and

FPGA which can realize the nonlinear Boolean function too, as to there are two

critical parameters including area and latency in the synthesis result, so we list the

area and latency of these three structures as shown in Fig. 5 and Fig. 6.

142

S. Yang

80bit

FPGA_NBF

60bit

CPLD_NBF

Our Design

40bit

0

200000

400000

600000

800000

1000000

1200000

Fig. 5. The area comparison with other designs

80bit

FPGA_NBF

60bit

CPLD_NBF

Our Design

40bit

0

1

2

3

4

5

6

7

Fig. 6. The latency comparison with other designs

The comparison result shows that when the number of variables is 40, the area

resources occupied of reconfigurable nonlinear Boolean function are 230

thousand gates, and the latency is 3.22 ns, which has been improved greatly

compared with other designs. Meanwhile, with the increase of the number of

variables, the advantages of our design are more obvious.

5

Conclusion

This paper presents a realization of high speed reconfigurable nonlinear

Boolean function, which can satisfy random level, arbitrary variables and any

forms of nonlinear function of stream cipher algorithms. In view of the low order

AND terms, the optimization scheme is proposed based on the implementation of

Design of a Reconfigurable Parallel Nonlinear Boolean Function …

143

LUT structure, which makes it more suitable for the structural characteristics of

the nonlinear function; In the light of high order AND terms, an optimization

scheme based on tree network is proposed; The final output network uses the tree

like structure to improve the computing speed. Synthesis, placement and routing

of reconfigurable design have accomplished on 018mm CMOS process.

Compared with other designs, the result proves our design has an obvious

advantage at the area and latency.

Acknowledgments. This work was supported in part by open project foundation of

State Key Laboratory of Cryptology; National Natural Science Foundation of China

(NSFC) under Grant No. 61202492, No. 61309022 and No. 61309008;

References

1. Barenghi A, Pelosi G, Terraneo F. Secure and efficient design of software block cipher

implementations on microcontrollers [J]. International Journal of Grid & Utility Computing,

2013, 4(2/3):110-118.

2. Chengyu Hu, Bo Yang, Pengtao Liu:Multi-keyword ranked searchable public-key

encryption. IJGUC 2015, 6(3/4): 221-231.

3. Tian H. A new strong multiple designated verifiers signature [J]. International Journal of

Grid & Utility Computing, 2012(3):1-11.

4. Yuriyama M, Kushida T. Integrated cloud computing environment with IT resources and

sensor devices[J]. International Journal of Space-Based and Situated Computing, 2011, 5(7):

11-14.

5. Iguchi N. Development of a self-study and testing function for NetPowerLab, an IP

networking practice system [J]. International Journal of Space-Based and Situated

Computing, 2014, 8(1): 22-25.

6. Xueyin Zhang, Zibin Dai, Wei Li, etc. Research on reconfigurable nonlinear Boolean

funcitons hardware structure targeted at stream cipher [C]. 2009 2nd International

Conference on Power Electronics and Intelligent Transportation System. 2009: 55-58.

7. Ji Xiangjun, Chen Xun, Dai Zibin etc. Design and Realization of an Implementation

hardware with Non-Linear Boolean Function [J]. Computer Application and Software, 2014,

31(7): 283-285.

Temporally Adaptive Co-operation

Schemes

Jakub Nalepa and Miroslaw Blocho

Abstract Selecting an appropriate co-operation scheme in parallel evolutionary algorithms is an important task and it should be undertaken with care. In

this paper, we introduce the temporally adaptive schemes, and apply them in

our parallel memetic algorithm for solving the vehicle routing problem with

time windows. The experimental results revealed that this approach allows

for retrieving better solutions in much shorter time compared with other cooperation schemes. The analysis is backed up with the statistical tests, which

gave the clear evidence that the results are important. We report one new

world’s best solution to the benchmark problem obtained using our adaptive

co-operation scheme.

Key words: Parallel algorithm; co-operation; memetic algorithm; VRPTW

1 Introduction

Solving rich vehicle routing problems (VRPs) is a vital research topic due

to their practical applications which include delivery of food, beverages and

parcels, bus routing, delivery of cash to ATM terminals, waste collection,

and many others. There exist a plethora of variants of rich VRPs reﬂecting

a wide range of real-life scheduling scenarios [6, 19]—they usually combine

multiple realistic constraints which are imposed on feasible solutions. Although exact algorithms retrieve the optimum routing schedules, they are

Jakub Nalepa

Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice,

Poland e-mail: jakub.nalepa@polsl.pl

Miroslaw Blocho

Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice,

Poland e-mail: blochom@gmail.com

© Springer International Publishing AG 2017

F. Xhafa et al. (eds.), Advances on P2P, Parallel, Grid, Cloud

and Internet Computing, Lecture Notes on Data Engineering

and Communications Technologies 1, DOI 10.1007/978-3-319-49109-7_14

145

146

J. Nalepa and M. Blocho

still very diﬃcult to exploit in practice, because of their unacceptable execution times for massively-large problems. Therefore, approximate algorithms

became the main stream of research and development—these approaches aim

at delivering high-quality (however not necessarily optimum) schedules in signiﬁcantly shorter time. In our recent work [14], we showed that our parallel

memetic algorithm (PMA–VRPTW)—a hybrid of a genetic algorithm and

some local reﬁnement procedures—elaborates very high-quality schedules for

the vehicle routing problem with time windows (VRPTW). Although PMA–

VRPTW was very eﬃcient, selecting the appropriate co-operation scheme

(deﬁning the co-operation topology, frequency and strategies to handle emigrants/immigrants) is extremely challenging and time-consuming—the improper selection can easily jeopardize the PMA–VRPTW capabilities.

1.1 Contribution

We propose two temporally adaptive co-operation schemes in PMA–VRPTW.

In these schemes, the master process samples several time points during

the execution, and monitors the search progress. Based on this analysis, the

scheme is dynamically updated to balance the exploration and exploitation

of the solution space, and to guide the search process as best as possible.

Our experiments performed on the well-known Gehring and Homberger’s

benchmark (in this work, we consider all 400-customer tests with wide time

windows, large truck capacities, and random positions of the customers, which

appeared very challenging [14]), revealed that the new temporally adaptive co-operation schemes allow for retrieving better solutions quickly (the

diﬀerences are statistically important), compared with other means of cooperations. We report one new world’s best solution elaborated using the

new scheme. It is worth mentioning that such temporally adaptive strategies

of establishing the desired co-operation schemes have not been intensively

studied in the literature so far, and they may become an immediate answer

to the problems which require the parallel processes to co-operate eﬃciently

to guide the search process towards high-quality solutions quickly.

1.2 Paper Structure

This paper is structured as follows. Section 2 describes the VRPTW. In Section 3, we review the state of the art on the VRPTW. PMA–VRPTW is

brieﬂy discussed in Section 4. In the same section, we present the temporally adaptive co-operation schemes, which are the main contribution of this

work. Section 5 contains the analysis of the experimental results. Section 6

concludes the paper and serves as the outlook to the future work.

## Advances on p2p, parallel, grid, cloud and internet computing proceedings of the 11th international conference on p2p, parallel, grid, cloud

## Same SIGNAL for some gate in two evaluations will imply the gate’s inputsin these two evaluations be same.

## 4 Realizing the “harmonious state” of standardized management of fire forces equipment procurement

## 2 ACRA Learning Strategies Scales (Acquisition, Coding, Retrieval andSupport)

Tài liệu liên quan

3 OCHP - Open Clearing House Protocol