Tải bản đầy đủ - 0 (trang)
1 4 and 5 Cores: Critical Paths Evaluating (X1, X2, Z)

1 4 and 5 Cores: Critical Paths Evaluating (X1, X2, Z)

Tải bản đầy đủ - 0trang

642



3.2



B.-Y. Peng et al.



How Many Cores Are Required for the Fastest Performance?



The next problem in implementing the Co-Z approach is the resource requirement if we want to speed up to the extreme. How many rounds of big-integer

multiplications are required at least and thus how many cores are required?

By induction, we can see that to evaluate a monomial of degree n, the best

approach will take lg n rounds of multiplications. To estimate the requirement,

let us analyze formula (2). It is easy to see the degrees of U , V , X1 , X2 and

Z in formula (2) with respect to (X1 , X2 , Z, xP , a, 4b) are 2, 4, 8, 8 and 7,

respectively. It is obvious that evaluating U using one big-integer multiplication

is optimal. It is a good news that X1 = V × fX1 (·) where both V and fX1 (·)

are of degree 4. We may optimize the evaluation procedure of V and fX1 (·)

(with 2 rounds of big-integer multiplications) and then get the optimal flow to

evaluate X1 . The fact Z = U V Z = U Z × V makes the optimization procedure

of evaluating Z to depend also on that of evaluating V . X2 = U × fX2 (·) brings

a problem, as fX2 (·) is of degree 6. It is impossible to factor fX2 (·) as a product

of a quadratic polynomial and a quartic polynomial. A second choice is given by

fX2 = X22 (X22 − 2aZ × Z) + Z 2 [(aZ)2 − 8bX2 Z]



(3)



We can then evalaute X2 in 3 rounds. To evaluate V and fX1 (·), we observe

that

(4)

V = 4X2 × X22 + 4X2 Z × aZ + 4bZ × Z 2

fX1 = 2X1 X2 (X1 + X2 ) + aZ × 2Z(X1 + X2 ) + 4bZZ 2 − xP ZU



(5)



Now we can build a 12-Montgomery-core system to perform Algorithm 1, and

the key schedule how to use the Montgomery cores is given as Table 2.

Table 2. Scheduling for Algorithm 1 with 12 cores

R# List of multiplication

1



U , X22 , aZ, Z 2 , X2 Z, 4bZ, xP Z, (X1 + X2 )Z, X1 X2



2



M1 = U · X22 , M2 = aZ · Z, M3 = U · Z 2 , M4 = (aZ)2 ,

M5 = 4b · X2 Z, M6 = X2 · X22 , M7 = aZ · X2 Z, M8 = 4bZ · Z 2 ,

M9 = U ·Z, M10 = (X1 +X2 )Z ·aZ, M11 = (X1 +X2 )X1 X2 , M12 = U ·xP Z

fX1 = 2M11 + 2M10 + M8 − M12 , V = 4M6 + 4M7 + M8



3



4



X1 = V · fX1 , Z = M9 · V , X2 = M1 · (X22 − 2M2 ) + M3 · (M4 − 2M5 )



Implementation and Results



We show our result with 5- and 12-Montgomery-core systems. For the 5-core

system, the maximum bit sizes are scalable and we provide the results for 264bit (for 256-bit fields) and for 528-bit (for 521-bit or 512-bit fields) operations in

ECC. The Montgomery reduction cores, standing for the big integer multipliers,



MC FPGA Implementation of ECC with Homogeneous Co-Z Representation



643



are of base d = 28 . The detailed design for the Montgomery reduction cores is

shown in the full version. A remark is given here that additional BRAM blocks

(named as xP −1 P pools) are allocated in order to restore pre-evaluated values

that are often used in the Montgomery cores.



Fig. 2. The proposed block diagram



Figure 2 illustrates the hardware architecture of a multi-multiplication-core

system. Each Montgomery multiplier will get two inputs (A, B) and generate one

output R. When there are 2 or more Montgomery multipliers, a typical choice to

use a MUX/deMUX to collect the outputs of the Montgomery multipliers, and to

dispatch the value in the memory to the specified inputs of the multipliers. This

approach will cost more cycles on the MUX/deMUX. A finite state machine or a

controller handles the addresses for the memory pool. There are paths from the

input of the whole system to the write-data buses, and the controller can assign

some pre-defined direct values to the write-data buses. The data buses do not

bother the controller directly, but there are some cases in which we need to check

if the outputs of the large number arithmetic units become 0. One comparator

to zero, whose comparison result is a flag for the controller, is installed from the

output bus of each large number arithmetic unit.

In this work, Xilinx R Zynq-7000TM All Programmable SoC (APSoC) on

Xilinx R ZC706 Evaluation Kit is adopted for the 5-core system and the 12-core

system. We also show our result about the resource requirement for multiple

3-core and 5-core ECC engines in one system on ZC706 board, which implies

that to build multiple 3-core engines in one system is better if there are sufficient

resources to build a 12-core system. The DSP slices are not needed — they were

going to be used for multimedia purpose specified by the original client.

In our system, the functions of ECC operations that are often used include:

1. Re-configurable parameters of a, b, p, q = |E|, and the base point G.

2. Scalar multiplication with the scalar k and the element P in E.

3. Group point addition of elements P and Q in the elliptic group E. The classical

approach by Cohen et al. [9] is applied.

4. Big-number MUL/ADD/SUB opperations modulo the group order q = |E|.

5. find the big-number inverse modulo q = |E|. Montgomery inversion [4] is not

in our hardware. Due to its simple state machine, add it should be easy.



644



B.-Y. Peng et al.



Table 3. Resource used for 5- and 12-Montgomery-core systems on ZC706 Kit. n in

Sm,n implies the maximum n-bit compatibility m-core design. No DSP slices are used

Module



S5,264

fmax = 83.33 MHz

Slice Slice 18 Kb

LUT6 Reg BRAM



S5,528

fmax = 62.50 MHz

Slice Slice 18 Kb

LUT6 Reg

BRAM



Mont. mul. (each)



2080



280



0



3455



xP −1 P pool



1635



559 40



3226



103



0



1288



Diff. adder



247



545



S12,264

fmax = 45 MHz

Slice Slice 18 Kb

LUT6 Reg BRAM



0



2052



280



0



1096 75



1339



559



96



165



0



103



0



5132



(X, Y, Z) recovery



1395



76



0



1968



76



0



1109



74



0



Other FSM



1630



932



0



2306



1727



0



1613



932



0



Memory pool



7662 2673



8



19554



18162 6366



8



Misc. modules



3973 1786



0



656



2362 1987



0



Total



4.1



26941 7529 48



5324 15

3412



0



46269 14458 90



54337 1344 104



5-Montgomery-Core System



The 5-Montgomery-core system is implemented on ZC706 Evaluation Kit, on

which a Z-7045 APSoC equivalent to a Kintex R -7 FPGA is used. There are

218600 LUTs and a Dual ARM R CortexTM -A9 MPCoreTM processor on this

APSoC [7], where the protocols (such as ECDH or ECDSA) is implemented on

the ARM processor. A parameter setting the maximum compatible bit-size is

configured in the 5-core system in our design. Here a 264-bit version and a 528bit version are synthesized and tested with NIST curves, Brainpool curve P512

r1 [14], and SEC P256 k1 curve [15] (a.k.a. the Bitcoin curve) are tested in both

of the systems. The resource requirements and time performances of the 264-bit

version and the 528-bit version are given as Tables 3 and 4.

Table 4. Performance of Q = k P in various {5, 12}-core systems on ZC706 Kit

Elliptic curve



S5,264 @ 83.33 MHz S5,528 @ 62.50 MHz S12,264 @ 45 MHz

Cycles Time (ms) Cycles Time (ms) Cycles Time (ms)



NIST P224



95657



NIST P256



1.148



133085 2.129



130513 2.900



109001 1.308



152429 2.439



148721 3.305



SEC P256 k1 (BitCoin) 109001 1.308



152429 2.439



148721 3.305



NIST P384



-



-



226432 3.623



-



-



Brainpool P512 r1



-



-



301421 4.823



-



-



NIST P521



-



-



306659 4.907



-



-



MC FPGA Implementation of ECC with Homogeneous Co-Z Representation



645



It should be noticed that there are two similar but different sorts of LUTs,

so the LUT count of each module only implies the size scale of the module, and

varies a little if the module is placed with different floor plans.

4.2



12-Montgomery-Core System



A 12-Montgomery-core system is implemented in our design to show the scalability of customized number of cores. However, we found that we can only

implement a 12-core system with a maximum 264-bit size. 528-bit version can

be synthesized, but will face a routing procedure failure due to routes too congested. Tables 3 and 4 show the test results.

MUX/deMUX problem on the memory pool will be more severe in the 12core system, and many Montgomery cores will be frequently useless during the

computation. It is not practical to use a 12-core system as one ECC engine.

4.3



3-Core Vs 5-Core



Our ECC engine is designed as a custom IP to provide the hardware support of

the ARM processor in Zynq-7000. A reasonable idea for the hardware/software

co-design is to provide multiple ECC engines in the embedded system. We have

run the implementation process to test how many ECC engines with our design

can be put in the same system in ZC706 kit. The resource requirement of the

multi-ECC-engine system is shown in Table 5.

Table 5. Resource usage and effectiveness of scalar multiplication for multi-ECCengine systems on ZC706 Kit. f = 40 MHz and NIST P521 curve applied for Sn,528 and

NIST P256 curve applied for Sn,264 . Complete results can be found in the full version

ECC engine Count Average LUT count System LUT count blocks/(s × kLUT)

S3,528

7.913 ms



4

5



38585

159788

Fail (routes too congested)



3.163



S5,528

7.666 ms



3

4



51797

165830

Fail (more than 218600)



2.360



S3,264

2.632 ms



10

11



19596

210083

Fail (more than 218600)



18.085



S5,264

2.725 ms



6

7



26930

190417

Fail (more than 218600)



11.563



S12,264

3.718 ms



2

3



54332

112070

Fail (partial conflict)



4.7999



We may use the throughput-resource ratio blocks/(s × kLUT) to evaluate the

effectiveness of the system we have built. The bigger the ratio is, the more effective the system is. In a 5-core ECC engine there are sometimes some multipliers



646



B.-Y. Peng et al.



running dummy operations, so we can see the throughput-resource ratio is much

lower. It is more effective to build 3-core engines in the system. Also we can see

that a 12-core engine system is not effective.



5



Conclusion and Future Work



We have shown the power and the limitation of multiple big-integer multiplication cores on the implementation of the Co-Z ladders for ECC. The numbers

suggest that a 3-Montgomery-core system achieve the best throughput-resource

ratio. We have also shown that it is possible to build a fast Montgomery ladder

using the Co-Z approach with a 12-Montgomery-core system.

The system in our design can be improved in several ways. For the design

of the block memory restoring the large numbers, the MUX/deMUX approach

may be changed. LaForest et al. [18–20] provide the solution in saving the clock

cycles reading and writing data from or into the memory, with the cost being

duplicated block memory modules used. Also the design of the controller can be

improved. The total finite state machine which constructs the controller is huge.

The controller controls the input and the output flows for all of the multipliers.

It is possible to re-design the controller as several controllers, each of which

controls only one multiplier.

Full version http://precision.moscito.org/by-publ/recent/CoZ-long.pdf.



References

1. Koblitz, N.: Ellptic curve cryptosystems. Math. Comput. 48(177), 203–209 (1987)

2. Miller, V.S.: Use of elliptic curves in cryptography. In: Williams, H.C. (ed.)

CRYPTO 1985. LNCS, vol. 218, pp. 417–426. Springer, Heidelberg (1986). doi:10.

1007/3-540-39799-X 31

3. Bernstein, D.J., Birkner, P., Joye, M., Lange, T., Peters, C.: Twisted edwards

curves. In: Vaudenay, S. (ed.) AFRICACRYPT 2008. LNCS, vol. 5023, pp. 389–

405. Springer, Heidelberg (2008). doi:10.1007/978-3-540-68164-9 26

4. Peter, L.: Montgomery: speeding the pollard and elliptic curve methods of factorization. Math. Comput. 48(177), 243–264 (1987)

5. Peter, L.: Montgomery: modular multiplication without trial division. Math.

Comput. 44(170), 519–521 (1985)

6. Land, I., Kenny, R., Brown, L., Pelt, R.: Shifting from software to hardware for

network security, White Paper. Altera, February 2016. https://www.altera.com/

content/dam/altera-www/global/en US/pdfs/literature/wp/wp-01261-shiftingfrom-software-to-hardware-for-network-security.pdf

7. Zynq-7000 All Programmable SoCs Product Tables and Product Selection Guide.

Xilinx (2015). http://www.xilinx.com/support/documentation/selection-guides/

zynq-7000-product-selection-guide.pdf

8. Hutter, M., Joye, M., Sierra, Y.: Memory-constrained implementations of elliptic

curve cryptography in Co-Z coordinate representation. In: Nitaj, A., Pointcheval,

D. (eds.) AFRICACRYPT 2011. LNCS, vol. 6737, pp. 170–187. Springer,

Heidelberg (2011). doi:10.1007/978-3-642-21969-6 11



MC FPGA Implementation of ECC with Homogeneous Co-Z Representation



647



9. Cohen, H., Miyaji, A., Ono, T.: Efficient elliptic curve exponentiation using mixed

coordinates. In: Ohta, K., Pei, D. (eds.) ASIACRYPT 1998. LNCS, vol. 1514, pp.

51–65. Springer, Heidelberg (1998). doi:10.1007/3-540-49649-1 6

10. Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS,

and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp.

104–113. Springer, Heidelberg (1996). doi:10.1007/3-540-68697-5 9

11. Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Ko¸c, C

¸ .K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302.

Springer, Heidelberg (1999)

12. Bernstein, D.J., Lange, T.: Explicit-Formulas Database. https://hyperelliptic.org/

EFD/

13. National Institute of Standards and Technology: Digital Signature Standard. FIPS

Publication 186-2, February 2000

14. Brainpool, E.C.C.: ECC brainpool standard curves and curve generation. http://

www.ecc-brainpool.org/download/Domain-parameters.pdf

15. Research, C.: SEC 2: Recommended Elliptic Curve Domain Parameters (2000)

16. Kwok, Y.-K., Ahmad, I.: Static scheduling algorithms for allocating directed task

graphs to multiprocessors. J. ACM CSUR 31(4), 406–471 (1999)

17. Pedro, M.C., Massolino, L.B., Chaves, R., Mentens, N.: Low Power Montgomery

Modular Multiplication on Reconfigurable Systems, Crypto ePrint 2016/280

18. LaForest, C.E., Gregory Steffan, J.: Efficient multi-ported memories for FPGAs.

In: Proceedings of the ACM(SIGDA) FPGA, pp. 41–50 (2010)

19. Laforest, C.E., Liu, M.G., Rapati, E.R., Steffan, J.G.: Multi-ported memories for

FPGAs via XOR. In: Proceedings of the ACM FPGA, pp. 209–218 (2012)

20. Laforest, C.E., Li, Z., O’rourke, T., Liu, M.G., Steffan, J.G.: Composing multiported memories on FPGAs. J. ACM Trans. Reconfig. Technol. Syst. 7(3) (2014).

Article 16



Network Security, Privacy, and

Authentication



DNSSEC Misconfigurations in Popular Domains

Tianxiang Dai(B) , Haya Shulman, and Michael Waidner

Fraunhofer Institute for Secure Information Technology SIT, Darmstadt, Germany

{tianxiang.dai,haya.shulman,michael.waidner}@sit.fraunhofer.de



Abstract. DNSSEC was designed to protect the Domain Name System (DNS) against DNS cache poisoning and domain hijacking. When

widely adopted, DNSSEC is expected to facilitate a multitude of future

applications and systems, as well as security mechanisms, that would use

the DNS for distribution of security tokens, such as, certificates, IP prefix authentication for routing security, anti-spam mechanisms. Multiple

efforts are invested in adopting DNSSEC and in evaluating challenges

towards its deployment.

In this work we perform a study of errors and misconfigurations in

signed domains. To that end, we develop a DNSSEC framework and

a webpage for reporting the most up to date statistics and provide

reports with vulnerabilities and misconfigurations. Our tool also supports

retrieval of historical data and enables to perform long-term studies and

observations of changes in the security landscape of DNS. We make our

tool and the collected data available via an online webservice.



1



Introduction



Domain Name System (DNS), [RFC1034, RFC1035], has a key role in the Internet. The correctness and availability of DNS are critical to the security and

functionality of the Internet. Initially designed to translate domain names to IP

addresses, the DNS infrastructure has evolved into a complex ecosystem, and the

complexity of the DNS infrastructure is continuously growing with the increasing range of purposes and client base. DNS is increasingly utilised to facilitate

a wide range of applications and constitutes an important building block in the

design of scalable network infrastructures.

There is a long history of attacks against DNS, most notably, DNS cache

poisoning, [5–7,12,14,17]. DNS cache poisoning attacks are known to be practiced by governments, e.g., for censorship [1] or for surveillance [11], as well as

by cyber criminals. In the course of a DNS cache poisoning attack, the attacker

provides spoofed records in DNS responses, in order to redirect the victims to

incorrect hosts for credential theft, malware distribution, censorship and more.

To mitigate the threat from the DNS cache poisoning attacks, the

IETF designed and standardised Domain Name System Security Extensions

(DNSSEC) [RFC4033-RFC4035]. Unfortunately DNSSEC requires significant

changes to the DNS infrastructure as well as to the protocol, and although

proposed and standardised already in 1997, it is still not widely deployed. Studies show that less than 1 % of the domains are signed with DNSSEC, [9,19] and

c Springer International Publishing AG 2016

S. Foresti and G. Persiano (Eds.): CANS 2016, LNCS 10052, pp. 651–660, 2016.

DOI: 10.1007/978-3-319-48965-0 43



652



T. Dai et al.



about 3 % of the DNS resolvers validate DNSSEC records, [3,13]. However, the

situation is improving and following the recent ICANN regulation, [15], the registrars are turning domain signing into an automated task, as the procedures for

automated domain signing by the registrars and hosting providers are becoming

widely supported. Now that the DNSSEC is taking off, tools for evaluating problems with signed domains are critical, since they can alert the domain owners

as well as clients of the potential pitfalls. Although tools for studying DNSSEC

exist, and we compare them with our tool in Related Work, Sect. 2, our tool

detects and reports misconfigurations and cryptographic vulnerabilities which

were not performed prior to our work.

In this work we perform a study of miconfigurations among DNSSEC-signed

domains. We first collect a list of popular signed domains, and then measure

the different misconfigurations and problems among them. We provide access to

our tool through a webpage, which can be accessed at: https://dnssec.cad.sit.

fraunhofer.de.

Contributions. We designed and implemented a framework, DNSSEC misconfiguration validation engine, which collects signed domains from multiple sources,

analyses the misconifgurations among them, and processes them into reports.

Our reports quantify two types of vulnerabilities in signed domains: cryptographic failures (those preventing a DNS resolver from establishing a chain of

trust or domains using vulnerable DNSSEC keys) and transport failures (e.g.,

lack of support of TCP or EDNS). We use our engine to perform Internet-wide

collection of 1349 Top-Level Domains (TLDs) and top-1M Alexa (www.alexa.

com) domains.

We collected statistics between March and September 2016 with our tool,

and report on the current status as well as improvements that we detected over

time. Our study indicates that 90 % of TLDs and 1.66 % of Alexa domains are

signed. Among signed domains, 0.89 % TLDs and 19.46 % Alexa domains cannot

establish a chain of trust to the root zone; among those Alexa domains, 85.5 %

are Second-Level Domains (SLDs). We also checked for the presence of DNSSEC

keys in domains with a broken chain of trust, in other repositories for DNSSEC

keys distribution. Of the 19.46 % of the Alexa domains, only 51 have a DLV

resource record in dlv.isc.org. Namely, majority of the signed domains do not

provide any benefit by signing their records, since the clients anyway cannot

validate the signatures. We find domains with vulnerable DNSSEC keys, using

even RSA modulus. In contrast to February 2016, where 3 % of TLDs did not

have support for TCP, all TLDs currently support TCP. However, 12.88 % of

Alexa domains have nameservers which still cannot serve DNS responses over

TCP.

The reports and statistics can be accessed at https://dnssec.cad.sit.

fraunhofer.de.

Organisation. In Sect. 2 we compare our research to related work. In Sect. 3 we

describe our DNSSEC configuration validation engine, its components and the

data collection that we performed with. In Sect. 4 we perform a measurement of



DNSSEC Misconfigurations in Popular Domains



653



signed domains and characterise causes for the misconfigured signed domains.

We conclude this work in Sect. 5.



2



Related Work



The research and operational communities invested significant efforts in generating online services for studying DNS. We review some of the central services.

OARC’s DNS Reply Size Test Server is an online service for testing responses

size of DNS. The clients can use the tool to evaluate the maximum response size

that their network can support. This test is especially critical for adoption of

DNSSEC, since DNSSEC enabled responses typically exceed the standard size

of 512 bytes.

Multiple online services were designed for evaluating the security of port

selection algorithms, most notably porttest.dns-oarc.net; see survey and analysis

in [6]. The tools study the randomness in ports selected by the DNS resolver.

Recently multiple tools were proposed for checking DNSSEC adoption on

zones. For instance, DNSViz, given a domain name, visualises all the keys the

domain has and signatures over DNS records. It also checks that it is possible to

establish a chain of trust from the root to the target domain. SecSpider provides

overall statistics for DNSSEC deployment on zones, by collecting signed DNS

records and keys from the zones.

Our tool complements the existing tools by allowing to study insecurity or

misconfigurations on a given domain, as well as analysing statistics of the misconfigurations over a given time period, and for a set of domains. In contrast to

existing tools which provide an analysis for a given domain that they receive in

an input, our tool is invoked periodically over the datasets that it uses, analyses

the data and produces reports with statistics. The reports contain misconfigurations on the transport layer, such as support of TCP, as well as on the cryptographic aspects, such as vulnerable keys and lack of chain of trust. Our tool

provides important insights to clients accessing domains as well as for domain

owners, and allows researchers to study changes in security and configurations

of domains over time.

Prior studies measuring adoption of DNSSEC, investigated validation on the

DNS resolvers’ side, [13], showing that a large fraction of DNS resolvers do

not perform correct validation of DNSSEC signatures. Other works investigated

obstacles towards adoption of DNSSEC, suggesting mitigations and alternative

mechanisms, [8–10].

Our tool provides insights on the status of adoption of DNSSEC among zones

and on misconfigurations within signed domains in DNS hierarchy, as well as on

the failures on nameservers, such as failures to serve responses over TCP.



3



DNSSEC Adoption/Configuration Framework



In this section we present our framework for collecting and processing domains,

illustrated in Fig. 1. In the rest of this section we explain the components of our



654



T. Dai et al.



Fig. 1. DNSSEC adoption and configuration evaluation framework.



DNSSEC validation engine, including data sources and data collection, and the

analysis of the data and processing into reports and online web page.

Domains Crawler. We developed a crawler to collect and store DNSSECsigned domains.

Data Sources. We collected sources of DNSSEC signed zones that we feed to

the database as ‘crawling seeds’:

(1) the root and Top Level Domain (TLD) zone files – we obtained the root and

TLD zone files (e.g., for com, net, org, info) from the Internet Corporation for Assigned Names and Numbers (ICANN). In total we study 1301

TLDs.

(2) we scanned the top-1M popular domains according to Alexa www.alexa.

com.



4



Evaluating Vulnerabilities in DNSSEC Adoption



In this section we provide our measurement of adoption of DNSSEC among

the domains in our dataset, i.e., the Top Level Domains (TLDs) and Second

Level Domains (SLDs) (based on the data sources in Sect. 3), and report on

misconfigurations and vulnerabilities.

Quantifying Signed Domains. We define DNSSEC-signed domains as those

with DNSKEY and RRSIG records. To check for the fraction of signed domains, we

checked for existence of DNSKEY and RRSIG records in our dataset. Our results

show that 90 % of the TLDs and 1.66 % of the SLDs are signed.

In Fig. 2 we plot the results we collected between March and September 2016.

The upper line indicates the total number of TLDs/SLDs, while the lower line

indicates the number of DNSSEC-signed TLDs/SLDs. In that time interval the

number of new TLDs increased by 250 and we observe roughly the same increase

in the number of signed TLDs. The graph also shows a growth in a number of new



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

1 4 and 5 Cores: Critical Paths Evaluating (X1, X2, Z)

Tải bản đầy đủ ngay(0 tr)

×