3 Power Analysis and Probing Attack on Two ANDs Based on the Mean Value (Assumptions (2) or (3))
Tải bản đầy đủ - 0trang
Side-Channel Attacks on Threshold Implementations Using a Glitch Algebra
67
– u
¯2 = 0 and u
¯3 = 1 (probability 14 ): v1 has a glitch if and only if y¯3 = 1.
– u
¯2 = 1 (probability 12 ): v1 has a glitch when y¯1 ∨ y¯2 ∨ y¯3 = 1. For y¯ = 1, there
is always a glitch. For y¯ = 0, there is a glitch with probability 34 .
So, if y¯ = 0, we observe a glitch in v1 with probability 14 × 0 + 14 × 12 + 12 × 34 = 12 .
If y¯ = 1, the probability becomes 14 × 0 + 14 × 12 + 12 × 1 = 58 . Hence, the mean
7
.
value reveals y¯. A single sample gives an error probability of 17
Now, under Assumption (3), we have
glitch(z1 ) = 0
glitch(z2 ) = y¯3
glitch(z3 ) = y¯1 ⊕ y¯2
glitch(v1 ) = y¯3 (¯
u2 ⊕ u
¯3 ) ⊕ (¯
y1 ⊕ y¯2 )¯
u2
so we can try to probe v1 again. With probability 41 , we have u
¯2 ⊕ u
¯3 = u
¯2 = 1
so glitch(v1 ) = y¯. Otherwise, glitch(v1 ) is uniformly distributed. So, for y¯ = 0,
E(glitch(v1 )) = 38 and for y¯ = 1, E(glitch(v1 )) = 58 . Again, y¯ leaks from the
mean value. A single sample gives an error probability of 38 .
The attack with noisy values is hardly more complicated than for n = 1.
Note that [7] does not claim any security on the composition of two AND
gates. But this attacks clearly shows the limitation of this approach.
5
Implementation with n = 4
Assuming that (x1 , x2 , x3 , x4 ) shares x, (y1 , y2 , y3 , y4 ) shares y, and (z1 , z2 , z3 , z4 )
shares z, Nikova, Rechberger, and Rijmen [7] propose
z1 = ((x3 ⊕ x4 ) ∧ (y2 ⊕ y3 )) ⊕ y2 ⊕ y3 ⊕ y4 ⊕ x2 ⊕ x3 ⊕ x4
z2 = ((x1 ⊕ x3 ) ∧ (y1 ⊕ y4 )) ⊕ y1 ⊕ y3 ⊕ y4 ⊕ x1 ⊕ x3 ⊕ x4
z3 = ((x2 ⊕ x4 ) ∧ (y1 ⊕ y4 )) ⊕ y2 ⊕ x2
z4 = ((x1 ⊕ x2 ) ∧ (y2 ⊕ y3 )) ⊕ y1 ⊕ x1
It was proposed as an improvement to the n = 3 scheme as it makes all zi
shares balanced. This property is called uniformity in [8]. It was used to address
composition. So, we look again at the composition of two AND circuits.
Again, we assume glitch(x1 ) = 1, glitch(x2 ) = glitch(x3 ) = glitch(x4 ) = glitch
y1 ⊕ y¯4 )
(yi ) = 0 for i = 1, . . . , 4 and glitch(x1 ) = 1. So, glitch(z1 ) = 0, glitch(z2 ) = (¯
y2 ⊕ y¯3 ) + 1 with Assumption (1).
+ 1, glitch(z3 ) = 0, and glitch(z4 ) = (¯
We compute v = z ∧ u = (x ∧ y) ∧ u using the threshold implementation with
v1 = ((z3 ⊕ z4 ) ∧ (u2 ⊕ u3 )) ⊕ u2 ⊕ u3 ⊕ u4 ⊕ z2 ⊕ z3 ⊕ z4
v2 = ((z1 ⊕ z3 ) ∧ (u1 ⊕ u4 )) ⊕ u1 ⊕ u3 ⊕ u4 ⊕ z1 ⊕ z3 ⊕ z4
v3 = ((z2 ⊕ z4 ) ∧ (u1 ⊕ u4 )) ⊕ u2 ⊕ z2
v4 = ((z1 ⊕ z2 ) ∧ (u2 ⊕ u3 )) ⊕ u1 ⊕ z1
So, we have
68
S. Vaudenay
glitch(v1 ) = (¯
y2 ⊕ y¯3 + 1)(¯
u2 ⊕ u
¯3 ) + (¯
y1 ⊕ y¯4 ) + (¯
y2 ⊕ y¯3 ) + 2
glitch(v2 ) = y¯2 ⊕ y¯3 + 1
glitch(v3 ) = ((¯
y1 ⊕ y¯4 ) + (¯
y2 ⊕ y¯3 ) + 2)(¯
u1 ⊕ u
¯4 ) + y¯1 ⊕ y¯4 + 1
y1 ⊕ y¯4 ) + 1)(¯
u2 ⊕ u
¯3 )
glitch(v4 ) = ((¯
Hence, we can just probe v1 and see if it has a glitch. With probability 12 , we
¯3 so glitch(v1 ) = 2(¯
y2 ⊕ y¯3 ) + (¯
y1 ⊕ y¯4 ) + 3. In other cases, we have
have u
¯2 = u
y1 ⊕ y¯4 ) + (¯
y2 ⊕ y¯3 ) + 2 which is uniformly distributed. So, by
glitch(v1 ) = (¯
repeating enough times, the majority of glitch(v1 ) is y¯ with high probability.
The attack with noisy values is hardly more complicated than for n = 1.
Computations with Assumptions (2) or (3) are similar.
Note that [7] does not claim any security on the composition of two AND
gates. However, the n = 4 implementation was made to produce a balanced
sharing of the output to address composability through pipelining, meaning by
adding a layer of registers between the circuits we want to compose. Here, we
consider the composition of two AND gates without pipelining. Indeed, we certainly do not want to add registers in between two single gates! But our attacks
shows that the entire layer of circuit that we want to compose through pipelining
must be analyzed as a whole, since single gates clearly do not compose well.
Higher-Order Threshold Implementation with n = 5
6
In [1], Bilgin et al. propose an example of higher-order threshold implementation.
Equation (1) in [1] implements y¯ = 1 ⊕ a
¯ ⊕ ¯b¯
c. To obtain the implementation of
an AND gate, we just remove the 1 and the a terms and obtain
y1
y2
y3
y4
y5
= (b2 ∧ c2 ) ⊕ (b1 ∧ c2 ) ⊕ (b2 ∧ c1 ) y6
= (b3 ∧ c3 ) ⊕ (b1 ∧ c3 ) ⊕ (b3 ∧ c1 ) y7
= (b4 ∧ c4 ) ⊕ (b1 ∧ c4 ) ⊕ (b4 ∧ c1 ) y8
= (b1 ∧ c1 ) ⊕ (b1 ∧ c5 ) ⊕ (b5 ∧ c1 ) y9
= (b2 ∧ c3 ) ⊕ (b3 ∧ c2 )
y10
= (b2 ∧ c4 ) ⊕ (b4 ∧ c2 )
= (b5 ∧ c5 ) ⊕ (b2 ∧ c5 ) ⊕ (b5 ∧ c2 )
= (b3 ∧ c4 ) ⊕ (b4 ∧ c3 )
= (b3 ∧ c5 ) ⊕ (b5 ∧ c3 )
= (b4 ∧ c5 ) ⊕ (b5 ∧ c4 )
Then, Eq. (2) in [1] decreases the number of shares to 5 by
z1
z2
z3
z4
= (b2 ∧ c2 ) ⊕ (b1 ∧ c2 ) ⊕ (b2 ∧ c1 )
= (b3 ∧ c3 ) ⊕ (b1 ∧ c3 ) ⊕ (b3 ∧ c1 )
= (b4 ∧ c4 ) ⊕ (b1 ∧ c4 ) ⊕ (b4 ∧ c1 )
= (b1 ∧ c1 ) ⊕ (b1 ∧ c5 ) ⊕ (b5 ∧ c1 )
z5 = (b2 ∧ c3 ) ⊕ (b3 ∧ c2 ) ⊕ (b2 ∧ c4 )⊕
= (b4 ∧ c2 ) ⊕ (b5 ∧ c5 ) ⊕ (b2 ∧ c5 )⊕
= (b5 ∧ c2 ) ⊕ (b3 ∧ c4 ) ⊕ (b4 ∧ c3 )⊕
(b3 ∧ c5 ) ⊕ (b5 ∧ c3 ) ⊕ (b4 ∧ c5 )⊕
(b5 ∧ c4 )
This 2nd order implementation is supposed to resist to probing attacks with two
probes. Normally, the transform of (y1 , . . . , y10 ) to (z1 , . . . , z5 ) by zi = yi for
i < 5 and z5 = y5 ⊕ · · · ⊕ y10 must be done with intermediate registers to avoid
the propagation of glitches. We wonder what happens without these registers.
Let consider an attack probing z4 and z5 . If there is a glitch in b5 and no
other input share, we have glitch(z4 ) = c¯1 and
Side-Channel Attacks on Threshold Implementations Using a Glitch Algebra
69
Table 2. Distribution of (glitch(z4 ), glitch(z5 )) for a glitch in b5 in the 2nd order threshold implementation
c¯ c¯1 c¯2 c¯3 c¯4 c¯5
0 00000
0 00011
0 00101
0 01001
0 00110
0 01010
0 01100
0 01111
0 10001
0 10010
0 10100
0 11000
0 10111
0 11011
0 11101
0 11110
mean
variance
A. (1) A. (2)
(0, 0) (0, 0)
(0, 2) (0, 1)
(0, 2) (0, 1)
(0, 2) (0, 1)
(0, 2) (0, 1)
(0, 2) (0, 1)
(0, 2) (0, 1)
(0, 4) (0, 1)
(1, 1) (1, 1)
(1, 1) (1, 1)
(1, 1) (1, 1)
(1, 1) (1, 1)
(1, 3) (1, 1)
(1, 3) (1, 1)
(1, 3) (1, 1)
(1, 3) (1, 1)
( 12 , 2) ( 12 , 15
)
16
15
( 14 , 1) ( 12 , 256
)
A. (3)
(0, 0)
(0, 0)
(0, 0)
(0, 0)
(0, 0)
(0, 0)
(0, 0)
(0, 0)
(1, 1)
(1, 1)
(1, 1)
(1, 1)
(1, 1)
(1, 1)
(1, 1)
(1, 1)
( 12 , 12 )
( 12 , 14 )
c¯ c¯1 c¯2 c¯3 c¯4 c¯5
1 00001
1 00010
1 00100
1 01000
1 00111
1 01011
1 01101
1 01110
1 10000
1 10011
1 10101
1 11001
1 10110
1 11010
1 11100
1 11111
mean
variance
A. (1) A. (2)
(0, 1) (0, 1)
(0, 1) (0, 1)
(0, 1) (0, 1)
(0, 1) (0, 1)
(0, 3) (0, 1)
(0, 3) (0, 1)
(0, 3) (0, 1)
(0, 3) (0, 1)
(1, 0) (1, 0)
(1, 2) (1, 1)
(1, 2) (1, 1)
(1, 2) (1, 1)
(1, 2) (1, 1)
(1, 2) (1, 1)
(1, 2) (1, 1)
(1, 4) (1, 1)
( 12 , 2) ( 12 , 15
)
16
15
( 14 , 1) ( 12 , 256
)
A. (3)
(0, 1)
(0, 1)
(0, 1)
(0, 1)
(0, 1)
(0, 1)
(0, 1)
(0, 1)
(1, 0)
(1, 0)
(1, 0)
(1, 0)
(1, 0)
(1, 0)
(1, 0)
(1, 0)
( 12 , 12 )
( 12 , 14 )
glitch(z5 ) = glitch((b5 ∧ c2 ) ⊕ (b5 ∧ c3 ) ⊕ (b5 ∧ c4 ) ⊕ (b5 ∧ c5 ))
With Assumption (1), this is glitch(z5 ) = c¯2 + c¯3 + c¯4 + c¯5 . With Assumption (2),
c2 , c¯3 , c¯4 , c¯5 ). With Assumption (3), this is glitch(z5 ) =
this is glitch(z5 ) = max(¯
c¯2 ⊕ c¯3 ⊕ c¯4 ⊕ c¯5 . So, we obtain the distributions for (glitch(z4 ), glitch(z5 )) which is
on Table 2. As we can see, the mean and the variance do not leak (as intended).
However, the distributions are quite far apart.
Indeed, for Assumption (3), we have c¯ = glitch(z4 ) ⊕ glitch(z5 ) so it is clear
that c¯ leaks. For Assumption (1), we have c¯ = glitch(z4 ) ⊕ (glitch(z5 ) mod 2) so
it is clear that c¯ leaks as well. For Assumption (2), the distributions are
Distribution
(0, 0) (0, 1) (1, 0) (1, 1)
(glitch(z4 ), glitch(z5 ))|¯
c = 0 1/16 7/16 0/16 8/16
c = 1 0/16 8/16 1/16 7/16
(glitch(z4 ), glitch(z5 ))|¯
so the statistical distance is 18 . This means that from a single value we can
1
. Of course, this ampliﬁes like
deduce c¯ with an error probability of Pe = 12 − 16
in (4) using more samples. Hence, two probes leak quite a lot. So, we clearly see
that avoiding the extra registers needed to avoid the number of shares to inﬂate
makes the implementation from [1] insecure.
70
S. Vaudenay
7
Conclusion
We have shown that the threshold implementations are quite weak against many
simple attacks: distinguishers based on non-linear functions on the power traces
(as simple as a threshold function or a power function), multiple probes, and
linear distinguishers for a cascade of circuits. Although they do not contradict the
results by their authors, these attacks show severe limitations on this approach.
We have seen that compared to the attack on the AND gate with no protection, the threshold implementation proposals only have the eﬀect to amplify the
noise of the side-channel attack by a constant factor. Therefore, we believe that
there is no satisfactory protection for attacks based on glitches.
References
1. Bilgin, B., Gierlichs, B., Nikova, S., Nikov, V., Rijmen, V.: Higher-order threshold
implementations. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol.
8874, pp. 326–343. Springer, Heidelberg (2014). doi:10.1007/978-3-662-45608-8 18
2. Chari, S., Jutla, C.S., Rao, J.R., Rohatgi, P.: Towards sound approaches to counteract power-analysis attacks. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol.
1666, pp. 398–412. Springer, Heidelberg (1999). doi:10.1007/3-540-48405-1 26
3. Chernoﬀ, H.: A measure of asymptotic eﬃciency for tests of a hypothesis based on
the sum of observations. Ann. Math. Stat. 23(4), 493–507 (1952)
4. Duc, A., Dziembowski, S., Faust, S.: Unifying leakage models: from probing
attacks to noisy leakage. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT
2014. LNCS, vol. 8441, pp. 423–440. Springer, Heidelberg (2014). doi:10.1007/
978-3-642-55220-5 24
5. Mangard, S., Popp, T., Gammel, B.M.: Side-channel leakage of masked CMOS
gates. In: Menezes, A. (ed.) CT-RSA 2005. LNCS, vol. 3376, pp. 351–365. Springer,
Heidelberg (2005). doi:10.1007/978-3-540-30574-3 24
6. Moradi, A.: Statistical tools ﬂavor side-channel collision attacks. In: Pointcheval,
D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 428–445.
Springer, Heidelberg (2012). doi:10.1007/978-3-642-29011-4 26
7. Nikova, S., Rechberger, C., Rijmen, V.: Threshold implementations against sidechannel attacks and glitches. In: Ning, P., Qing, S., Li, N. (eds.) ICICS 2006. LNCS,
vol. 4307, pp. 529–545. Springer, Heidelberg (2006). doi:10.1007/11935308 38
8. Nikova, S., Rijmen, V., Schlă
aer, M.: Secure hardware implementation of nonlinear
functions in the presence of glitches. J. Cryptology 24, 292–321 (2011)
9. Reparaz, O., Bilgin, B., Nikova, S., Gierlichs, B., Verbauwhede, I.: Consolidating
masking schemes. In: Gennaro, R., Robshaw, M. (eds.) CRYPTO 2015. LNCS, vol.
9215, pp. 764–783. Springer, Heidelberg (2015). doi:10.1007/978-3-662-47989-6 37
10. Standaert, F.-X., Veyrat-Charvillon, N., Oswald, E., Gierlichs, B., Medwed, M.,
Kasper, M., Mangard, S.: The world is not enough: another look on second-order
DPA. In: Abe, M. (ed.) ASIACRYPT 2010. LNCS, vol. 6477, pp. 112–129. Springer,
Heidelberg (2010). doi:10.1007/978-3-642-17373-8 7
11. Trichina, E., Korkishko, T., Lee, K.H.: Small size, low power, side channel-immune
AES coprocessor: design and synthesis results. In: Dobbertin, H., Rijmen, V., Sowa,
A. (eds.) AES 2004. LNCS, vol. 3373, pp. 113–127. Springer, Heidelberg (2005).
doi:10.1007/11506447 10
Diversity Within the Rijndael Design Principles
for Resistance to Diﬀerential Power Analysis
Merrielle Spain1(B) and Mayank Varia2
1
MIT Lincoln Laboratory, Lexington, USA
merrielle.spain@ll.mit.edu
2
Boston University, Boston, USA
varia@bu.edu
Abstract. The winner of the Advanced Encryption Standard (AES)
competition, Rijndael, strongly resists mathematical cryptanalysis. However, side channel attacks such as diﬀerential power analysis and template
attacks break many AES implementations.
We propose a cheap and eﬀective countermeasure that exploits the
diversity of algorithms consistent with Rijndael’s general design philosophy. The secrecy of the algorithm settings acts as a second key that the
adversary must learn to mount popular side channel attacks. Furthermore, because they satisfy Rijndael’s security arguments, these algorithms resist cryptanalytic attacks.
Concretely, we design a 72-bit space of SubBytes variants and a 36-bit
space of ShiftRows variants. We investigate the mathematical strength
provided by these variants, generate them in SageMath, and study their
impact on diﬀerential power analysis and template attacks against ﬁeldprogrammable gate arrays (FPGAs) by analyzing power traces from the
DPA Contest v2 public dataset.
Keywords: Side channel attack · Side channel countermeasure · Guessing entropy · Diﬀerential power analysis · Template attack · Hamming
weight · Advanced Encryption Standard · Rijndael · FPGA
1
Introduction
Diﬀerential power analysis (DPA) [1] and template attacks [2] can quickly
break secure, correctly implemented cryptographic algorithms [3]. They harness
information leaked by the physical implementation of a cryptosystem—outside
the scope of cryptographic models, provable security claims, and mathematical cryptanalysis. Researchers have proposed countermeasures to side channel
This work is sponsored by the Oﬃce of Naval Research under Air Force Contract
FA8721-05-C-002. Opinions, interpretations, conclusions and recommendations are
those of the authors and are not necessarily endorsed by the United States Government.
M. Varia—Research performed while consulting at MIT Lincoln Laboratory.
c Springer International Publishing AG 2016
S. Foresti and G. Persiano (Eds.): CANS 2016, LNCS 10052, pp. 71–87, 2016.
DOI: 10.1007/978-3-319-48965-0 5
72
M. Spain and M. Varia
attacks ranging from isolating the device to masking the signal [4,5]. However,
these approaches have drawbacks, especially for lightweight and mobile security.
We leverage work from the Advanced Encryption Standard (AES) [6] process
to argue cryptanalytic security, while deriving side channel resilience from diversity available within the design principles of the winner Rijndael. NIST’s burdensome competition only certiﬁed a single algorithm for standardization, even
though Rijndael’s security arguments cover a range of settings.
We explore the space of Rijndael variants that stay within these security
arguments to maintain optimal cryptanalytic security. These “tunable knobs”
increase resistance to DPA and template attacks by introducing a second source
of entropy. Additionally, as with Clavier et al. [7], our method complements
masking and shuﬄing techniques.
1.1
Prior Work
Barkan and Biham explored dual ciphers of AES [8], which are variant ciphers
whose plaintexts, ciphertexts, and keys can be mapped to those of AES via
invertible transformations. Initial works showed 240 duals of AES that arise
from the choice of 30 irreducible polynomials of degree 8 in GF(2)[x] and 8
choices of the primitive root of this polynomial. Rostovtsev and Shemyakina [9]
further propose that each of the 16 SubBytes operations could be diﬀerent.
Kerckhoﬀs’s principle notwithstanding, one might hope that choosing a random variant on the ﬂy could obfuscate the AES circuitry. Indeed, several works
have designed and implemented modular FPGAs that can choose on the ﬂy
between the 240 duals, either for performance reasons [10] or in hope of improving security [11]. However, Moradi and Mischke [12] demonstrated that a single,
reconﬁgurable chip implementing the AES duals (without LUTs) is insecure
because power side channels can leak the variant choice. Moreover, even while
subsequent works have discovered up to 61,200 AES duals [13,14], the space of
duals remains small enough to brute force.
To overcome this limitation, other prior work seeks to design a large corpus of
variants based on Rijndael, without connecting mathematical security to that of
the standard. Jing et al. [15] initiate this line of research by proposing variations
of SubBytes and MixColumns; these results have since been superceded by other
works. Jing et al. [16] extensively analyze the space of SubBytes variants possible
through the use of diﬀerent aﬃne transformations. Several works propose varying
the 4 row shift oﬀsets in the ShiftRows operation [7,16]. Finally, a few works
ﬁnd alternate MixColumns matrices with higher multiplicative order [17,18].
1.2
Our Contributions
This paper proposes a moving target defense against a side channel attacker. We
contribute the ﬁrst work that simultaneously:
Diversity Within the Rijndael Design Principles for Resistance
73
1. Generates variants that maintain both the design and mathematical strength
of AES.
2. Leverages the variation in round function components for improved resistance
to diﬀerential power analysis (DPA) and template attacks.
By contrast, prior work either abandons the structure of AES, weakens its cryptanalytic strength, or fails to justify improved side channel resistance. Jing et al.
claim that variation increases strength against attacks, but fail to specify any
attacks [16]. Furthermore, they allow ﬁxed points in SubBytes, which reduces
cryptanalytic strength. They also fail to identify redundancy between components or quantify the security provided.
Section 2 describes the structure of our Rijndael variants and calculates the
number of unique variants. Section 3 determines the implementation cost of
our scheme. Section 4 demonstrates that our variants retain the design principles necessary to argue for its resistance to common cryptanalytic attacks;
we also provide open-source SageMath code that automatically produces variants and tests them against cryptanalytic metrics (https://github.com/mit-ll/
Diversity-Within-Rijndael). Section 5 argues that our variants’ diversity impedes
DPA and template attacks; we augment these claims with analysis of the DPA
Contest v2 dataset [3].
1.3
Envisioned Usage
As side channel resistance depends on usage, this work focuses on Rijndael
variants implemented on ﬁeld-programmable gate arrays (FPGAs). More concretely, we envision each FPGA being hardcoded with a single variant.1 This
technique is simpler and more performant (in runtime and chip size) than prior
work [11,12,16] that envisioned a single FPGA that can change variants on
the ﬂy.
We stress the compatibility of this approach with Kerckhoﬀs’s principle. Our
approach treats pieces of the round structure internals as a second component
of the key. While a particular variant is ﬁxed at compilation time, this choice
can be altered by reprogramming the device or obtaining a new one.
In some scenarios, altering an algorithm costs more than altering a key; in
those cases, key evolution [20] could make a better side-channel deterrent. Our
techniques suit an environment where: Varying the algorithm costs no more than
varying the key. Continuous rekeying costs too much, in computation or communication, or insuﬃcient robustness can harm the availability of communication.
A block cipher must remain robust against side channel attacks for a long time.
One such scenario involves military communication devices that require high
availability, are diﬃcult to adjust in the ﬁeld, and are reconﬁgured easily back
at home.
1
For instance, one can modify Manteena’s implementation of AES in VHDL [19,
Appendix D] to produce diﬀerent, static mappings of byte values in SubBytes, mappings of byte locations in ShiftRows, and matrix constants in MixColumns.
74
2
M. Spain and M. Varia
The Design of Our Rijndael Variants
AES [6] operates on a 16-byte state organized into a 4 × 4 matrix of bytes. It
performs several rounds that comprise four algorithms: SubBytes, ShiftRows,
MixColumns, and AddRoundKey. The round function satisﬁes the two primary
concepts for designing ciphers from Claude Shannon: confusion and diﬀusion [21].
Confusion states that the eﬀects should be key-dependent and hard to predict.
In AES, AddRoundKey provides key-dependence and SubBytes provides nonlinearity. Diﬀusion states that a minor change in the input should disperse to
many output locations. In AES, MixColumns provides local diﬀusion within a
column and ShiftRows spreads the diﬀusion globally. The synergy between AES
components produces strength beyond Shannon’s original vision: its wide trail
strategy [22] permits strong claims of AES’ resistance to diﬀerential and linear
cryptanalytic attacks.
SubBytes
72 bits
Intermediate
value
ShiftRows
36 bits
AddRoundKey
Ciphertext
Fig. 1. Schema of last round of AES, simpliﬁed to four bytes in two columns. SubBytes
and AddRoundKey act on each byte independently, and ShiftRows disperses bytes to
diﬀerent columns (dashed regions) without altering values.
Figure 1 shows a simpliﬁed last round of AES along with our theoretical
estimates of the variety possible within components. First, we describe the four
round function operations in AES and our variants of these operations. Second,
we calculate how the entropies of our variations combine.
2.1
SubBytes
In AES, SubBytes is a ﬁxed nonlinear permutation that independently replaces
each byte of the input with a diﬀerent value. It provides limited confusion at low
cost. Concretely, SubBytes concatenates three steps:
1. Inversion fp (x) = x−1 over the ﬁnite ﬁeld GF(256) = GF (2)[x]/(p(x)), where
p(x) = x8 + x4 + x3 + x + 1.
Diversity Within the Rijndael Design Principles for Resistance
75
2. Linear transformation g(x) = Ax over the vector space GF(2)8 .
3. Addition2 of a constant h(x) = x + b in GF(2)8 .
AES’ security relies on three properties of SubBytes. First, the function has high
algebraic complexity when viewed in a single mathematical space [23]. Second,
SubBytes must be highly nonlinear : possessing low linear biases and diﬀerence
propagations. Third, SubBytes cannot have any ﬁxed or anti-ﬁxed points.
Our variations follow Jing et al.’s procedure to preserve the ﬁrst two properties [16]. In the inversion step, we choose the modulus p from any of the 30
irreducible polynomials of degree 8 over GF(2).3 In the linear transformation,
we pick an invertible matrix A (i.e., having linearly independent rows) from the
7
i
62
such choices.
i=0 (256 − 2 ) ≈ 1.16 × 2
Unlike Jing et al. [16], our variations also preserve the third property by
restricting b ∈ GF(2)8 to choices that avoid any (anti-)ﬁxed points in the completed SubBytes permutation. We approximate the fraction of choices that meet
this constraint by replacing f and g with a truly random function R. In this case,
PrR [R(x) + b has no (anti-)ﬁxed points] = (254/256)256 ≈ 0.134. Our empirical
analysis with 50 million randomly-sampled choices shows that the fraction of
valid b is 0.135, close to our theoretical estimate. Hence, there are slightly more
than 5 bits of entropy in the choice of the constant b.
Finally, we observe that the three steps contribute independent sources of
entropy. That is, for all pairs of inverse functions fp and fp , linear transformations g and g , and constant addition steps h and h , h ◦ g ◦ fp = h ◦ g ◦ fp
unless the pairs are identical. This statement follows by rearranging the above
inequality to (h ◦ g )−1 ◦ h ◦ g = fp ◦ (fp )−1 and empirically verifying that the
right side is nonaﬃne for p = p whereas the left side is aﬃne.
In total, our design yields more than 272 variants of SubBytes.4 We will show
in Sect. 4 that the variants retain Rijndael’s resistance toward mathematical
cryptanalysis.
2.2
ShiftRows
AES’ ShiftRows operation transposes the 16 bytes of state by shifting each
row of the state matrix cyclically to the left by a ﬁxed number of bytes.
ShiftRows contributes to the wide trail strategy due to its diﬀusion optimality: it maps the 4 bytes within each column of the round state to 4 diﬀerent
2
3
4
Our variants perform XOR, just as AES does. By contrast, Rijmen and Oswald [13]
create variants that preserve AES’ original SubBytes functionality, at the cost of
replacing XOR with a (slower and leakier) table lookup.
It is also possible to choose the primitive root of the polynomial used to represent
elements of GF(256) [9]. This yields 3 bits of entropy independent of the aﬃne
transformation. However, SageMath encapsulates its choice of primitive root, so our
work skips this extra ﬂexibility.
We remark that Jing et al.’s calculation of this value [16] is inaccurate by a multiplicative factor of 7. Coincidentally, this 1/7 error closely matches the omitted 13.5 %
throughput of SubBytes lacking ﬁxed points.
76
M. Spain and M. Varia
columns [22, Deﬁnition 9.4.1]. We describe three, increasingly large, families of
ShiftRows variants that maintain diﬀusion optimality.
Cyclic preserving. Permutations in this family maintain AES SubBytes’ cyclic
nature. Previously considered [7,16], these variants choose diﬀerent cyclic oﬀsets
for each row of ShiftRows. This family contains 4! = 24 variants.
a) Input
b) Row preserving
c) Transpose
d) Our construction
Fig. 2. Depiction of the action of ShiftRows. The input (a) is colored by column, and
two outputs are displayed for transpositions that are row preserving (b) and not (d).
Row preserving. A higher entropy variation breaks the cyclic property of
ShiftRows, but keeps each byte in its original row. The ﬁrst row has 4! permutations. In the second row, there are 3 choices for the location of the white
block consistent with diﬀusion optimality, and 3 locations for the block of the
color above the white block (black, in the case of Fig. 2b). Let E denote the event
that this block is placed directly under the white block of row 1, as is the case
in Fig. 2b. In the third row, there are 2 locations for the white block. Afterward,
there exist 2 choices to complete the ShiftRows variant if event E occurred and
1 choice otherwise. In total, this procedure yields 4! · 3 · (1 · 4 + 2 · 2) = 576 = (4!)2
variants.
Our construction. We stress the irrelevance of row preservation to diﬀusion
optimality. We propose a (4!)8 family of diﬀusion-optimal byte transpositions
that we construct in three steps.
1. Transpose the 4 × 4 input matrix to satisfy diﬀusion optimality (Fig. 2c).
2. Independently shuﬄe the entries within each row (Fig. 2c).
3. Independently shuﬄe the entries within each column (Fig. 2d).
This construction independently chooses 8 permutations: 4 on the rows in
Step 2 and 4 on the columns in Step 3. All choices are distinct and maintain
diﬀusion optimality. Hence, our construction yields (4!)8 ≈ 1.60 × 236 variants.
2.3
MixColumns
AES’ MixColumns operation separately multiplies the 4 bytes in each column
of the state by a ﬁxed, invertible, circulant 4 × 4 matrix over the ﬁeld GF(256)
(using the same representation as described in AES’ SubBytes). Speciﬁcally,
the matrix in AES uses the following coeﬃcients in the ﬁrst column: c0 = 02,
c1 = 01, c2 = 01, and c3 = 03. The last round of AES omits MixColumns.
Diversity Within the Rijndael Design Principles for Resistance
77
The need for the MixColumns matrix to have diﬀerential and linear branch
numbers of 5 governs the choice of the constants. Grosek and Zajac [18] determined the satisfactory choices: For the matrix to be invertible,
ci = 0. To follow the wide trail strategy, ci = 0, ci = ci+2 , ci ci+1 = ci+2 ci+3 , and c2i = ci+1 ci−1
for all i, considering indices mod 4. Most settings satisfy these constraints, so
around 32 bits of entropy exist in the design of MixColumns.
2.4
AddRoundKey
AES XORs each state byte with a round key byte, itself a ﬁxed function of the
AES key. This operation concludes each round, and an extra AddRoundKey
precedes the ﬁrst round. The key schedule’s design provides three important
security properties: round-dependent constants break symmetry to prevent slide
attacks, SubBytes provides confusion to thwart related-key attacks, and a diffusive structure resists partial-key attacks [22]. Furthermore, the simplicity of
AddRoundKey’s XOR operation facilitates the wide trail strategy arguments
that decompose the cryptanalytic strength of AES to a function of the strength
of its parts. Hence, our variants keep AddRoundKey’s structure intact in order
to retain the security properties of AES.
We note that SubBytes’ usage inside key expansion induces a tradeoﬀ. If we
use the standard AES SubBytes inside the key schedule, then our variants require
larger chip area to store two diﬀerent SubBytes permutations. On the other hand,
using our SubBytes variant inside the key expansion reduces key agility; the
expanded key must be recomputed whenever the SubBytes variant changes. In
this work, we choose to maintain AES’ AddRoundKey entirely. Hence, updating
the key would be identical to AES.
2.5
Total Entropy Provided by Our Variants
Determining the total entropy of our variants requires measuring redundancy
between components. The variations of SubBytes and ShiftRows are independent
by design: one function changes byte values and the other changes byte positions.
Hence, we sum the entropies of SubBytes and ShiftRows to arrive at a total of
more than 2108 variants.
Although we described variations of MixColumns, we exclude them from our
design for two reasons. First, care must be taken to avoid dependences on the
previous variants: for instance, applying a scalar multiplication or cyclic rotation
to the MixColumns matrix is redundant with the variations to SubBytes and
ShiftRows, respectively [16]. Second, varying MixColumns fails to aﬀect many
side channel attacks because the ﬁnal round omits MixColumns.
Similarly, it may appear tempting to vary the round constants in AddRoundKey. However, changing the round constants fails to introduce new entropy over
variations of SubBytes’ modulus p, SubBytes’ aﬃne transformation A and b,
and MixColumns’ circulant matrix entries c0 through c3 [8].