Tải bản đầy đủ - 0 (trang)
2 Estimation of the triplet structure invariant via its first representation: the P1 and the P¯1 case

2 Estimation of the triplet structure invariant via its first representation: the P1 and the P¯1 case

Tải bản đầy đủ - 0trang

Estimation of the triplet structure invariant via its first representation

is to study the joint probability distribution

P(Eh1 , Eh2 , Eh3 ) ≡ P(Rh1 , Rh2 , Rh3 , φh1 , φh2 , φh3 ).

(5.2)

According to Section 4.1 we must first calculate the characteristic function

C and then, by Fourier inversion, recover the distribution (5.2). Because of

the importance of the triplet invariant, we report the necessary calculations in

Appendix 5.A. The resulting distribution is

R1 R2 R3

P(R1 , R2 , R3 , φ1 , φ2 , φ3 ) =

exp −R21 − R22 − R23 + C cos(φ1 + φ2 + φ3 ) ,

π3

(5.3)

where R1 , R2 , R3 , φ1 , φ2 , φ3 stand for Rh1 , Rh2 , Rh3 , φh1 , φh2 , φh3 , respectively,

C=

2R1 R2 R3

Neq

(5.4)

3/2

Neq = σ2 /σ3 ,

(5.5)

and

σn =

N

j=1

Zjn .

N is the number of atoms in the unit cell and Zj is the atomic number of the

jth atom. If all of the atoms are of the same species (and have similar thermal

displacement), then Neq ≡ N and

2R1 R2 R3

.

N

The simultaneous presence of heavy and light atoms in the unit cell makes

Neq < N (see Section 5.3).

From (5.3) the conditional distribution, P( |Rh1 , Rh2 , Rh3 ), may be obtained

(abbreviated to P( ); Cochran, 1955):

C=

P( ) = [2π I0 (C)]−1 exp(C cos ),

(5.6)

which may also be written as

P( ) = M( ; 0, C),

where

M( ; θ , C) = [2π I0 (C)]−1 exp[C cos(

− θ )]

is the von Mises distribution for the variable , centred at θ, with concentration parameter equal to C.

Equation (5.6) is plotted in Fig. 5.1, from which we observe:

(i) I0 is the modified Bessel function of order 0 (see Appendix M.E). We have

to think of [2π I0 (C)]−1 as a scaling factor, allowing 0 P( )d = 1.

(ii) Equation (5.6) has its maximum at = 0 (where cos = 1). It may be

concluded that the expected value of is always zero.

(iii) The sharpest curves are obtained in correspondence with the largest values of C. Thus the statistical indication ≈ 0 is reliable only if C is

sufficiently large. This condition is satisfied if all three Rs are sufficiently

large and N is sufficiently small.

105

106

The probabilistic estimation of triplet and quartet invariants

P(F)

1

C=6

0.8

C=4

0.6

C=2

0.4

Fig. 5.1

The Cochran distribution P( ) for a

triplet phase invariant, for different values

of parameter C.

0.2

C=0

C=1

Φ

0

–180

–135

–90

–45

0

45

90

135

180

(iv) If at least one of the Rs is zero, then P( ) = (2π )−1 : no phase indication

is obtained.

The statement ≈ 0 is a statistical expectation; it does not mean that

must be zero. To better understand this point, let us calculate the following

mean values:

< cos( )> =

cos( )P( )d

= D1 (C)

(5.7)

= 0,

(5.8)

0

< sin( )> =

sin( )P( )d

0

where D1 (C) = I1 (C)/I0 (C) is the ratio of the two modified Bessel functions

of order 1 and zero, respectively (see Fig. 5.2). According to (5.7), the average

value of < cos( ) > is smaller than 1, and is sufficiently close to 1 if C is large.

However, as for any statistical indication, it may also be that cos( ) is actually

negative, even if C is positive and large.

1.0

D1(x)

0.8

0.6

D2(x)

0.4

0.2

0.0

Fig. 5.2

The functions D1 (x) and D2 (x).

0.0

2.0

4.0

6.0

x

8.0

10.0

Estimation of the triplet structure invariant via its first representation

According to (5.8), the Cochran relationship is unable to fix the enantiomorph. Thus, if < cos( )> = cos q, equation (5.8) says that +q and –q have

equal probability, and coherently gives < > = 0 and < sin > = 0.

A last remark may be useful for readers not familiar with the von Mises

distribution. For circular variables, it plays a role similar to that played by the

Gaussian function for linear variables. In particular, the von Mises distribution

is marked by maximum likelihood and by maximum entropy characterization

(Mardia, 1972). While the normal distribution along a line has useful mathematical and statistical properties, this is not true for a normal distribution along

a circle (i.e. in the case of directional data). Indeed, in the theory of circular

variables, a normal distribution (as well as the most significant distributions

on a line, e.g. Cauchy, Poisson, etc.) is wrapped around the circumference of a

circle of unit radius, thus producing the so-called wrapped distribution.

Let us now consider the P1¯ case (Cochran and Woolfson, 1955): E1 , E2 , E3

are now real numbers, and according to Appendix 5.A the following joint

probability distribution is obtained:

P(E1 , E2 , E3 ) =

1

1

E1 E2 E3

exp − (E12 + E22 + E32 ) +

(2π )3/2

2

Neq

.

In this case the phase problem reduces to a sign problem. The probability that

the sign of E1 E2 E3 is plus, is given (but for a scaling term) by

P+ ≈ exp +

R1 R2 R3

Neq

,

and the probability that it is minus is given (but for a scaling term) by

P− ≈ exp −

R1 R2 R3

Neq

.

Since it must be that P+ + P− = 1, the rescaled value of the positive sign

probability is

+

P = (1 + P− /P+ )

−1

2R1 R2 R3

= 1 + exp −

Neq

−1

.

(5.9)

Since

(1 + e−2x )−1 = ex /(ex + e−x ) =

1 1

+ tanh x,

2 2

from (5.9),

P+ =

1 1

R1 R2 R3

+ tanh

2 2

Neq

(5.10)

is obtained (see Fig. 5.3). As for the acentric case we notice:

(i) P+ is always larger than 1/2, unless some of R1 , R2 , R3 are vanishing.

1 R2 R3

(ii) The reliability of sign indication is large only for large values of R√

.

Neq

(iii) The efficiency of (5.10) decays with the size of the structure.

Triplet estimation in space groups with symmetry higher than triclinic is

described briefly in Appendix 5.B.

107

108

The probabilistic estimation of triplet and quartet invariants

1

0.8

P+ 0.6

Fig. 5.3

Centrosymmetric space groups. P+ (X) is

the probability that the triplet sign is positive, according to equation (5.10), and

1 R2 R3

X = R√

. P+ is always equal to or

Neq

larger than 1/2.

0.4

0.2

0

1

2

3

4

X

5

6

7

8

The relationships (5.6) and (5.10) have been obtained by making use of two

basic assumptions: the structure is composed of discrete atoms (atomicity postulate) and the electron density is everywhere real and positive (positivity

postulate). For X-rays, positivity and atomicity are implicit in the positivity

of the atomic scattering factor f . It is, however, worthwhile noticing that when

triplets are to be estimated for neutron diffraction data (see Chapter 11), the

positivity postulate may be violated and relations (5.6) and (5.10) are no longer

valid. In an analogous way dispersion effects could introduce complex scattering factors, ( fj = fj + ifj ): in this case also, the probabilistic theory for triplet

estimation should be reformulated (Hauptman, 1982a,b; Giacovazzo, 1983b;

see Chapter 15).

In this section we focus our attention only on X-ray data: we wish to enquire

about the range of structural complexity inside which equations (5.6) and

(5.10) may be usefully applied. Since = 1 by definition, the R values

do not change their order of magnitude, no matter how complex is the structure. Therefore, the only parameter in C which changes size with structural

complexity is 2/ Neq : this parameter influences the average efficiency of the

triplet relationships. In more detail:

1. For crystal structures where non-hydrogen atoms are nearly equal, Neq is

almost equal to the number of non-hydrogen atoms in the unit cell (this

is only valid for X-ray data). Therefore, hydrogen atoms could even be

omitted from the calculation of Neq .

2. N > Neq when heavy and light atoms coexist in the unit cell. The difference

becomes large with increasing values of the ratio

atomic number of heavy atom:atomic number of light atoms.

For example, JAMILAS [K4 C64 H68 N8 O20 S4 , space group P1] is a small

structure with N = 100 non-hydrogen atoms in the unit cell; the corresponding value of Neq is 55. The above result indicates that crystal structures

with a large number of light atoms and a few heavy atoms are more easily

Table 5.1 Schwarz [C46 H70 O27 , P1]: statistical results on triplet estimates

(Cochran formula). nr is the number of triplets with Cochran parameter C > THR,

<| |> is the corresponding average value of | |, and % is the percentage of triplets

with positive value of cos

THR

nr

<| |>o

%

0.4

1.2

2.0

2.6

3.8

5117

4572

1552

570

81

41

40

30

27

19

90

91

96

98

100

solvable by direct methods than structures of the same size but without

heavy atoms.

3. For unit cells with a large number of atoms, C is small for most of the

triplets; correspondingly, extremely broad probability distributions (5.6) are

expected. The consequence is that few triplet phases are really close to

zero, the majority are dispersed in the interval (0, 2π). If the structure size

is small, a high percentage of triplet phases will be close to zero.

Table 5.1 shows some statistical calculations for the Schwarz [C46 H70 O27 ,

space group P1] structure, showing how is distributed versus C. The table

entries may be interpreted as follows:

(i) There are 81 triplets for which C > 3.8; for these, the average value of

| | is 19◦ (in this case the condition C > 3.8 selects triplets with phase

really close to zero), and cos is always positive.

(ii) There are 570 triplets with C > 2.6; for these, the average value of | |

is 27◦ .

Data in Table 5.1 may be usefully compared with data in Table 5.2, where we

show similar statistics for a small protein (1e8a; space group R3, 182 residues,

corresponding to 1472 non-hydrogen atoms in the asymmetric unit. Data resolution: 1.95 Å). Only 92 triplets reach a C value larger than 0.5, the percentage

of triplets which deviate from the Cochran expectation ≈ 0 is very high.

Table 5.2 1e8a. Statistical results on triplet estimates (Cochran formula). nr is the number of triplets with Cochran parameter C > THR, <| |> is the corresponding average

value of | |, and % is the percentage of triplets with positive value of cos

THR

0.1

0.2

0.3

0.4

0.5

nr

300000

79494

7355

759

92

<| |>o

%

86

84

83

78

78

54

55

56

59

59

109

110

The probabilistic estimation of triplet and quartet invariants

Apparently, the structural complexity does not allow selection of reliable triplet

invariants, with obvious consequences in the phasing steps.

5.4 The estimation of triplet phases via their

second representation

The Cochran formula (5.6) estimates triplet phases (5.1) by exploiting only the

information contained in three diffraction moduli; any is expected to be close

to 2π , and there is no chance of recognizing bad triplets (i.e. triplet phases

close to ±π/2 or with negative cosine values). This is of paramount importance

to the efficiency of the phasing process. We will see in the Chapter 6 that the

occurrence of a relatively large number of bad triplets in the phasing process

can lead to its failure. Alternatively, the probability of finding the correct set

of phases is enhanced if bad triplets are recognized; they should be excluded

from the structure solving process or actively used in a correct manner.

The representation theory, described in Chapter 4, indicates how information

contained in all of the reciprocal space may be used to improve the Cochran

estimates. In accordance with Section 4.2, the second representation of is a

collection of special quintets,

{ }2 = {

+ φk − φk } ,

(5.11)

where k is a free vector in reciprocal space. The basis magnitudes of any

are

2

Rh 1 , Rh 2 , Rh 3 , Rk

and the cross-magnitudes are

Rh1 ±k , Rh2 ±k , Rh3 ±k .

The collection of the basis and cross-magnitudes of the various quintets

{B}2 , and is called the second phasing shell of :

2

is

{B}2 = Rh1 , Rh2 , Rh3 , Rk , Rh1 ±k , Rh2 ±k , Rh3 ±k .

¯ a study of the ten-variate probability

These results suggest, for P1 and P1,

distribution

P(Eh1 , Eh2 , Eh3 , Ek , Eh1 +k , Eh2 +k , Eh3 +k , Eh1 −k , Eh2 −k , Eh3 −k ),

(5.12)

from which the conclusive conditional distribution,

P( |10 moduli),

(5.13)

is obtained. Equations (5.12) and (5.13) may be calculated by means of the

techniques described in Chapter 4. Since k is a free vector, a formula can be

found which provides the conditional probability distribution of given the

basis and cross-moduli of any quintet 2 . We will denote such a probability

P10 ( ), in order to emphasize the fact that the formula explores the reciprocal

space by means of a ten-node figure. Three nodes (i.e. h1 , h2 , h3 ) are fixed

while k varies; the remaining seven nodes sweep out reciprocal space.

The estimation of triplet phases via their second representation

111

0.7 P (F)

0.6

0.5

G = –2

G=3

0.4

0.3

0.2

Fig. 5.4

P10 ( ) according to equation (5.14).

We choose G = 3 (continuous line) and

G = −2 (dashed line).

0.1

0

–180

–120

–60

0

60

120

F

180

The final probabilistic formula (Cascarano et al., 1984; Burla et al., 1989a)

is of a von Mises type, and may be written as

P10 ( ) = [2π I0 (G)]−1 exp(G cos ),

(5.14)

where G is a concentration parameter which depends on hundreds or thousands

of magnitudes, and may be positive or negative. If G > 0, the expected value

of is zero, if negative, the expected value of is π ; unlike the Cochran relationship, P10 ( ) is able to identify negative triplet cosines. Two distributions

(5.14), one corresponding to a positive and the other to a negative value of G

are shown in Fig. 5.4: it is evident that, when G < 0, the value of is probably

closer to π than to 0.

For cs. space groups the triplet sign may be estimated by equation (5.15),

P+ =

1 1

G

+ tanh

2 2

2

(5.15)

as a substitute for equation (5.10). Since G may also be negative, positive

and negative triplets may be identified. Correspondingly, Fig. 5.3 may be

generalized into Fig. 5.5, allowing values of P+ smaller than 1/2.

For the interested reader, a formal expression of G, including symmetry

effects, is given in Appendix 5.C, where we also compare the efficiencies of

the Cochran and the P10 formulas. Because of its superiority, the P10 formula

1

0.8

0.6

P+

0.4

0.2

0

–6

–4

–2

0

X

2

4

6

Fig. 5.5

P+ in accordance with equation (5.15).

P+ is larger or smaller than 1/2, according

to whether G is positive or negative.

112

The probabilistic estimation of triplet and quartet invariants

has been fully integrated in the SIR suite of phasing programs starting from

SIR88 (Burla et al., 1989a).

5.5 Introduction to quartets

Four phases are said to form the quartet invariant,

= φh1 + φh2 + φh3 + φh4 ,

if

h1 + h2 + h3 + h4 = 0.

Hauptman and Karle (1953) and Simerska (1956), independently, suggested that

would be approximately zero for large values of Rh1 Rh2 Rh3 Rh4 .

The use of quartets in direct procedures for phase solution was first introduced by Schenk (1973a,b, 1974), who, from semi-empirical observations

on the moduli Rh1 +h2 , Rh1 +h3 , Rh2 +h3 , derived useful conditions for improving

estimation of the relation

≈ 0. Probabilistic theories for quartet estimation from the first phasing shell were, independently, described for P1 by

Hauptman (1975a,b) and by Giacovazzo (1976b,c). Theories for P1¯ were given

by Giacovazzo (1975a, 1976a), Green and Hauptman (1976), and Hauptman

and Green (1976). A general probabilistic theory of quartets valid in all space

groups was given by Giacovazzo (1976d).

Both Hauptman’s and Giacovazzo’s approaches use the first phasing shell,

Rh1 , Rh2 , Rh3 , Rh4 , Rh1 +h2 , Rh1 +h3 , Rh2 +h3 , to estimate quartets; mainly, they

differ because the second author has used the Gram–Charlier expansion of

the characteristic function (see Appendix 4.A). For brevity we will use the

following notation:

Ri = Rhi , φi = φhi for i = 1, . . . , 4,

R5 = Rh1 +h2 , R6 = Rh1 +h3 , R7 = Rh2 +h3 ,

φ5 = φh1 +h2 , φ6 = φh1 +h3 , φ7 = φh2 +h3 .

5.6 The estimation of quartet invariants

¯ via their first representation:

in P1 and P1

Hauptman approach

Hauptman derived in P1 the following conditional distribution:

1

P( |R1 , . . . , R7 )

exp(−4C cos )I0 (R5 Z5 ·)I0 (R6 Z6 )I0 (R7 Z7 ),

L

where I0 (x) is the modified Bessel function of order zero,

(5.16)

C = R1 R2 R3 R4 /N,

(5.17)

2

Z5 = √ (R21 R22 + R23 R24 + 2NC cos )1/2 ,

N

(5.18a)

The estimation of quartet invariants in P1 and P1¯

2

Z6 = √ (R21 R23 + R22 R24 + 2NC cos )1/2 ,

N

113

(5.18b)

2

Z7 = √ (R22 R23 + R21 R24 + 2NC cos )1/2 .

(5.18c)

N

As for the triplet invariants, distribution (5.16) depends on cos ; therefore

only cos may be estimated, it being impossible to distinguish between +

an − (or, in other words, to distinguish between the two enantiomorphs).

Since L, the scaling factor, has a rather complicated expression, one might

use numerical methods for calculating:

1. the scaling factor L, via the condition

π

P( )d

= 1;

0

2. the mode m of P( );

3. the mean value, given by

P(F)

π

=

P( )d ;

0

4. the variance, V, as given by

π

V=

5. σ =

V.

(

R1 = 2.27

R2 = 3.01

R3 = 2.49

R4 = 2.16

R5 = 1.85

R6 = 2.84

R7 = 1.90

)2 P( )d .

0

Estimation of | |, via (5.16), depends on an intricate interrelationship among

all the seven magnitudes. However, some working rules can be stated:

1. P( ) is unimodal between 0 and π , and m can, in principle, lie anywhere

between 0 and π;

2. if the cross-magnitudes are large, is expected to be close to zero;

3. if the cross-magnitudes are small, is expected to be close to π ;

4. if the cross-magnitudes are of medium size and N is sufficiently small, then

is expected to be close to ±π/2;

5. the larger N, the larger the overall variance associated with quartet phase

estimation.

Figures 5.6 and 5.7 show (broken curves) the distribution (5.16) for some

values of the seven magnitudes when N = 47. In Fig. 5.6, where all the

cross-magnitudes are large, m = 0.0,

29◦ , σ = V 1/2 = 21.9◦ . In Fig.

5.7 where all the cross-magnitudes are small, m = 180◦ ,

142◦ , σ =

32.7 .

It is clear from the figures that cosines estimated near π will (on average)

be in poorer agreement with the true values than the cosines estimated near

0, because of the relatively larger value of the variance. Even poorer will

be the estimates of the cosines located in the middle range (usually called

enantiomorph sensitive quarters); no useful application has been found for

them.

The three cross-magnitudes are not always in the set of measured reflections.

Then, some marginal joint probability distributions must be considered in order

p/2

p

Fig. 5.6

Distribution (5.16) (broken curve) and

(5.22) (continuous curve) for the indicated |E| values in a structure with

N = 47 atoms in the unit cell.

P(F)

R1 = 2.31

R2 = 2.82

R3 = 1.88

R4 = 2.10

R5 = 0.36

R6 = 0.24

R7 = 0.10

p/2

p

Fig. 5.7

Distribution (5.16) (broken curve) and

(5.22) (continuous curve) for the indicated |E| values in a structure with

N = 47 atoms in the unit cell.

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2 Estimation of the triplet structure invariant via its first representation: the P1 and the P¯1 case

Tải bản đầy đủ ngay(0 tr)

×