APPENDIX 8. A SOLVENT CONTENT, ENVELOPE DEFINITION, AND SOLVENT MODELLING
Tải bản đầy đủ - 0trang
Envelope definition
where dprot is the protein density (g cm−3 ). If this is assumed equal to 1.35,
then
V p = 1. 23/VM , V
solv
= 1 − V p.
(8.A.2)
8.A.2 Envelope definition
Wang (1985) proposed an automatic cyclic procedure for defining a mask,
which should separate the map into solvent and molecule space, in accordance with the ratio fixed by the Matthews criterion. The Wang procedure may
be described as follows:
(a) The current electron density map (no matter if it has been obtained by
ab initio or non-ab initio techniques) is truncated according to:
ρtrunc (r) = ρ(r) if ρ(r) > ρsolv ,
ρtrunc (r) = 0 if ρ(r) ≤ ρsolv ,
where the threshold, ρsolv , is chosen to meet the expected solvent content.
(b) ρtrunc is smoothed (into ρsm (r)) by associating, at each point r of ρtrunc ,
the weighted average density over the points included in an encompassing
sphere of radius R (between 8 and 4 Å, according to the resolution or, also,
to the quality of the structure):
ρsm (r) =
r
w(r − r )ρtrunc (r ),
(8.A.3)
where
w(r − r ) = 1 − d(r )/R for d < R,
w(r − r ) = 0 for d > R.
d = |r − r | is the distance between points r and r .
(c) A cut-off value, ρcut , is calculated, which divides the unit cell into two
regions, solvent and protein; solvent pixels are marked by the condition
ρsm (r) ≤ ρcut , voids internal to the molecular envelope are polished.
(d) A solvent corrected map is obtained by setting all the values outside the
protein envelope to a low constant value; the electron density values inside
the molecular envelope are set to the current values (say, to the values
defined at point a).
(e) New phases are obtained by Fourier inversion of the solvent corrected map. Often such phases are combined with the experimental phases
(those obtained via SAD-MAD, SIR-MIR, or MR techniques). The corresponding electron density map is the new basis for the application of
point a.
Two years later, Leslie (1987) observed that (8.A.3) is a convolution, and that
flattening may be more easily performed via its Fourier transform:
T[ρsm (r)] = Fsm (h) = T[w]T[ρtrunc ] = g(s) · Ftrunc ,
where Ftrunc is readily calculated by Fourier inversion of the truncated map.
g(s) is the Fourier transform of the weight function, sum of two components,
the first of which is the Fourier transform of a sphere. According to James
(1962),
g(s) = Y(uR) − Z(uR),
191
192
Phase improvement and extension
where s = 2 sin θ/λ, u = 2π s,
Y(x) = 3(sin x − x cos x)/x3 ,
and
Z(x) = 3[2x sin x − (x2 − 2) cos x − 2]/x4 .
Leslie’s procedure improved the efficiency of the flattening technique and
dramatically reduced the computing time.
It may be useful to mention that several attempts have been made to estimate the protein envelope at very low resolution (say, about 8 Å or worse).
The necessary prior information consists of unit cell parameters, space group,
high quality diffraction data, complete up to a fixed resolution, and a rough
estimate of the solvent fraction. Attempts began with Kraut (1958). Somewhat
later, different algorithms were proposed, summarized as follows: the histogram method (Luzzati et al., 1988; Mariani et al., 1988; Lunin et al., 1990),
the condensing protocol (Subbiah, 1991, 1993; David and Subbiah, 1994), the
one sphere method (Harris, 1995), FAM (Lunin et al., 1995; Moras et al., 1983;
Urzhumtsev et al., 1996). The general idea at the basis of all these algorithms
was to define at very low resolution a rough envelope, which may be easier
than at high resolution. Once a model envelope is obtained, phase extension
at higher resolution should be performed, mainly via solvent flattening, histogram matching, etc. to progressively improve identification of the solvent
region, and then allow solution of the protein structure.
The above methods were able to find good (even if rough) envelope models,
but their weak point was the phase extension, quite difficult from very low
resolution. In recent years these methods have been shelved, however it may
be that in the future their appeal will again increase.
8.A.3 Models for the bulk solvent
The narrow boundary region (within a 7 Å boundary layer) between the protein and the solvent exhibits an ordered structure of strongly bonded water
molecules. As a rule of thumb, about one water molecule per residue belongs
to such an ordered substructure (Kleywegt and Jones, 1997a). The solvent is
disordered beyond this shell, and solvent flattening techniques use this characteristic property to improve the protein phases. Since the bulk solvent may
significantly contribute to the structure factors, taking into account its contribution may improve agreement between calculated and observed structure
factors; this may be useful both in the refinement step, and in the phasing step
itself (e.g. in the translation step of molecular replacement).
The effect of the solvent on structure factors may be understood as follows:
cancelling the solvent contribution from the calculated structure factor is equivalent to setting the electron density of the bulk solvent to zero. This implies
an infinitely sharp contrast between protein surface and solvent, with an overestimate of the low resolution structure factor amplitudes. We will quote two
models for the solvent:
Histogram matching
1. Exponential bulk solvent. This is based on the Babinet principle, according to which, if the unit cell is divided in two parts, one relative to the
disordered solvent and the other to the protein molecule, then Fsolv = −FP ,
where the first structure factor refers to the solvent volume and the second to
the protein volume. To understand the above relationship, we notice that, by
a property of the Fourier transform, a unit cell with constant electron density will show vanishing structure factor amplitudes, except for F000 . The
contribution of the solvent bulk is therefore opposite to that of the protein
volume, and will tend to weaken the amplitudes of the latter. An approximation to solvent scattering may be achieved by placing atoms with very high
temperature factor (e.g. 200 Å2 ) in the solvent region. The effects of the
above model may be represented by calculating the total structure factor,
Ft , as (Glikos and Kokkinidis, 2000b)
Ft (h) = FP (h)[1 − ksol exp(−Bsol s2 /4)],
where s = 2 sin θ/λ and ksol is the ratio between the mean electron densities
of the solvent and of the protein. Since:
(i) the electron density of water is about 0.334 e− /Å3 , and that for a salt
solution may be estimated at around 0.40 e− /Å3 ;
(ii) the protein density may be estimated as close to 0.439 e− /Å3 ,
then ksol may be approximated to 0.76. If we choose Bsol ≈ 200 Å2 , the
effects of the solvent will disappear rapidly at higher resolution.
2. Flat bulk solvent. A flat mask is used as the solvent model; it is located
into the solvent region, at a distance of about 1.4 Å from the van der Waals
surface of the protein. The bulk solvent region is then uniformly filled by a
continuous electron density, which contributes to the total structure factor,
in accordance with (Jiang and Brunger, 1994),
Ft (h) = FP (h) + ksol Fsol (h) exp(−Bsol s2 /4)]
(8.A.4)
The residual between the observed and the solvent corrected structure factors,
Ft , provides optimal values for the parameters ksol and Bsol (typical values are,
ksol ≈ 0.4 and Bsol ≈ 45 Å2 ). This kind of bulk solvent correction is implemented in several refinement programs and is also used to improve the efficiency
of the translation step in MR programs.
A P P E N D I X 8 . B H I S T O G R A M M AT C H I N G
This technique is widely used in image processing; it aims to improve the
image quality by fitting the density distribution of an image with the ideal
distribution. From this point of view, the electron density is an image of the
crystal structure, the quality of which should be improved by fitting the density
frequency with standard distributions.
The actual form of a histogram depends on several parameters, among which
are:
1. the fraction of the unit cell volume occupied by the solvent;
2. the resolution at which the diagram is calculated;
193
194
Phase improvement and extension
3. the mean phase error associated with the structure factors;
4. the overall temperature factor.
To circumvent the effects of the temperature parameter, histogram matching
procedures remove the overall temperature factor from all the |F|s. This allows
simplification of the method, since it is not necessary to use different standard
histograms for different temperature factors. Accordingly, the standard histogram, which is relative to the frequency distribution of the density in the
protein region, may be treated as a function of the resolution only. It may be
obtained from the electron density map of a similar known structure or from a
formula. Main (1990a,b) (see also Lunin and Skovoroda, 1991) has developed
a six-parameter formula which produces useful histograms over a range of
resolutions from 4.5 to 0.9 Å. Histograms are calculated by considering the
densities only within the molecular envelope. We note that:
P(ρ)
ρ
Fig. 8.B.1
Electron density histograms obtained
from refined phases ( —— ) and from
approximated phases ( - - - ).
1. The flatness of the histogram increases with the average phase error. In
Fig. 8.B.1 we overlap the histogram corresponding to the refined structure
with one obtained from approximate phase values.
2. Histograms are asymmetric; the asymmetry is a consequence of the positivity of the electron density (negative density values are less frequent
than positive ones) and may be used as a criterion for phase correction
(Podjarny and Yonath, 1977). On the other hand, the negative regions must
be present in the histograms because they are generated by unavoidable
series termination errors. Skewness, say,
γ =<
P(ρ)
ρ
Fig. 8.B.2
Electron density profile variation with
resolution.
ρ− < ρ >
σd
3
>
with
σd = < (ρ− < ρ >)2 >1/2
,
is usually calculated to evaluate the asymmetry; it can be positive or negative, or undefined. Negative skewness values indicate that the tail on the
left-hand side of the probability density function is longer than on the righthand side; a positive skewness indicates that the tail on the right-hand side
is longer than on the left-hand side; a zero value indicates that the values
are relatively evenly distributed on both sides of the mean. In our case,
skewness is expected to be positive.
3. The histogram changes with the data resolution (see Fig. 8.B.2). The histogram for high resolution maps has its maximum close to ρ = 0; for low
resolution maps, the maximum shifts to higher values of ρ, and the peak
is broader. The peak of the histogram lowers to a minimum at about 3 Å
resolution; as the resolution decreases, the peak rises again, moves towards
higher density, and becomes broader. Long tails towards high density are
present in high resolution maps.
4. The histogram matching technique may be applied as follows (Zhang and
Main, 1990a,b):
(i) From a given set of B-parameter corrected structure factors, the Fourier
synthesis and the corresponding histogram are calculated. The latter is
compared with the standard histogram.
(ii) The electron density histogram of the actual map is divided into smaller
areas with boundaries, ρi , i = 1, . . . , n (n ∼ 100) (see Fig. 8.B.3a). The
Histogram matching
195
standard histogram is also divided into smaller areas, with boundaries,
ρ i , i = 1, . . . , n (see Fig. 8.B.3b).
(iii) Scale factors ai and shifts bi are calculated to map ρ into ρ for the ith
interval:
ρ = ai ρ + bi
(8.B.1)
where
ai =
ρ i+1 + ρ i
,
ρi + 1 − ρi
bi =
ρ i ρi + 1 − ρ i + 1 ρi
.
ρi + 1 − ρi
ρi
ρi+1
a)
For example, if only a scale factor k relates the two maps (e.g. ρ = kρ),
then
ai = k and bi = 0 for any i.
If only a shift relates the two maps (e.g. ρ = ρ + b), then
ai = 1 and bi = k for any i.
(iv) The operation (8.B.1) is applied to the actual map for each interval; the
new map will show the same density distribution as that expected.
(v) A new set of structure factors is calculated from the modified electron
density, whose phases are employed for a next cycle.
A more intuitive approach is to let P(ρ) and Ps (ρ) be the current and the
standard reference density histograms, respectively (both sum to unity), and
N(ρ) and Ns (ρ) the corresponding cumulative distributions. The transformation of P(ρ) into Ps (ρ) is made as follows. For any density value P(ρ), the
corresponding point in N(ρ) is calculated; this is mapped in Ns (ρ), and the
desired modified value in the standard distribution is obtained by inverting the
cumulative standard distribution:
ρ = Ns−1 [Ns (ρ)].
Histogram matching is usefully combined with solvent flattening techniques
as follows:
(a) The molecular envelope is obtained.
(b) The solvent region is flattened, while the density within the molecular
envelope is matched with the expected histogram. Obviously, histogram
matching efficiency is high when the solvent region is a small percentage of the unit cell. When the reverse condition occurs, solvent flattening
effects are dominant.
(c) Structure factors are calculated from the above modified map and their
phases are (eventually) combined with experimental phases. If a phase
extension process is started, the extended phases are accepted at the
calculated values.
(d) A new map is calculated using data obtained at step (c), and the procedure
is repeated from step (a) until convergence is obtained.
It is obvious that histogram matching and solvent flattening procedures are
not able to suppress all false peaks from an experimental electron density map
and/or generate all the supplementary peaks to complete the structure. Indeed,
ρ′i+1
ρ ′i
b)
Fig. 8.B.3
(a) Electron density histogram for the
actual electron density model; (b) standard electron density histogram.
196
Phase improvement and extension
false density inside the envelope tends to remain, while molecular density outside the envelope may remain strongly depressed. However, these procedures
are able to produce remarkable improvements in the maps.
APPENDIX 8.C A BRIEF OUTLINE OF THE
A R P /w A R P P R O C E D U R E
The automatic refinement procedure (ARP) is based on a free atom approach.
A set of dummy atoms is created, new atoms are added and old ones are
deleted to create new models which, cyclically re-evaluated, should end in a
final model describing electron density and target protein. The procedure may
be described schematically as follows.
Dummy atoms (of the same atomic species, say O) are located in the high
density regions of the best available electron density map; a fine grid of about
0.25 Å is used. The initial model is gradually expanded; the density threshold is
gradually lowered and additional atoms are located at bonding distances from
existing atoms. The model is completed when the number of dummy atoms is
about 3 x NAT, where NAT is the expected number of atoms. At the end of the
updating process (see below) the number of atoms is reduced to about 1.2 x
NAT.
The model is updated as follows.
(i) Atom rejection. Hybrid electron densities of type 3F o –2Fc are calculated;
an atom is removed on the basis of the density at the atomic centre,
on shape criteria (e.g. sphericity), and distance criteria (e.g. too close to
accepted atoms).
(ii) Atom addition. The F o –Fc synthesis is calculated; the grid point with the
highest density value is selected as a new atomic position, provided that
this satisfies defined distance constraints in relation to other positioned
atoms. Grid points at small distance from this added atom are rejected
and the next higher grid point is selected.
(iii) Model refinement. This may be performed by a cyclic procedure based
on unrestrained least squares or maximum likelihood refinement (in both
the cases the procedure aims at matching calculated to observed structure
factors), and/or by real-space refinement (an atom may be moved from the
peak position on the basis of a density shape analysis around it). Usually,
the reciprocal space refinement is performed by REFMAC (Murshudov
et al., 1996); it needs a number of observations, greater than the number
of model parameters, which then sets the resolution limit of ARP/wARP
to about 2.5 Å.
So far no attempt has been made to establish a chemical sense to the atoms,
in terms of atomic species, bond distances, bond angles, protein secondary
structures, etc.; typically, free atoms lie within 0.5–0.6 Å of the corresponding
positions in the correct structure.
Following this, model reconstruction starts; its task is to discard atoms in
false positions, assign atomic species to the well-located atoms, and to establish their connectivity. Only when atomic species, bonds, and angles for a
A brief outline of the ARP/wARP procedure
group of atoms have been defined (see below), will stereochemical restraints
be applied in restrained refinements; this will improve the ratio of observations
to parameters, and will increase the efficiency of the least squares refinement
(which will then become hybrid, because free and restrained atomic positions
will coexist).
Model reconstruction starts with identification of the main-chain atoms.
Every Cα atom should stay at 3.8 Å from at least one other candidate Cα atom,
which may be connected to the first one by a forward (outgoing) directionality
(– C(= O) –N −Cα ) or by a incoming (backward) directionality [N—C(= O)
−Cα ]. If two atoms i and j are Cα candidates (see Fig. 8.C.1), then a peptide unit plane is placed among the candidates and rotated about the i–j axis.
If, for a given rotation angle, the interpolated electron density at the peptide
atomic positions is larger than a given threshold, atoms i and j are flagged as Cα
atoms.
The electron density maps to which AMB algorithms are applied usually
show non-negligible phase errors; therefore, the condition according to which
two consecutive Cα atoms should lie at 3.8 Å apart must be replaced by a more
permissive condition, say, the distance should lie in a range (e.g. 3.8 ± 1 Å).
The result is that many candidates may be connected by more than one
incoming and one outgoing connection, with the consequent combinatorial
explosion of the possible chains. ARP/wARP solves the problem by dividing each candidate chain into small structural subunits and by evaluating,
by stereochemical arguments, the probability of each subunit being the correct one. Subunits consisting of four consecutive Cα atoms are used, say
Cα (n) − Cα (n + 1) − Cα (n + 2) − Cα (n + 3), and the two-dimensional frequency distributions of the angle Cα (n) − Cα (n + 1) − Cα (n + 2) and of the
dihedral angle Cα (n) − Cα (n + 1) − Cα (n + 2) − Cα (n + 3) are tested against
the distribution derived from database analysis (Oldfield and Hubbard, 1994;
Kleywegt, 1997). This information is of a three-dimensional nature, and may
be used to obtain a score for the subunit (of length four) parameters. The main
chain is built by overlapping the last three atoms of one subunit with the first
three of the following. The chain scores are then obtained by summation of the
subunit scores.
Limited data resolution and quality of the phases, combined with the natural conformational flexibility of the chain, may not allow recovery of a full
continuous chain; several main-chain fragments may be obtained, separated
by gaps, and some chain fragments may be wrongly identified. The lower the
quality of the starting electron density map and data resolution, the larger the
probability of having a large number of gaps.
Once one or more main-chain fragments have been correctly identified, side
chains may be built by taking into account the Cα positions, the density distribution in the map, and connectivity criteria; the aim is to dock the polypeptide
fragments into the sequence (assumed to be known). A score is associated with
each possible docking position, so that the chain would have the most probable
side chain conformation.
197
a)
Cα (n+1)
N
C
O
Cα (n)
b)
Cα (n+1)
O
C
N
Cα (n)
Fig. 8.C.1
Connection between two candidate Cα
atoms. The peptide plane is located
between these atoms and rotates about
the axis Cα (n) −Cα (n + 1). The two candidates are flagged as Cα atoms if, for a
given orientation of the plane, the interpolated density values at the atomic positions are larger than a given threshold.
(a) Forward directionality; (b) backward
directionality.
9
Charge flipping and VLD
(vive la difference)
9.1 Introduction
Direct methods procedures (see Chapter 6) or Patterson techniques (see
Chapter 10), primarily the former, have been methods of choice for crystal
structure solution of small- to medium-sized molecules from diffraction data.
Over the last 30 years, several new phasing algorithms have been proposed,
not requiring the use of triplet and quartet invariants, but based only on the
properties of Fourier transforms. These were not competitive with direct methods and have never became popular, but they contain a nucleus for further
advances. Among these we mention:
(i) Bhat (1990) proposed a Metropolis technique (Metropolis et al., 1953;
Kirkpatrick et al., 1983; Press et al., 1992), also known as simulated
annealing (the reader is referred to Section 12.9 for details on the
algorithm). From a random set of phases, an electron density map is
calculated, modified, and inverted. The corresponding phases are altered
according to the simulated annealing algorithm, and then used to calculate
a new electron density map. The procedure is cyclic.
(ii) A strictly related simulated annealing procedure has been proposed by Su
(1995). The objective function to minimize was
R=
h
(S|Fh |calc − |Fh |obs )2 ,
where S is the scale factor. The scheme is as follows: random atomic positions are generated and in succession shifted; the simulated annealing
algorithm is applied to accept or reject atomic shifts. At the end, a new
atomic structure is generated, whose positions are shifted in succession,
and so on in a cyclic way.
(iii) The forced coalescence method (FCP) was proposed by Drendel et al.
(1995). Hybrid electron density maps (see Section 7.3.4) were actively
used with different values of τ and ω.
Even if never popular, the above algorithms opened the way to two other
methods which are much more efficient, charge flipping and VLD (vive la difference), to which this chapter is dedicated. Both are based on the properties of
The charge flipping algorithm
199
the Fourier transform; they do not require the explicit use of structure invariants
and seminvariants, or a deep knowledge of their properties. The reader should
not, however, conclude that the invariance and seminvariance concepts are not
necessary in the handling of these approaches, on the contrary, understanding
these basic concepts is essential to the appreciation of these new methods.
To be more clear, when an electron density is modified, a model map is
simultaneously identified; and when the model map is Fourier inverted, model
structure factors with modulus |Fp | and phase φ p are obtained. The reliability of the new phases is usually calculated via the distribution P(R, Rp , φ, φp ),
described in Section 7.2, which involves estimation of the two-phase structure
invariants, (φh − φph ).
9.2 The charge flipping algorithm
Charge flipping was developed by Oszlányi and Suto (2004, 2005, 2008) and
has been successfully applied to small molecules (Wu et al., 2004; Palatinus
and Chapuis, 2007), modulated structures (Palatinus et al., 2006), powder data
(Baerlocher et al., 2007a,b), high resolution protein data (Dumas and van der
Lee, 2008). We describe the algorithm step by step (see Fig. 9.1):
1. The list of unique reflections, as fixed by the space group symmetry, is
expanded in P1 to produce a complete list of reflections; Friedel pairs, if
present, are merged.
2. Random starting phases are assigned to the expanded list of reflections.
3. An electron density map is calculated over a grid with spacing adjusted
to RES/2. It may be seen from Fig. 9.2a that, at least at high resolution,
large density values are restricted to a small percentage of pixels, which
therefore carry almost all of the structural information. Figure 9.2a is a
different way of representing the density distributions shown previously in
Figs. 8.B.1 and 8.B.2.
4. The electron density is modified so that all the pixels with density smaller
than a given positive threshold δ (see Fig. 9.2b) are submitted to flipping
(i.e. their density is multiplied by −1). In Fig. 9.1, the modified map is
called ρ mod .
5. The inverse Fourier transform of ρ mod is treated as follows: for large amplitudes (about 80% of the total number), calculated phases are associated
with observed amplitudes, for weak reflections, the calculated modulus
{|F |, φ}
FT
ρ (r)
{|F |, φmod}
MOD
FT –1
ρ mod(r)
{|Fmod |, φmod}
Fig. 9.1
Charge flipping algorithm. {|F|} is the
set of observed reflections, {φ} is the set
of random phases. FT and FT−1 indicate
the direct and the inverse Fourier transforms, respectively, MOD is the function
used to modify the electron density, ρ(r).
{|F| mod } and {φ mod } are the structure
factor amplitudes and phases obtained by
Fourier inversion of ρ mod .
Charge flipping and VLD (vive la difference )
200
density
(a)
0.75
is retained and the phase shifted by π/2. A new electron density map is
calculated and the cycle starts again.
Let us first explain the source of the algorithm name. The total charge in a map
is assumed to be
0.50
0.25
ctot =
–0.25
i
ρi ,
where i varies over all grid points and the flipped charge is defined as
cflip =
density
(b)
0.75
0.50
0.25
d
–0.25
Fig. 9.2
Typical high-resolution electron density
distribution, sorted in descending order.
The number of pixels is in the abscissa.
ρi <δ
|ρi |,
where the summation goes over all points satisfying the condition ρi < δ.
The various applications showed that, for proper values of δ, the ratio cflip /ctot
should lie at around 0.9. As a rule of thumb, it should roughly correspond to
inverting the low density pixels shown in Fig. 9.2b. Flipping the density in
this region modifies the electron density distribution and allows us to explore
the phase space efficiently. Giacovazzo and Mazzone (2011) observed that the
flipped region corresponds to that with the largest values of electron density variance. As in other EDM techniques, the algorithm modifies a model
without destroying it; in this way the region to reverse is not part of the solution, otherwise the convergence would never be reached. When a good model
is obtained, reversing the sign of the density for high variance pixels provides
negligible perturbation of the model, which cannot be destroyed.
δ is the most critical parameter; it usually changes during the phasing
process and sometimes has to be tuned to lead the algorithm to succeed.
During each charge flipping cycle, the crystallographic residual and the
skewness coefficient of the density map (see Appendix 8.B) are calculated.
The convergence is assumed to be reached when a sharp increase in relative
skewness occurs, accompanied by a drop in the crystallographic residual. The
phasing process may usually be subdivided into an initial transient step, a long
stagnation period where the phase space is extensively explored, a stable state
where a sharp increase in the relative skewness occurs, accompanied by a drop
in the crystallographic residual. Such a sharp improvement in the figure of
merit denotes that convergence has been reached.
Charge flipping solves the structures in P1. This trick was first applied by
Sheldrick and Gould (1995), when solution in the correct space group was not
being successful; it was later adopted by other authors. The main advantage
of solving a structure in P1 is that the restraints imposed by symmetry on the
phase values are relaxed and the phases may sometimes converge smoothly
to the correct values. However, the use of P1 remained infrequent for direct
methods; indeed, symmetry is important prior information which should not
generally be suppressed. Charge flipping, however, renounces this information;
it is not clear why, but its efficiency decreases dramatically when phasing is
attempted using the correct space group symmetry.
In accordance with the above observations, the charge flipping crystal structure solution step is followed by a second step, restating the correct space group
symmetry; i.e. it locates the space group symmetry elements in the P1 density
map. A technique is therefore necessary to automatically find the shift between
the origin of the P1 map and the conventional origin of the space group. This
The VLD phasing method
process has to be accompanied by density averaging over the symmetry equivalent (in the correct space group) grid points. The algorithm used for returning
to the correct space group is similar to the RELAX procedure developed by
Burla et al. (2000), and later improved by Caliandro et al. (2007a). Since
RELAX also plays an important role in the VLD approach, we describe it in
Appendix 9.B.
9.3 The VLD phasing method
In Section 7.2, we assumed that a model structure is available; to deal in an
optimal way with the phasing problem, we calculated the joint probability
distribution P R, Rp , φ, φp (see equation (7.3)). In Section 7.3, we showed
its extraordinary usefulness for optimization of some widely used crystallographic tools and phasing procedures; we refer in particular to the observed
Fourier synthesis via the use of the weight m, and to the difference Fourier
synthesis via the use of the Read coefficients, mE − σA Ep .
Let us now consider ρ, ρp , ρq ; these are the target, the model, and their
ideal difference structure; ρq = ρ − ρp has the property that, summed to ρp ,
it provides ρ, no matter what is the quality of ρp . Let R, Rp , Rq , φ, φp , φq be
the corresponding normalized diffraction amplitudes and phases. Would the
distribution,
P R, Rp , Rq , φ, φp , φq ,
(9.1)
be more useful than (7.3)? The hope is that including into the probabilistic
approach the additional variate Eq could lead to more accurate conditional
distributions, estimating phases given three rather than two magnitudes.
Distribution (9.1) (studied by Burla et al., 2010a) is the theoretical basis
of the VLD (vive la difference) algorithm; for the interested reader, some
details are quoted in Appendix 9.A.1, together with the conditional distributions which support VLD. The VLD algorithm (Burla et al., 2010b, 2011a,b) as
an ab initio phasing technique is described in Sections 9.3.1 to 9.3.2; its applications to ab initio phasing are summarized in Section 9.3.3. We delay until
Section 10.4 some VLD applications in combination with Patterson deconvolution techniques: VLD combination with molecular replacement is described
in Section 13.10.
9.3.1 The algorithm
Distribution (9.1) is practicable only if:
(i) measurement errors are included in the mathematical model;
(ii) the parameter σA (calculated between the model and the target structure)
is not unity.
Indeed, according to the definition of ρq , if condition (i) is violated, then
Fq = F − Fp is determined perfectly by the other two variates, and cannot
be introduced as a third variable in equation (9.1). If condition (ii) is violated
(i.e., σA = 1), then ρp ≡ ρ and ρq ≡ 0; then it is not necessary to calculate a
six-variate distribution (indeed, Fq will be identically equal to zero).
201