вход по аккаунту



код для вставкиСкачать
PROTEINS: Structure, Function, and Genetics 34:453–463 (1999)
Molecular Dynamics and Accuracy of NMR Structures:
Effects of Error Bounds and Data Removal
François-Regis Chalaoux, Seán I. O’Donoghue, and Michael Nilges*
Structural Biology Programme, European Molecular Biology Laboratory, Heidelberg, Federal Republic of Germany
The effect of internal dynamics on
the accuracy of nuclear magnetic resonance (NMR)
structures was studied in detail using model distance restraint sets (DRS) generated from a 6.6
nanosecond molecular dynamics trajectory of bovine pancreatic trypsin inhibitor. The model data
included the effects of internal dynamics in a very
realistic way. Structure calculations using different
error estimates were performed with iterative removal of systematically violated restraints. The accuracy of each calculated structure was measured
as the atomic root mean square (RMS) difference to
the optimized average structure derived from the
trajectory by structure factors refinement. Many of
the distance restraints were derived from NOEs
that were significantly affected by internal dynamics. Depending on the error bounds used, these
distance restraints seriously distorted the structure, leading to deviations from the coordinate average of the dynamics trajectory even in rigid regions.
Increasing error bounds uniformly for all distance
restraints relieved the strain on the structures.
However, the accuracy did not improve. Significant
improvement of accuracy was obtained by identifying
inconsistent restraints with violation analysis, and
excluding them from the calculation. The highest accuracy was obtained by setting bounds rather tightly, and
removing about a third of the restraints. The limiting
accuracy for all backbone atoms was between 0.6 and
0.7 Å. Also, the precision of the structures increased
with removal of inconsistent restraints, indicating that
a high precision is not simply the consequence of tight
error bounds but of the consistency of the DRS. The
precision consistently overestimated the accuracy. Proteins 1999;34:453–463. r 1999 Wiley-Liss, Inc.
Key words: NOE; precision; protein structure; simulated annealing
complicated because different types of dynamics may
either increase the NOE (fluctuations in the inter-proton
distance) or decrease it (fluctuations in the angle between
the inter-proton vector and the internal coordinate system), leading to either under- or over-estimation of the
average distance between the protons. These effects may
partially cancel each other.3
In the standard structure calculation protocol for macromolecular NMR structures, one estimates the errors qualitatively, and fits a rigid structure to appropriately loosely
set error bounds. Different suggestions have been made for
the choice of these error bounds; e.g., restraints are
classified into three classes (weak, medium, strong) with
upper limits of 2.7, 3.6, and 5.0 Å,4,5 all upper limits are set
to 6 Å,6 or the lower and upper limits are set as functions of
the volume or the derived distance.7–10 Currently, there is
no consensus as to the best criterion for setting these
bounds. How the selection of error bounds affect the
quality and accuracy of the structures has been the focus of
several studies.5,6,11,12 While the practice of setting loose
error bounds has been generally successful (the majority of
NMR structures have been determined this way), it is
clear that information content of the data is lost if the error
bounds are set too loosely. This is unsatisfactory in particular for structure validation. On the other hand, error
bounds that are too narrow will lead to distortions in the
How structure accuracy is affected by incorrect distance
estimates and the choice of error bounds needs to be
studied in a carefully designed model system that allows a
meaningful assessment of the accuracy. It is difficult to
define accuracy for a structure undergoing significant
internal dynamics, since we do not know which rigid
reference structure to refer to. In one study,5 model data
was generated from an ensemble of experimental NMR
structures of protein G,13 and the reference structure was
Abbreviations: BPTI, bovine pancreatic trypsin inhibitor; DRS,
distance restraint set; MD, molecular dynamics; NMR, nuclear magnetic resonance; NOE, nuclear Overhouser effect; PDB, protein data
bank; RMS, root mean square.
Internal dynamics has long been acknowledged as a
fundamental problem in the derivation of the threedimensional structures of macromolecules by nuclear magnetic resonance (NMR).1,2 All parameters measured by
NMR are time—and ensemble—averages. In contrast to
X-ray crystallography, where the structure factors are
linear superpositions over different conformations, these
averages may be strongly non–linear, and depend on the
time-scale of the motion. For the NOE, the analysis is
Grant Sponsor: Deutsche Forschungsgemeinschaft; Grant number:
Ni499/1-1; Grant sponsor: The Supercomputing Resource for Molecular Biology at the European Molecular Biology Laboratory, funded by a
European Union Human Capital and Mobility Access to Large Scale
Facilities; Grant number: ERBCHGECT940062.
François-Regis Chalaoux’s present address is Synthelabo Biomoleculaire, Strasbourg 67080 CEDEX, France.
*Correspondence to: Michael Nilges, Structural Biology Programme, European Molecular Biology Laboratory, Meyerhofstr. 1,
D-69117 Heidelberg, Federal Republic of Germany.
Received 10 August 1998; Accepted 11 November 1998
chosen as the minimized coordinate average over the
ensemble. The study used a realistic amount of experimental data, since experimentally observed NOEs were used
as a basis for generating the model data, and it included
dynamic effects to some degree, since model distances were
calculated from the ensemble as 7r⫺68⫺1/6 averages, where r
is the distance between two protons. The conclusion of the
study was that the limiting accuracy for an NMR structure
is around 0.4 Å, which is also considered the limiting
accuracy for X-ray crystal structures.14 A similar number
was derived from the comparison of several NMR and
crystal structures.15 However, Zhao and Jardetzky6 have
criticized the model study, pointing out that it contains
circular reasoning; in particular, in the model study, the
same energy parameters were used as for the calculation
of the original ensemble of experimental structures. They
argue that the reference structure has to be determined
with an independent technique. Therefore, they used the
protonated X-ray crystal structure of bovine pancreatic
trypsin inhibitor (BPTI) as reference structure and for the
derivation of a model data-set. Random noise was added to
the inter-proton distances in the X-ray structure to mimic
the effects of experimental errors and internal dynamics.
The accuracy in their study was around 1 Å. The relevance
of their study is limited, since the ‘‘noise’’ originating from
internal dynamics is not random but contains correlations.
It is also difficult to evaluate the results, since the noise
was added in a more or less arbitrary manner.
The best technique to generate model NOE data is to use
molecular dynamics (MD) calculations, since spectral densities and thus crossrelaxation rates and NOEs can be
calculated with few approximations, and no a priori assumptions need to be made about the nature of the
averaging.3,16,17–20 MD-generated model data was used to
test time-averaged restraint techniques.21–23 However,
these studies again suffered from circular reasoning, since
distances were extracted from the trajectories as simple
distance averages (i.e., angular averaging was neglected
and thus perfect knowledge of the type of averaging was
assumed), and since the force fields used for generating the
data generation trajectory and the refinement trajectory
were identical.
The principal problem with internal dynamics is that it
can lead to mutually inconsistent NOEs. Recently, methods have been introduced to automatically identify and
remove incorrect distance restraints in model building,24,25
by an iterative statistical analysis of the violations. Similar ideas were introduced to identify relatively large errors
in distance restraints due to incorrect assignments26 and
noise peaks.10,26,27 The effect of removing or re-setting
slightly inconsistent NOEs, such as those originating from
internal dynamics, has not been tested yet. In a similar
spirit, error bounds are often increased manually only for
certain restraints to avoid violations (e.g., ref. 15).
We have used an MD trajectory of BPTI to generate a
model system that avoids circular reasoning and introduces errors in inter-proton distances due to internal
dynamics in a realistic and meaningful way. Starting from
the X-ray crystal structure, the MD trajectory was calcu-
lated for several nanoseconds in order to allow significant
internal dynamics to occur. NOEs were extracted from the
trajectory by calculating spectral densities from vector
autocorrelation functions.20 Roughly half of the correlation
functions had not converged in 6.6 ns simulation time. For
the present study, in order to obtain a complete set of
spectral densities, we assumed slow dynamics for the
non-converged correlation functions, and estimated spectral densities from average order parameters and 7r⫺68
distance averages over the trajectory.
NOEs to protons not assigned experimentally28 were
removed. In this way, we arrived at a realistic number of
distance restraints, when compared with an experimental
study of the same protein.29 ‘‘Noise’’ in the distance restraints in our study arises from under—or over—
estimation of the distances calculated from the spectral
densities, when compared to the arithmetic mean of the
distances in the trajectory, or the distances in the reference
structure. Hence, compared with the previous studies by
Clore et al.5 and Zhao and Jardetzky,6 our system has a
more realistic model of the noise. A single reference
structure was calculated from the MD trajectory using the
probability map method.30 This method produces an average structure with good covalent geometry and packing.
Since the method uses structure factor refinement, the
structure is equivalent to an X-ray structure refined
against the average structure factor of the trajectory.
Using this model system, we addressed several points.
Firstly, we systematically investigated the effect of different error bounds on structure quality and accuracy. In our
model system, low-energy structures could only be obtained by using very wide bounds. However, these structures had poor accuracy. Secondly, we studied the effect of
removing systematically violated restraints from the data.
We found that those distances most affected by internal
dynamics were identified preferentially in the violation
analysis, and that the removal of these restraints improved the accuracy. The most accurate structures were
obtained with rather tight error bounds and roughly one
third of the data removed. In all our calculations, precision
overestimated accuracy. Finally, we tried to realistically
estimate the accuracy that can be achieved for a protein
with significant internal dynamics.
Molecular Dynamics Trajectory
A molecular dynamics simulation of 6.6 ns was performed using the program X-PLOR31 employing the
CHARMM extended atom energy function PARAM1932
(see ref. 20 for details). The initial set of atomic coordinates
was obtained from the crystal form II structure of BPTI.33
Polar hydrogens were added34 and the resulting structure
minimized with 100 steps of conjugate gradient minimization, followed by 50 ps of Langevin dynamics with an
integration step of 2 fs to release initial stress and to
remove bad contacts. Only polar hydrogens were treated
explicitly resulting in a total system size of 568 atoms. An
implicit solvent model was employed, using a distance
dependent dielectric constant ⑀ ⫽ R, scaling of charges on
Lys, Arg, Glu, and Asp residues by a factor of 0.3, and
solving the Langevin equation for the solvent accessible
side-chains with a friction coefficient of 20 ps⫺1 and including
random forces. The cutoff for the non-bonded list generation
was set to 9.5 Å, and a switching function32 was applied to
non-bonded interactions between 5.0 and 9.0 Å.35 Initial
velocities were assigned from a Maxwell distribution at
300 K. The Newton/Langevin equations of motions were
integrated with a time step of 2 fs for an overall time of 6.6
ns. Bond lengths were kept rigid during the simulation by
use of the SHAKE-method.36 Complete coordinate sets
were written every 0.1 ps. For each coordinate set, nonpolar hydrogens were then added with X-PLOR.34
Generation of the Model Data-Set
Details of the NMR analysis of the trajectory are presented elsewhere.20 We first calculated the rotational
correlation function averaged over three spatial dimensions.
We then selected all proton-proton pairs for which the
7r⫺68⫺1/6 distance averaged over the full length of the trajectory
was less than 4.5 Å. Vector autocorrelation functions were
then calculated for each selected proton pair. Since aliphatic protons were added only after the calculation of the
trajectory, no information about the rotation of methyl
groups could be extracted. We therefore used the average
position of the methyl protons as a reference position for
each methyl group and treated the methyl group as a
single hydrogen for the purpose of NOE calculations.
For each correlation function, a convergence length was
estimated by comparing two correlation functions calculated from trajectories of 10% different lengths. The convergence length was defined as the time at which the two
correlation functions differed by more than 2.5%. Correlation functions with a convergence length of less than 10 ps
were excluded from the analysis. The rotational correlation function was factored out, and the remaining internal
correlation function was analyzed for plateaus within the
convergence lengths to define the order parameter and
effective correlation time.17,37 The order parameters were
used to extend the correlation functions to infinity, and
spectral densities were then calculated by numerical integration. Cross relaxation rates could then be extracted as
described.38 Effective distances reff were calculated from
these rates assuming a simple r⫺6 dependence.
For all proton pairs excluded from the analysis by the
convergence criterion, cross-relaxation rates were estimated by using the 7r⫺68 averaged over the trajectory,
multiplied with the overall average over the order parameters. This seemed justified since dynamic processes not
converged in the 6.6 ns trajectory of a small protein could
reasonably be assumed to be on the same timescale as the
overall tumbling. By this procedure, the number of distances reff in the data set could be nearly doubled.
All NOEs to protons where no chemical shift assignment
was available28 were removed.
Reference Structure
The average structure of the trajectory was generated by
superposition of all frames onto the automatically deter-
mined rigid part of the protein,39,40 using X-PLOR. This
structure was then optimized by generating a probability
density from the fitted frames, and refining into this
density using X-ray refinement.30
Error Bounds
Error bounds were derived by qualitative classification,
, or from the known difference
proportional to reff or reff
between reff and the reference structure (see Results). To
obtain the DRS RSX (see Table I), the distances reff were
binned into 0.5 Å bins. In each bin, we calculated the mean
and standard deviation of rreference. The points corresponding to the mean ⫾ X times the standard deviation were
fitted with straight lines. The error bounds were then
defined by these lines.
Iterative Structure Calculation
Structures were calculated with X-PLOR with a standard simulated annealing protocol starting from random
torsion angles.41 In some of the calculations we omitted the
high temperature stage, since the two cooling stages alone
converged well enough. The NMR refinement parameter
files (TOPALLHDG and PARALLHDG, version 4) were
used. This version is consistent with the ideal values in the
X-ray structure refinement parameter file PARHCSDX,42
and uses atom radii rather similar to those used in the
distance geometry program DISGEO.11 We note that the
parameters, in particular the non-bonded parameters, are
quite different from the PARMH19 parameters used to
generate the MD trajectory.
The ARIA modules, interfaced to X-PLOR (versions 3.1
and 3.851) were used to analyze structures and restraint
violations, essentially as described.10,41 ARIA performs two
essential tasks: it assigns ambiguous NOEs by iteratively
removing assignment possibilities, and it identifies consistently violated restraints by a statistical analysis of restraint violations, similar to self-correcting distance geometry.24 Only the second task was used for the present
In iteration zero, all restraints were used. In each
following iteration, the eight structures with the lowest
energy from the previous iteration were selected. We
performed violation analysis as described24 and calculated
the fraction Rvio of structures in which a particular restraint is violated by more than a threshold vtol:
Rvio ⫽
兺 ⌰(D ⫺ U ⫺ v
⫹ ⌰(L ⫹ vtol ⫺ D)
where ⌰(x) is the Heaviside step function and Sconv is the
number of lowest energy structures (i.e., eight). The parameter vtol was set to 0.1 Å in iterations one and two, and to
0.0 Å in iterations three and four. If Rvio exceeded a
threshold for a particular restraint, this restraint was
removed from the list. In all calculations, the threshold for
Rvio was set to 0.75. Since there were no true noise peaks in
the data and convergence in each iteration to the correct
fold was essentially 100%, we employed a higher value for
this threshold than usual (0.5; see ref. 41). The complete
list of restraints was analyzed in this way in each iteration, so that restraints removed in one iteration could
reenter the calculation in a following iteration.
All restraints, parameter files and protocols used in this
study are available on request.
MD Trajectory and Reference Structure
The trajectory showed significant dynamics especially in
the loops. The locations of the most mobile regions correlate well with the experimental NMR structure ensemble,29 normal mode calculations,43 and differences between the X-ray crystal structures in different crystal
forms.44 However, the fluctuations are significantly larger
than those found in the experimental NMR ensemble (ref.
29; see ref. 20 for a more detailed discussion).
The rigid part of the molecule, determined with an
automated fitting procedure,39,40 comprised residues 1–8,
11–12, 16–38, and 41–58. The backbone RMS fluctuation
around the average structure for this region was 0.68 Å,
for all backbone atoms 0.78 Å. The reference structure was
determined by refining the average structure against the
averaged structure factors calculated from the trajectory.30
Validation of the reference structure with PROCHECK
showed 37 of the 46 non-proline and non-glycine residues
in the most favored regions and 9 residues in the additional allowed regions.
Fig. 1. Scatter plot of the distances reff determined from spectral
densities against the trajectory averages 7rtraj8 (cf. Figure 11 in ref. 20). The
error bounds for the DRSs RWMS and RL25 are indicated.
The Model Data
The basis of the model data-set were the cross-relaxation rates, ␴ij, calculated from the trajectory via protonproton vector autocorrelation functions and model-free
analysis.37 Whenever the correlation functions had converged (see Methods and ref. 20), effective distances reff
were determined from the cross-relaxation rates using a
standard r⫺6 dependency (reff,2 in ref. 20). For all nonconverged correlation functions, slow dynamics was assumed, and reff was estimated as (S2 7r⫺68)(⫺1/6), where S2 is
the average over all order parameters of the converged
correlation functions, and r is the distance in the MD
trajectory. After removal of all distances involving protons
for which no chemical shift was reported,28 and with the
maximum observable distance set to 4.2 Å, a total of 1,543
distances were obtained. Of these, 828 distances were
derived from converged correlation functions, the remaining from (S2 7r⫺68)(⫺1/6) averages. In total, 718 were intraresidue, 233 sequential, 188 medium range, and 404 long
range. We have deliberately not removed any intra-residue
restraints since they are an integral part of the data. The
number of restraints compares to 642 upper limit restraints obtained experimentally.29 If one takes into account that the model data set contains restraints for all
inter-proton distances, including trivial ones that are fixed
by the covalent geometry, the number of restraints in the
model data set is realistic. None of the calculations in this
paper included distance restraints for hydrogen bonds or
torsion angle restraints. The three disulphide bonds were
Fig. 2. Ratios 7rtraj8/reff, depending on residue number (cf. Figure 13 in
ref. 20). The bigger black dots indicate the average for each residue; the
black lines in the bottom of the figure indicate the secondary structure
introduced as distance restraints (2.02 Å) between the
sulphur atoms.
In an optimal experiment, one would be able to directly
measure 7rtraj8, the arithmetic average of the distance in the
structure over time. From the cross relaxation rates,
without applying any corrections, we obtain an effective
distance reff. Figures 1 and 2 compare the distances reff
determined from spectral densities against the distances
7rtraj8. For most residues the average overall ratios 7rtraj8/reff
is close to one. However, for most residues there are values
significantly larger than one, which indicates serious
underestimation of the distance due to internal dynamics.
TABLE I. Distance Restraint Sets (DRS)
Used in This Study†
Type of bounds
upper ⫽ 6Å
weak medium strong
⌬⫹/⫺ ⫽ 0.25reff
⌬⫹/⫺ ⫽ 0.125reff
⌬⫹/⫺ ⫽ 0.125reff
⌬⫹/⫺ ⫽ 0.0625reff
⌬⫹/⫺ ⫽ 0.03125reff
⌬⫹/⫺ ⫽ 0.0625reff
⌬⫹/⫺ ⫽ 0.03125reff
⌬⫹/⫺ ⫽ ␴
⌬⫹/⫺ ⫽ ␴/2
⌬⫹/⫺ ⫽ ␴/4
indicates the number of restraints for which the distance in the
reference structure lay outside lower and upper bound, Ncalc
number of restraints that were excluded during the calculation, and
the number of restraints that were reset during the calculation.
The total number of restraints was 1,543 in all calculations.
Because of the large number of reff distances calculated as
(S2 7r⫺68)(⫺1/6), the deviations are more severe than in the
previous analysis.20 Expectedly, the most severe deviations are found for the most mobile residues (around
residues Tyr10 and Arg40).
Several distance restraint sets (DRS) were derived from
this data-set and the reference structure. In one set (RAL6 ),
all superior and lower bounds were set to 6 Å and 0 Å,
respectively, as suggested by Zhao and Jardetzky.6 For the
second set (RWMS ) we used the classification of 2.7 Å, 3.6 Å,
and 5.0 Å for strong, medium, and weak NOEs (e.g., ref.
15), where a strong NOE was defined by reff ⬍ 2.5 Å, a
medium as reff ⬍ 3.3 Å, and a weak NOE as reff ⬍ 4.2 Å; all
lower bounds were set to 1.8 Å. Several DRSs were
generated by setting the estimated error to a polynomial
function of reff: for RLX, the error is ⌬⫹/⫺ ⫽ Lreff, for RQX,
⌬⫹/⫺ ⫽ Qreff
. The DRSs RMX are identical to RQX, but in the
calculations violated distance bounds are not removed but
loosened specifically.
In our model system, the error of each determined
distance reff is known from a comparison with the distance
rreference in the reference structure. We generated three
DRSs with error bounds directly derived from this difference: for the DRSs (RSX ), the error bounds were set such
that approximately X% of the distances in the reference
structure, rreference, lay between lower and upper bound,
where X was set to 68, 38, and 20. This corresponds to 1,
0.5, and 0.25 ␴ from the mean. This results in somewhat
tighter lower bounds since under-estimation of distances is
more important than over-estimation (see Fig. 1 and 2).
The number of restraints for which the distances in the
reference structure were between lower and upper bound
are listed in Table I.
Structure Quality, Precision, and Accuracy
We used a standard simulated annealing protocol employing Cartesian MD, starting from random torsion angle
structures (for exact parameters, see ref. 10). In each
Fig. 3. Energy-sorted RMSave plots for all backbone atoms. In contrast
to the original suggestion,45 the plot shows the maximum RMS difference
from the average structure (see ref. 46), not the pairwise RMS difference.
Diamond: iteration 0; asterisks: iteration 1; square: iteration 2; triangle:
iteration 3; dot: iteration 4. (a) DRS RAL6. (b) DRS RS38. (c) DRS RS20.
iteration, 20–25 structures were calculated. The convergence of the protocol was very good (Fig. 3). For each DRS,
the initial calculation (‘‘zeroth’’ iteration) was performed
with all restraints. At the beginning of iterations one to
four, each restraint was checked for systematic violations
in the structures of the previous iteration, and a restraint
was removed if it was violated in more than six of the eight
structures with lowest total energy, essentially as suggested26 and described before10 (see also Methods). In total,
four refinement iterations were performed. The number of
active restraints decreased with iteration for all DRSs, and
the tighter the bounds, the more data were excluded
(Table I).
Some of the calculated ensembles are shown together
with snapshots of the trajectory in Figure 4. The buried
side-chain of Phe22 has two distinct conformations.
For all DRSs apart from RAL6, RMS deviations from ideal
covalent geometry and experimental restraints are high
in the first iteration (see Fig. 5), indicating that the
TABLE II. Accuracy, Precision, and Structure Quality in
Iterations 1 and 4†
Iteration 0
Iteration 4
Fig. 4. C␣ traces of the MD trajectory and some of the calculated
ensembles. The reference structure is shown in fat lines. The side-chain
of the mobile buried residue Phe22 is shown. (a) Frames from the
trajectory every 500 ps (b) DRS RAL6, iteration 0. (c) DRS RS38, iteration 0.
(d) DRS RS38, iteration 4.
restraints cannot be satisfied simultaneously in a single
rigid structure. These RMS deviations decrease rapidly
with iteration as the systematically violated restraints are
removed, and they plateau around the third iteration. In
contrast, RAL6 has only slightly elevated RMS values for
covalent energy terms and distance restraints in the
zeroth iteration, showing that there is little strain in the
structures. Accordingly, fewer restraints are excluded
(Table I).
Other quality indices such as the WhatIf quality index47
and average PROSA energy48 vary less with iteration (cf.
Table II). Since the fold of the protein varies only little
between iterations and different bound sets, this is not
surprising. The PROSA energy improves, while the WhatIf
quality index deteriorates slightly. With distance data
derived from experiments, we observed a good correlation
between these quality indices and refinement iteration.10
The present data are derived from a simulation.
Precision has been defined as the RMS difference from
the average structure6 (RMSave ). For accuracy, two definitions are possible: the average of the RMS differences of
each single structure from the reference structure (RMSref ),
(RMSave,ref , RMSref ), precision (RMSave ), and structure
quality for different DRSs in iterations 0 and 4. For each DRS, the
structure quality was assessed by the WhatIf quality index,47 the
average PROSA energy per residue,48 and the number of residues in
core regions of the Ramachandran plot (␾ ⫺ ␺).49
and the RMS difference of the average structure from the
reference structure (RMSave,ref ). Both measures are reported in Table II, only the first measure in Figure 6.
RMSave,ref is usually somewhat smaller than RMSref, i.e.,
the average structure is more accurate than the individual
structures on average, by maximally 0.25 Å.
Precision and accuracy increase with exclusion of violated restraints, evidenced by a decrease of RMSref and
RMSave with iteration number (see Figure 6, and Table II).
To a small extent this is observed also for DRS RAL6,
although there is little strain in the structures even in
iteration zero.
In most calculations, RMSref and RMSave show little
change after iteration three (Fig. 6), similar to the conformational energy terms. The lowest value of RMSref is
reached for DRS RS38 in iteration 3 (0.63 Å).
Table 2 compares the accuracy and precision of all zeroth
and fourth iterations. Only iteration zero (i.e., no data
removal or bounds modifications) can be directly compared
to previous studies.5,6,12 For this iteration, there is no
obvious correlation between tightness of bounds, accuracy,
and precision. For example, DRSs RS20 has much tighter
bounds than DRS RS68, but their precision is very similar.
The accuracy, however, is higher for DRS RS68. Clearly,
RAL6 yields the worst results in terms of accuracy.
RMS ref (Å)
M03 L12
RMS ave (Å)
Fig. 6. Precision against accuracy for all DRSs and for all iterations.
The iterations for each DRS are connected by lines. Iteration zero is
marked by 0; iteration 4 is marked by the name of the DRS. Each group of
calculations is marked by a different colour: magenta, qualitative error
bounds (AL6, WMS); green, linear error estimate (L12, L25); blue,
quadratic error estimate (Q03,Q06,Q12); red, linear bounds from known
error (S20,S38,S68); and black, quadratic error estimate with automatic
bound resetting (M03,M06).
structures has been seen in other model calculations5,50
and in a comparison of NMR structures of different
generations.15 While it is obvious from Figure 6 that in our
model study the more precise structures are also more
accurate, there is no clear relationship.
Exclusion of Distances
Fig. 5. Structure quality for DRSs RAL6 (diamond), RWMS (asterisk),
RS38 (square), RS20 (triangle). (a) Non-bonded repel energy. (b) RMS
deviations from ideal angles. (c) RMS deviations from included distance
restraints. (d) Number of included restraints.
After four iterations of data removal, there is a clearer
trend to more accurate structures with tightness of bounds.
Only for the tightest bounds tested (DRS RS20 ), there is a
increase both in RMSave and RMSref in iterations three and
four. For this DRS, more than half of the restraints are
removed during the calculation; this loss in experimental
information is not compensated by the increased information content in tighter bounds. Surprisingly, the results for
our scheme of bounds resetting (calculations M03 and
M06) are somewhat worse than for the simple data removal scheme.
RMSref is systematically higher than RMSave in all
iterations and for all DRSs (Fig. 6). That precision overestimates accuracy has been noted previously.5,6,12,15,50 A
linear relation between accuracy and precision in NMR
The overall number of excluded restraints is comparable
to, but always smaller than, the number of violations in
the reference structure (Table I). Since internal dynamics
leads to inconsistent distance restraints, the violation
analysis would optimally identify those restraints most
affected by internal dynamics.
In Figure 7, we compare the average distances in one of
the calculated ensembles, 7rensemble8, to the corresponding
distances in the reference structure, rreference. Even for the
largest values of rreference, the correlation is surprisingly
good. There is a slight tendency that 7rensemble8 is smaller
than rreference. This is a consequence of the fact that not all
distance restraints with severe underestimation of the
upper limit were removed.
To get a more detailed picture, we compared the fraction
of excluded restraints per residue to the fraction of restraints which are violated in the reference structure, for
DRS RS38 (Fig. 8). There is a good correlation between the
percentage of excluded restraints and the percentage of
violated restraints in the reference structure (correlation
coefficient 0.64; Fig. 8a). However, there is little correlation between the percentage of excluded restraints and the
Fig. 7. Scatterplot of the distances in the reference structure rreference
against distances averages 7rensemble8 over the ensemble for DRS RS38,
fourth iteration. The excluded restraints are marked with crosses, the
included restraints with open circles.
fluctuation around the average in the trajectory (correlation coefficient 0.23; cf. Fig. 8a,b), or the RMS difference
between the structure and the reference structure (correlation coefficient 0.22; cf. Fig. 8a,c). The correlation between
the error in the structure and the RMS fluctuation in the
trajectory is better (correlation coefficient 0.49). As already
apparent in Figure 6, the RMS fluctuation of the structure
around its average is very much underestimated, and the
correlation of its residue dependence with that in the
original trajectory is small (correlation coefficient 0.24).
In contrast, the overall RMS fluctuation of the ensemble
for DRS RAL6 iteration 0, agrees rather well with the
fluctuation in the trajectory (correlation coefficient 0.48).
However, this comes at the expense of a much increased
RMS difference between the ensemble and the reference
structure (Fig. 8c), and the correlation between the fluctuation in the trajectory and the error is small (correlation
coefficient 0.25).
Figure 9 shows the number of excluded restraints,
compared with the difference between reff and rreference, for
DRS RS38. The probability to correctly identify distances
reff affected by internal dynamics grows with the error in
reff (the two lines in the figure coincide for large differences
rreference ⫺ reff ). However, there are excluded restraints
close to a difference of zero, and some restraints are not
excluded even at differences rreference ⫺ reff of several Å.
Fig. 8. (a) Fraction of excluded restraints per residue (RS38, iteration
4), compared to the fraction of distances in the reference structure lying
outside the bounds in the DRS RS38. (b) C␣-fluctuations in the trajectory
(solid lines); for calculation AL6, iteration 0 (dot-dashed line); for calculation S38, iteration 0 (dotted line); and for calculation S38, iteration 4
(dashed line). (c) C␣-RMS differences from the reference structure, for
calculation AL6, iteration 0 (dashed line), and for calculation S38, iteration
4 (dot-dashed line).
Re-Classification Versus Exclusion of Data
Fig. 9. Total number of restraints (total hight), and number of excluded
restraints (dark gray bar), against rreference ⫺ reff, for DRS RS38. Bin size is
0.2 Å.
Restraint exclusion is obviously not the optimal solution, since experimental data is lost. One simple alternative scheme is to increase the distance bounds for the
violated restraints, rather than remove them altogether.
This is common practice in many laboratories (e.g., see ref.
15). When applied manually, care is used and additional
data (e.g., peak shapes) are used to identify restraints for
which restraints are loosened. We have tested one auto-
matic scheme, in which the bounds for each violated
restraint are increased in steps by 10% until the restraint
is satisfied, up to a maximum of five times. If the restraint
still cannot be satisfied, it is excluded.
Expectedly, fewer restraints were excluded from the
calculation (see calculations M03 and M06 in Table I).
The precision of the structures is very high. However, the
surprising result of this calculation was that in general the
accuracy of the structures is not improved. For data set
RQ06, there is even a significant decrease.
The Model System
From an MD trajectory one can calculate realistic spectral densities and it is therefore the method of choice for
generating model systems to test the influence of internal
dynamics on the accuracy of NMR structures. The analysis
through correlation functions rather than simple distance
averages is more realistic, since for many NOEs the effects
of distance and angular fluctuations cancel each other, and
the assumption of 7r⫺68⫺1/6 or 7r⫺38⫺1/3 averages would be
incorrect. No knowledge of the exact nature of the average
was assumed in the refinement, similar to the real,
experimental case. The ‘‘noise’’ in the data due to internal
dynamics is not randomly and symmetrically distributed
around a known rigid structure; the derived data seemed
to be more problematic than that considered in the more
recent model studies.5,6 This is evidenced by the fact that
the RMS fluctuation of the trajectory around its average is
higher than what is typically observed in high resolution
NMR structures,13,29 that we could not obtain low-energy
structures with DRS RWMS, in contrast to Clore et al.,5 and
that the RMS difference from the ideal structure for DRSs
RAL6 is larger than observed by Zhao and Jardetzky.6
The reference structure, calculated from the MD trajectory by probability map refinement, is the equivalent of an
independently refined X-ray crystal structure. Circular
reasoning is avoided, since the force field used to calculate
the MD trajectory (PARAM19) was very different from the
parameters used in the calculation of structural ensembles
from the restraints (PARALLHDG). PARAM1932 is an
extended atom force field, treating only polar hydrogen
atoms explicitly, and the version of PARALLHDG51 used in
the calculation has covalent parameters from the CSDX
X-ray refinement parameters,42 and vdW radii from the
distance geometry program DISGEO.11 The comparison is
in a way more relevant than the comparison between
experimental X-ray crystal and solution NMR structures,
since crystal packing effects do not play any role.
Our goal was to examine the effects of dynamic averaging alone. For this purpose, the MD trajectory offers the
ultimate comparison with respect to structure accuracy
and the dynamics. Hence, we did not include the effects of
spin diffusion, which can be dealt with during a structure
calculation in a straightforward way (for reviews, see refs.
52, 53, 54).
Precision and Accuracy of the Structures
To achieve the highest accuracy, we had to define rather
narrow error bounds, and subsequently remove systematically violated distance restraints (about one third of the
total number of restraints for DRSs RQ03 and RS38 ). It
therefore appears that a smaller number of accurate
distance restraints led to more accurate structures than a
larger number of loose bounds. While the differences
between the results obtained in the fourth iteration for
different DRSs may be small (a few tenths of an Å), they
are statistically significant (between calculations S38 and
WMS, for example, the difference is more than two standard deviations; data not shown), and there is a clear trend
towards higher accuracy with tighter bounds. Differences
between structures obtained with all data (iteration 0) and
with data removal are much more important, and we
stress that the only DRS that produced low energy structures in iteration 0 was RAL6.
Simply removing the distance restraints that cause
systematic violations is only a first and maybe not very
satisfactory solution. Other procedures can be considered.
We tested one simple scheme that widens bounds specifically rather than removing the restraints, similar to
procedures used with experimental data.15 To date, however, this method does not perform as well as data exclusion (see calculation M03 and M06 in Figure 6), and lead to
an increase in precision, but a decrease in accuracy.
Due to the restraint removal, the structures obviously do
not satisfy all restraints any more. Structures refined with
wide error bounds, on the other hand, may satisfy all
restraints, but may not satisfy all data. In the DRS RAL6,
for example, a strong NOE is converted into an upper limit
of 6.0 Å. While a structure with a corresponding interproton distance of around 6 Å certainly satisfies the
restraint, it would violate the data (assuming that a strong
NOE corresponds to a distance of around 2.5 Å) by 3.5 Å.
Consequently, if one evaluates RMS differences not to the
bounds but directly to the complete set of ‘‘measured’’
distances, reff, structures in iteration 0 with DRS RAL6
show significantly larger values (around 1 Å) than structures of any other DRS in any iteration (e.g., below 0.6 Å in
iteration 0 and around 0.8 Å in iteration 4 for DRS RS38 ).
This is the case even though one uses the complete
data-set for the evaluation, and many of the data points
contributing to the RMS had been excluded from the
structure calculation.
A definite advantage of DRS RAL6 seemed that RMSDave
showed a correlation with the RMS fluctuation in the
original trajectory. However, the RMS fluctuation for the
tightest bounds (RS20 ) in iteration 0 is of similar size, and
the correlation is even better for RS38 (Fig. 8). Hence,
inconsistencies in the data can produce a similar RMSDave
as wide error bounds. The exact value of RMSDave depends
on the bounds, without any clear trend. None of the
calculations reproduced the disorder of the side-chain of
Phe22 (see Fig. 4).
Exclusion of Violated Restraints
and Internal Dynamics
The criterion for data exclusion in Eq. (1) was developed
to identify large violations24 and may be rather crude for
the present purpose since it uses only structural consistency to identify NOEs affected by internal dynamics.
Since this is the present implementation in iterative
schemes like ARIA10 and NOAH,26 we felt it important to
study the effect on dynamically averaged distances. While
the correlation of excluded distance restraints with those
violated in the reference structure is satisfactory, the
correlation with the mobility of the peptide chain is small
(see Fig. 8).
Still, a logical solution for inconsistencies in the restraints due to internal dynamics is to use ensemble
averaging for the identified NOEs, and assume a rigid
model for all others. This would solve the problem of
underdetermination in some ensemble averaging methods. In this context it should be noted that in the study by
Bonvin and Brünger,55 an important fraction of the data
was not averaged (hydrogen bond and coupling constant
restraints). Our own attempts to use ensemble averaging
without such a class of static restraints failed. An analysis
as described in this paper could be used to identify a class
of restraints that are not subject to dynamic averaging. We
expect that in this iterative way, ensemble averaging
methods could be used from the start in an NMR structure
calculation, as an integral part of iterative methods like
In this paper, we have deliberately restricted ourselves
to the ‘‘standard’’ structure determination approach. We
feel that the determination of an accurate ‘‘average’’ structure from NMR data is an important goal in itself, even
when some of the data are better represented by an
ensemble. One important use of an accurate average
structure is solving X-ray crystal structures with molecular replacement. An average structure is also a necessary
‘‘zero-order’’ approximation for further NMR refinement.
In general, the data may not always contain enough
information to obtain the dynamic behavior of the protein
In this paper, we showed that precision is not only a
consequence of tight bounds and number of restraints but
also of the consistency of restraints. The accuracy is
significantly higher with narrow bounds and restraint
exclusion than with bounds wide enough to obtain lowenergy structures without data removal. The most accurate structures were obtained by removing about a third of
the distance restraints. The restraint exclusion scheme
worked qualitatively correctly and identified those restraints that are incompatible with the rigid reference
It is worth noting that in our model study, all noise was
due to internal dynamics, and no additional sources of
noise such as artifacts or incorrect assignments were
present. If a data removal strategy is employed with
experimental data, the minimum requirement is to docu-
ment the excluded restraints. This applies obviously also
to modifications of individual restraints to obtain violationfree structures. Although this has apparently been used
for many structure determinations by NMR (e.g., ref. 15),
details of the procedures involved are usually not given,
and the consequences or the validity of the approach have
not been assessed systematically. We suggest to submit the
data with the structures in a form close to the raw data, so
that modifications (reclassification/exclusion) are visible.
To date, data-sets submitted to the PDB57 often do not
allow the ready identification of modified restraints, in
particular when only the final upper bounds are reported.
In addition to the deviation from the original data (an
R-value), the number of NOEs that needed corrections
could serve as a figure of merit.
1. Kim Y, Prestegard JH. A dynamic model for the structure of acyl
carrier protein in solution. Biochemistry 1989;28:8792–8797.
2. van Gunsteren WF, Brunne RM, Gros P, van Schaik RC, Schiffer
CA, Torda AE. Accounting for molecular mobility in structure
determination based on nuclear magnetic resonance spectroscopic
and X-ray diffraction data. Meth Enzymol 1994;261:619–654.
3. LeMaster DM, Kay LE, Brünger AT, Prestegard JH. Protein
dynamics and distance determinations by NOE measurement.
FEBS Lett 1988;236:71–76.
4. Wüthrich K. NMR of proteins and nucleic acids. New York: John
Wiley & Sons; 1986. p 1–292.
5. Clore GM, Robien MA, Gronenborn AM. Exploring the limits of
precision and accuracy of protein structures determined by nuclear
magnetic resonance spectroscopy. J Mol Biol 1993;231:82–102.
6. Zhao D, Jardetzky O. An assessment of the precision and accuracy
of protein structures determined by NMR: dependence on distance
errors. J Mol Biol 1994;239:601–607.
7. Güntert P, Braun W, Wüthrich K. Efficient computation of threedimensional protein structures in solution from nuclear magnetic
resonance data using the program DIANA and the supporting
programs CALIBA, HABAS, and GLOMSA. J Mol Biol 1991;217:
8. Hyberts SG, Goldberg MS, Havel TF, Wagner G. The solution
structure of eglin c based on measurements of many NOEs and
coupling constants and its comparison with X-ray structures.
Protein Sci 1992;1:736–751.
9. Folmer RHA, Nilges M, Konings RNH, Hilbers CW. Solution
structure of the single-stranded DNA binding protein of bacteriophage Pf3. EMBO J 1995;14:4132–4142.
10. Nilges M, Macias MJ, O’Donoghue SI, Oschkinat H. Automated
NOESY interpretation with ambiguous distance restraints: the
refined NMR solution structure of the pleckstrin homology domain from ␤-spectrin. J Mol Biol 1997;269:408–422.
11. Havel T, Wüthrich K. A distance geometry program for determining the structures of small proteins and other macromolecules
from nuclear magnetic resonance measurements of intramolecular 1H1H proximities in solution. Bull Math Biol 1984;46:673–698.
12. Havel TF, Wüthrich K. An evaluation of the combined use of
nuclear magnetic resonance and distance geometry for the determination of protein conformations in solution. J Mol Biol 1985;182:
13. Gronenborn AM, Filpula DR, Essig NZ, et al. A novel, highly
stable fold of the immunoglobulin binding domain of streptococcal
protein. Science 1991;253:657–661.
14. Chothia C, Lesk AM. The relation between the divergence of
sequence and structure in proteins. EMBO J 1986;5:823–826.
15. Gronenborn AM, Clore GM. Structures of protein complexes by
multidimensional heteronuclear magnetic resonance spectroscopy. Crit Rev Biochem Mol Biol 1995;30:351–385.
16. Post CB. Internal motional averaging and three-dimensional
structure determination by nuclear magnetic resonance. J Mol
Biol 1992;224:1087–1101.
17. Brüschweiler R, Roux B, Blackledge M, Griesinger C, Karplus M,
Ernst R. Influence of rapid intramolecular motion on NMR
cross-relaxation rates. A molecular dynamics study of antamanide
in solution. J Am Chem Soc 1992;114:2289–2302.
Abseher R, Lüdemann S, Schreiber H, Steinhauser O. NMR cross
relaxation investigated by molecular dynamics simulation: a case
study of ubiquitin in solution. J Mol Biol 1995;249:604–624.
Fushman D, Ohlenschläger O, Rüterjans H. Determination of the
backbone mobility of ribonuclease T1 and its 2’GMP complex
using molecular dynamics simulations and NMR relaxation data.
J Biomol Struct Dyn 1994;11:1377–1402.
Schneider T, Brünger AT, Nilges M. Influence of internal dynamics
on accuracy of protein NMR structures: derivation of realistic
model distance data from a long molecular dynamics trajectory. J
Mol Biol 1999;285:727–740.
Pearlman DA, Kollman PA. Are time-averaged restraints necessary for nuclear magnetic resonance refinement? A model study
for DNA. J Mol Biol 1991;220:457–479.
Pearlman DA. How is an NMR structure best defined? An analysis
of molecular dynamics distance based approaches. J Biomol NMR
Pearlman DA. How well do time-averaged J-coupling restraints
work? J Biomol NMR 1994;4:279–299.
Haenggi G, Braun W. Pattern recognition and self-correcting
distance geometry calculations applied to myohemerythrin. FEBS
Lett 1994;344:147–153.
Mumenthaler C, Braun W. Predicting the helix packing of globular proteins by self-correcting distance geometry. Protein Sci
Mumenthaler C, Braun W. Automated assignment of simulated
and experimental NOESY spectra of proteins by feedback filtering
and self-correcting distance geometry. J Mol Biol 1995;254:465–
Macias MJ, Musacchio A, Ponstingl H, Nilges M, Saraste M,
Oschkinat H. Structure of the pleckstrin homology domain from
␤-spectrin. Nature 1994;369:675–677.
Wagner G, Braun W, Havel T, Schaumann T, Gō N, Wüthrich K.
Protein structures in solution by nuclear magnetic resonance and
distance geometry: the polypeptide fold of the basic pancreatic
trypsin inhibitor determined using two different algorithms,
DISGEO and DISMAN. J Mol Biol 1987;196:611–639.
Berndt K, Güntert P, Orbons L, Wüthrich K. Determination of a
high-quality nuclear magnetic resonance solution structure of the
bovine pancreatic trypsin inhibitor and comparison with three
crystal structures. J Mol Biol 1993;227:757–775.
DeLano WL, Brünger AT. Helix packing in proteins: prediction
and energetic analysis of dimeric, trimeric, and tetrameric GCN4
coiled coil structures. Proteins 1994;20:105–123.
Brünger AT. X-PLOR. A system for X-ray crystallography and
NMR. New Haven: Yale University Press; 1992. p 1–382.
Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan
S, Karplus M. CHARMM: A program for macromolecular energy,
minimization, and dynamics calculations. J Comp Chem 1983;4:
Deisenhofer J, Steigemann W. Crystallographic refinement of the
structure of bovine pancreatic trypsin inhibitor at 1.5 Å resolution. Acta Crystallogr 1975;B31:238–250.
Brünger AT, Karplus M. Polar hydrogen positions in proteins:
empirical energy placement and neutron diffraction comparison.
Proteins 1988;4:148–156.
Loncharich RJ, Brooks BR. The effects of truncating long-range
forces on protein dynamics. Proteins 1989;6:32–45.
Ryckaert J, Ciocotti G, Berendsen H. Numerical-integration of
cartesian equations of motion of a system with constraints—
molecular dynamics of N-alkanes. J Comput Phys 1977;23:327–
Lipari G, Szabo A. Model-free approach to the interpretation of
nuclear magnetic resonance relaxation in macromolecules. 1.
Theory and range of validity. J Am Chem Soc 1982;104:4546–
Solomon I. Relaxation processes in a system of two spins. Phys
Rev 1955;99:559–565.
Nilges M, Clore GM, Gronenborn AM. A simple method for
delineating well-defined and variable regions in protein structures determined from interproton distance data. FEBS Lett
Abseher R, Nilges M. Are there non-trivial dynamic crosscorrelations in proteins? J Mol Biol 1998;279:911–920.
Nilges M, O’Donoghue SI. Ambiguous NOEs and automated
NOESY assignment. Progr NMR Spectr 1998;32:107–139.
Engh RA, Huber R. Accurate bond and angle parameters for x-ray
structure refinement. Acta Crystallogr 1991;A47:392–400.
Brüschweiler R. Normal modes and NMR order parameters in
proteins. J Am Chem Soc 1992;114:5341–5344.
Wlodawer A, Nachman J, Gilliland G, Gallager W, Woodward C.
Structure of form III crystals of bovine pancreatic trypsin inhibitor. J Mol Biol 1987;198:469–480.
Widmer H, Widmer A, Braun W. Extensive distance geometry
calculations with different NOE calibrations: new criteria for
structure selection applied to sandostatin and BPTI. J Biomol
NMR 1993;3:307–324.
Abseher R, Horstink L, Hilbers CW, Nilges M. Essential spaces
defined by NMR structure ensembles and molecular dynamics
simulation show significant overlap. Proteins 1998;31:370–382.
Vriend G, Sander C. Quality control of protein models: directional
atomic contact analysis. J Appl Crystallogr 1993;26:47–60.
Sippl MJ. Recognition of errors in three-dimensional structures of
proteins. Proteins 1993;17:355–362.
Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PROCHECK: a program to check the stereochemical quality of protein
structures. J Appl Crystallogr 1993;26:283–291.
Brünger AT, Clore GM, Gronenborn AM, Saffrich R, Nilges M.
Assessing the quality of solution nuclear magnetic resonance
structures by complete cross-validation. Science 1993;261:328–
Nilges M, Clore GM, Gronenborn AM. Determination of threedimensional structures of proteins by hybrid distance geometrydynamical simulated annealing calculations. Febs Lett 1988;229:
James TL. Relaxation matrix analysis of two-dimensional nuclear
Overhauser effect spectra. Curr Opin Struct Biol 1991;1:1042–
Case D. New directions in NMR spectral simulation and structure
refinement. In: van Gunsteren WF, Weiner PK, Wilkinson AJ,
editors. Computer simulation of biomolecular systems: theoretical
and experimental applications. Vol 2. Escom Leiden; 1993. p
Bonvin AMJJ, Boelens R, Kaptein R. Determination of biomolecular structures by NMR: use of relaxation matrix calculations. In:
van Gunsteren WF, Weiner PK, Wilkinson AJ, editors. Computer
simulation of biomolecular systems: theoretical and experimental
applications, Vol 2. Escom Leiden; 1993. p 407–440.
Bonvin AMJJ, Brünger AT. Conformational variability of solution
nuclear magnetic resonance structures. J Mol Biol 1995;250:
Bonvin AMJJ, Brünger AT. Do NOE distances contain enough
distance information to access the relative populations of multiconformer structures? J Biomol NMR 1995;5:72–76.
Bernstein FC, Koetzle TF, Williams GJB, et al. The protein data
bank: a computer-based archival file for macromolecular structures. J Mol Biol 1977;112:535–542.
Без категории
Размер файла
284 Кб
Пожаловаться на содержимое документа