вход по аккаунту



код для вставкиСкачать
PROTEINS: Structure, Function, and Genetics, Suppl. 1:68–73 (1997)
Analysis of Comparative Modeling Predictions
for CASP2 Targets 1, 3, 9, and 17
Robert W. Harrison, Charles C. Reed, and Irene T. Weber*
Department of Microbiology and Immunology, Kimmel Cancer Center, Jefferson Medical College,
Philadelphia, Pennsylvania
Comparative modeling targets 1, 3, 9, and 17 were predicted by alignment
of multiple sequences and structures, when
available, followed by minimization using the
program AMMP. The minimization used improved potentials, and distance restraints for
regions of common structure. New prediction
procedures were evaluated. Three tested solvent corrections did not significantly improve
the predictions. Target 17 had 85.3% sequence
identity with the parent and no insertions or
deletions. The prediction had a root-meansquare deviation from target 17 of 0.56 Å on Ca
atoms, and 0.59 Å for the ligand atoms, which
verified the accuracy of the minimization. Targets 1, 3, and 9 had 36.4%, 46.7%, and 33.3%
identity with the parent sequences, and predictions resulted in root-mean-square deviations
for 79–85% of Ca atoms of 1.49, 1.11, and 1.24 Å,
respectively. Conformational differences between parent and target crystal structures
were difficult to predict. The use of distance
restraints and multiple structures improved
the positioning of gaps in sequence alignment.
Distance restraints did not overcome errors
in sequence alignment or ambiguities due to
conformational variation in proteins. Predictions for targets 3 and 9 successfully reduced large deviations between parent and
target structures. Proteins, Suppl. 1:68–73, 1997.
r 1998 Wiley-Liss, Inc.
Key words: homology modeling; energy minimization; distance restraints; protein structure; prediction errors
Comparative modeling to predict the structure of
one protein from sequence similarities with a related
protein of known structure can be used to design
protein engineering experiments, to predict how
ligands bind to proteins, and to help solve experimental structures in solution or in the crystal. The
Critical Assessment of Structure Prediction Methods
2 (CASP2) was valuable for objective evaluation of
new prediction procedures by comparison of predicted and target structures. The results of comparative modeling in the 1994 CASP experiment and
previous comparisons have shown that it is very
difficult to predict the structure of regions with low
sequence similarity, insertions or deletions.1–4 However, functionally conserved ligand binding sites can
be predicted with relatively high accuracy.2,4 For the
CASP2 experiment, we incorporated the improved
potentials described in ref. 5, and tested three simple
solvent corrections, the use of multiple crystal structures for sequence alignment, and application of
interatomic distance restraints during minimization.
The proteins that were predicted by comparative
modeling were target 1, dihydrofolate reductase
from Haloferax volcanii, target 3, phosphotransferase enzyme IIA domain from Mycoplasma capricolum, target 9, cucumber stellacyanin, and target
17, rat liver glutathione transferase. Targets 1, 3, 9,
and 17 had sequence identities of 36.4, 46.7, 33.3,
and 85.3% with parent crystal structures 1dyh, 1gpr,
2cbp, and 2gst from the Protein Data Bank,6 respectively. Several related structures were available for
targets 1 and 3, and these structures were used to
provide distance restraints for a ‘‘core structure’’ of
distances predicted to be conserved. Target 9 was
predicted from just Ca atoms as a test using the 1cbp
parent structure and from all atoms in the 2cbp
crystal structure. Target 17 was predicted in complex with the substrate glutathione.
Differences between predicted and target structures will include both errors in the potentials and
modeling procedures and errors in the protein crystal structures. A prediction cannot be more accurate
than average crystallographic precision, where recent comparisons of different crystal structures of
identical proteins have found a mean value of 0.40 Å
for the root-mean-square deviation (RMSD) of Ca
atoms7 and a range of 0.16–0.79 Å for main chain
atoms.8 The errors in the predicted structure will not
have a normal distribution. This nonnormality will
be especially pronounced when a robust optimizer is
used. Where the distance terms correctly describe
the geometry, the errors should be normal, with a
variance reflecting the average error in the re-
*Correspondence to: Dr. Irene T. Weber, Department of
Microbiology and Immunology, Kimmel Cancer Center, Jefferson Medical College, Philadelphia, PA 19107.
Received 1 May 1997; Accepted 25 August 1997
TABLE I. Summary of Predictions: RMS Differences for Conserved Regions (Å)*
Target 1†
Target 3†
Target 9‡
Target 17§
*Conserved regions were defined by using a cutoff of 3.0 Å for target 1, 2.5 Å for targets 3 and 9, and no cutoff for target 17. Analysis of
all atoms is presented in ref. 14.
†A, B, C, D test different solvent corrections for targets 1 and 3. Prediction A has no solvent correction, B used an increased van der
Waals radius for polar side chains, C used a single-shell discrete water model for polar residues, and D used a two-shell discrete water
model for polar residues.
‡For target 9, A is the prediction from all atoms (2cbp) and B is from Ca atoms only (1cbp).
§For target 17, A and B are two subunits of the predicted dimer.
TABLE II. Comparison of Predicted, Parent, and Target Structures
Crystal structures*
1.25 (82.3)
1.07 (82.5)
0.93 (81.4)
0.46 (100)
RMSD Å (%Ca)†
0.91 (95.7)
0.67 (98.7)
0.74 (78.7)
0.40 (100)
1.49 (84.8)
1.11 (85.1)
1.24 (79.3)
0.56 (100)
*Res is the number of residues in the target protein. The PDB entry for the parent crystal structure is given with
the percentage of identical residues (%ID) compared to the target sequence and the number of insertions or
deletions (#Ins). Structures 1dyh, 1dhf, 1dr1, 3dfr and 5dfr were used for sequence alignment and distance
restraints during minimization of target 1, and 1gpr, 1gla, and 1f3g were used for target 3.
†RMSD is the root-mean-square deviation for the percentage of Ca atoms in parentheses using cutoff values as in
Table I.
‡Parent/target compares the parent and target crystal structures.
§Parent/predict compares the parent and predicted structures.
¶Predict/target compares the predicted and target crystal structure.
straints (this follows from the central limit theorem
of statistics applied to an expectation value). However, where the distance restraints and potentials
are ill posed, due to an alignment error or where
insufficient structural data are available at a loop,
the results will depend on the specific error resulting
in a nonnormal distribution. When the distance
terms are only locally valid, as in loops, the local
superposition of structures will have low variance,
but the global superposition will be poor. Therefore,
the total distribution of errors will have at least
three components: the normally distributed error in
good regions, errors distributed around a different
mean in locally good regions, and systematically
incorrect regions.
The protein sequences were aligned by using the
sequence analysis package of the Genetics Computer
Group. Multiple sequence alignments were made for
targets 1 and 3. The initial alignment was adjusted
manually to position insertions and deletions at
surface turns between elements of secondary structure. The parent structure was chosen to have the
highest identity and the fewest gaps relative to the
target sequence. The initial model for the target
structure was obtained by combining the sequence
with the parent structure as described previously.2
All atoms of identical residues and identical atoms in
different residues were kept. Solvent and common
ligands from the parent structure were included
because these are often structurally important.
Distance Restraints
Interatomic distances between 3 and 8 Å were
used as restraints for the atoms expected to be
identical in parent and target structures, as in
Weber and coworkers.9 Distance restraints were
implemented with a split harmonic potential similar
to that used for nuclear Overhauser effect data. The
upper and lower bounds were determined from the
observed range of distances in multiple crystal structures or set to distance 60.5 Å. Peptide geometry
was restrained toward trans with a 3.148 Å restraint
on the O = HN distance. For multiple crystal
structures, an intersection set of distance restraints
was generated by using interatomic distances found
in all structures. A union set of all distances was
generated and used with a lower weight.
Solvent Corrections
Models include all solvent molecules available for
the parent crystal structure. Three different solvent
corrections were tested for prediction of targets 1
and 3: an increased van der Waals radius for polar
side chains, and discrete water molecules in one or
two shells around polar amino acids. Since water is
present in high molar excess to any individual amino
acid in the protein, the ability of a charged amino
acid to make salt bridges is reduced by competition
with bound water. The van der Waals radius for
hydrogen bond acceptors and donors was increased
by 3 Å to reflect the average presence of one water.
Discrete models were also generated by attaching
one shell and two shells of water to individual
hydrogen bond donors and acceptors in amino acid
side chains. These models for solvation do improve
local geometry and preclude spurious buried residues in test systems, but had little effect on overall
The new atoms and all hydrogen atoms were built
using AMMP10 minimization, while keeping the identical atoms fixed, as described.2 AMMP uses all
atoms and all nonbond and electrostatic terms, and
an improved potential set.5 The entire structure was
minimized with distance restraints for targets 1, 3,
and 9. Distance restraints for multiple crystal structures were applied for targets 1 and 3. Manual
adjustment was used to place residues F21 and Y22
internally in one insertion of target 9. No distance
restraints were used for target 17.
The robust optimization technique of four-dimensional (4D) embedding was used in order to fully test
the effects of distance restraints. If a local technique,
like conjugate gradients, were used, then it would be
impossible to differentiate between errors due to
improper convergence and errors due to improper
distance terms. 4D embedding was described for
CASP.2 4D embedding was first proposed for solving
nuclear magnetic resonance (NMR) distance restraint problems,11 and the implementation in AMMP
readily solves these problems.
The predictions for targets 1, 3, 9, and 17 are listed
in Tables I and II. New methods were tested by
submitting several predictions for the same target.
The accuracy of minimization was tested for target
17. Solvent corrections were tested for targets 1 and
3. Prediction starting from just Ca atoms was tested
for target 9. Several related crystal structures were
used in multiple sequence alignment to aid in positioning gaps, and to provide interatomic distance
restraints for predictions of targets 1 and 3. The
predictions were compared with both the target and
parent crystal structures for the overall RMSDs.
Target 17 was very similar to the parent 2gst with
85.3% sequence identity and no insertions or deletions. The predictions had RMSDs from target 17 of
0.52 and 0.56 Å on Ca atoms. Targets 1, 3, and 9 had
36.4, 46.7, and 33.3% identity with the parent sequences, and the predictions resulted in RMSDs for
79–85% of Ca atoms of 1.49, 1.11 and 1.24 Å,
respectively (Table II). Interestingly, although the
parents and targets are related by 33–47% sequence
identity, the pairs of related crystal structures all
have 81–83% structurally conserved residues with
RMS differences of 0.93–1.25 Å. The structures are
more highly conserved than the sequences, with
differences in the observed range for the sequence
identity.7 However, target 9 is the most similar
structurally to the parent 2cbp, although the sequence identity of 33% is the lowest. The tested
procedures were evaluated for their success in reproducing the target structures in both conserved and
variable regions.
Effect of Minimization
The prediction for target 17 was the most straightforward, since the target and parent sequences had
the high-sequence identity of 85% with no gaps. The
prediction tested the standard method with all atoms, solvent and ligand from the parent structure2
and improved potentials from ref. 5. Because it was
not known if inhibitor was bound to target 17, the
glutathione substrate was modeled in the binding
site, as described for nonprotein ligands.2 The two
subunits in the dimer were entered as submissions
CM519 and 525. This prediction showed the accuracy of the procedure when the target is almost
identical to the parent structure with an RMSD of
0.46 Å for Ca atoms (Table II). The RMS differences
between predicted and target structures were 0.52
and 0.56 Å for Ca atoms, 0.57 and 0.60 Å for main
chain atoms, and 0.98 and 1.03 Å for all atoms, for
the two subunits, respectively. These differences are
within the experimental errors observed for different
crystal structures of identical proteins, where recent
comparisons have found a mean value of 0.40 Å for
the RMSD of Ca atoms7 and a range of 0.16–0.79 Å
for main-chain atoms.8 The deviations of side chain
atoms were more variable; RMSDs of 1.32–1.68 Å
were reported for different crystal forms of bovine
Fig. 1. Structural alignment of C-terminal sequences of target
3, prediction 3, parent 1gpr, 1f3g, and 1gla structures. Residues
142–154 of target 3 are shown. Asterisks indicate gaps.
Fig. 2. Comparison of prediction (green), 1gpr parent (yellow) and target 3 (red) in region of
termini. The Ca atoms of residues 11–16, 139–145, and 153–159 of target 3 are shown with
corresponding regions of 1gpr and the prediction. The predicted structure has moved away from the
parent and toward the target structure for these 3 adjacent strands.
pancreatic trypsin inhibitor.12 The common atoms of
the ligand had an RMS difference of 0.59 Å and the
ligand binding site had differences of 0.44 Å for Ca
atoms and 0.76 Å for all atoms, which suggests that
ligands and their binding sites can be predicted with
relatively high accuracy, as previously noted.2,4 The
close agreement between prediction and target, and
success in the docking predictions,13 verify the accuracy of the potentials and minimization procedure in
reproducing closely similar protein structures.
Effects of Solvent Corrections
Solvent corrections were tested for targets 1 and 3
to ensure that polar side chains were directed to the
surface of the protein. Inappropriate placement of
polar side chains within the protein was observed in
the previous CASP experiment. The models for solvation improved local geometry and precluded spurious buried residues in test systems, but had little
effect on overall error. Three predictions were submitted to test two solvent corrections for target 1.
CM123 had no correction, CM124 used an increased
van der Waals radius for polar side chains, and
CM122 used a single shell discrete water model for
polar residues. Four predictions were submitted for
target 3 to test three different solvent corrections.
CM144 had no correction, CM145 used an increased
van der Waals radius for polar side chains, CM146
used a single shell discrete water model for polar
residues, and CM176 used a two-shell discrete water
model. For both targets, the predictions were very
similar to each other for Ca and all atoms, and there
was no significant improvement in the agreement
with the target structure (Table I).
Prediction From Only Ca Atoms
Target 9 had 33% sequence identity with a single
parent represented initially by Ca atoms only (1cbp)
and later all atoms (2cbp). Predictions were made
from both 1cbp and 2cbp to test the effects of starting
from Ca atoms only. Not unexpectedly, prediction
from Ca atoms was significantly worse at 2.5 Å
RMSD from the target Ca atoms, compared to 1.2 Å
for the prediction using all atoms (Table I). The
TABLE III. Analysis of Insertions, Deletions,
and Variable Regions*
Region type†
Insert 24
Insert 73–74
Insert 92–93
Delete 132/133
Delete 144/145
Delete 145/146
Insert 20–24
Insert 64, 66
Fig. 3. Difference distance plot for target 3. The difference is
the distance between pairs of Ca atoms in the prediction and
target minus the distance in parent and target. This difference is
plotted against the distance between the parent and target for the
same atoms. Negative values indicate that the prediction is closer
to the target than is the parent. Three effects are seen in this plot.
The normally distributed errors in the structurally conserved core
are clustered around 1 Å. Many of the large distances between the
parent and the target are shortened, but a streak of small
differences (around 20.5) extending to 12 Å along the x-axis
reflects the effect of incorrect distance restraints. This ‘‘streak’’ was
absent from the equivalent plot for target 9, where only a
single-parent structure was available.
agreement for all atoms also increased to 4.1 Å from
1.9 Å comparing predictions from only Ca atoms and
from all atoms.
Errors in Sequence Alignment
Several structures were superimposed to give the
initial sequence alignment for targets 1 and 3. The
insertions and deletions in the sequence alignment
for target 1 were correctly positioned by using structures 1dyh, 1dhf, 1dr1, 3dfr, and 5dfr, except for a
misplacement of the C terminus. The predicted
two-residue deletion was actually a longer sevenresidue deletion between the target and parent,
which was not obvious from the aligned sequences.
Similarly for target 3, the superposition of the structures of 1gpr, 1f3g, and 1gla allowed correct positioning of the gaps, except near the C terminus. The
predicted C terminus was mispositioned by one
residue compared to the correct structure. The correct position cannot easily be deduced from the
multiple alignment (Fig. 1). Target 9 was predicted
from one parent structure, and there were errors in
the predicted positions of gaps. One stretch of 14
residues was misplaced by one residue due to the
presence of only two identical residues in both the
predicted and actual alignments. These errors would
probably not have occurred if several related structures were used. Therefore, the superposition of
several related structures helped to position inser-
RMS differences (Å)‡
Not flagged
Not flagged
*Regions of dissimilar structure that were flagged in the
automatic comparisons for CASP2.
†Insertions (Insert), deletions (Delete), and Variable regions are
‡RMSDs on Ca atoms are noted for the superposition of the
whole structure and for a local superposition of the region.
tions and deletions, but did not always result in the
correct alignment.
Comparison of Prediction, Parent,
and Target Structures
One measure of success is whether the prediction
has reduced the conformational distance between
parent and target crystal structures. The predictions
without solvent correction were analyzed in comparison with parent and target structures (Table II).
Overall the predicted structures were close to the
parent structures with RMSD of 0.40–0.91 Å. However, targets 1 and 3 included 3% more atoms in the
superposition with the predicted structure than with
the parent structure. This increase suggested that
the overall agreement between predicted and target
structures was improved. For target 3, concerted
changes in position of the two termini and strands
139–145 showed that the prediction had moved
closer to the target in regions of larger discrepancies
(Fig. 2). Therefore, the distances between pairs of Ca
atoms were plotted to show if the prediction was
closer than the parent to the target structure. The
differences in separation of Ca atoms for prediction–
target and target–parent were plotted against the
target–parent differences for the same atoms (Fig.
3). The negative values clearly confirm that the
prediction is closer to target 3 than is the parent.
Improvement is seen especially for atoms with differences of 4–8 Å between parent and target crystal
structures. These atoms were predicted up to 5 Å
closer to the target, which is a significant improvement over the parent structure. This success was
partly due to the use of distance restraints from
multiple structures. Similar improvements were ob-
served in the difference plot for prediction of target 9,
but these must arise from minimization because only
a single parent structure was used. By contrast, the
prediction for target 1 showed no improvement over
the parent structure, despite the use of distance
restraints from five crystal structures.
Analysis of Loops
The regions with larger differences between prediction and target were analyzed by the automated
procedure for CASP2, which performed an overall
superposition and a local superposition of residues in
‘‘loops.’’ Results for targets 1, 3, and 9 are summarized in Table III. In the prediction for target 9, a
five-residue insertion (20–24) was deduced to be a
helix, and distance restraints were used to enforce
helical conformation. Unfortunately, the predicted
helix was left-handed because the distance restraints did not distinguish the handedness (the
residues were correctly L-amino acids). For targets 1
and 3, the low RMSDs of 0.05–0.48 Å for the local
superpositions of most loops suggested that the
correct conformation had been predicted. The residues 144–146 of target 3 that were modeled with two
single-residue deletions were not flagged as conformationally different suggesting successful prediction.
In both targets 1 and 3, more than half of the flagged
‘‘loop’’ regions did not involve insertions or deletions.
Therefore, differences at gaps were indistinguishable
from conformational differences in variable regions.
Effects of Distance Restraints
Two fundamental problems with distance restraints were revealed. First, because they are entirely empirical, distance restraints bring no new
information to the problem. Distance restraints improved the prediction when similar distances were
present in the target. They did not overcome errors
in sequence alignment or ambiguities arising from
local variations in protein structures. Second, while
distance restraints are good at preserving correct
features of the model, they are equally good at
preserving incorrect features. Incorrect features result from sequence alignment errors, or when conserved secondary structure elements are shifted in
space with respect to each other. Therefore, careful
choice of distance restraints is important in regions
with low homology. Distance restraints should not be
used in regions with little sequence similarity or
insertions and deletions. Distance restraints, while
useful, are no replacement for a fundamentally
better treatment of a solvated charged protein.
1. The AMMP potentials and minimization procedure result in predictions that agree with the
target within the experimental errors observed
for protein crystal structures when the sequence
identity is high.
The positioning of insertions and deletions is
improved when several related structures are
superimposed to obtain the best alignment with
the target sequence. But, multiple sequence alignment does not always result in the correct positioning of the target sequence.
The simple solvent corrections that were tested
did not significantly improve the predictions.
Distance restraints can improve the prediction
for structurally conserved regions, but should not
be used in regions with little sequence similarity
or insertions and deletions.
Two predictions have reduced the large deviations between parent and target structures.
1. Mosimann, S., Meleshko, R., James, M.N.G. A critical
assessment of comparative modeling of tertiary structures
of proteins. Proteins 23:301–317, 1995.
2. Harrison, R.W., Chatterjee, D., Weber, I.T. Analysis of six
protein structures predicted by comparative modeling techniques. Proteins 23:463–471, 1995.
3. Greer, J. Comparative modeling methods: Application to
the family of mammalian serine proteases. Proteins 7:317–
334, 1990.
4. Weber, I.T. Evaluation of homology modeling of HIV protease. Proteins 7:172–184, 1990.
5. Weber, I.T., Harrison, R.W. Molecular mechanics calculations on HIV-1 protease with peptide substrates correlate
with experimental data. Protein Eng. 9:679–690, 1996.
6. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., et al. The
Protein Data Bank: A computer-based archival file for
macromolecular structures. J. Mol. Biol. 112:535–542, 1977.
7. Flores, T.P., Orengo, C.A., Moss, D.S., Thornton, J.M.
Comparison of conformational characteristics in structurally similar protein pairs. Protein Sci. 2:1811–1826,
8. Zegers, I., Maes, D., Dao-Thi, M.-H., Poortmans, F., Palmer,
R., Wyns, L. The structures of RNase A complexed with
38CMP and d(CpA): Active site conformation and conserved
water molecules. Protein Sci. 3:2322–2339, 1994.
9. Weber, I.T., Harrison, R.W., Iozzo, R.V. Model structure of
Decorin and implications for collagen fibrillogenesis. J.
Biol. Chem. 271:31767–31770, 1996.
10. Harrison, R.W. Stiffness and energy conservation in the
molecular dynamics: An improved integrator. J. Comp.
Chem. 14:1112–1122, 1993.
11. Beulter, T.C., van Gunsteren, W.F. Molecular dynamics free
energy calculation in four dimensions. J. Chem. Phys.
101:1417–1422, 1994.
12. Wlodawer, A., Nachman, J., Gilliland, G.L., Gallagher, W.,
Woodward, C. Structure of form III crystals of bovine
pancreatic trypsin inhibitor. J. Mol. Biol. 198:469–480,
13. Dixon, J.S. Evaluation of the CASP2 docking section.
Proteins Suppl. 1:198–204, 1997.
14. Martin, A.C.R., MacArthur, M.W., Thornton, J.M. Assessment of comparative modeling in CASP2. Proteins Suppl.
1:14–28, 1997.
Без категории
Размер файла
228 Кб
Пожаловаться на содержимое документа