close

Вход

Забыли?

вход по аккаунту

?

231

код для вставкиСкачать
PROTEINS: Structure, Function, and Genetics Suppl 3:218–225 (1999)
COMPARISON OF PREDICTION QUALITY IN THE
THREE CASPS
A Measure of Progress in Fold Recognition?
Aron Marchler-Bauer and Stephen H. Bryant*
Computational Biology Branch, National Center for Biotechnology Information,
National Library of Medicine, National Institutes of Health, Bethesda, Maryland
ABSTRACT
We present a retrospective analysis of CASP3 threading predictions, applying evaluation and assessment criteria used at CASP2. Our
purpose is twofold. First, we wish to ask whether
measures of model accuracy are comparable between CASP3 and CASP2, even though they have
been calculated differently. We find that these quantities are effectively the same, and that either may
be used to compare model accuracy. Secondly, we
wish to assess progress in fold recognition by comparing the numbers of CASP2 and CASP3 models
that cross specific accuracy thresholds. We find that
the number of accurate models at CASP3 drops
sharply as the targets become more difficult, with
less extensive similarity to known structures, exactly the pattern seen at CASP2. CASP3 teams do
not seem to have predicted accurate models for
targets of greater difficulty, and for a given difficulty
range the best CASP3 models seem no more accurate than the best models at CASP2. At CASP3,
however, we find greater numbers of accurate models for medium-difficulty targets, with extensive
similarity to a known structure but no shared sequence motifs. Threading methods would appear to
have become more reliable for modeling based on
remote evolutionary relationships. Proteins Suppl
1999;3:218–225. Published 1999 Wiley-Liss, Inc.†
Key words: protein structure; threading; structure
prediction; fold recognition; structure
comparison
INTRODUCTION
Measuring progress in fold recognition would appear to
be a simple matter. ‘‘Blind’’ predictions for CASP21 and
CASP32 represent the state of the art in threading methods as of 1996 and 1998, respectively. One need only use
this database to ask: Were the threading models produced
for CASP3 more accurate than models produced for CASP2?
Were there greater numbers of accurate models produced
at CASP3, as compared to CASP2? These seemingly
simple questions may not be so simple to answer, however.
There are significant technical differences in the way
†This article is a US government work and, as such, is in the public domain in the
United States of America.
Published 1999 WILEY-LISS, INC.
threading alignment accuracy was measured at CASP2
and CASP3, and to compare model accuracy one must
verify that these alternative measures are equivalent, or
nearly so. To compare the numbers of accurate models, one
must also define what one means by ‘‘accurate.’’ While it is
straightforward to choose a specific threshold, there is
certainly no unique or universally accepted way to do so.
Furthermore, while one might expect prediction success to
depend on target difficulty, there is no reason to expect
that CASP2 and CASP3 have presented equal mixtures of
easy, medium, and hard targets. Yet to make a valid comparison, one must somehow assign target difficulties, for
which there is also no unique or universally accepted metric.
Model accuracy at CASP3 and CASP2 was evaluated by
comparing threading alignments to a reference structurestructure alignment. The CASP3 assessor used the number of correctly aligned target residues, sf0⫹sf4, as a part
of his competitive ranking of threading models,3,4 and the
CASP2 assessor similarly used alignment specificity, the
number of correctly aligned residues as a fraction of
alignment length, ASp4.5,6 The reference structure-structure alignments used to compute sf0⫹sf4 and ASp4 are
quite different, however. At CASP3 the structure comparison program PROSUP7 searched among alternative structure-structure alignments of the predicted and observed
target structures to find a reference alignment that maximized sf0. At CASP2 the structure comparison programs
DALI,8 SSAP,9 and VAST10 compared the target structure
to all templates in the database and computed a single
structure-structure alignment for each template found to
be similar to the target. It is impossible to know a priori
whether these differences in the ‘‘standard of truth’’ are
important, and whether model accuracy has been measured in comparable ways at CASP3 and CASP2. To
address this question we therefore compute CASP2 accuracy measures for all CASP3 models and present here a
quantitative comparison.
Sustained performance of CASP3 threading methods
was assessed by ranking models for each target, consider*Correspondence to: Stephen H. Bryant, National Institutes of
Health, Building 38A, Room 8N805, 8600 Rockville Pike, Bethesda,
MD 20894. E-mail: bryant@ncbi.nlm.nih.gov
Received 27 May 1999; Accepted 14 June 1999
PROGRESS IN FOLD RECOGNITION?
ing the sf0⫹sf4 accuracy measure, with six points awarded
for first-place accuracy, five points for second place, and so
on.3 As noted by the CASP3 assessor, the sum of points at
CASP3 is analogous to a statistic from the sport of
formula-1 automobile racing. In formula-1 racing, drivers
are awarded points based on their ranking in a number of
races and compared according to the sum of points.
Sustained performance at CASP2 was assessed differently,
by counting the number of models that exceeded a fixed
(though arbitrary) accuracy threshold with respect to the
ASp4 or CSpc (Contact Specificity) accuracy measures.11,12
CASP2 scores are analogous to a statistic from a different
competitive sport, baseball. Baseball players are often
compared according to the number of ‘‘home runs,’’ i.e., the
number of times they hit the ball farther than a fixed
(though arbitrary) distance. Either assessment style is a
reasonable way to judge sustained performance. The
CASP3 assessment style seems less suited to judging
progress over time, however, because it is based entirely on
relative performance. From the number of formula-1 points,
for example, one cannot tell who was driving faster, the top
driver in 1996, or the top driver in 1998. But one may infer
that a baseball player with more home runs was hitting
the ball farther than a player with less, no matter when
these performances were recorded. To measure progress in
fold recognition we therefore rely on counts of accurate
models, using accuracy thresholds equivalent to those
applied at CASP2.
Measuring progress in fold-recognition is perhaps more
difficult than comparing formula-1 drivers or baseball
players, however. At CASP the ‘‘playing field’’ does not stay
the same from year to year, since new targets must be
chosen and the extent of their similarity to known structures will vary. To assess progress one must somehow level
the playing field by correcting for differences among the
targets and comparing predictions for targets of comparable difficulty. After CASP2 it was suggested that target
difficulty be characterized by plotting the degree of sequence similarity versus the extent of structural similarity
with respect to available templates.11,12 Difficulty categories are assigned based on a target’s falling within distinct
regions of this ‘‘phase diagram.’’ Medium targets, for
example, are those with 60% or more of residues superimposable on a known structure, but without recognizable
sequence motifs. Here we present this phase diagram of
target difficulty for both CASP3 and CASP2 targets, and
we compare the numbers of accurate CASP3 and CASP2
models by difficulty categories. We suggest an interpretation, but the reader is free to use these data to make his or
her own assessment of progress.
METHODS
We obtained predictions for CASP3 fold recognition
targets from the LLNL Prediction Center.13 CASP3 predictions were made available as three-dimensional models in
PDB-format, including predictions originally submitted as
target-template alignments. For calculation of CASP2
evaluation quantities, we converted predictions back into
target-template alignments. We could do so for all predic-
219
tions where the PDB14 template was named in the prediction and where we could unambiguously assign 60% or
more of model residues to template residues. For unambiguous linking of model and template residues we required
C␣ ⫺ C␣ distances of 2.5 Å or less. We also converted
models submitted as separate segments into a single
model including a larger fraction of the target, whenever
this did not result in physically implausible models. The
CASP3 organizers treated individual segments as separate models, and a small number of predictions thus differ
from those they evaluated. This difference has no significant effect on evaluations shown below.
CASP2 evaluation quantities were calculated using the
structure-structure alignments generated by the VAST
algorithm, as distributed to CASP3 predictors prior to the
meeting.15 We could not calculate evaluation quantities
based on reference alignments by DALI or SSAP, since
target-template alignments by these methods were not
computed for CASP3.2 For models based on templates not
recognized by VAST and models where alignment reconstruction was not possible (usually because the PDB
template was not identified in the prediction) we calculated CASP2 evaluation quantities that do not depend on a
reference structure-structure alignment, including CSpc.
CASP2 evaluation quantities have been described in detail
previously6 and are summarized in the caption to Figure 1.
The complete set of models used in this study and CASP2
evaluation quantities calculated for CASP2 and CASP3
predictions are available electronically.16,17 CASP3 evaluation quantities used in the comparisons below were taken
directly from the website maintained by the CASP3 organizers.13
RESULTS
Comparable Measures of Model Accuracy?
To determine whether measures of model accuracy from
CASP2 and CASP3 are comparable, we plot in Figure 1
(sf0⫹sf4)/nres versus ASp4 and CSpc. The quantity sf0⫹sf4
gives the number of correctly aligned residues relative to
the PROSUP structure-structure alignment used at
CASP3, allowing a tolerance of 4 residues shift error. We
express this value as a fraction, (sf0⫹sf4)/nres, where nres
is the length of the predicted CASP3 alignment, so as to
place it on the same scale as the CASP2 quantities. The
CASP2 quantity ASp4 gives the fraction of correctly aligned
residues relative to VAST structure-structure alignments,
similarly allowing 4 residues shift error. The CASP2
quantity CSpc gives the fraction of correctly predicted
contacts, C␣ pairs under 8 Angstroms apart, separated by 5
or more residues in the polypeptide chain. One may see
that (sf0⫹sf4)/nres and ASp4 are highly correlated. For
CASP3 fold-recognition models based on templates recognized as similar by VAST the correlation coefficient is .89,
increasing to .92 if comparative modeling targets are also
included (not shown). Values for sf0 and sf4 are calculated
for all models, regardless of their similarity to the target,
and in this respect (sf0⫹sf4)/nres is similar to CSpc. One
may see there are a large number of CASP3 models
recognized as inaccurate by either measure, most based on
220
A. MARCHLER-BAUER AND S.H. BRYANT
Fig. 1 Correlation of CASP2 and
CASP3 model accuracy measures,
(a) (sf0⫹sf4)/nres vs. ASp4, and
(b) (sf0⫹sf4)/nres vs. CSpc. Values
are expressed as percentages. The
CASP3 measure sf0 gives the number
of correctly aligned residues according
to the PROSUP target vs. model structure-structure alignment. The CASP3
measure sf4 gives the number of additional residues that are correctly
aligned if one allows a shift-error tolerance of four residues. Here we plot
Alignment Specificity (sf0⫹sf4)/nres,
where nres is the number of residues
in the CASP3 model. The CASP2 measure ACrct gives the number of correctly aligned residues according to
the VAST target vs. template structurestructure alignment. The CASP2 measure ACrct4 gives the total number
residues that are correctly aligned allowing a 4-residue shift-error tolerance. Here we plot Alignment Specificity, ASp4⫽(ACrct⫹ACrct4)/nres. In (a)
we draw a line with slope 1 and intercept 0, to indicate the expected behavior if (sf0⫹sf4)/nres and ASp4 were
identical. The CASP2 quantity CCrct
gives the number of residue pairs predicted to be in contact by the threading
model, which are also in contact in the
true structure of the target. Here we
plot Contact Specificity, CSpc⫽CCrct/
nc, where nc is the total number of
contacts predicted by the threading
model. The relationship of CSpc and
(sf0⫹sf4)/nres (or ASp4) is approximately quadratic, and in (b) we plot
CSpc on a scale that is linear in the
square root of CSpc, and we calculate
the correlation coefficient accordingly.
CASP3 models based on templates
recognized as similar by VAST are
plotted as large dots, and those based
on templates not recognized as similar
by VAST are plotted as small dots.
Fragmentary models assigning coordinates to less than 45% of domain
residues are omitted from the plots.
templates not recognized as similar to the target by VAST
(small dots in Figure 1). For the subset of models based on
templates similar to the target, as recognized by VAST, the
correlation of (sf0⫹sf4)/nres and CSpc is .80.
To determine whether the ranking of CASP3 predictions
would be affected by differences in model accuracy measures, we have recalculated the CASP3 assessor’s ‘‘formula1’’ table using CSpc, a CASP2 measure that does not
depend on structure-structure alignment. As in the CASP3
assessor’s table,18 models are given six points if they have
the highest CSpc, five points if they have the secondhighest CSpc, etc. We have not attempted to award ‘‘bonus
points,’’ and the ranking is based entirely on the relative
values of CSpc for models based on templates judged by
the assessor to belong to the correct SCOP19 superfamily
or fold, i.e., those assigned a letter A through F in his
assessment. This analysis is shown in Table I. When
comparing to the assessor’s formula-1 table,18 one sees
that ranking by CSpc gives essentially the same results as
a ranking that considered sf0⫹sf4. The top six prediction
teams are the same and occur in almost the same order,
and there are only minor differences in the ranking of the
remaining teams. Results are also very similar to the
CASP3 assessor’s formula-1 table if one uses ranks based
on ASp4 (not shown). One may conclude that the differences in model accuracy measures between CASP2 and
CASP3 were not critical with respect to CASP3 assessment. The CASP3 evaluation quantities are very similar to
those used previously at CASP2.
Close examination of Figure 1 does reveal some differences in the CASP3 and CASP2 evaluation quantities.
There are a small number of models that VAST finds to be
221
PROGRESS IN FOLD RECOGNITION?
TABLE I. CASP3-Style Assessment Using CASP2 Model Accuracy Measures†
SCOP
target
212
166
005
176
217
019
074
028
017
156
066
033
061
090
003
201
147
105
040
273
009
085
035
257
045
162
190
023
224
053
072
168
136
142
143
179
Class
Bryant
SB-Fold
Jones
UNAGI
Sippl
UCSC
Sternberg
Godzik
Sjolander
Benner-Co
Fischer
Elofsson
Yang
Valencia
Hubbard
Tatsuya
BMERC
Moult
Olszewski
Reva
Xu-Ying
Park
Baker
Gregoret
Torda
Coulson
Kolinskol
Timms
Finkelste
Avbelj
Weber
Eisenberg
Blundell
Taylor
Solovyev
GMD-SCAI
MR
81
(1)
(3)
(2)
(4)
(5)
(6)
(7)
(11)
(12)
(15)
(10)
(13)
(9)
(16)
(17)
(14)
(22)
(25)
(21)
(23)
(18)
(19)
(8)
(38)
(29)
(35)
(20)
(31)
(37)
(32)
(33)
(36)
(34)
(28)
(26)
(27)
B
A
D
C
44
85
Superfamily
83 54 53
B
A
D
B
B
D
C
A
B
E
D
63b
79
46
71a
A
E
F
C
C
B
F
B
A
C
F
F
F
F
D
E
A
F
D
E
E
B
C
F
F
D
A
A
C
Fold
67
71b
New fold
52
56
N
N
N
N
N
D
N
N
F
A
E
F
E
E
F
F
F
B
N
B
F
C
N
F
F
F
C
43
C
C
D
E
F
F
F
F
E
F
F
F
A
C
D
E
F
F
F
F
F
F
F
N
N
C
Pt
8
7
7
7
6
5
4
3
3
3
3
3
3
3
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
28
28
22
18
17
12
09
12
11
10
09
07
05
03
09
05
05
05
04
03
03
03
02
06
04
03
02
01
01
01
01
01
01
01
01
01
†Models
based on the correct fold, as judged by the CASP3 threading assessor, are awarded one ‘‘fold point’’ as indicated by letters A through F and
counted in column ‘‘C.’’ Models for each target are ranked by accuracy using the CASP2 evaluation quantity CSpc (see text). Only model 1 (the
model with CASP3 model id ⫽ 1) is considered for each team and letters A through F (most through least accurate) score 6 through 1 accuracy
points respectively, as at CASP3. Accuracy points are summed in column ‘‘Pt.’’ Targets 52 and 56 were considered novel folds by the CASP3
assessor. ‘‘N’’ indicates that ‘‘None’’ was predicted for the model with id ⫽ 1, and awarded one ‘‘fold point’’ and one accuracy point.18 Ranking of
teams follows the number of ‘‘fold points’’ and secondarily the sum of accuracy points, as at CASP3. Column ‘‘MR’’ gives the rank assigned to this
team in the CASP3 assessor’s ‘‘formula-1’’ table, as presented at CASP3.18 We note that the CASP3 assessor’s evaluation of model accuracy was not
based strictly on numerical measures.3 We find, however, that evaluation of model accuracy based on the sf0 and sf4 measures emphasized at
CASP3 gives very similar results: Assignment of model accuracy points based on (sf0 ⫹ sf4)/nres gives a ranking of the top six teams that is the
same as shown here, and almost the same as the ranking presented at CASP318. We emphasize that the CASP3-style assessment shown here is
based on relative accuracy; models listed as ‘‘A’’, for example, need not cross the CASP2-style accuracy thresholds applied in Table II and Fig. 2.
completely misaligned, with ASp4⫽0, while PROSUP assigns nonzero sf0⫹sf4. Similarly one sees a few models
with no correctly predicted contacts, CSpc⫽0, but with
nonzero sf0⫹sf4 values. Presumably this difference reflects PROSUP’s search for alternative structure-structure
alignments that maximize sf0, although we note that we
cannot directly compare the VAST and PROSUP alignments, since the latter have not been distributed. There
are also a few models with intermediate values of (sf0⫹sf4)/
nres and/or CSpc that appear to be based on templates not
recognized as similar by VAST. Some are cases where the
CASP3 prediction did not name the template used to build
the model, and others are cases where the extent of
target-template similarity falls below VAST’s significance
threshold. The above analysis shows that these differences
are minor issues, however, in comparison of CASP2 and
CASP3 model accuracy measures. As was concluded after
CASP2, different structure-structure comparison methods
tend to agree in their identification of the more accurate
threading models.5,11
222
A. MARCHLER-BAUER AND S.H. BRYANT
Fig. 2. ‘‘Phase diagram’’ of target difficulty for CASP2 and CASP3 fold
recognition targets. The extent of structural similarity of the target and
database templates is given as the fraction superimposed, the length of
the VAST structure-structure alignment divided by the length of the target
chain or domain. The degree of sequence similarity is given by the
percentage of identical residues in the VAST alignment. CASP2 foldrecognition targets are numbered 2 through 38 and CASP3 foldrecognition targets are numbered 43 through 83. Targets for which at least
one team predicted an accurate model are indicated by a large square
symbol. Small symbols indicate other aspects of similarity of the target
and database templates: ‘‘x’’ indicates that the target shares recognizable
sequence motifs with one or more database templates (see text). Circles
indicate that the similarity is detected by VAST only, with filled circles
indicating ‘‘impossible’’ targets where the common substructure detected
in the template is not very extensive, predicting 25% or less target residue
contacts. For each target we consider only those structural neighbors that
were available at the time of CASP2 or CASP3, taking VAST data from the
sets distributed to predictors at the time.15,17 When more than one
database template is similar to the target we average across those
structural alignments where alignment length is at least 85% of the length
of the longest VAST alignment. Target length is taken as the length of the
chain except in cases when domain boundaries were specified by the
CASP2 or CASP3 organizers or correctly identified by one or more teams.
For CASP3 targets 63, 71, 79, and 83, two domains were identified, and
we consider as the target the domain most similar to database templates.15 Two of these domains are listed separately as 63b and 71a in the
assessor’s table18 and in Table 1; there were few predictions for the
additional domains of targets 63 and 71, but if they are treated as separate
targets (63a and 71b)18 they fall in the ‘‘hard’’ region of the plot, with no
accurate predictions. We exclude targets where similarity to database
templates was recognizable by BLAST19 with default parameters; this
affects only CASP3 target 85. We note that all CASP3 targets shown as
accurately modeled were predicted by at least one group as the model
with id⫽1, with the exception of target 71, where a single model with id⫽4
had (sf0⫹sf4)/nres ⬎ 50%.
More or Less Difficult Targets?
To understand the relationship of target difficulty and
prediction success we identify in Figure 2 those targets for
which at least one team produced an accurate model. We
employ the ‘‘critical’’ accuracy threshold suggested after
CASP2,11 that at least 50% of aligned residue pairs in the
threading alignment agree with aligned residue pairs in
the reference structure-structure alignment, within a shifterror tolerance of 4 residues. For CASP2 targets accurate
models are those with ASp4 greater than 50% or CSpc
greater than 25%. For CASP3 targets accurate models are
those where (sf0⫹sf4)/n is 50% or greater, equivalent to
the Alignment Specificity threshold from CASP2. We also
require that predictors place at least 20% confidence in the
corresponding model. For CASP2 models, Fold Recognition Specificity (Conf x TSpc) must be 20% or greater,6 and
for CASP3 models at least one of the 5 allowed alternative
models must cross the model accuracy threshold. Fragmentary models including less than 45% of chain or domain
residues are excluded and considered inaccurate for both
CASP2 and CASP3 predictions.
In Figure 2 we plot a ‘‘phase diagram’’ of target difficulty
for fold recognition targets from both CASP2 and CASP3.
Each target is characterized by two values, the fraction of
target residues that may be superimposed on database
templates and the fraction of identical residues in the
corresponding structure-structure alignments. These values reflect the extent and degree of similarity to previously
known structures, and together they reflect the difficulty
or ‘‘predictability’’ of a fold-recognition target. Data are
based on structure-structure alignments by the VAST
algorithm,10,15,17 although we note that values for CASP2
targets are very similar to those calculated previously
from a combination of VAST and DALI alignments.11 As
one might expect for fold recognition targets selected by
the CASP2 and CASP3 organizers, all targets fall in the
‘‘twilight zone’’ of sequence similarity, below 20% identity.
The CASP2 and CASP3 targets vary widely, however, with
respect to the extent of structural similarity to available
templates.
223
PROGRESS IN FOLD RECOGNITION?
TABLE II. Counts of Accurate Models From CASP2-Style Assessment and Properties of the Targets
and Best Models From CASP2 and CASP3†
T04
T31
T02
T14
T38
T20
T22
T54
T79
T46
T53
T63
T71
T81
T83
T43
T44
T59
T67
T80
D
Size
#Crct
SCRms
SCFrac
SC%Id
BSCLen
BSCRms
BMLen
BMRms
BMCSpc
E
E
M
M
M
H
H
E
E
M
M
M
M
M
M
H
H
H
H
H
84
242
88
252
152
320
591
202
65
119
264
65
125
152
80
158
347
75
187
219
16
14
1
7
2
0
0
3
7
6
5
2
1
6
5
0
0
0
0
0
2.5
2.6
2.2
4.1
3.7
4.3
3.1
1.8
2.4
3.3
3.4
1.7
2.7
3.4
3.1
2.9
3.3
1.9
2.8
2.1
58.69
72.19
69.32
80.36
78.29
36.25
13.54
46.04
87.69
68.32
87.88
73.38
71.84
67.89
70.12
51.65
52.42
50.00
37.43
29.45
12.8
15.4
6.6
8.3
10.2
8.2
11.2
17.2
7.0
7.3
11.2
8.8
8.5
10.9
9.6
11.9
11.2
9.0
10.0
16.3
61
188
70
204
94
90
65
93
44
88
232
50
99
106
59
88
187
40
70
62
2.05
2.38
1.87
2.72
3.50
2.71
2.18
1.80
1.90
3.10
3.40
1.90
2.40
2.20
2.00
3.20
3.10
1.30
2.80
2.20
65
202
64
132
98
203
93
116
51
84
204
60
84
109
65
105
209
62
118
118
2.97
4.16
2.83
5.45
5.72
7.63
9.27
8.80
4.33
6.62
5.96
3.81
8.48
2.93
4.91
15.00
15.60
9.47
17.40
16.10
61.6
70.7
65.0
33.0
53.9
28.8
9.3
40.8
37.5
51.0
39.3
67.5
35.4
70.3
44.7
21.9
17.5
34.4
24.8
18.1
†Column
‘‘D’’ refers to target difficulty: ‘‘E’’ for easy targets, ‘‘M’’ for medium targets, and ‘‘H’’ for hard targets. Targets 2 through 38 are from the
CASP2 experiment. ‘‘Size’’ is the number of residues in the chain or domain regarded as the prediction target. Column ‘‘#Crct’’ is the number of
models, counting only one from each team, that cross specific accuracy thresholds (see text). ‘‘SCRms’’ is the average C␣ RMS-residual between the
target and VAST structural neighbors, including all neighbors where alignment length is 85% or more of the longest VAST alignment. ‘‘SCFrac’’ is
the fraction superimposed by VAST, and ‘‘SC%Id’’ the percentage of identical residues in structural superpositions, averaged as for ‘‘SCRms.’’
‘‘BSCLen’’ is the length of the longest VAST alignment and ‘‘BSCRms’’ its RMS superposition residual. ‘‘BMLen’’ gives the extent of a ‘‘best’’ model,
chosen according to the value of CSpc. ‘‘BMRms’’ gives the RMS-residual when this model is superimposed on the true target structure and
‘‘BMCSpc’’ the percentage of contacts predicted correctly by the model. For brevity we exclude ‘‘impossible’’ fold-recognition targets from Table II.
For CASP2, ‘‘impossible’’ targets are those where the jury of structure comparison methods did not identify significant similarity with database
templates.11 For CASP3, ‘‘impossible’’ targets are those where the CASP3 assessor considered no prediction to be based on a correct fold, awarding
no score of F or above, and where structural neighbors identified by VAST, averaged as for SCRms, conserve less than 25% of contacts in the target
domain. ‘‘Impossible’’ CASP3 targets identified by the latter criterion are targets 52, 56, 61, 75, and 77, as indicated in Figure 2.
The relationship of target difficulty and prediction success is rather obvious in Figure 2. One may simply draw a
line that separates all 13 targets for which an accurate
model was predicted from the remaining 12 targets where
no team predicted an accurate model. The targets for
which accurate models were predicted are those with more
extensive structural similarity to database templates
and/or a greater degree of sequence similarity. Roughly
speaking, the accurately modeled targets are those where
60% or more of target residues could be superimposed on a
database template. The number of data points is small, but
there is no indication that this pattern has changed
between CASP2 and CASP3. CASP3 predictors do not
seem to have produced accurate models for targets of
greater difficulty, where there is less extensive structural
similarity to a previously known structure. We note,
however, that some models for hard CASP3 targets came
close to the critical accuracy threshold. Some models for
CASP3 target 44, for example, were accurate with respect
to individual domains of the target, even though they were
inaccurate with respect to the complete prediction.
More or Less Accurate Models?
To categorize targets by difficulty one may divide the
phase diagram in Figure 2 into distinct regions. Following
CASP2 we suggested a three-tier classification of ‘‘easy,
medium, and hard targets,’’11 and this same system seems
informative for CASP3. Hard prediction targets are those
for which the fraction of residues that may be superimposed on a known template is less than 60%. Medium
targets are those with more extensive structural similarity, where 60% or more of residues may be superimposed
on a database template, but with no sequence motifs
sufficient for fold identification. Easy targets are those
with sequence motifs sufficient for fold assignment, as
identified by PSI-BLAST20 and/or search of relevant literature, usually with 12% or more sequence identity. Under
this classification the only easy fold-recognition targets at
CASP3 are targets 54 and 79. Target 54 (VanX) was
assigned to a structural family present in PDB well before
the CASP3 experiment,21 and the helix-turn-helix DNAbinding motifs in target 79 (MarA) could be detected using
well known sequence-pattern collections.22
Table II shows the difficulty category for each CASP2
and CASP3 target and the number of accurate models for
that target. Table 2 also lists a few properties of what we
have picked as the best model for each target. One may see
from Table II that no accurate models were predicted for
hard targets at either CASP2 or CASP3. Accurate modeling of hard targets seems to be beyond the limits of current
224
A. MARCHLER-BAUER AND S.H. BRYANT
threading methods. Accuracy of the best models also seems
little changed from CASP2 to CASP3. There was one
model with less than 3 Angstroms RMS for a medium
target at CASP2, target 2, and the same is true at CASP3,
for target 81. The most striking difference between CASP2
and CASP3, perhaps, is the large number of accurate
predictions for easy targets at CASP2. There may be
several explanations for this. From Figure 2 we see that
CASP2 targets 4 and 31 are more sequence-similar to
database templates than other targets with extensive
structural similarity, with the exception of CASP3 target
54. These similarities may have been more accessible to
sequence-based prediction methods, contributing to a
higher level of prediction success. Targets 4 and 31 are also
members of well-understood structural families, the OBfold and the trypsin-like serine proteases, where characteristic sequence motifs were well-documented in the literature.12
There is a suggestion of progress, however, if one focuses
on the medium targets in Table II. While there were
relatively few accurate models for medium targets at
CASP2, most of the medium targets at CASP3 were
accurately modeled by five or six different teams. This
suggestion of progress is confirmed when one considers the
medium targets in more detail. Target 14, the CASP2
medium target with the greatest number of accurate
models, is perhaps ‘‘easier’’ than the rest: It is a member of
the TIM-barrel structural family, which is well described
in the literature and very common in the structural
database. The two CASP3 medium targets with the fewest
accurate predictions are perhaps a little ‘‘harder’’ than the
rest: Target 63 has two domains, but this was recognized in
advance by few predictors. Target 71 differs from database
templates in many structural details, such that the CASP3
assessor has assigned it to a novel SCOP superfamily.3
CASP2 certainly showed that accurate predictions for
medium targets are possible. At CASP3, however, these
predictions seem to have become more reliable, with
different threading methods producing both specific recognition and accurate models for all six medium targets. It is
also interesting to note that the two top-ranked teams at
CASP3 used largely automated alignment procedures,23,24
while the top-ranked team at CASP2 relied on manual
alignments.25
DISCUSSION
It is perhaps satisfying to find that the ‘‘extensive
changes of the fold-recognition evaluation criteria’’13 between CASP2 and CASP3 do not seem to have greatly
affected evaluation of model accuracy. Most of the model
accuracy measures used at CASP2 and CASP3 depend on
structure-structure alignments, and these have been calculated in different ways (target-template versus targetmodel comparison) using different structure comparison
programs (PROSUP versus DALI, SSAP, and VAST). We
find, however, that the resulting model accuracy measures
are highly correlated and that the differences have little
effect on assessment of CASP3 predictions. It is perhaps
not surprising that evaluations using target-template and
target-model alignments are similar: Threading models
copy coordinates from a template, and these alignments
are thus nearly equivalent. The most novel feature of the
CASP3 model accuracy measures is PROSUP’s search of
alternative structure-structure alignments, to find a reference alignment that maximizes sf0. This does not seem to
affect identification of the more accurate models, however,
as shown in Figure 1, and by graphical comparison of
sf0⫹sf4 values calculated using DALI as opposed to
PROSUP reference alignments.26 This is perhaps good
news for future CASPs, since it suggests that the complexity of allowing ‘‘bets’’ on alternative structure-structure
alignments is unnecessary for reliable model evaluation.
It is also satisfying, perhaps, to see a clear dependence of
prediction success on target difficulty. In the phase diagram of Figure 2 one sees an obvious relationship between
the occurrence of accurate models and the extent of
structural similarity of the target and database template.
When 60% of target residues may be superimposed on a
database template, roughly speaking, one or more teams
have predicted accurate models for all 13 easy or medium
targets from CASP2 and CASP3. Conversely, for the 12
hard or impossible targets with less than 60% of residues
superimposable on a database template, there were no
accurate models at either CASP2 or CASP3. Threading
methods score sequence-structure compatibility according
to how each residue from the target sequence ‘‘fits’’ the
structural environment of the site to which it is aligned.
That environment may be described as the solvent accessibility at that site, for example, or, in sequence-based
methods, as a list of residue types preferred at that site. As
structural similarity of the target and template becomes
less extensive, however, a greater proportion of environment descriptors will be incorrect: The actual solvent
accessibility at a conserved site in the target will differ, as
will the list of preferred residue types. Thus it would be
rather surprising if one did not see a dependence of
threading success on the extent of structural similarity. As
the extent of structural similarity goes down, the signal-tonoise ratio in a threading calculation must also go down.
What is perhaps least satisfying in the present analysis
is its indication of limited progress in fold recognition. As
one sees in Table II, there are a greater number of accurate
predictions at CASP3 for medium targets, and one may
readily conclude that threading methods have become
more reliable for detection of remote evolutionary relationships. On the other hand, one sees that hard targets
remain beyond the reach of threading methods. This can
be interpreted negatively, in the sense that threading
methods have a long way to go before they approach the
sensitivity of structure-structure comparison. Indeed, the
phase diagram in Figure 2 was originally proposed as a
means to measure improvement in threading sensitivity,11
but it shows no obvious improvement between CASP2 and
CASP3. Threading methods must ultimately fail, of course,
as the extent of structural similarity between target and
template decreases. Bearing this intrinsic limitation in
mind, one can also make a positive interpretation of the
similarity threshold apparent in Figure 2: Perhaps the
PROGRESS IN FOLD RECOGNITION?
best threading methods are already working about as well
as is possible. The only way to distinguish these alternative interpretations, of course, is to wait and see what
predictions are made at CASP4 and beyond!
10.
11.
ACKNOWLEDGMENTS
We thank the CASP3 experimentalists, predictors, and
organizers for providing prediction and evaluation data for
this analysis. We thank Ken Addess and Tom Madej for
calculating VAST structure-structure alignments for
CASP3 targets. We thank Anna Panchenko for valuable
discussions and the NIH intramural research program for
support.
REFERENCES
1. Moult J, Hubbard T, Bryant SH, Fidelis K, Pedersen JT. Critical
assessment of methods of protein structure prediction (CASP):
round II. Proteins Suppl 1997;1:2–6.
2. Moult J, Hubbard T, Fidelis K, Pedersen JT. Critical assessment
of methods of protein structure prediction (CASP): round III.
Proteins Suppl 1999;3:2–6.
3. Murzin AG. Structure classification-based assessment of CASP3
predictions for the fold recognition targets. Proteins Suppl 1999;3:
88–103.
4. Sippl MJ, Lackner P, Dominques FS, Koppensteiner WA. An
attempt to analyse progress in fold recognition from CASP1 to
CASP3. Proteins Suppl 1999;3:226–230.
5. Levitt M. Competitive assessment of protein fold recognition and
alignment accuracy. Proteins Suppl 1997;1:92–104.
6. Marchler-Bauer A, Bryant SH. Measures of threading specificity
and accuracy. Proteins Suppl 1997;1:74–82.
7. Feng ZK, Sippl MJ. Optimum superimposition of protein structures: ambiguities and implications. Folding & Design 1996;1:123–
132.
8. Holm L, Sander C. Mapping the protein universe. Science 1996;273:
595–602.
9. Orengo CA, Taylor WR. SSAP: sequential structure alignment
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
225
program for protein structure comparison. Methods in Enzymology 1996;266:617–635.
Gibrat J-F, Madej T, Bryant SH. Surprising similarities in structure comparison. Current Opinion in Structural Biology 1996;6:
377–385.
Marchler-Bauer A, Levitt M, Bryant SH. A retrospective analysis
of CASP2 threading predictions. Proteins Suppl. 1997;1:83–91.
Marchler-Bauer A, Bryant SH. A measure of success in fold
recognition. Trends in Biochemical Sciences 1997;22:236–240.
http://predictioncenter.llnl.gov/casp3/
http://rutgers.rcsb.org/pdb/
http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/casp3/
casp3vast.html
http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/casp3/
casp3eval.html
http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/casp2/index.html
http://predictioncenter.llnl.gov/casp3/results/FR-Summary.gif
Hubbard TJP, Ailey B, Brenner SE, Murzin AG, Chothia C. SCOP:
a structural classification of proteins database. Nucleic Acids Res
1999,27:254–256.
Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and
PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389–3402.
McCafferty DG, Lessard IA, Walsh CT. Mutational analysis of
potential zinc-binding residues in the active site of the enterococcal D-Ala-D-Ala dipeptidase VanX. Biochemistry 1997;36:10498–
10505.
Hofmann K, Bucher P, Falquet L, Bairoch A. The PROSITE
database, its status in 1999. Nucleic Acids Research 1999;27:215–
219.
Jones DT, Tress M, Bryson K, Hadley C. Successful recognition of
protein folds using threading methods biased by sequence similarity and predicted secondary structure. Proteins Suppl 1999;3:104–
111.
Panchenko A, Marchler-Bauer A, Bryant SH. Threading with
explicit models for evolutionary conservation of sequence and
structure. Proteins Suppl 1999;3:133–140.
Murzin AG, Bateman A. Distant homology recognition using
structural classification of proteins. Proteins Suppl 1997;1:105–
112.
http://PredictionCenter.llnl.gov/casp3/SUMMARY/cm1/
Документ
Категория
Без категории
Просмотров
2
Размер файла
117 Кб
Теги
231
1/--страниц
Пожаловаться на содержимое документа