close

Вход

Забыли?

вход по аккаунту

?

465

код для вставкиСкачать
PROTEINS: Structure, Function, and Genetics Suppl 3:126–132 (1999)
Cooperative Approach for the Protein Fold Recognition
Motonori Ota,1* Takeshi Kawabata,1 Akira R. Kinjo,1,2 and Ken Nishikawa1,2
Institute of Genetics, Mishima, Japan
2Department of Genetics, School of Life Science, The Graduate University for Advanced Studies, Mishima, Japan
1National
ABSTRACT
We, four independent predictors,
organized a team and tackled blind protein structure predictions using fold recognition methods. We
tried to assign the homologous or analogous folds in
the protein structure database for a number of
target sequences that showed no apparent sequence
homology to the proteins of known folds. After
primary analyses by conventional softwares, these
sequences were threaded through the structural
library using three different programs developed by
ourselves, which employed different compatibility
functions. Collecting the results of our individual
analyses, and the available biological knowledge
about the target, we held meetings and discussed all
plausible structures for the target. For 25 target
sequences, we submitted 56 models including NONE:
This was the first time the fold was determined. At
the time of the meeting (CASP3), 19 protein structures (21 domains) categorized as the threading
targets were available. We succeeded in predicting
eight out of 18 targets (20 domains) that we submitted; however, alignment accuracies were not
satisfactory for some of the models. We often obtained correct answers even if some of us missed
the right prediction; therefore it would appear
that our threaders compensated each other. When
all the information is managed effectively, the prediction gains more accuracy. Proteins Suppl
1999;3:126–132. r 1999 Wiley-Liss, Inc.
Key words: CASP; structure prediction; threading;
compatibility function; homology
INTRODUCTION
In the critical assessment of techniques for protein
structure prediction (CASP), we are challenged to predict
the structures of proteins that are soon to be determined
experimentally. One of the categories for predicting the
targets, which have no apparent homology to the proteins
of known structures, is the fold recognition. In this category, we are asked to find known structures, whose folds
are similar to that of the targets. For such problems, the
sequence-structure compatibility search methods (threading) have been developed in the last decade.1–3 Although
the principle of the method is promising, it is not yet
reliable enough to allow for the abolition of human intervention. In order to conduct an accurate prediction, profound biological knowledge about the target is also needed.4
It is, however, too difficult for a human being to carry out
r 1999 WILEY-LISS, INC.
whole analyses for all the targets (more than 20) within
the prediction season (about 3 months).
On participating at the third meeting of the blind
prediction experiment (CASP3), we organized the team
UNAGI (the Japanese word for eel for which Mishima,
where our institute is located, is famous), which consists of
independent predictors having their own threaders (programs for the fold recognition). We exchanged information
about the target sequences and examined the results of our
individual analyses. Then, after coming to an agreement,
we finally decided our submission models.
In this article, we show that our team cooperated well
and that the hybrid method of combining a few threaders
with human intervention was successful, if they were
managed effectively.
METHODS
First, the target sequences were analyzed with publicly
available software. Sequence homology searches were
performed using FASTA,5 BLAST,6 and PSI-BLAST,7 and
homologous sequences thus found were aligned using
CLUSTALW8 for the detailed analyses. Literature searches
were conducted utilizing SWISS-PROT references9 and
PubMed.10 Second, public or in-house software was used
for the more complicated analyses. Secondary structure
prediction was carried out using SSThread,11 PHD,12
JOINT,13 and BW-MGOR.14 Sequence motif searches were
conducted using PROSITE.15 Third, using the target sequence, its homologs and their multiple alignment were
threaded through the structural library which contained
approximately 1,400 structures taken from release 83 of
the Protein Data Bank (PDB).16 For a single sequence, we
performed threading using COMPASS,17 S3 (Kinjo, unpublished), and LIBRA.18 COMPASS, one of the classic threaders, uses a set of knowledge-based functions which take
into account four terms: side-chain packing, hydration,
local-conformation, and hydrogen-bonding.19 S3 employs
four terms that are the same as those of COMPASS, but
are different in the classification of the structural features
for the local structure and side-chain packing functions. S3
takes into account local structure in more detail and
considers five consecutive residue sites simultaneously,
including the ␤-bulge and the N- and C-cap structures at
the ␣-helix termini. However, S3 uses simple residue-wise
distance potential for the side-chain packing function and
*Correspondence to: Motonori Ota, National Institute of Genetics,
Mishima, Shizuoka 411-8540, Japan. E-mail: mota@genes.nig.ac.jp
Received 1 February 1999; Accepted 19 April 1999
127
COOPERATIVE APPROACH FOR FOLD RECOGNITION
TABLE I. Summary of the UNAGI’s Predictions†
Target
T0043 (HPPK)
T0044 (RTCA)
T0045 (YBAK)
T0046 (ADG)
T0051 (GLME)
T0052 (CV-N)
T0053 (CBIK)
T0054 (VANX)
T0056 (DNAB)
T0061 (HDEA)
T0062 (UBIB)
T0063 (IF5A)
T0067 (PBP)
T0068 (PGL2)
T0071 (ADAC)
T0072 (CD5)
T0074 (EPS15)
T0075 (ETS-1)
T0077 (L30)
T0078 (TESB)
T0079 (MARA)
T0080 (3MG)
T0081 (MGSA)
T0083 (CYNS)
T0085 (C554)
Length
MD1
158
347
158
119
483
101
264
202
114
89
232
138
187
376
238
110
98
110
105
288
129
219
152
156
211
1cus
1asyA
NONE
2mcm*
1reqB
NONE*
1ak1
NONE
NONE*
1ngr
2cnd
1rsy
NONE
1rmg
NONE
1vfaA
2scpA
NONE
1tmy
NONE
1pdnC*
NONE
1rnl
1r69*
1fgjA
MD2
1ble
1atiA
1hrdA
1tul
MD3
MD4
2fx2
3hsc
1tmy
1fivA
1scuB
1btmB
1slcA
1pczAB*
1lbu
1bmfD
1am3
3D
DV
SE
AE
2admA
冑
冑
F
F
w
w
w
w
冑
F
R
F
冑
冑
N⫹
R
N⫹
B
?
N
w
1vhh
1hulA
NONE*
冑
冑
F
F
F
F
F
1slcA
冑
冑
冑
冑
F
F
C
F
w
w
R
R
w
冑
冑
冑
C
F
F
R
w
w
w
⫹
冑
冑
冑
冑
冑
F
F
F
F
F
R
w
?
R
R
F
w
●
E
D
2snv
1hurB
1tf4A*
1noa1
MD5
1aoa
1div
1sfe
1a0i
NONE
6fabH*
N
N
w
w
†MD1–5
are the model structures we submitted. Column 3D shows whether the 3D coordinate of the target is available for self-evaluation. The
division of each target is shown in the DV column (C, comparative modeling; F, fold recognition). The column SE indicates the results of the
self-evaluation (R, right; N, correct as NONE; ⫹, bonus; ?, not sure; w, wrong). The models self-evaluated are marked with the asterisk on their
shoulders. The column AE shows the evaluation by the assessor.23 Alphabetical codes mean the relative accuracy of the model; A (excellent) to F
(OK). N, NONE is right; ⫹, bonus; ?, unsure; w, wrong; ‘‘●’’, near the right model. Blank means either that the 3D model was not available or that
the protein was not regarded as a fold recognition target.
does not consider the contact probability as does
COMPASS. Adjustment of the weighting for each function
employed in COMPASS was not adopted, and therefore the
four functions were weighted equally. The sequencestructure alignment algorithm employed was the same
as that of COMPASS. LIBRA uses the same terms as
COMPASS but employs a different normalization scheme
for the scoring function20 and can also accept multiply
aligned sequences. When it was necessary, the inverse
folding search (structure-recognizes-sequence protocol) was
carried out against the sequence database using LIBRA.
For targets whose secondary structure was already known
or roughly deduced, a new structure comparison method,
MATRAS (Kawabata, manuscript in preparation), was
employed for the search. Considering the environmental score (explained later), a similar structure search
was performed against the secondary structural
library of PDB using the secondary structure of the target
protein.
The final decision-making was not an easy or straightforward process. After individual analyses of each target, we
held a meeting. We agreed immediately when the threaders each produced similar results, or where other information, e.g., sequence motifs, made fold recognition easy. For
most targets, however, we could not immediately decide on
the most appropriate models. In such cases, we customized
our programs to take account of a target-specific feature
such as sequence motifs, disulfide bonds, or hypothetical
domain structures. In cases where we could not find
consistency among our results, we concluded that the fold
was new (submitted as NONE).
For self-evaluation of the models, we used MATRAS.
The program MATRAS (MArkovian TRAnsition of protein
Structure) was designed for comparing protein tertiary
structures using a log-odds structure similarity matrix
that was derived from homologous PDB entries according
to Dayhoff et al.21 However, formalism was applied to the
changes of structural features instead of amino acid substitutions. Three different scores were compiled: secondary
structure element (SSE), environment (local structure and
solvent accessibility), and C␣-pair distance. When we used
MATRAS for the structural alignment, SSEs were aligned
roughly by the SSE score and refined using the environmental and C␣-pair distance scores. A threshold for significant
similarity was determined by the score distribution of the
same fold structures in the SCOP database.22
RESULTS
We submitted 56 models for 25 targets. Among them, 18
structures (20 domains) were experimentally determined
and regarded as the fold recognition targets (we withdrew
from T0059, because it is too short and not suitable for our
Fig. 1. The TOPS diagrams25 of structure of T0053 (a) and 1ak1 (b).
Circles and triangles denote the ␣-helices and ␤-strands, respectively.
The matched segments are colored by red (␣-helices) or yellow (␤strands), whereas the insertions are not colored. The segments incorrectly aligned are marked as ‘‘miss.’’ The conserved motif in 1ak1 (marked
as ‘‘MOTIF’’) is disrupted in the target T0053. Omitting the N-terminal 60
residues, our alignment is very accurate. The RMSD is 5.8 Å. The ASp4
measure,38 the rate of the number of the aligned residue pairs in the
submitted alignment and structural alignment by MATRAS that agree to
within a shift error of four residues to the total number of the residue pairs
in the alignment, is 91.2%.
Fig. 2. Ribbon diagrams of T0083 (a) and 1r69 (b) drawn with MolScript.39 The aligned regions
are colored with red. They can be superimposed with RMSD 3.5 Å. ASp436 is 95.2%.
COOPERATIVE APPROACH FOR FOLD RECOGNITION
129
Fig. 3. Ribbon diagrams of T0052 (a) and 1pczAB (b) drawn with MolScript.39 Three strands in
the structures can be superimposed with RMSD 5.6 Å (c). T0052 (blue). 1pczAB (green).
threaders), and eight predictions were recognized as correct.23 The results and their evaluations made by the
assessor as well as by ourselves using MATRAS are
summarized in the Table I. Except for a few cases, our
self-evaluation agreed with those of the assessor. Some of
our submitted models are explained in detail in the
following sections.
Target T0053 (CbiK protein)
The T0053 sequence was threaded through the structural library. A threader, COMPASS, ranked 1ak1 (ferrochelatase) structure at the first place with a significant
compatibility score of about ⫺3.3 (a compatibility score
less than ⫺3.0 is usually significant). Utilizing the other
threaders (S3 and LIBRA), this structure appeared within
the top five structures. Although the results of secondary
structure prediction (irregular ␣/␤ fold) was inconsistent with the prediction (1ak1 has two regular ␣/␤
domains) and the conserved sequence motif in 1ak1 family
(PROSITE ID: PS00534) is disrupted in T0053, the functional similarity between the target and 1ak1 (the former
is involved in vitamin B12 synthesis, the latter, heme
synthesis) strongly supported our results.24 Therefore, we
submitted only 1ak1 structure. As a result, we predicted
the correct fold, and our alignment was recognized as
the second best model for T0053 (Table I). The topology
diagrams drawn by TOPS25 for the target structure
and our model are shown in Figure 1. We missed the
correct alignments for the N-terminal 60 residues because
COMPASS was not able to skip the two inserted ␣-helices
after the first ␤-strand of 1ak1. The root mean square
deviation (RMSD) measured for the whole structures is
12.8 Å, but it decreases to 5.8 Å if we remove the first 60
residues of the model.
Target T0083 (Cyanase)
The T0083 sequence, its related sequences, and their
multiple alignment were threaded through the structural
library; however, the results were not significant. Although the results of the secondary structure prediction
strongly suggested that the fold was of mainly ␣ type, we
could not find such folds among the structures that ranked
at high positions in the compatibility searches. When the
structures were sorted by the compatibility score normalized by the alignment length using a threader, LIBRA, we
found that Trp repressor (1trrA) and Cro repressor (1r69)
structures had very good compatibility scores, ⫺3.3 and
⫺2.8, respectively. These structures are all ␣-type DNAbinding units, which is consistent with those of the secondary structure prediction. The Trp repressor and the Cro
130
M. OTA ET AL.
TABLE II. Summary of the Ability of Each Method†
Target
a) For our submission
T0046
T0053
T0068
T0071, first domain
T0074
T0079
T0081
T0083
T0085
b) For the additional remarks
T0071, second domain
T0080
T0081
†Good,
Type
␤
␣/␤
␤
␤
␣
␣
␣/␤
␣
␣
␤
␣⫹␤
␣/␤
Model
IG
1ak1
1rmg
IG
Calmodulin
1pdnC
1rnl
1r69
1fgjA
TATA-binding
1fmtA
1jdbK
COMPASS
Good
Good
Middle
S3
LIBRA
SeqSearch
Motif
Middle
Good
Middle
Middle
Good
Middle
Good
Middle
Middle
Good
(Middle)
Good
Exist
Exist
Middle
Exist
Good
Middle
Middle
Middle
Middle
detect with the significant level; middle, rank the answer at the first (second) position, but not significant; IG, immunoglobulin fold.
repressor do not share the same fold.22 In order to investigate which fold was more compatible, the inverse-folding
protocol searches with LIBRA were performed against the
sequence database constructed from PDB rel.83 plus the
target sequences. The 1r69 structure26 showed good compatibility with the target, whereas the 1trrA structure did
not. The compatibility scores of the target and its related
proteins with 1r69 were in the range between ⫺4.8 and
⫺2.6. The 1r69 structure was aligned with the N-terminal
half of the target. Looking at the multiple alignment of the
target and its homologs, a Pro-rich region is found in the
middle (residues 78–92), and we consider that it may form
a coil structure and, therefore, would act as a hinge for the
two domains. The N-terminal half of the target and our
model (Fig. 2) can be superimposed with RMSD 3.5 Å
according to our submitted alignment.
Target T0085 (Cytochrome C554)
The T0085 sequence was threaded through the structural library using our threaders; however, the results
were not significant. These results were as we expected: It
has been observed that our threaders cannot properly
treat cytochromes that bind multiple hemes.18,27 We
thought that the four heme-binding motifs (CXXCH: cytochrome c family heme-binding site signature) were crucial.
A sequence similarity search against the database of
proteins with known structures found the group of the
cytochrome c3 (2cdv, etc.), the group of the cytochrome
c553 (1dvh, etc.), and hydroxylamine oxidoreductase
(1fgjA), all of which have the heme-binding motifs. Among
them, 1fgjA28 was the one whose length between motifs
was suitable; also, the target is the hydroxyalamine oxidoreductase-linked cytochrome. Therefore, we considered
that the structure might be a plausible candidate for the
compatible structure. Each heme in 1fgjA structure is
bound to two histidines: One is in the CXXCH motif, and
the other is in a different site. Next, we analyzed the
alignment. In a suboptimal alignment by PSI-BLAST, two
of the other heme-binding histidines of 1fgjA were aligned
with two tyrosines of the target. Tyrosine seemed suitable
as a heme-binding residue. Finally, we performed the
inverse-folding search by LIBRA using the part of 1fgjA
structure aligned to T0085 against the sequence database
with the target sequence. The target sequence ranked at
the second place. This result supported the compatibility
between T0085 and 1fgjA. The solved target structure
suggests its evolutionary relationship with 1fgjA.29 Our
alignment was wrong: All the heme-binding residues were
indeed histidines. One of the reasons for the incorrect
alignment is that there are eight hemes in 1fgjA whereas
there are four in the target; this difference in the number
of hemes made finding the correct alignment difficult.
Target T0052 (Cyanovirin-N)
The target sequence exhibits an internal duplication:
the N-terminal half (residues 1–50) and the C-terminal
half (residues 51–101) of the sequence are homologous to
each other.30 Therefore, we assumed that the structure of
the target would be symmetric. Two disulfide bonds were
known to exist.30 We performed threadings with the target
sequence and the two halves of the sequence, and the
‘‘synthesized’’ sequences in which the N-terminal half or
the C-terminal half was repeated twice. No significantly
compatible structures that readily met the requirement of
symmetry and disulfide bonds were found. Therefore, we
submitted NONE as our first model. Although our first
model was NONE, we thought that the TATA-box–binding
protein might possibly meet our hypotheses when examining the results of the threading. 1pcz, a TATA-box–binding
protein, was one of the relatively highly compatible structures. However, its monomeric structure is nonglobular
and probably unstable (the target was supposed to be
monomeric30 ). Therefore, we synthesized a chimerical
structure from the interacting domains of its dimeric
structure (1pczAB, Fig. 3b). The actual structure, recognized as a new fold, contains two symmetrically arranged
␤-sandwich domains as shown in Figure 3a.31 One domain
consists of two ␤-sheets, one with three ␤-strands from the
N (or C) terminal half of the sequence and the other with
two ␤-strands from the other half. The former ␤-sheet and
COOPERATIVE APPROACH FOR FOLD RECOGNITION
the corresponding part of our model can be superimposed
with an RMSD of 5.6 Å (Fig. 3c). Although the interchange
of segments (two ␤-strands, in this case) between domains
is difficult to predict, our partially correct prediction
encourages the development of a new prediction method by
the fragment combinatorial approach.32
131
For the target, we paid attention to the crystallographer’s remark that the secondary structure prediction by
PHD was ‘‘quite accurate.’’ Therefore, we assumed that the
structure of T0044 was an irregular ␣/␤ type according to
PHD. We carried out the threading of the target sequence
and its homologs and their multiple alignment. We could
not obtain the significant threading results, yet we could
not submit NONE because the crystallographer also mentioned that the answer already existed in PDB. As a next
trial, we performed a secondary-structure threading by
MATRAS using the output of the PHD prediction against
the secondary structure library of PDB. We also took into
account the functional similarity: ATP-binding and RNAbinding abilities might be required for the answer structure. Finally, we chose as many as five structures that
were ranked at relatively high positions in both the
threading and secondary-structure threading and met the
functional requirements (Table I). None of our submission
hit the answer structure. The target structure is composed
of four domains. The structural alignment of T0044 and
1nawA (the answer in our library) requires large gaps to
skip the third domain, consisting of about 90 residues (this
is the largest domain). It was very difficult for the secondary-structure threading to allow these gaps; therefore,
1nawA could not rank at a high position.
answers, was found at the third place using LIBRA, but we
submitted the results obtained by MATRAS instead. This
shows a typical example that the use of multiple methods
poses difficulties in the final decision-making process. For
the second domain of T0071, the compatible structures
predicted with S3 were TATA-box–binding proteins (1st:
1aisA, 2nd: 1ytbA), which were correct answers. However,
we did not submit it because there was little evidence to
support the result and we were too concerned about the
first domain.
We also noticed that the increment of the PDB entries is
significant, and constant updating of the structural library
is important for fold recognition. Our basic structural
library was compiled from the PDB rel.83. It did not
contain the correct answers for T0080 (3MG) and T0081
(MGSA) during the prediction season. Thereafter, we
performed the threading against the new structural library including the answers (1fmtA for T0080, 1jdbK for
T0081). All the three threaders used detected clear similarity between 1jdbK and the target T0081 (Table IIb). Thus,
it was very probable that we submitted the structure of
1jdbK as the first model for T0081. Only the threading by
LIBRA ranked 1fmtA at the top position of the compatibility score, although the alignment was incorrect.
The cooperative approach we took for CASP3 prediction
consumed a great deal of human-power, so, for example, it
would not be applicable to the analysis of a large number of
sequences, such as genomes.37 However, each threader has
its preference, and if we can manage their preferences
well, the analysis may become easier, gain accuracy, and
could be partially automated in the future. The lessons
from this experiment will contribute to such large-scale
analyses.
DISCUSSION
ACKNOWLEDGMENTS
We correctly predicted eight targets out of 18 (20 domains). It appears that the success could not have been
achieved if we had employed only one or a few methods.33
Actually the methods we employed compensated each
other, and we eventually reached the correct structure,
even if one method failed. Contributions of each method for
each target are summarized in Table IIa. COMPASS looks
suitable for the prediction of ␣/␤-type or ␤-type proteins,
whereas LIBRA appears effective to predict ␣-type protein.
S3 sometimes supports the correct answer by ranking it at
a relatively high position (not shown). The selection of the
submission was not straightforward because the manner
depended on each target as already mentioned in the
Results section.
The biological or functional knowledge gleaned from the
literature helped the selection and, in many cases, led us to
the correct answers. Surprisingly, most of the hypothetical
papers or predicted motifs were proven to denote the truth:
The structure of T0054 (VANX) 34 is similar to 1lbu or 1vhh
fold,35 and the two Helix-Turn-Helix motifs do exist in
T0079 (MARA).36
Human intervention sometimes mislead us; that is,
some correct answers were missed through the meetings.
In the case of the target T0044, 1nawA, one of the correct
We thank the organizers and the assessors for the
preparation of this experiment and meeting. We are also
grateful to the structure submitters for offering their
experimental structures of the target protein before the
publication. In addition, we thank Rosemary Chapman
and Thomas D. Andrews for the critical reading of the
manuscript. A.R.K. is a predoctoral research fellow of the
Japan Society for the Promotion of Science.
Target T0044 (RNA-38 Terminal Phosphate Cyclase)
REFERENCES
1. Lamer M-R, Rooman MJ, Wodak SJ. Protein structure prediction
by threading methods: evaluation of current technologies. Proteins 1995;23:337–355.
2. Levitt M. Competitive assessment of protein fold recognition and
alignment accuracy. Proteins Suppl 1997;1:92–104.
3. Marchler-Bauer A, Levitt M, Bryant SH. A retrospective analysis
of CASP2 threading predictions. Proteins Suppl 1997;1:83–91.
4. Murzin A, Bateman A. Distant homology recognition using structural classification of proteins. Proteins Suppl 1997;1:105–112.
5. Pearson W, Lipman D. Improved tools for biological sequence
comparison. Proc Natl Acad Sci USA 1988;85:2444–2448.
6. Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local
alignment search tool. J Mol Biol 1990;215:403–410.
7. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs. Nucleic Acids Res 1997;25:
3389–3402.
8. Thompson J, Higgins D, Gibson T. CLUSTAL W: improving the
132
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
M. OTA ET AL.
sensitivity of progressive multiple sequence alignment through
sequence weighting, position-specific gap penalties and weight
matrix choice. Nucleic Acids Res 1994;22:4673–4680.
Bairoch A, Apweiler R. The SWISS-PROT protein sequence data
bank and its supplement TrEMBL in 1999. Nucleic Acids Res
1999;27:49–54.
http://www.ncbi.nlm.nih.gov/PubMed/
Ito M, Matsuo Y, Nishikawa K. Prediction of protein secondary
structure using the 3D-1D compatibility algorithm. Comput Appl
Biosci 1997;13:415–424.
Rost B, Sander C. Combining evolutionary information and
neural networks to predict protein secondary structure. Proteins
1994;19:55–72.
Nishikawa K, Noguchi T. Predicting protein secondary structure
based on amino acid sequence. Methods Enzymol 1991;202:31–44.
Kawabata T, Doi J. Improvement of protein secondary structure
prediction using binary word encoding. Proteins 1997;27:36–46.
Hofmann K, Bucher P, Falquet L, Bairoch A. The PROSITE
database, its status in 1999. Nucleic Acids Res 1999;27:215–219.
Sussman JL, Lin D, Jianag J, Manning NO, Prilusky J, Ritter O,
Abola EE. Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta
Cryst D 1998;54:1078–1084.
Matsuo Y, Nishikawa K. Protein structural similarities predicted
by a sequence-structure compatibility method. Protein Sci 1994;3:
2055–2063.
Ota M, Nishikawa K. Feasibility in the inverse protein folding
protocol. Protein Sci 1999;8:1001–1009.
Matsuo Y, Nishikawa K. Assessment of a protein fold recognition
method that takes into account four physicochemical properties:
side-chain packing, solvation, hydrogen-bonding, and local conformation. Proteins 1995;23:370–375.
Ota M, Kanaya S, Nishikawa K. Desk-top analysis of the structural stability of various point mutations introduced into ribonuclease H. J Mol Biol 1995;248:733–738.
Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary
change in proteins. In: Dayhoff MO, editor. Atlas of protein
sequence and structure. 5 suppl. 3, Washington, DC: National
Biomedical Research Foundation, 1978. p 345–352.
Murzin AG, Brenner SE, Hubbard T, Chothia C. Scop: a structural
classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995;247:536–540.
Murzin AG. Structure classification-based assessment of CASP3
predictions for the fold recognition targets. Proteins Suppl 1999;3:
88–103.
Raux E, Thermes C, Heathcote P, Rambach A, Warren M. A role for
Salmonella typhimurium cbiK in cobalamin (vitamin B12) and
siroheme biosynthesis. J Biotechnol 1997;179:3202–3212.
25. Westhead D, Hatton D, Thornton J. An atlas of protein topology
cartoons available on the worldwide web. Trends Biochem Sci
1998;23:35–36.
26. Mondragon A, Subbiah S, Almo SC, Drottar M, Harrison SC.
Structure of the amino-terminal domain of phage 434 repressor at
2.0 Å resolution. J Mol Biol 1989;205:189–200.
27. Matsuo Y, Nakamura H, Nishikawa K. Detection of 3D-1D
compatibility characterized by the evaluation of side-chain packing and electrostatic interactions. J Biochem (Tokyo) 1995;118:137–
148.
28. Igarashi N, Moriyama H, Fujiwara T, Fukumori Y, Tanaka N. The
2.8 Å structure of hydroxylamine oxidoreductase from a nitrifying
chemoautotrophic bacterium, Nitrosomonas europaea. Nature
Struct Biol 1997;4:276–284.
29. Iverson T, Arciero D, Hsu B, Logan M, Hooper A, Rees D. Heme
packing motifs revealed by the crystal structure of the tetra-heme
cytochrome c554 from Nitrosomonas europaea. Nature Struct Biol
1998;5:1005–1012.
30. Gustafson K, Sowder RI, Henderson L, Cardellina JI, McMahon J,
Rajamani U, Pannell L, Boyd M. Isolation, primary sequence
determination, and disulfide bond structure of cyanovirin-N, an
anti-HIV (human immunodeficiency virus) protein from the cyanobacterium Nostoc ellipsosporum. Biochem Biophys Res Commun
1997;238:223–228.
31. Bewley C, Gustafson KR, Boyd MR, Covell DG, Bax A, Clore GM,
Gronenborn AM. Solution structure of cyanovirin-N, a potent
HIV-inactivating protein. Nature Struct Biol 1998;5:571–578.
32. Simons KT, Bonneau R, Ruczinski I, Baker D. Ab initio protein
structure prediction of CASP III targets using ROSETTA. Proteins Suppl 1999;3:171–176.
33. Rice D, Fischer D, Weiss R, Eisenberg D. Fold assignments for
amino acid sequences of the CASP2 experiment. Proteins Suppl
1997;1:113–122.
34. Bussiere DE, Pratt SD, Katz L, Severin JM, Holzman T, Park CH.
The structure of VanX reveals a novel amino-dipeptidase involved
in mediating transposon-based vancomycin resistance. Mol Cell
1998;2:75–84.
35. McCafferty D, Lessard I, Walsh C. Mutational analysis of potential zinc-binding residues in the active site of the enterococcal
D-Ala-D-Ala dipeptidase VanX. Biochemistry 1997;36:10498–
10505.
36. Gallegos M, Michan C, Ramos J. The XylS/AraC family of
regulators. Nucleic Acids Res 1993;21:807–810.
37. Fischer D, Eisenberg D. Assigning folds to the proteins encoded by
the genome of Mycoplasma genitalium. Proc Natl Acad Sci USA
1997;94:11929–11934.
38. Marchler-Bauer A, Bryant SH. Measures of threading specificity
and accuracy. Proteins Suppl 1997;1:74–82.
39. Kraulis PJ. MOLSCRIPT: a program to produce both detailed and
schematic plots of protein structures. J Appl Crystallogr 1991;24:
946–950.
Документ
Категория
Без категории
Просмотров
2
Размер файла
211 Кб
Теги
465
1/--страниц
Пожаловаться на содержимое документа