PROTEINS: Structure, Function, and Genetics Suppl 3:126–132 (1999) Cooperative Approach for the Protein Fold Recognition Motonori Ota,1* Takeshi Kawabata,1 Akira R. Kinjo,1,2 and Ken Nishikawa1,2 Institute of Genetics, Mishima, Japan 2Department of Genetics, School of Life Science, The Graduate University for Advanced Studies, Mishima, Japan 1National ABSTRACT We, four independent predictors, organized a team and tackled blind protein structure predictions using fold recognition methods. We tried to assign the homologous or analogous folds in the protein structure database for a number of target sequences that showed no apparent sequence homology to the proteins of known folds. After primary analyses by conventional softwares, these sequences were threaded through the structural library using three different programs developed by ourselves, which employed different compatibility functions. Collecting the results of our individual analyses, and the available biological knowledge about the target, we held meetings and discussed all plausible structures for the target. For 25 target sequences, we submitted 56 models including NONE: This was the first time the fold was determined. At the time of the meeting (CASP3), 19 protein structures (21 domains) categorized as the threading targets were available. We succeeded in predicting eight out of 18 targets (20 domains) that we submitted; however, alignment accuracies were not satisfactory for some of the models. We often obtained correct answers even if some of us missed the right prediction; therefore it would appear that our threaders compensated each other. When all the information is managed effectively, the prediction gains more accuracy. Proteins Suppl 1999;3:126–132. r 1999 Wiley-Liss, Inc. Key words: CASP; structure prediction; threading; compatibility function; homology INTRODUCTION In the critical assessment of techniques for protein structure prediction (CASP), we are challenged to predict the structures of proteins that are soon to be determined experimentally. One of the categories for predicting the targets, which have no apparent homology to the proteins of known structures, is the fold recognition. In this category, we are asked to find known structures, whose folds are similar to that of the targets. For such problems, the sequence-structure compatibility search methods (threading) have been developed in the last decade.1–3 Although the principle of the method is promising, it is not yet reliable enough to allow for the abolition of human intervention. In order to conduct an accurate prediction, profound biological knowledge about the target is also needed.4 It is, however, too difficult for a human being to carry out r 1999 WILEY-LISS, INC. whole analyses for all the targets (more than 20) within the prediction season (about 3 months). On participating at the third meeting of the blind prediction experiment (CASP3), we organized the team UNAGI (the Japanese word for eel for which Mishima, where our institute is located, is famous), which consists of independent predictors having their own threaders (programs for the fold recognition). We exchanged information about the target sequences and examined the results of our individual analyses. Then, after coming to an agreement, we finally decided our submission models. In this article, we show that our team cooperated well and that the hybrid method of combining a few threaders with human intervention was successful, if they were managed effectively. METHODS First, the target sequences were analyzed with publicly available software. Sequence homology searches were performed using FASTA,5 BLAST,6 and PSI-BLAST,7 and homologous sequences thus found were aligned using CLUSTALW8 for the detailed analyses. Literature searches were conducted utilizing SWISS-PROT references9 and PubMed.10 Second, public or in-house software was used for the more complicated analyses. Secondary structure prediction was carried out using SSThread,11 PHD,12 JOINT,13 and BW-MGOR.14 Sequence motif searches were conducted using PROSITE.15 Third, using the target sequence, its homologs and their multiple alignment were threaded through the structural library which contained approximately 1,400 structures taken from release 83 of the Protein Data Bank (PDB).16 For a single sequence, we performed threading using COMPASS,17 S3 (Kinjo, unpublished), and LIBRA.18 COMPASS, one of the classic threaders, uses a set of knowledge-based functions which take into account four terms: side-chain packing, hydration, local-conformation, and hydrogen-bonding.19 S3 employs four terms that are the same as those of COMPASS, but are different in the classification of the structural features for the local structure and side-chain packing functions. S3 takes into account local structure in more detail and considers five consecutive residue sites simultaneously, including the ␤-bulge and the N- and C-cap structures at the ␣-helix termini. However, S3 uses simple residue-wise distance potential for the side-chain packing function and *Correspondence to: Motonori Ota, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan. E-mail: email@example.com Received 1 February 1999; Accepted 19 April 1999 127 COOPERATIVE APPROACH FOR FOLD RECOGNITION TABLE I. Summary of the UNAGI’s Predictions† Target T0043 (HPPK) T0044 (RTCA) T0045 (YBAK) T0046 (ADG) T0051 (GLME) T0052 (CV-N) T0053 (CBIK) T0054 (VANX) T0056 (DNAB) T0061 (HDEA) T0062 (UBIB) T0063 (IF5A) T0067 (PBP) T0068 (PGL2) T0071 (ADAC) T0072 (CD5) T0074 (EPS15) T0075 (ETS-1) T0077 (L30) T0078 (TESB) T0079 (MARA) T0080 (3MG) T0081 (MGSA) T0083 (CYNS) T0085 (C554) Length MD1 158 347 158 119 483 101 264 202 114 89 232 138 187 376 238 110 98 110 105 288 129 219 152 156 211 1cus 1asyA NONE 2mcm* 1reqB NONE* 1ak1 NONE NONE* 1ngr 2cnd 1rsy NONE 1rmg NONE 1vfaA 2scpA NONE 1tmy NONE 1pdnC* NONE 1rnl 1r69* 1fgjA MD2 1ble 1atiA 1hrdA 1tul MD3 MD4 2fx2 3hsc 1tmy 1fivA 1scuB 1btmB 1slcA 1pczAB* 1lbu 1bmfD 1am3 3D DV SE AE 2admA 冑 冑 F F w w w w 冑 F R F 冑 冑 N⫹ R N⫹ B ? N w 1vhh 1hulA NONE* 冑 冑 F F F F F 1slcA 冑 冑 冑 冑 F F C F w w R R w 冑 冑 冑 C F F R w w w ⫹ 冑 冑 冑 冑 冑 F F F F F R w ? R R F w ● E D 2snv 1hurB 1tf4A* 1noa1 MD5 1aoa 1div 1sfe 1a0i NONE 6fabH* N N w w †MD1–5 are the model structures we submitted. Column 3D shows whether the 3D coordinate of the target is available for self-evaluation. The division of each target is shown in the DV column (C, comparative modeling; F, fold recognition). The column SE indicates the results of the self-evaluation (R, right; N, correct as NONE; ⫹, bonus; ?, not sure; w, wrong). The models self-evaluated are marked with the asterisk on their shoulders. The column AE shows the evaluation by the assessor.23 Alphabetical codes mean the relative accuracy of the model; A (excellent) to F (OK). N, NONE is right; ⫹, bonus; ?, unsure; w, wrong; ‘‘●’’, near the right model. Blank means either that the 3D model was not available or that the protein was not regarded as a fold recognition target. does not consider the contact probability as does COMPASS. Adjustment of the weighting for each function employed in COMPASS was not adopted, and therefore the four functions were weighted equally. The sequencestructure alignment algorithm employed was the same as that of COMPASS. LIBRA uses the same terms as COMPASS but employs a different normalization scheme for the scoring function20 and can also accept multiply aligned sequences. When it was necessary, the inverse folding search (structure-recognizes-sequence protocol) was carried out against the sequence database using LIBRA. For targets whose secondary structure was already known or roughly deduced, a new structure comparison method, MATRAS (Kawabata, manuscript in preparation), was employed for the search. Considering the environmental score (explained later), a similar structure search was performed against the secondary structural library of PDB using the secondary structure of the target protein. The final decision-making was not an easy or straightforward process. After individual analyses of each target, we held a meeting. We agreed immediately when the threaders each produced similar results, or where other information, e.g., sequence motifs, made fold recognition easy. For most targets, however, we could not immediately decide on the most appropriate models. In such cases, we customized our programs to take account of a target-specific feature such as sequence motifs, disulfide bonds, or hypothetical domain structures. In cases where we could not find consistency among our results, we concluded that the fold was new (submitted as NONE). For self-evaluation of the models, we used MATRAS. The program MATRAS (MArkovian TRAnsition of protein Structure) was designed for comparing protein tertiary structures using a log-odds structure similarity matrix that was derived from homologous PDB entries according to Dayhoff et al.21 However, formalism was applied to the changes of structural features instead of amino acid substitutions. Three different scores were compiled: secondary structure element (SSE), environment (local structure and solvent accessibility), and C␣-pair distance. When we used MATRAS for the structural alignment, SSEs were aligned roughly by the SSE score and refined using the environmental and C␣-pair distance scores. A threshold for significant similarity was determined by the score distribution of the same fold structures in the SCOP database.22 RESULTS We submitted 56 models for 25 targets. Among them, 18 structures (20 domains) were experimentally determined and regarded as the fold recognition targets (we withdrew from T0059, because it is too short and not suitable for our Fig. 1. The TOPS diagrams25 of structure of T0053 (a) and 1ak1 (b). Circles and triangles denote the ␣-helices and ␤-strands, respectively. The matched segments are colored by red (␣-helices) or yellow (␤strands), whereas the insertions are not colored. The segments incorrectly aligned are marked as ‘‘miss.’’ The conserved motif in 1ak1 (marked as ‘‘MOTIF’’) is disrupted in the target T0053. Omitting the N-terminal 60 residues, our alignment is very accurate. The RMSD is 5.8 Å. The ASp4 measure,38 the rate of the number of the aligned residue pairs in the submitted alignment and structural alignment by MATRAS that agree to within a shift error of four residues to the total number of the residue pairs in the alignment, is 91.2%. Fig. 2. Ribbon diagrams of T0083 (a) and 1r69 (b) drawn with MolScript.39 The aligned regions are colored with red. They can be superimposed with RMSD 3.5 Å. ASp436 is 95.2%. COOPERATIVE APPROACH FOR FOLD RECOGNITION 129 Fig. 3. Ribbon diagrams of T0052 (a) and 1pczAB (b) drawn with MolScript.39 Three strands in the structures can be superimposed with RMSD 5.6 Å (c). T0052 (blue). 1pczAB (green). threaders), and eight predictions were recognized as correct.23 The results and their evaluations made by the assessor as well as by ourselves using MATRAS are summarized in the Table I. Except for a few cases, our self-evaluation agreed with those of the assessor. Some of our submitted models are explained in detail in the following sections. Target T0053 (CbiK protein) The T0053 sequence was threaded through the structural library. A threader, COMPASS, ranked 1ak1 (ferrochelatase) structure at the first place with a significant compatibility score of about ⫺3.3 (a compatibility score less than ⫺3.0 is usually significant). Utilizing the other threaders (S3 and LIBRA), this structure appeared within the top five structures. Although the results of secondary structure prediction (irregular ␣/␤ fold) was inconsistent with the prediction (1ak1 has two regular ␣/␤ domains) and the conserved sequence motif in 1ak1 family (PROSITE ID: PS00534) is disrupted in T0053, the functional similarity between the target and 1ak1 (the former is involved in vitamin B12 synthesis, the latter, heme synthesis) strongly supported our results.24 Therefore, we submitted only 1ak1 structure. As a result, we predicted the correct fold, and our alignment was recognized as the second best model for T0053 (Table I). The topology diagrams drawn by TOPS25 for the target structure and our model are shown in Figure 1. We missed the correct alignments for the N-terminal 60 residues because COMPASS was not able to skip the two inserted ␣-helices after the first ␤-strand of 1ak1. The root mean square deviation (RMSD) measured for the whole structures is 12.8 Å, but it decreases to 5.8 Å if we remove the first 60 residues of the model. Target T0083 (Cyanase) The T0083 sequence, its related sequences, and their multiple alignment were threaded through the structural library; however, the results were not significant. Although the results of the secondary structure prediction strongly suggested that the fold was of mainly ␣ type, we could not find such folds among the structures that ranked at high positions in the compatibility searches. When the structures were sorted by the compatibility score normalized by the alignment length using a threader, LIBRA, we found that Trp repressor (1trrA) and Cro repressor (1r69) structures had very good compatibility scores, ⫺3.3 and ⫺2.8, respectively. These structures are all ␣-type DNAbinding units, which is consistent with those of the secondary structure prediction. The Trp repressor and the Cro 130 M. OTA ET AL. TABLE II. Summary of the Ability of Each Method† Target a) For our submission T0046 T0053 T0068 T0071, first domain T0074 T0079 T0081 T0083 T0085 b) For the additional remarks T0071, second domain T0080 T0081 †Good, Type ␤ ␣/␤ ␤ ␤ ␣ ␣ ␣/␤ ␣ ␣ ␤ ␣⫹␤ ␣/␤ Model IG 1ak1 1rmg IG Calmodulin 1pdnC 1rnl 1r69 1fgjA TATA-binding 1fmtA 1jdbK COMPASS Good Good Middle S3 LIBRA SeqSearch Motif Middle Good Middle Middle Good Middle Good Middle Middle Good (Middle) Good Exist Exist Middle Exist Good Middle Middle Middle Middle detect with the significant level; middle, rank the answer at the first (second) position, but not significant; IG, immunoglobulin fold. repressor do not share the same fold.22 In order to investigate which fold was more compatible, the inverse-folding protocol searches with LIBRA were performed against the sequence database constructed from PDB rel.83 plus the target sequences. The 1r69 structure26 showed good compatibility with the target, whereas the 1trrA structure did not. The compatibility scores of the target and its related proteins with 1r69 were in the range between ⫺4.8 and ⫺2.6. The 1r69 structure was aligned with the N-terminal half of the target. Looking at the multiple alignment of the target and its homologs, a Pro-rich region is found in the middle (residues 78–92), and we consider that it may form a coil structure and, therefore, would act as a hinge for the two domains. The N-terminal half of the target and our model (Fig. 2) can be superimposed with RMSD 3.5 Å according to our submitted alignment. Target T0085 (Cytochrome C554) The T0085 sequence was threaded through the structural library using our threaders; however, the results were not significant. These results were as we expected: It has been observed that our threaders cannot properly treat cytochromes that bind multiple hemes.18,27 We thought that the four heme-binding motifs (CXXCH: cytochrome c family heme-binding site signature) were crucial. A sequence similarity search against the database of proteins with known structures found the group of the cytochrome c3 (2cdv, etc.), the group of the cytochrome c553 (1dvh, etc.), and hydroxylamine oxidoreductase (1fgjA), all of which have the heme-binding motifs. Among them, 1fgjA28 was the one whose length between motifs was suitable; also, the target is the hydroxyalamine oxidoreductase-linked cytochrome. Therefore, we considered that the structure might be a plausible candidate for the compatible structure. Each heme in 1fgjA structure is bound to two histidines: One is in the CXXCH motif, and the other is in a different site. Next, we analyzed the alignment. In a suboptimal alignment by PSI-BLAST, two of the other heme-binding histidines of 1fgjA were aligned with two tyrosines of the target. Tyrosine seemed suitable as a heme-binding residue. Finally, we performed the inverse-folding search by LIBRA using the part of 1fgjA structure aligned to T0085 against the sequence database with the target sequence. The target sequence ranked at the second place. This result supported the compatibility between T0085 and 1fgjA. The solved target structure suggests its evolutionary relationship with 1fgjA.29 Our alignment was wrong: All the heme-binding residues were indeed histidines. One of the reasons for the incorrect alignment is that there are eight hemes in 1fgjA whereas there are four in the target; this difference in the number of hemes made finding the correct alignment difficult. Target T0052 (Cyanovirin-N) The target sequence exhibits an internal duplication: the N-terminal half (residues 1–50) and the C-terminal half (residues 51–101) of the sequence are homologous to each other.30 Therefore, we assumed that the structure of the target would be symmetric. Two disulfide bonds were known to exist.30 We performed threadings with the target sequence and the two halves of the sequence, and the ‘‘synthesized’’ sequences in which the N-terminal half or the C-terminal half was repeated twice. No significantly compatible structures that readily met the requirement of symmetry and disulfide bonds were found. Therefore, we submitted NONE as our first model. Although our first model was NONE, we thought that the TATA-box–binding protein might possibly meet our hypotheses when examining the results of the threading. 1pcz, a TATA-box–binding protein, was one of the relatively highly compatible structures. However, its monomeric structure is nonglobular and probably unstable (the target was supposed to be monomeric30 ). Therefore, we synthesized a chimerical structure from the interacting domains of its dimeric structure (1pczAB, Fig. 3b). The actual structure, recognized as a new fold, contains two symmetrically arranged ␤-sandwich domains as shown in Figure 3a.31 One domain consists of two ␤-sheets, one with three ␤-strands from the N (or C) terminal half of the sequence and the other with two ␤-strands from the other half. The former ␤-sheet and COOPERATIVE APPROACH FOR FOLD RECOGNITION the corresponding part of our model can be superimposed with an RMSD of 5.6 Å (Fig. 3c). Although the interchange of segments (two ␤-strands, in this case) between domains is difficult to predict, our partially correct prediction encourages the development of a new prediction method by the fragment combinatorial approach.32 131 For the target, we paid attention to the crystallographer’s remark that the secondary structure prediction by PHD was ‘‘quite accurate.’’ Therefore, we assumed that the structure of T0044 was an irregular ␣/␤ type according to PHD. We carried out the threading of the target sequence and its homologs and their multiple alignment. We could not obtain the significant threading results, yet we could not submit NONE because the crystallographer also mentioned that the answer already existed in PDB. As a next trial, we performed a secondary-structure threading by MATRAS using the output of the PHD prediction against the secondary structure library of PDB. We also took into account the functional similarity: ATP-binding and RNAbinding abilities might be required for the answer structure. Finally, we chose as many as five structures that were ranked at relatively high positions in both the threading and secondary-structure threading and met the functional requirements (Table I). None of our submission hit the answer structure. The target structure is composed of four domains. The structural alignment of T0044 and 1nawA (the answer in our library) requires large gaps to skip the third domain, consisting of about 90 residues (this is the largest domain). It was very difficult for the secondary-structure threading to allow these gaps; therefore, 1nawA could not rank at a high position. answers, was found at the third place using LIBRA, but we submitted the results obtained by MATRAS instead. This shows a typical example that the use of multiple methods poses difficulties in the final decision-making process. For the second domain of T0071, the compatible structures predicted with S3 were TATA-box–binding proteins (1st: 1aisA, 2nd: 1ytbA), which were correct answers. However, we did not submit it because there was little evidence to support the result and we were too concerned about the first domain. We also noticed that the increment of the PDB entries is significant, and constant updating of the structural library is important for fold recognition. Our basic structural library was compiled from the PDB rel.83. It did not contain the correct answers for T0080 (3MG) and T0081 (MGSA) during the prediction season. Thereafter, we performed the threading against the new structural library including the answers (1fmtA for T0080, 1jdbK for T0081). All the three threaders used detected clear similarity between 1jdbK and the target T0081 (Table IIb). Thus, it was very probable that we submitted the structure of 1jdbK as the first model for T0081. Only the threading by LIBRA ranked 1fmtA at the top position of the compatibility score, although the alignment was incorrect. The cooperative approach we took for CASP3 prediction consumed a great deal of human-power, so, for example, it would not be applicable to the analysis of a large number of sequences, such as genomes.37 However, each threader has its preference, and if we can manage their preferences well, the analysis may become easier, gain accuracy, and could be partially automated in the future. The lessons from this experiment will contribute to such large-scale analyses. DISCUSSION ACKNOWLEDGMENTS We correctly predicted eight targets out of 18 (20 domains). It appears that the success could not have been achieved if we had employed only one or a few methods.33 Actually the methods we employed compensated each other, and we eventually reached the correct structure, even if one method failed. Contributions of each method for each target are summarized in Table IIa. COMPASS looks suitable for the prediction of ␣/␤-type or ␤-type proteins, whereas LIBRA appears effective to predict ␣-type protein. S3 sometimes supports the correct answer by ranking it at a relatively high position (not shown). The selection of the submission was not straightforward because the manner depended on each target as already mentioned in the Results section. The biological or functional knowledge gleaned from the literature helped the selection and, in many cases, led us to the correct answers. Surprisingly, most of the hypothetical papers or predicted motifs were proven to denote the truth: The structure of T0054 (VANX) 34 is similar to 1lbu or 1vhh fold,35 and the two Helix-Turn-Helix motifs do exist in T0079 (MARA).36 Human intervention sometimes mislead us; that is, some correct answers were missed through the meetings. In the case of the target T0044, 1nawA, one of the correct We thank the organizers and the assessors for the preparation of this experiment and meeting. We are also grateful to the structure submitters for offering their experimental structures of the target protein before the publication. In addition, we thank Rosemary Chapman and Thomas D. Andrews for the critical reading of the manuscript. A.R.K. is a predoctoral research fellow of the Japan Society for the Promotion of Science. Target T0044 (RNA-38 Terminal Phosphate Cyclase) REFERENCES 1. Lamer M-R, Rooman MJ, Wodak SJ. Protein structure prediction by threading methods: evaluation of current technologies. Proteins 1995;23:337–355. 2. Levitt M. Competitive assessment of protein fold recognition and alignment accuracy. Proteins Suppl 1997;1:92–104. 3. Marchler-Bauer A, Levitt M, Bryant SH. A retrospective analysis of CASP2 threading predictions. Proteins Suppl 1997;1:83–91. 4. Murzin A, Bateman A. Distant homology recognition using structural classification of proteins. Proteins Suppl 1997;1:105–112. 5. Pearson W, Lipman D. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 1988;85:2444–2448. 6. Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol Biol 1990;215:403–410. 7. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25: 3389–3402. 8. Thompson J, Higgins D, Gibson T. CLUSTAL W: improving the 132 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. M. OTA ET AL. sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994;22:4673–4680. Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res 1999;27:49–54. http://www.ncbi.nlm.nih.gov/PubMed/ Ito M, Matsuo Y, Nishikawa K. Prediction of protein secondary structure using the 3D-1D compatibility algorithm. Comput Appl Biosci 1997;13:415–424. Rost B, Sander C. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 1994;19:55–72. Nishikawa K, Noguchi T. Predicting protein secondary structure based on amino acid sequence. Methods Enzymol 1991;202:31–44. Kawabata T, Doi J. Improvement of protein secondary structure prediction using binary word encoding. Proteins 1997;27:36–46. Hofmann K, Bucher P, Falquet L, Bairoch A. The PROSITE database, its status in 1999. Nucleic Acids Res 1999;27:215–219. Sussman JL, Lin D, Jianag J, Manning NO, Prilusky J, Ritter O, Abola EE. Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Cryst D 1998;54:1078–1084. Matsuo Y, Nishikawa K. Protein structural similarities predicted by a sequence-structure compatibility method. Protein Sci 1994;3: 2055–2063. Ota M, Nishikawa K. Feasibility in the inverse protein folding protocol. Protein Sci 1999;8:1001–1009. Matsuo Y, Nishikawa K. Assessment of a protein fold recognition method that takes into account four physicochemical properties: side-chain packing, solvation, hydrogen-bonding, and local conformation. Proteins 1995;23:370–375. Ota M, Kanaya S, Nishikawa K. Desk-top analysis of the structural stability of various point mutations introduced into ribonuclease H. J Mol Biol 1995;248:733–738. Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary change in proteins. In: Dayhoff MO, editor. Atlas of protein sequence and structure. 5 suppl. 3, Washington, DC: National Biomedical Research Foundation, 1978. p 345–352. Murzin AG, Brenner SE, Hubbard T, Chothia C. Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995;247:536–540. Murzin AG. Structure classification-based assessment of CASP3 predictions for the fold recognition targets. Proteins Suppl 1999;3: 88–103. Raux E, Thermes C, Heathcote P, Rambach A, Warren M. A role for Salmonella typhimurium cbiK in cobalamin (vitamin B12) and siroheme biosynthesis. J Biotechnol 1997;179:3202–3212. 25. Westhead D, Hatton D, Thornton J. An atlas of protein topology cartoons available on the worldwide web. Trends Biochem Sci 1998;23:35–36. 26. Mondragon A, Subbiah S, Almo SC, Drottar M, Harrison SC. Structure of the amino-terminal domain of phage 434 repressor at 2.0 Å resolution. J Mol Biol 1989;205:189–200. 27. Matsuo Y, Nakamura H, Nishikawa K. Detection of 3D-1D compatibility characterized by the evaluation of side-chain packing and electrostatic interactions. J Biochem (Tokyo) 1995;118:137– 148. 28. Igarashi N, Moriyama H, Fujiwara T, Fukumori Y, Tanaka N. The 2.8 Å structure of hydroxylamine oxidoreductase from a nitrifying chemoautotrophic bacterium, Nitrosomonas europaea. Nature Struct Biol 1997;4:276–284. 29. Iverson T, Arciero D, Hsu B, Logan M, Hooper A, Rees D. Heme packing motifs revealed by the crystal structure of the tetra-heme cytochrome c554 from Nitrosomonas europaea. Nature Struct Biol 1998;5:1005–1012. 30. Gustafson K, Sowder RI, Henderson L, Cardellina JI, McMahon J, Rajamani U, Pannell L, Boyd M. Isolation, primary sequence determination, and disulfide bond structure of cyanovirin-N, an anti-HIV (human immunodeficiency virus) protein from the cyanobacterium Nostoc ellipsosporum. Biochem Biophys Res Commun 1997;238:223–228. 31. Bewley C, Gustafson KR, Boyd MR, Covell DG, Bax A, Clore GM, Gronenborn AM. Solution structure of cyanovirin-N, a potent HIV-inactivating protein. Nature Struct Biol 1998;5:571–578. 32. Simons KT, Bonneau R, Ruczinski I, Baker D. Ab initio protein structure prediction of CASP III targets using ROSETTA. Proteins Suppl 1999;3:171–176. 33. Rice D, Fischer D, Weiss R, Eisenberg D. Fold assignments for amino acid sequences of the CASP2 experiment. Proteins Suppl 1997;1:113–122. 34. Bussiere DE, Pratt SD, Katz L, Severin JM, Holzman T, Park CH. The structure of VanX reveals a novel amino-dipeptidase involved in mediating transposon-based vancomycin resistance. Mol Cell 1998;2:75–84. 35. McCafferty D, Lessard I, Walsh C. Mutational analysis of potential zinc-binding residues in the active site of the enterococcal D-Ala-D-Ala dipeptidase VanX. Biochemistry 1997;36:10498– 10505. 36. Gallegos M, Michan C, Ramos J. The XylS/AraC family of regulators. Nucleic Acids Res 1993;21:807–810. 37. Fischer D, Eisenberg D. Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. Proc Natl Acad Sci USA 1997;94:11929–11934. 38. Marchler-Bauer A, Bryant SH. Measures of threading specificity and accuracy. Proteins Suppl 1997;1:74–82. 39. Kraulis PJ. MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J Appl Crystallogr 1991;24: 946–950.