12: 385-390 (1996) YEAST VOL. oo O OO 0" VII % Yeast Sequencing Reports 0 0 0000° Sequence Analysis of the 43 kb CRMl- YLM9-PET.54DIE2-SMII -pH081- YHB4-PFKI Region from the Right Arm of Saccharomyces cerevisiae Chromosome VII QUIRINA J. M. VAN DER AARTt, KARL KLEINEf AND H. YDE STEENSMA*t§ ?Institute of Molecular Plant Sciences, Leiden University, Wassenaarseweg 64, 2333 A L Leiden, The Netherlands f Martinsrieder Institut f u r Protein Sequenzen, A m Klopferspitz 18a, 0-82152 Martinsried, Germany §Delft University of Technology, Department of Microbiology and Enzymology, Julianalaan 67, 2628 BC Delft, The Netherlands Received 6 July 1995; accepted 23 September 1995 The nucleotide sequence of a 43 118 bp fragment from chromosome VII of Saccharomyces cerevisiae has been determined and analysed. The fragment originates from the right arm of chromosome VII. It starts approximately 11 kb centromere-proximal to the pet54 marker and ends in the middle of the PFKI gene. The sequence contains a small nuclear RNA gene (SNR7) and 29 open reading frames (ORFs) larger than 100 amino acids. Six of these were completely internal to or partially overlapped other ORFs. Six previously described genes, YLM91MRPL9, C R M l , DIE2, SMZl, PH081 and YHB4, were mapped to this region in addition to pet54 and PFKI. Of the remaining 17 ORFs, four showed homology with other S. cerevisiae genes and four, including one of the partially overlapping ORFs, with genes from other organisms. Eight ORFs had no homology with any sequence in the databases. The actual sequences have been deposited in the EMBL database under Accession Number X87941. KEY WORDS ~ Saccharomyces cerevisiae; chromosome VII; sequence; snRNa; SNR7 INTRODUCTION MATERIALS AND METHODS We have sequenced a 43 kb fragment from Succharomyces cerevisiae as part of the European project to sequence the entire 1150 kb chromosome VII DNA molecule. The segment originated from the right arm of chromosome VII of strain S288C and formed the yeast DNA insert of cosmid pEGH484 provided by H. Tettelin (Universite Catholique de Louvain). The inserted DNA extends from 11 kb centromere-proximal to the pet54 marker into the middle of the PFKl gene. In this report we present the sequence and the computer analysis of the entire 43 118 bp fragment. Strains and plusmids *Corresponding author CCC 0749-503X/96/040385-06 0 1996 by John Wiley & Sons Ltd Cosmid pEGH484 containing a 43 kb yeast DNA insert was received from H. Tettelin, the DNA coordinator for chromosome VII. It is a partial Suu3A fragment from chromosome VII of strain S288C inserted in the unique BamHI site of pWE15 (Evans and Wahl, 1987). Phagemid pBluescript I1 KS+ (Strategene) was used for sub-cloning and sequencing. Escherichiu coli strain XLlBlue (recAl endAZ gyrA96 thi-l hsdRl7 supE44 relAl luc[F'proAB lucPZAM15 TnlO(Tet')]; Bullock et ul., 1987) was used for plasmid amplification. 386 Q. J. M . VAN DER AART ET AL. independently determined at least three times, with a mean of 4.06 times per base. Sequence analysis revealed 29 open reading frames (ORFs) of more than 100 amino acids, a small nuclear RNA gene and two ARS consensus sequences. The ORFs were provisionally named G85 followed by an arbitrary number. Final names will be assigned when the sequences of the entire chromosome VII DNA molecule has been obtained. The characteristics of the ORFs, listed in Table 1, are discussed below. Sequencing strategy Four ORFs, G8539, G8555, G8583 and G8591 DNA sequencing was carried out combining were completely internal to other ORFs and all primer walking and direct cloning sequencing ap- four run in opposite direction with respect to the proaches. First, fragments varying from 2 to 10 kb larger ORFs. ORF G8517 partially overlapped were sub-cloned and amplified. The sequence of G8520 ( YLMYlMRPL9, opposite orientation) and the yeast DNA in the sub-clones was determined G8550 partially overlapped both G8555 (same by primer walking. Junctions between fragments orientation) and the S M I l transcribed region, were either sequenced from overlapping sub-clones G8553 (opposite orientation). It is unlikely that or directly from cosmid DNA which was digested these six ORFs represent real genes, although by appropriate enzymes. Sequence reactions on G8550 shows homology (starting at amino acid 77) dsDNA as template were primed with the MI3 with 24 amino acids from an archaeal lipoprotein universal and reverse primers or with synthetic attachment site (Mattar et al., 1994). oligonucleotides (Pharmacia Nederland, RoosenThe sequences of eight ORFs, CRMl (Toda et daal). We used the dideoxy method of Sanger et al. al., 1992; D13039), YLMYIMRPLY (Graack et al., (1977) with FITC-labelled dATP and T7 DNA- 1992; X65014IS37340), PET54 (Costano et al., polymerase (Pharmacia) on an automatic se- 1989; X13427), DIE2 (Nikawa and Hosaka, 1995; quencer (ALF, Pharmacia). The sequences were D38049), S M I l (Fishel et al., 1993; L15423), determined on both strands and each base was PH081 (Coche et al., 1990; S41074), YHB4 (Zhu sequenced at least three times. and Riggs, 1992; B45383) and PFKl (S38963) have been reported previously, although the positions of Sequence analysis the corresponding genes, except PET54 and PFKl, The Heidelberg Geneskipper program and on the physical and genetic maps of chromosome the GCG sequence analysis software package VII are unknown. The published sequence of (Devereux et al., 1984) were used for sequence the YLMYIMRPLY ORF (Graack et al., 1992; alignments and analysis. Comparison of nucleotide X65014lS37340) showed three different bases in and amino acid sequences to the data banks 807 bp compared with our data, but no differences EMBL release 35 and PIR International release in the amino acid sequences. The sequences of 44.07 was performed with either FASTA or the DIE2 and G8547 differ in one base, whereas the amino acid sequences are identical. The published on-line MIPS package (Martinsried Institute). sequence of PH081 (Coche et al., 1990; S41074) is lacking one amino acid, an asparagine at position 974, compared to our sequence. No further differRESULTS AND DISCUSSION ences were found between the ORFs and the The complete sequence of the DNA insert in published sequence data. Assuming that these difpEGH484 was determined in both directions. First ferences are all caused by sequence errors, which is a restriction map was constructed for the restric- highly unlikely, the error rate for the ORFs would tion endonucleases BamHI, BgflI, EcoRI, SalI, be seven differences in 13 341 bases or 0.05%. Four ORFs show homology with other S. cerXbaI and XhoI (Figure 1). Using these sites, subclones were made. Some of the larger sub-clones evisiae genes. These include a drug-resistance gene, were further sub-cloned as Hind111 or BgllI frag- SGEl (Amakasu et al., 1993; S46275; 53% identity ments. The inserts of the sub-clones were se- and 75% similarity in 526 amino acids with ORF quenced in both directions and each base was G8537). This ORF (G8537) also shows significant DNA manipulations Plasmid preparations were carried out using the ammonium acetate method of Lee and Suraiya (1990). DNA for automatic sequencing was purified over Nucleobond-AX (Machery-Nagel, Diiren) columns according to the manufacturer’s instructions. Restriction endonucleases and T4 DNA ligase were used according to the recommendations of the suppliers (Boehringer, Pharmacia). 387 43 kb FRAGMENT FROM CHROMOSOME VII Xa Bg X Xa Bg S w - rn CRM 1 01 R X R R Bg S -PE T54 17 YLM9 A R 23 30 Xa S Bg X R -37 41 Bg DIE2 50 Xa Bg X I I I Bg B II Xa B R R Xa I I I II I YHB4 PH08I 64 - 75 R Xa Xa R 111 78 S R I I I I , 83 - - * PH08 7 61 s - X B Xa A SMI 7 44 39 R Bg Xa Bg 58 55 c . I -- - - - - Xa 91 c -c 81 93 85 96 PFK7 1 kb U Figure 1. Restriction map and ORF positions. The restriction map of the entire 43 kb chromosome VII insert in cosmid pEGH484 is shown on top. For better representation the continuous map is divided into three parts, at a Sun and an XbaI site respectively. B, BumHI; Bg, BglII; R, EcoRI; S, Sun; Xa, XbaI; X, Xhol. The arrows below the restriction map show the positions of the ORFs. Numbers refer to the last two digits of the provisional names, e.g. 01=G8501. ‘A’ represents the positions of ARS consensus sequences, ‘R’ that of the snRNA. homology with an ORF on chromosome XI (44Yn identity and 67% similarity in 3 16 amino acids with YKRIOSC, S38184). Homology was further found with a cell cycle gene, CDC20, which is involved in cell division control (S48507; 48% identity and 68Yo similarity in 153 amino acids with O R F G8.541) and a sporulation gene, SP012 (Malavasic and Elder, 1990; S46756; 31% identity and 59% similarity in 115 amino acids with ORF G8558). Finally, ORF G8561 is 59% identical and 79Y0 similar in 217 amino acids with the yeast homolog of prohibitin (S50315), which determines the replicative life span. Four ORFs have homology with sequences from other organisms. ORF G8501, which is only partially present in the sequenced fragment, shows similarity with ion channels from higher eukaryotes (Soldatov, 1992; Salkoff et al., 1987). As mentioned above, ORF G8550, which partially overlaps both G8555 and SMIl (G8553), has some amino acid sequence similarity with an archaeal lipoprotein attachment site (Mattar et al., 1994). O R F G8564 exhibits similarity with human ankyrin (25% identity and 56% similarity in 217 amino acids), mouse and Drosophila. Ankyrin is a transmembrane protein involved in differentiation (Milner and Campbell, 1993). Finally, ORF G8578 has the ATP/GTP binding site motif A (Linder et al., 1989). The remaining eight ORFs do not show significant homology with any sequence in the databases. 505 115 137 315 228 1178 399 233 129 785 171 882 109 288 107 >194 G8553 G8555 G8558 G8561 G8564 G8567 G8572 G8575 G8578 G8581 G8583 G8585 G8591 G8593 G8596 G8599 1155 24 208 278 1 3298 11 430 1145 24 198 271 1 2929 11 217 W W C C C 0.44 0.36 0.77 0.41 0.33 0.50 0.46 0.41 0.59 0.47 0.40 0.43 0.49 22 656 21 621 24 061 25 238 26 401 29 968 32 859 33 820 34 577 37 416 35 944 40 448 40 175 41 814 42 315 43 118 21 142 21 277 23 651 24 294 25 718 26 435 31 665 33 122 34 191 35 062 35 432 37 803 39 849 40 951 41 995 42 536 C W W C W C W C C C W C W C C C 4.42 9.19 11.22 10.62 6.42 5.67 6.25 10.06 6.67 9.85 8.45 8.18 11.36 6.67 9.70 11.36 0.44 0.42 0.45 0.45 6.43 8.74 5.62 10.23 6.90 15 039 18 129 18 758 20 751 21 463 14611 16 903 18 162 19 177 21 122 Fop, frequency of optimal codons. ~ - - - ~ C W C W W 143 409 199 525 114 G8539 G8541 G8544 G8547 G8550 ARS ARS Tau Delta snRNA C C W C W 0.51 0.54 0.48 0.41 0.38 0.40 0.48 0.42 6.60 2570 5.28 7.87 10.99 7.73 10.01 9.76 9.58 FOP PI End 7550 8131 8641 10 750 12 559 13 973 16 402 4299 7793 7835 8885 11 681 12 630 14 564 W W 1084 113 269 622 293 448 613 G8514 G8517 G8520 G8523 G8527 G8530 G8537 1 G8501 Start W >857 Name Strand orientation No. of amino acids Table 1. Characteristics of open reading frames. Nikawa and Hosaka (1995) Mattar et al. (1994) Overlap with G8553 and G8555 Fishel et al. (1993) Internal to G8553 Malavasic and Elder (1990) PIR: S50315 Milner and Campbell (1993) Coche et al. (1990) Zhu and Riggs (1992) Linder et al. (1989) SGEI and YKRlO5C of S. cerevisiae Identical to DIE2 Archaeal lipoprotein attachment site Identical to SMII SP012 of S. cerevisiae Prohibitin of S. cerevisiae MammalianlDrosophila ankyrin Identical to PH081 Identical to YHB4 ATPIGTP binding site, motif A Identical to snRNA snR7 Identical to PFKl Patterson and Guthrie (1987) TTTTATGTTTT ATTTATGTTTT PIR: S38963 Internal to G8585 Internal to G8581 Amakasu et al. (1993) and PIR: S38184 Internal to G8537 PIR: S48507 Identical to PETS4 CDC20 of S. cerevisiae Costano et al. (1989) Identical to YLM9IMRPLY Particularities Soldatov (1992) Salkoff et al. (1987) Toda et al. (1992) Overlapping G8520 Graack et al. (1992) Ca2+-channelhuman Na'-channel Drosophila Identical to CRMl Homology 00 389 43 kb FRAGMENT FROM CHROMOSOME VII None of the ORFs in this fragment of the yeast genome contains sequences indicative of introns or contains an ‘RPG’ box or Hap2/Hap3/Hap4 binding site in its 5‘ upstream region and only one ORF (G8596) has a GCN4 box in the promoter region (Fondrat and Kalogeropoulos, 1994). Two ARS consensus elements were found (position 1145-1155 and position 24198-24208). It is unlikely that these are functional yeast replication origins, since the first is in the middle of an ORF and potential B-elements are not present in either (Palzkill et al., 1986). In addition, a (solo) tau element (position 271 1-2781) and a (solo) delta element (position 2929-3298) are present. The small nuclear RNA gene found at position 11217I1430 is identical to the previously described snR7 (US-like snRNA; Patterson and Guthrie, 1987). Between several ORFs we found long stretches of about 18 A and/or T residues. According to Struhl (1985), approximately 25% of all yeast genes contain poly(dA-dT) tracks of comparable size in their upstream regions. They can act as upstream promoter elements for constitutively transcribed genes and also as barriers between transcription units. Such stretches are present upstream of ORFs. These long stretches of A and T are found upstream of G8541, G8550, S M I l (G8553), G8558, YHB4 (G8572) and G8581, thus in six out of 29 ORFs or 23, not counting the internal or overlapping ORFs. In addition, in 80% of the DNA sequences of yeast genes 8 bp dA-dT stretches are found, usually several times per sequence and mostly located in non-coding regions (Struhl, 1985). These short dA-dT stretches can also be found in the fragments we have sequenced. The frequency of optimal codons (Fop; Sharpe and Cowe, 1991) was calculated for the various ORFs. As may be seen from the Table, most ORFs have a low or intermediate Fop and therefore would be moderately or poorly expressed. Exceptions are the PFKl (Fop 0.71) and YHB2 (Fop 0.59) genes which might be highly expressed. From the 23 complete, non-internal ORFs, 15 of the putative proteins have a predicted PI above 8.0 and only four a PI below 6.0. Hence, for this part of the yeast genome there would be many more basic than acidic proteins, which is not in accordance with the rest of the genome. In summary, the 43 118 bp of the fragment sequenced contain 27 complete ORFs, part of another ORF and part of the PFKl gene. The mean length of the 27 ORFs is 1156 bp or, excluding the six overlapping or partly overlapping ORFs, 21 ORFs with a mean length of 1393 bp. The gene density in this fragment is approximately one gene per 1.9 kb and 75% of the DNA is potentially coding. These figures correspond well to previous data for the yeast genome (Oliver et al., 1992; Dujon et al., 1994; Feldmann et al., 1994; Johnston et al., 1994; Bussey et al., 1995). The insert in pEGH484 thus represents a typical piece of yeast DNA. ACKNOWLEDGEMENTS We thank H. Tettelin, coordinator of chromosome VII sequencing, for providing the recombinant plasmid. We are grateful to Linda van der Zanden for technical assistance. This work was supported by the Commission of the European Communities under the BIOTECH program of the Division of Biotechnology. REFERENCES Amakasu, H., Suzuki, Y., Nishizawa, M. and Fukasawa, T. (1993). Isolation and characterization of SGE1: a yeast gene that partially suppresses the gall1 mutation in multiple copies. Genetics 134, 675-683. Bullock, W. O., Fernandez, J. M. and Short, J. M. (1987). XLl-Blue: A high efficiency plasmid transforming recA Escherichia coli strain with 8galactosidase selection. Biotechniques 5, 376-379. Bussey, H., Kaback, D. B., Zhong, W. W., et al. (1995). The nucleotide sequence of chromosome I from Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 92, 3809-3813. Coche, T., Prozzi, D., Legrain, M., Hilger, F. and Vandenhaute, J. (1990). Nucleotide sequence of the PH081 gene involved in the regulation of the repressible acid phosphatase gene in Saccharomyces cerevisiae. Nucl. Acids Res. 18, 2176. Costano, M. C., Seaver, E. C. and Fox, T. D. (1989). The PET54 gene of Saccharomyces cerevisiae: Characterization of a nuclear gene encoding a mitochondrial translational activator and subcellular localization of its product. Genetics 122, 297-305. Devereux, J., Haeberli, P. and Smithies, 0. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucl. Acids Res. 12, 387-395. Dujon, B., et al. (1994). The complete DNA sequence of yeast chromosome XI. Nature 369, 371-378. Evans, G. A. and Wahl, G. M. (1987). Cosmid vectors for genomic walking and rapid restriction mapping. Methods Enzymol. 152, 604-610. Feldmann, H., et al. (1994). Complete DNA sequence of yeast chromosome 11. EMBO J. 13, 5795-5809. 390 Fishel, B. R., Sperry, A. 0. and Garrard, W. T. (1993). Yeast calmodulin and a conserved nuclear protein participate in the in vitro binding of a matrix association region. Proc. Natl. Acad. Sci. USA 90, 56235627. Fondrat, C. and Kalogeropoulos, A. (1994). Approaching the function of new genes by detection of their upstream activation sequences in Saccharomyces cerevisiae: application to chromosome 111. Curr. Genet., 25, 396406. Graack, H.-R., Grohmann, L., Kitakawa, M., Schafer, K.-L. and Kruft, V. (1992). Ym19, a nucleus-encoded mitochondria1 ribosomal protein of yeast, is homologous to L3 ribosomal proteins from all natural kingdoms and photosynthetic organelles. Eur. J. Biochem. 206, 373-380. Johnston, M.. et al. (1994). Complete nucleotide sequence of Saccharomyces cerevisiae chromosome VlII. Science 265, 2071-2802. Lee, S. and Suraiya, R. (1990). A simple procedure for maximum yield of high-quality plasmid DNA. Biotechniques 9, 616- 679. Linder, P., Lasko, P. F., Ashburner, M., et al. (1989). Birth of the D-E-A-D box. Nature 337, 121-122. Malavasic, M. J. and Elder, R. J. (1990). Complementary transcripts from two genes necessary for normal meiosis in the yeast Saccharomyces cerevisiae. Mol. Cell. Bid. 10, 2809-2819. Mattar, S., Scharf, B., Kent, S. B. H., Rodewald, K., Oesterhelt, D. and Engelhard, M. (1994). The primary structure of halocyanin, an archaeal blue copper protein, predicts a lipid anchor for membrane fixation. J. Biol. Chem. 269, 14939-14945. Milner, C. M. and Campbell, R. D. (1993). The G9a gene in the human major histocompatibility complex encodes a novel protein containing ankyrin-like repeats. Biochem. J. 290, 811-818. Q. J. M. VAN DER AART ET AL. Nikawa, J.-I. and Hosaka, K. (1995). Isolation and characterization of genes that promote the expression of inositol transporter gene I T R l in Saccharomyces cerevisiae. Molec. Microbiol. 16, 301 -308. Oliver, S. G., et al. (1992). The complete DNA sequence of yeast chromosome 111. Nature 357, 3846. Palzkill, T. G., Oliver, S. G. and Newlon, C. S. (1986). DNA sequence analysis of A R S elements from chromosome I11 of Saccharomyces cerevisiae: Identification of new conserved sequence. Nucl. Acids Res. 14, 6247-6264. Patterson, B. and Guthrie, C. (1987). An essential yeast snRNA with a US-like domain is required for splicing in vivo. Cell 49, 61 3-624. Salkoff, L., Butler, A,, Scavarda, N. and Wei, A. (1987). Nucleotide sequence of the putative sodium channel gene from Drosophila: the four homologous domains. Nucl. Acids Res. 15, 8569-8572. Sanger, F., Nicklen, S. and Coulson, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 14, 5463-5461. Sharpe, P. M. and Cowe, E. (1991). Synonymous codon usage in Saccharomyces cerevisiae. Yeast 7, 657-678. Soldatov, N. M. (1992). Molecular diversity of L-type Ca2' channel transcripts in fibroblasts. Proc. Natl. Acad. Sci. USA 89, 46284632. Struhl, K. (1985). Naturally occurring poly(dA-dT) sequences are upstream promoter elements for constitutive transcription in yeast. Proc. Nut1 Acad. Sci. USA 82, 8419-8423. Toda, T., Shimanuki, M., Saka, Y., et al. (1992). Fission yeast pap1 -dependent transcription is negatively regulated by an essential nuclear protein, crml. Mol. Cell. Biol. 12, 5474-5484. Zhu, H. and Riggs, A. F. (1992). Yeast flavohemoglobin is an ancient protein related to globins and a reductase family. Proc. Natl. Acud. Sci. USA 89, 5015-5019.