YEAST VOL. 12: 505-514 (1996) 0o o o o 0' XIV % Yeast Sequencing Reports 0 0 0o o o o The Sequence of a 24 152 bp Segment from the Left Arm of Chromosome XIV from Succhuromyces cerevisiae Between the BNIl and the POL2-Genes MARK SEN-GUPTA, RUTH LYCKT, URSULA FLEIG, RAINER K. NIEDENTHAL AND JOHANNES H. HEGEMANN* Institute fur Mikrobiologie und Molekulurbiologie, Justus-Liebig- Universitat Giessen, Frankfurter Str. 107, 35392 Giessen, Germany TInstitut f u r Zellbiologie, J. W. Goethe-Universitat Frunkfurt, Marie-Curie-Str. 9, 60439 FrunkfurtlM., Germany Received 24 October 1995; accepted 11 November 1995 In the framework of the European Union programme for sequencing the genome of Saccharomyces cerevisiue we have determined the nucleotide sequence of a region of 24 152 bp located on the left arm of chromosome XIV between the BNIl and the POL2 genes. The sequence was obtained by directed sequence analysis using a mixture of Ex0111 and primer walking strategies. Subsequent analysis revealed 13 open reading frames (ORFs) including four small ORFs completely internal to, or partly overlapping with, other ORFs. Five of these ORFs have been described previously (BNII, A P L I , L YPI, PIKI, POL2) and thus 74.8% of the 24 152 bp were already present in the databases prior to this sequencing effort. Interestingly, all 13 identified ORFs are characterized by a low codon adaptation index (0,04422). In addition, this region of chromosome XIV shows an unusually high gene density with about 88% of coding DNA. This amounts to one gene per 2177 bp, which is significantly above the average gene length (about 1500 bp). For eight ORFs considerable homologies to 'Expressed Sequence Tags' derived from human cDNAs located in the XREF database could be identified.'The complete nucleotide sequence of the 24 152 bp segment has been deposited in the EMBL data library under the Accession Number X92494. KEY WORDS Sacchuromyces cerevisiue; chromosome XIV; ORF analysis; amino acid permeases; polymerase 11; phosphatidylinositol-4-kinase ~ INTRODUCTION Chromosome XIV from Saccharomyces cerevisiae is one of the chromosomes currently being sequenced as part of the European Union BIOTECH I and I1 projects with the aim of obtaining the entire sequence of the genome of this eukaryotic model organism. As part of this framework, we report here the sequence and analysis of a 24 152 bp segment from the left arm of chromosome XIV. This segment starts about 132 kb away from the left telomere with the previously *Corresponding author. CCC 0749-503X/96/050505-10 0 1996 by John Wiley & Sons Ltd identified BNIl gene and ends about 470 kb away from the centromere at the POL2 gene. The telomere-proximal end of the DNA contig described here is partly overlapping with part of the insert from cosmid 14-3b sequenced by the Glansdorff-Pierard laboratory. MATERIALS AND METHODS Plasmids and D N A manipulations The DNA segment reported here is located on two partially overlapping cosmids 14-3b pou and 14-4 which were obtained from Peter Philippsen 506 who is coordinator of the chromosome XIV sequencing group within the BIOTECH project. Cosmid 14-4, which is about 44.75 kb in size and carries a 36.9 kb insert, was obtained as follows: yeast strain FY1679 was partially cleaved with Sau3A and the fragments were cloned into the BamHI-cleaved 8.2 kb cosmid vector pWE15 (Evans and Wahl, 1987) to obtain a yeast genomic library (Thierry et al., 1995). Chromosome XIV specific cosmids were identified and their position along the chromosome was determined (Hamberg, 1993). Cosmid 14-3b pou is about 45.2 kb in size, carries a 37-1 kb insert and was isolated from a different yeast genomic DNA cosmid library. This library was created by cloning a partial Sau3A digest of genomic DNA from yeast strain FY1679 into the BamHI site of the pOU6lcos vector (Knott et al., 1988; Thierry et al., 1995). We verified the cosmid inserts by Southern blots on EcoRI-cleaved cosmid DNA and cleaved yeast genomic DNA of strain FY1679. All DNA manipulations were done as described (Sambrook et al., 1989) and enzymes were used according to the supplier’s instructions. The Ex0111 kit Erase-a-base was obtained from Promega (Heidelberg, Germany). Oligonucleotides were synthesized on an ABI 370B DNA synthesizer and used directly after ethanol precipitation without further purification. Strains and media The Escherichia coli strain used for cloning experiments was XLlblue (recAf, endAf, gyrA96, thi-f, hsdRl7, supE44, relAl, lac [F’ proAB, lac14ZdMf 5, Tnf 0 (tet‘)]). E. coli transformants were selected on LB media supplemented with 0.1 gll ampicillin. The yeast strain was FY1679 ( a h ura3lura3 trpllTRP1 leu21LEU2 his31HIS3 GAL2; from Bernard Dujon). Yeast media have been described (Sherman et al., 1986). Sequence analysis Cosmids 14-4 and 14-3b were cleaved with EcoRI, XhoI or SalI and the fragments subcloned into pBluescript SKI1+ (Stratagene, Heidelberg, Germany). All subclone inserts were sequenced at their 5’ and 3’ termini using modified T3 (CAA TTA ACC CTC ACT AAA G) and T7 (GTA ATA CGA CTC ACT ATA G) primers. Oligonucleotides designed from these sequences were used to sequence directly on the cosmids to determine physical linkage of individual subclones. For M. SEN-GUPTA ET AL. subclones with inserts longer than 2 kb a series of nested deletions was constructed by limited digestion with Ex0111 according to the supplier’s instructions and sequenced by using the modified T3 and T7 primers. Sequence gaps were filled by using appropriate oligonucleotides. Smaller subclones were sequenced by primer walking with primers homologous to sequences 200 to 300 bases apart. Sequencing was performed on an ABI 373A sequencer (Applied Biosystems Inc., Weiterstadt, Germany) using the Taq DyeDeoxy Terminator Cycle Sequencing Kit supplied by the manufacturer. Sequence alignments and overlaps were done on a Macintosh Quadra 650 computer using DNASTAR’S seqman program (Lasergene Ltd, London, England). FASTA, MOTIFS and PROSITE analyses were routinely performed by MIPS (Martinsried, Germany). Further characterization was done by using DNASISlPROSIS (Pharmacia, Freiburg, Germany) and the HUSAR package (DKFZ, Heidelberg, Germany). Homology searches against the Expressed Sequence Tag (EST) database were performed by the Genome Cross-Referencing Group at the NIH, Bethesda, U.S.A. This XREF database contains cDNAs from humans, mouse and rat (Boguski et al., 1994). RESULTS AND DISCUSSION Sequence determination In total, 278 subclones and Ex0111 clones were generated and sequenced to determine the 24 152 bp composition on both strands. The total number of bases sequenced was 109 581. By using a sequencing strategy of 45% primer walking and 55% nested deletions, the average reading number per base was 4.5, with each base being sequenced at least three times (upper and lower strand together). At the left end of the 24 152 bp contig, the cosmid clone 14-3b has an overlap of 1701 bp with cosmid clone 14-3b sequenced by the GlansdorffPierard laboratory (Figure 1). No sequence discrepancies were found in this overlap. Sequence analysis A search for coding regions in the 24 152 bp long contig revealed 13 open reading frames (ORFs) longer than 300bp named NO646 to 507 24 152 bp SEGMENT FROM CHROMOSOME XIV Telomere Cosmid clone 14-4 Centrornere ---+ Cosmid clone 14-3b pou N0665 NO670 NO790 NO795 10000 No825 NO800 15000 20000 24152 No647 0n m ORF with intron e w ~ ~ ~ s also sequenced by the laboratory of Glansdorff-Pierard I I Figure 1. Genomic organization of a 24 152 bp DNA fragment from the left arm of chromosome XIV. Localization of the inserts from the relevant cosmid clones and localization and orientation of the open reading frames (ORFs) are indicated. Only ORFs longer than 100 amino acids are shown. The preliminary ORF nomenclature was provided by MIPS and begins with N, indicating chromosome XIV, followed by a consecutive number starting at the left end of this DNA fragment (the lowest number telomere-proximal, the highest number centromere-proximal). Definitive numbering will be given after assembly of the entire chromosome. Previously sequenced ORFs are indicated by shaded boxes and their names. NU830 (Figure 1). This nomenclature is preliminary and will be revised once chromosome XIV is assembled completely. ORF NO646 extends into sequences flanking this contig to the left and is thus incomplete. No tRNAs, T y elements, delta or sigma sequences could be detected. Determination of the codon adaptation index (CAI; Sharp and Li, 1987) for each ORF revealed that all 13 ORFs are probably expressed at a low level, showing CAI values of 0.04 to 0.22 only (Table 1). In particular the four ORFs NU647, NU665, NU670 and NU830 have extremely low CAI values between 0.04 and 0.10. These values are below the minimal CAI value of 0.11 defined previously for a ‘real’ gene (Dujon et al., 1994). Taking the CAI value and the size as a measure these four ORFs probably do not code for any proteins. Furthermore, three of these four ORFs, NU647, NU665 and NU830, were found in the coding region of the previously identified genes BNII (NU646), APLI (NU66U) and POL2 (NU825) respectively. Thus the very low CAI values and the fact that thus far a gene within another gene has not been described in S. cerevisiae strongly suggest that the three internal ORFs are not real genes (Demolis et al., 1993; Boles and Zimmermann, 1994). It should be noted, however, that two of the questionable ORFs (NU665 and NU67U) show a significant homology to human cDNAs, although the relevance of this finding is at present unclear (Table 2). Within the 24 152 bp long contig, five genes have been identified previously, thus 74.8% of the contig was already available in the databases. Excluding the four questionable ORFs, the contig carries nine ORFs. The mean length of these ORFs is 2177 bp (726 amino acids). This number is above the average ORF length of 1461 bp (487 amino acids) described for the already published yeast chromosomes (Oliver et al., 1992; Dujon et al., 1994; Johnston et al., 1994; Feldmann et al., 1994). The higher value for this particular region of chromosome XIV may be due to the fact that several larger genes (BNII, PIKl and POLZ) reside within this region (Table 1). The nine ORFs cover about 88% of the contig. Again this number is significantly above the 72% described, for example, for chromosome I1 (Feldmann et al., 1994) and is probably due to the presence of the three larger genes. Such a high coding capacity has also been described for certain regions of chromosome VIII (Johnston et al., 1994). The maximum length of an + - NO809 - + - - + + + + 255.7 13.7 0.15 0.10 0.20 0.16 40.7 35.5 350 314 14 839-15 891 16 172-17 116 2222 124 0.15 34.5 298 3556-14552 17488-24152 20535-20906 0.14 0.04 0.12 0.09 0.08 0.22 0.15 0.12 CAI2 174.5 12.5 64.0 13.8 15.3 68.1 119.9 15.5 MW (kDa) 1553 103 573 123 131 61 1 1066 139 Size (aa) 14659 68 1-992 52166937 5625-5996 6975-7370 78269661 0 1 5 4 1 3 354 3521-13940 Position VEE350 (S.C.) NADH dehydrogenases POL2 ( S . C . ) PIKl (S.c.) NADH dehydrogenases N UFl (S.c.) L YPl (S.c.) ALP1 (S.C.) BNIl (S.C.) Homology to 11 210111210 93911825 11313029 12214276 3 12213134 488114881 1151709 298413006 706718870 Score FASTA3 100 99.7 (3) 100 99.1 (5) 99.9 (2) (aa) 100 99.6 (9) 100 99,3 (12) 99.9 (2) (bP) % Identity4 (changes) Internal to NO825 Prokaryotic lipoprotein attachment site Putative intron at pos. 14 425-14 531 Internal to NO660 Internal to NO646 Special features ' ORFs that are questionable because of their size and low CAI value are in bold. +indicates that transcription of the ORF is centromere-directed; - indicates transcription towards the telomere. 'The CAI is the codon adaptation index calculated following Sharp and Li (1987). 3FASTA scores higher than 100 are listed. Only the highest score is shown. S.C.=Saccharornyces cerevisiae. 4The percentage of identity over the entire ORF is indicated. The numbers of amino acids or base pairs which are different to the previously published sequences are shown in brackets. NO825 NO830 NO815 NO820 - NO646 NO647 NO660 NO665 NO670 NO790 NO795 NO800 - Orientation' ORF name Table 1. F .P "1 b. h 2 s 70 % 509 24 152 bp SEGMENT FROM CHROMOSOME XIV Table 2. EST homologies.' ORF name NO646 NO660 NO665 NO670 NO790 NO795 NO800 NO820 Highest p-value 3.1e-2 7.9e-13 4.8e-5 3.8e-2 1.7e-13 7.8e-24 6.6e-2 6.2e- 11 cDNA Origin NCBI-ID Human placenta RNA Human RNA Muscular atrophy patient; total brain RNA Human fetal liver and spleen RNA Human RNA Total human brain RNA Alu-primed human cDNA Human fetal liver and spleen RNA 106 959 12 857 123 375 169 620 12 857 300 820 92 261 187 815 cDNA related protein family Amino acid permeases Stress-related proteins Amino acid permeases Phosphatidylinositol 3-kinases Cytochrome c oxidases 'Only homologies with a p-value better than 7.0e-2 are listed. intergenic region, counted from the initiation or the termination codon of the adjacent ORFs, was found to be 889 bp (intergenic region between A P L l and L Y P I ) and the minimum length was 167 bp (intergenic region between PIKl and NOSOO). The average length of an intergenic region was 383 bp, which is only 66% of the length found for other intergenic regions (for example 574 bp on chromosome 11, Feldmann et al., 1994). In summary, the low CAI values found for all ORFs can be correlated with a high gene density in this region of chromosome XIV. It remains to be tested systematically, whether such a correlation can be found also for other chromosomal locations. Finally the G + C content of the 24 152 bp long contig is 38.8%, a value slightly higher than that found for the other published chromosomes (Oliver et al., 1992; Dujon et al., 1994; Johnston et al., 1994; Feldmann et al., 1994). For example, for chromosome 11, the overall G + C content is 38.3%, while the G + C content of only the coding regions is 39.6% (Feldmann et al., 1994). So the higher value for our contig may be due to its high gene density. Analysis of ORF products The location and orientation of the 13 ORFs on the 24 152 bp contig, the CAI value as well as the length and molecular weight of the deduced proteins are listed in Table 1. In addition, the results from the FASTA analysis and the presence of specific motifs in the putative proteins are summarized in Table 1. Finally, the amino acid sequence of all ORFs was used to search the EST database and EST homologues found for an ORF are listed in Table 2 (XREF database, Boguski et al., 1994). NO646 This ORF is identical to the first 4659 bp of BNIl which was identified as a synthetic lethal to CDC12 (Fares and Pringle, unpublished 1994; GenBank Accession Number S48523). Mutant forms of CDC12 show defects in cytokinesis and formation of microfilament rings (Hartwell, 1971; Byers and Goetsch, 1976). A comparison of NO646 and BNIl showed two base pair changes in the ORF leading to two different amino acids. Database searches revealed no further motifs to be present in BNIl and only a weak homology (p-value of 3.le-2) to an EST cDNA could be detected. NO660 This gene is identical to A P L l , which encodes a putative permease for basic amino acids (Sychrova and Chevallier, 1994; EMBL Accession Number X74069). Comparison of the A P Ll sequence with NO660 revealed 12 base pair changes and a deletion of three base pairs in our sequence. The 12 base pair changes gave rise to four amino acid variations between APLI and NO660 and eight silent changes. One amino acid change at position 517 is in a region not conserved among known amino acid permeases L Y P l , A P L l , CANl and GAPl (Figure 2). This may indicate that the changes are due to strain differences rather than sequencing errors. Two other changes are in regions (positions 260 and 548) where the permeases are conserved and both exchanges fit the consensus sequence better than the published A P Ll sequence does. One exchange at position 126 within the amino acid permeases signature changes an amino acid which fits the consensus derived from the four permeases L Y P I , A P L l , CANl and G A P l . However, this amino acid change is in a region not well 510 M. SEN-GUPTA ET AL. ALPl NO660 LYPl NO790 M G R F S N I I T S N K W D E K Q N N I G E Q S M Q E L M G R F S N I I T S N K W D E K Q N N I G E Q S M Q E L CANl GAPl 12857/2-243 128571242-304 ALPl NO660 LYPl NO790 CANl GAPl 12857/2-243 ALPl NO660 LYPl NO790 CANl GAP1 1285712-243 _ _ _ _ _ _ _ _ _ _ _ _ - _ _ - _ _ - _ _ _ _ _ T ALP1 NO660 LYPl NO790 F A Q R F L S P T T T 1S S F T V CANl S GAPl TacdT 1285712-243 _ _ _ _ _ - - - - - - - - - - - - - - - - ALPl NO660 LYPl NO790 CANl GAPl 1285712-243 ALPl NO660 LYPl NO790 P I G F R YW P I G F R YW P I G F R YW W R R R R N N N N P P P P G G G G A A A A W W W W G G G G P P P P G R P L G W V S S L G R P L G W V S S L CANl GAP1 12857/2-243 12857/242-304 conserved in the amino acid permeases signature (see Figure 2 legend). An additional lysine codon was found in the published A P L l sequence. This change was observed in the N-terminus of the protein. In the APLl published sequence three identical lysine codons were present at amino acid position 91 to 93 while in our NO660 sequence only two lysine codons could be found. As this sequence variation is located at the N-terminus of the protein, it does not affect the homology regions to other amino acid permeases. Thus the product of F V G G - P Q the NO660 gene sequenced here is 573 amino acids long and one amino acid shorter than the published Apllp sequence (574 aa). The FASTA analysis revealed a high homology of A P L l to ORF N0790, which is located upstream of Apll and which is identical to the L YPl gene encoding a lysine-specific permease (see below, Figure 2). The A P L l and L Y P l DNA sequences show high homology to a gene on chromosome V, CANl (Ahmad and Bussey, 1986), and the general amino acid permease GAPl on 51 1 24 152 bp SEGMENT FROM CHROMOSOME XIV ALPl NO660 LYPl I N A A F T Y Q G T E L V G I T A G E A A N P R K A L P R A I K K V V I N A A F T Y Q G T E L V G I T A G E A A N P R K A L P R A I K K V V P R K S V P R A I K K V V P RK S V P m Q M F IIG ALPl NO660 LYPl NO790 IT A GE AIK D B E ~ I K - F Y I L S L F F I G L L V P Y N D F Y I L S L F F I G L L V P Y N D CANl GAPl 1285712-243 128571242-304 L M I G L L V P Y N D ALPl NO660 LYPl NO790 CAM GAPl 1285712-243 128571242-304 ALPl LYPl NO790 GAPl 1285712-243 128571242-304 GAPl 1285712-243 128571242-304 ALPl NO660 LYPl NO790 CANl GAP1 1285712-243 128571242-304 ALPl NO660 LYPl NO790 CANl GAPl 1285712-243 128571242-304 ALPl NO660 LYPl NO790 CANl GAPl 1285712-243 128571242-304 rn M W E D E P K N - - - - - - - - - - - _ _ _ _ _ _ _ _ _ _ _ -FWD FWNVVA V W E D H E P K L L K Q E I A E E K A I M A T K P R W Y R I W - N F W C Figure 2. (Continued) Figure 2. Alignment of the amino acid sequences of NO660 and NO7990 with the Apll, Lypl, Can1 and Gap1 proteins and a sequence derived from the EST database with the NCBI-ID 12857. Amino acids identical to the consensus sequence are boxed. Amino acids that differ between N0660p and Apll or between N0790p and Lypl are shaded. The match to the amino acid permeases signature is indicated. The signature is defined as: [STAGCI- G-P-X(~,~)-[LIVMFYW](~)-X-[LIVMFYW]-X-[LIVMFSTA](~)-[SAG]-X(~)-[LIVMFYW]x-[LIVMS]-x(3)-[LMC]-[GA]-E-x(5)-[PS] (taken from PROSITE). The numbers to the right of the NCBI-ID of the cDNA indicate the translated bases. A possible frameshift in the sequence is present at position 242. chromosome XI (Jauniaux and Grenson, 1990; Figure 2). In our search for human cDNA homologues, significant homologies to peptide sequences derived from several cDNAs were found, all of which showed a high similarity to amino acid permeases or amino acid transporters 512 NO795 67354/150-284 79696/1-240 216995/10-258 30082012-208 M. SEN-GUPTA ET AL. LLVETITN - --- -- A M S V H S I K K A L T K K M I E D A E L D D K G G I A S L N D H F L R A F G N P N G F K Y R R AODNa900 44 60 V V ~ V [ S 1 I ~ Q V -~ -X - X- - - - - - - - - - - - _ _ _ _ _ _ _ _ - - - - _ _ _ _ _ PIK-Slgnaiure B NO795 ~ A ~ S L A A Y S V I C Y L L O V K D R H N G ~ I M I D N E G H V S H I D P G F M L S N S P G S V G F E A A P F K L T Y ~ 9 6 0 I 44 673541150-284 60 83 69 ~~ Figure 3. Alignment of the amino acid sequence of NO795 with sequences derived from four different cDNAs (NCBI-IDS: 67354, 79696, 216995 and 300820) of the EST database. Amino acids identical to N0795p are boxed. The match to the two and phosphatidylinositol 3- and 4-kinases signatures, which are [LIVMFA]-K-~(~)-[DE](~)-[LIVM]-R-Q-[DE]-X(~)-[LIVMFY]-Q S-~-A-~(~)-[LIVM]-~(~)-[FY]-[LIVM](~)-~-[LIVM]-X-D-R-H-X(~)-N (taken from PROSITE), respectively, is indicated. from different organisms (Table 2). The high homology between the protein sequences derived from A P L l , LYP1, CAN1 and GAP1 and the cDNAs is demonstrated in Figure 2. NO790 This ORF is identical to L Y P l coding for a lysine-specific permease (Sychrova and Chevallier, 1993; EMBL Accession Number X67315). Compared to the LYPI sequence in the database we observed nine base pair changes which led to three different amino acids and six silent changes. None of these changes lie within a conserved region of the amino acid permeases (Figure 2). An homology search revealed strong similarities to peptide sequences from the EST database (Table 2). The peptide sequences themselves show homology to amino acid permeases from different organisms. In Figure 2, a multiple sequence alignment of Lypl, Apll, Canl, Gap1 and the protein sequences derived from one cDNA with N0660p and N0790p is presented. The aligned amino acids identical to the consensus sequence are boxed. The homology of the cDNA is not located exclusively within transmembrane spanning or other known functional domains. NO795 This ORF is identical to PZKl, encoding phosphatidylinositol-4-kinase, which catalyses the first step in the biosynthesis of phosphatidylinositol4,5-bisphosphate (PtdIns[4,5]P2; Flanagan et al., 1993; GenBank Accession Number L20220). No changes in the DNA sequence were observed. In the EST database, several human cDNAs were found encoding peptides with highly significant homologies to Pikl protein (Table 2). The amino acid alignment revealed that the proteins derived from two of the cDNAs show homology to one domain from Pikl, which is found in protein kinases (Figure 3). Two other cDNAs show very high homology to a second motif located at the Cterminal end of N0795p. These phosphatidylinositol 3- and 4-kinase signatures seem to be distantly related to the catalytic domain of protein kinases (Kunz et al., 1993). NO800 This ORF codes for a putative protein of 139 amino acids. ORF NO800 overlaps with ORF NO809 by 382bp on the opposite strand. The FASTA search revealed weak homologies to several NADH dehydrogenases from different organisms (Table 1). The motif search indicated a sequence with similarity to prokaryotic lipoprotein attachment sites. Search of the EST database showed weak homology to a cDNA-derived peptide, which shares some similarity with a group of cytochrome c oxidases (Table 2). The CAI value for NO800 was 0.12. ORF NO800 together with NO809 was disrupted but no effect on viability or growth was observed (Giildener and Hegemann, unpublished). NO809 This ORF encodes a putative protein of 34.5 kDa. The coding region appears to be interrupted by an intron (base pairs 21-127). The putative gene product shows a moderate similarity to NUFl from budding yeast, which functions as a spacer protein in the spindle pole body (Mirzayan et al., 1992; Kilmartin et al., 1993; Table 1). Disruption of NO809 together with NO800 gave 24 152 bp SEGMENT FROM CHROMOSOME XIV' no detectable altered phenotype (Giildener and Hegemann, unpublished). NO815 This hypothetical gene product bears significant similarity to another ORF from chromosome XIV, VEE350p (MIPS confidential database; Table 1). No further significant homology was detected by FASTA analysis or comparison to the EST database. Disruption of NO815 resulted in a slow growth phenotype, indicating that this gene most likely is a real gene (Giildener and Hegemann, unpublished). NO820 The putative gene product shows moderate similarity to several NADH dehydrogenases (Table 1). Search of the EST database revealed a significant homology of 70% in a 48 amino acids stretch with 41% identity to the translation product of a human cDNA sequence. This cDNA itself shows n o homologies to other protein families (Table 2). Disruption of this O R F was lethal, indicating that NO820 is an essential gene (Giildener and Hegemann, unpublished). NO825 This O R F is identical to POL2 encoding the catalytic subunit of DNA polymerase I1 (Morrison et al., 1990; GenBank Accession Number M36724). N o sequence differences were found within the 6666 b p long coding region. It has been shown previously that Po12 has homologies to polymerases from higher eukaryotes. ACKNOWLEDGEMENTS We gratefully acknowledge the contribution of Katrein Hamberg and Peter Philippsen, who provided the cosmid clones 14-4 and 14-3b. We thank Susanne Heck for synthesizing oligonucleotides and Lydia Karpfinger and Karl Kleine (MIPS, Martinsried, Germany) for their help with the sequence analysis. This work was supported by the European Union Programmes BIOTECH I and 11. REFERENCES Ahmad, M. and Bussey. H. (1986). Yeast arginine permease: nucleotide sequence of the CAN1 gene. Curr. Genet. 10, 587-592. Boguski, M. S., Tolstoshev, C. M. and Bassett, D. E., Jr. (1994). Gene discovery in dbEST [letter]. Science 265, 1993-1994. Boles, E. and Zimmermann, F. K. (1994). Open reading frames in the antisense strands of genes coding for 513 glycolytic enzymes in Saccharomyces cerevisiae. Mol. Gen. Genet. 243, 363-368. Byers, B. and Goetsch, L. (1976). A highly ordered ring of membrane-associated filaments in budding yeast. J. Cell Biol. 69, 717-721. Demolis, N., Mallet, L., Bussereau, F. and Jacquet, M. (1993). RIM2, M S I l and PGIl are located within an 8 kb segment of Saccharomyces cerevisiae chromosome 11, which also contains the putative ribosomal gene L21 and a new putative essential gene with a leucine zipper motif. Yeast 9, 645-659. Dujon, B., Alexandraki, D., Andre, B., et al. (1994). Complete DNA sequence of yeast chromosome XI. Nature 369, 371-378. Evans, G. A. and Wahl, G. M. (1987). Cosmid vectors for genomic walking and rapid restriction mapping. Methods Enzymol. 152, 604610. Feldmann, H., Aigle, M., Aljinovic, G., et al. (1994). Complete DNA sequence of yeast chromosome 11. EMBO J. 13, 5795-5809. Flanagan, C. A,, Schnieders, E. A,, Emerick, A. W., Kunisawa, R., Admon, A. and Thorner, J. (1993). Phosphatidylinositol 4-kinase: gene structure and requirement for yeast cell viability. Science 262, 14441448. Hamberg, K. (1993). Kartierung von Cosmidklonen des Chromosoms XIV der Hefe Saccharomyces cerevisiae unter Verwendung neuentwickelter Vektoren fur die Chromosomenkopierende Transformation. PhD Thesis, Justus-Liebig Universitat, Giessen. Hartwell, L. H. (1971). Genetic control of the cell division cycle in yeast. IV. Genes controlling bud emergency and cytokinesis. Exp. Cell Res. 69, 265276. Jauniaux, J. C. and Grenson, M. (1990). GAPI, the general amino acid permease of Saccharomyces cerevisiae. Nucleotide sequence, protein similarity with the other bakers yeast amino acid permeases, and nitrogen catabolite repression. Eur. J. Biochem. 190, 39-44. Johnston, M., Andrews, S., Brinkman, R., et al. (1994). Complete nucleotide sequence of Saccharomyces cerevisiae chromosome VIII. Science 265, 2077-2082. Kilmartin, J. V., Dyos, S. L., Kershaw, D. and Finch, J. T. (1993). A spacer protein in the Saccharomyces cerevisiae spindle poly body whose transcript is cell cycle-regulated. J. Cell Biol. 123, 1175-1 184. Knott, V., Rees, D. J., Cheng, Z. and Brownlee, G. G. (1988). Randomly picked cosmid clones overlap the pyrB and oriC gap in the physical map of the E. coli chromosome. Nucl. Acids Res. 16, 2601-2612. Kunz, J., Henriquez, R., Schneider, U., Deuter Reinhard, M., Movva, N. R. and Hall, M. N. (1993). Target of rapamycin in yeast, TOR2, is an essential phosphatidylinositol kinase homolog required for G 1 progression. Cell 73, 585-596. Mirzayan, C., Copeland, C. S. and Snyder, M. (1992). The NUFl gene encodes an essential coiled-coil 514 related protein that is a potential component of the yeast nucleoskeleton. J. Cell Biol. 116, 1319-1332. Morrison, A,, Araki, H., Clark, A. B., Hamatake, R. K. and Sugino, A. (1990). A third essential DNA polymerase in S. cerevisiae. Cell 62, 1143-1 151. Oliver, S. G., van der Aart, Q. J., Agostoni Carbone, M. L., et al. (1992). The complete DNA sequence of yeast chromosome 111. Nature 357, 38-46. Sambrook, J., Fritsch, E. F. and Maniatis, T. (1989). Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Sharp, P. M. and Li, W. H. (1987). The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucl. Acids Res. 15, 1281-1295. M. SEN-GUPTA ET AL. Sherman, F., Fink, G. R. and Hicks, J. B. (1986). Laboratory Course Manual for Methods in Yeast Genetics. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Sychrova, H. and Chevallier, M. R. (1993). Cloning and sequencing of the Saccharomyces cerevisiae gene LYP1 coding for a lysine-specific permease. Yeast 9, 771-782. Sychrova, H. and Chevallier, M. R. (1994). A P L l , a yeast gene encoding a putative permease for basic amino acids. Yeast 10, 653-657. Thierry, A,, Gaillon, L., Galibert, F. and Dujon, B. (1995). Construction of a complete genomic library of Saccharomyces cerevisiae and physical mapping of chromosome XI at 3.7 kb resolution. Yeast 11, 121-135.