DOI 10.1007/s10517-017-3910-z 814 Bulletin of Experimental Biology and Medicine, Vol. 163, No. 6, October, 2017 METHODS Method of Selection of Bacteria Antibiotic Resistance Genes Based on Clustering of Similar Nucleotide Sequences I. S. Balashov, V. A. Naumov, P. I. Borovikov, A. B. Gordeev, D. V. Dubodelov, L. A. Lyubasovskaya, Yu. V. Rodchenko, A. A. Bystritskii, N. V. Aleksandrova, D. Yu. Trofimov, and T. V. Priputnevich Translated from Byulleten’ Eksperimental’noi Biologii i Meditsiny, Vol. 163, No. 6, pp. 784-787, June, 2017 Original article submitted December 9, 2016 A new method for selection of bacterium antibiotic resistance genes is proposed and tested for solving the problems related to selection of primers for PCR assay. The method implies clustering of similar nucleotide sequences and selection of group primers for all genes of each cluster. Clustering of resistance genes for six groups of antibiotics (aminoglycosides, β-lactams, fluoroquinolones, glycopeptides, macrolides and lincosamides, and fusidic acid) was performed. The method was tested for 81 strains of bacteria of different genera isolated from patients (K. pneumoniae, Staphylococcus spp., S. agalactiae, E. faecalis, E. coli, and G. vaginalis). The results obtained by us are comparable to those in the selection of individual genes; this allows reducing the number of primers necessary for maximum coverage of the known antibiotic resistance genes during PCR analysis. Key Words: antibiotic resistance; clusterization; next-generation sequencing technology; polymerase chain reaction (PCR) Resistance of microorganisms to antimicrobial drugs is a topical problem of modern clinical microbiology. The use of next-generation sequencing (NGS) technology provides maximum information about the genome of bacteria isolated from clinical material. One of the approaches to analyze of this information  is de novo assembly of reads with subsequent identification of resistance genes in databases . This approach is not always convenient, because whole-genome sequencing of each isolate is required, which is not always possible for technical and economic reasons. V. I. Kulakov Research Center for Obstetrics, Gynecology, and Perinatology, Ministry of Health of the Russian Federation, Moscow, Russia. Address for correspondence: firstname.lastname@example.org. A. B. Gordeev In routine practice, PCR test systems that provide rapid and available diagnostic information are the best option. The primers for PCR should have high specificity and all target genes (or they should be universal for a set of similar nucleotide sequences) should be present to include the maximum number of antibiotic resistance (AR) genes. The search for optimal set of primers for the detection of AR genes in bacteria continues , since current commercial PCR kits in real-time for AR assay are limited in the spectrum of the tested genes or are inaccessible . The goal of this work was the development and testing of a new method of selection of genes predicting AR of bacteria for further selection of primers and development of multiplex PCR test systems for simultaneous identification of a large number of ARrelated genes. 0007-4888/17/16360814 © 2017 Springer Science+Business Media New York 815 I. S. Balashov, V. A. Naumov, et al. MATERIALS AND METHODS The study was conducted on 81 strains of resistant bacteria of different genus (K. pneumoniae, Staphylococcus spp., S. agalactiae, E. faecalis, E. coli, and G. vaginalis) that were selected by phenotypic characteristics among strains isolated from patients of the V. I. Kulakov Research Center for Obstetrics, Gynecology, and Perinatology. The proposed method consists in clustering of similar nucleotide sequences and selection of group primers for all the genes of each cluster. Pure bacterial cultures were obtained by inoculation of clinical material into nonselective and selective culture media: 5% blood agar, UriSelect medium (BioRad), Endo agar (State Research Center for Applied Microbiology and Biotechnology), mannitol-salt agar (HiMedia Laboratories) and enterococcal agar (BD). Species identification of the isolated cultures was conducted on a Vitek 2-Compact automatic bacteriological analyzer (BioMerieux) and by using MALDI-TOF technique on a Autoflex III mass spectrometer with MALDI-Biotyper 3.0 software (Bruker Daltonics). The antibiotic sensitivity tests for isolated cultures were performed by phenotypical methods: by discdiffusion method and by determining minimal inhibitory concentration of the drug on a Vitek 2 Compact 30 automatic bacteriological analyzer. The results were evaluated in accordance with the EUCAST Breakpoint table 5.0 recommendations. We performed NGS analysis for all strains of bacteria. Genomic DNA was isolated from fresh cultures containing at least 10 million cells by lysis with lysozyme and proteinase K followed by DNA extraction with phenol-chloroform mixture. DNA libraries were prepared with Ion Xpress Plus Fragment Library Kit and Ion Xpress Barcode adapters 1-96 kits (Thermo Fisher Scientific). The quality of the libraries was controlled on a Bioanalyzer 2100 with High Sensitivity DNA Kit (Agilent Technologies). Ion OneTouch Template Kit (Thermo Fisher Scientific) was used for emulsion PCR and sphere enrichment. Sequencing was performed on an Ion PGM Torrent platform with Ion Sequencing Kit and 316v2 chips (Thermo Fisher Scientific). All the stages starting from preparation of libraries were conducted in accordance with manufacturer’s protocols. The search among known AR associated genes was performed in the ResFinder database  and 699 of 821 unique AR-related genes were selected that were responsible for resistance to antibiotics of 6 classes (aminoglycosides, β-lactams, fluoroquinolones, glycopeptides, macrolides and lincosamides, and fusidic acid) and grouped in accordance with the targeted antibiotic. For each group of genes, hierar- chical clustering of similar nucleotide sequences was performed on the basis of similarity matrix of nucleotide sequences. Contig assembly was carried out on the basis of the data obtained from whole-genome sequencing as well as the assessment of coverage together with alignment for the nucleotide sequences of the genes on the ResFinder. The total length of the contigs (more than 500 nucleotide pairs) adjusted to the reference length for the given type of microorganisms was taken further as a correction value n. For E. coli strains, no genes with nucleotide coverage proportion >0.5 were detected. The values >1 are due to normalization (the average value of the covered nucleotides to the genes more than the average in the genome). For each gene in the samples, nucleotide coverage was evaluated; the value <50% was interpreted as the absence of the gene in the sample. For all samples, the presence of genes from each cluster was evaluated. The association of genes belonging to the cluster with phenotypic response to the tested antibiotic was evaluated using the Cohen κ coefficient. A similar algorithm was used to search for the association of single genes with phenotypic resistance. The results of evaluation of the association for genes belonging to the cluster with the phenotypic response to the tested antibiotic were compared with similar results for single genes. The correction of p-values to the plurality of hypotheses was conducted with FDR method (the expected fraction of false deviations). Contigs assembly was carried out using SPAdes software ; sequence alignment was performed using BWA software ; nucleotide coverage and proportion of covered nucleotides were evaluated using SAMtools  and BEDtools software . The statistical analysis was performed using R platform including Biostrings, stats, Dynamic Tree Cut, irr, and ggplot2 packages. RESULTS The results of NGS are shown in Table 1. Clustering of similar nucleotide sequences yielded from 1 (for fusidic acid) to 17 (for β-lactams) clusters in each group of genes (Table 2). The results of clustering showed that the number of genes in the majority of clusters did not exceed 14. For β-lactams, 5 clusters including 20-179 genes were detected. This is due to high polymorphism of the genes of the blaSHV, blaTEM, blaCMY, blaCTX-M, and blaVIM gene families. Smaller fragmentation of glycopeptides was due to the heterogeneity of sequences that was higher than for other groups. For fusidic acid, clustering was low effective due to low number of known resis- 816 Bulletin of Experimental Biology and Medicine, Vol. 163, No. 6, October, 2017 METHODS TABLE 1. Results of NGS Analysis E. coli E. faecalis G. vaginalis K. pneumoniae P. aeruginosa S. agalactiae S. aureus S. epidermidis 9 6 6 19 11 10 4 16 Average coverage, readings 8.17 11.93 28.99 8.16 11.49 6.8 49.37 36.57 Length of contigs/length of reference (n) 0.993 1.076 1.009 0.980 1.046 0.331 0.880 0.995 Number of found genes (median) 0 32.5 3 18 38 9.5 37 25.5 Average part of coated nucleotides in the genes — 0.769 0.888 0.793 0.816 1.812 1.004 0.970 Parameter Number of strains TABLE 2. Distribution of Genes during Clustering Stage Drug Number of genes Number of clusters Number of genes in clusters [Min; Max] Aminoglycosides 61 15 [2; 10] β-Lactams 530 17 [2; 179] Fluoroquinolones 21 3 [2; 14] Glycopeptides 24 8 [2; 5] Macrolides 61 15 [2; 10] Lincosamides 2 1  tance genes. During evaluation of association between single genes and clusters, only elements with κ≥0.5 and p<0.05 after correction after multiple comparisons were selected. The results of evaluation of the genotype and phenotype association are given in Table 3 (the data are presented only for antibiotics, for which significant results were obtained). For erythromycin and carbapenems, the clustering of gene sequences made it possible to reach κ>0.5, while for single genes no significant result was received. For ampicillin, clusters showed better association than with single genes. For amikacin, there were no clusters for a significant evaluation of AR. It should be noted that for vancomycin, the results of test of single genes and clusters coincide because of the co-presentation of the same genes from clusters in the samples. The method of clusterization was applied for solving the task of selection of a set of nucleotide sequences, to which it is then proposed to select primers that allow the identification of a wide range of bacterial resistance genes for the six groups of antibiotics (aminoglycosides, β-lactams, fluoroquinolones, glycopeptides, macrolides and lincosamides, and fusidic acid). For fusidic acid this method has no significant advantages, as only 2 resistance genes (fusB and far1) are known. However, in the case of the other five groups, the number of clusters in each group (from 3 to 17) was significantly lower than the number of genes (from 21 to 530). The method of clustering of similar nucleotide sequences allowed reducing the number of nucleotide sequences required for full coverage of genes by primers during PCR analysis, from 699 (for unique genes) to 59 (for clusters). The analysis of the correspondence of the phenotypic response with a set of genes and a set of clusters showed similar results for genes and for clusters, which gives us grounds to TABLE 3. Assessment of the Genotype and Phenotype Association for Single Genes and Clusters The maximum value of κ Cohen Drug single genes clusters Amikacin 0.68 — Ampicillin 0.6 0.68 Amoxicillin/clavulanic acid 0.76 0.76 Third generation of cephalosporins 0.76 — — 0.54 Vancomycin 0.78 0.78 Erythromycin — 0.54 Carbapenems 817 I. S. Balashov, V. A. Naumov, et al. state that all useful information about genes at the application of clustering is preserved. Thus, testing of the proposed method showed that the use of clustering during molecular-genetic testing of strains for the presence of AR yielded results comparable to the results obtained of testing of single genes. In addition, clustering based on similarity of nucleotide sequences reduced the number of primers required to maximum coverage of known AR genes during real-time multiplex PCR and increased availability of the analysis for health care institutions. The work was performed within the framework of the Agreement with the Ministry of Science and Education of the Russian Federation No. 14.607.21.0019 “Development of molecular and genetic test systems for evaluation of the pathogenicity and resistance of nosocomial and opportunistic pathogens in mothers and newborns (code 2014-14-579-0001-065). REFERENCES 1. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19(5):455-477. 2. Ellington MJ, Findlay J, Hopkins KL, Meunier D, AlvarezBuylla A, Horner C, McEwan A, Guiver M, McCrae LX, 3. 4. 5. 6. 7. 8. 9. Woodford N, Hawkey P. Multicentre evaluation of a real-time PCR assay to detect genes encoding clinically relevant carbapenemases in cultured bacteria. Int. J. Antimicrob. Agents. 2016;47(2):151-154. Findlay J, Hopkins KL, Meunier D, Woodford N. Evaluation of three commercial assays for rapid detection of genes encoding clinically relevant carbapenemases in cultured bacteria. J. Antimicrob. Chemother. 2015;70(5):1338-1342. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754-1760. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-2079. Thomsen MCF, Ahrenfeldt J, Cisneros JLB, Jurtz V, Larsen MV, Hasman H, Aarestrup FM, Lund O. A bacterial analysis platform: an integrated system for analysing bacterial whole genome sequencing data for clinical diagnostics and surveillance. PLoS One. 2016;11(6):e0157718. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841842. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, Aarestrup FM, Larsen MV. Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 2012;67(11):2640-2644. Zankari E, Hasman H, Kaas RS, Seyfarth AM, Agersø Y, Lund O, Larsen MV, Aarestrup FM. Genotyping using wholegenome sequencing is a realistic alternative to surveillance based on phenotypic antimicrobial susceptibility testing. J. Antimicrob. Chemother. 2013;68(4):771-777.