Genome-wide association study of rheumatoid arthritis in KoreansPopulation-specific loci as well as overlap with European susceptibility loci.код для вставкиСкачать
ARTHRITIS & RHEUMATISM Vol. 63, No. 4, April 2011, pp 884–893 DOI 10.1002/art.30235 © 2011, American College of Rheumatology Genome-Wide Association Study of Rheumatoid Arthritis in Koreans Population-Specific Loci as Well as Overlap With European Susceptibility Loci Jan Freudenberg,1 Hye-Soon Lee,2 Bok-Ghee Han,3 Hyoung Do Shin,4 Young Mo Kang,5 Yoon-Kyoung Sung,2 Seung-Cheol Shim,6 Chan-Bum Choi,2 Annette T. Lee,1 Peter K. Gregersen,1 and Sang-Cheol Bae2 Objective. To perform a genome-wide association study (GWAS) in Koreans in order to identify susceptibility loci for rheumatoid arthritis (RA). Methods. We generated high-quality genotypes for 441,398 single-nucleotide polymorphisms (SNPs) in 801 RA cases and 757 controls. We then tested 79 markers from 46 loci for replication in an independent sample of 718 RA cases and 719 controls. Results. Genome-wide significance (P < 5 ⴛ 10–08) was attained by markers from the major histocompatibility complex region and from the PADI4 gene. The replication data showed nominal association signals (P < 5 ⴛ 10–02) for markers from 11 of the 46 replicated loci, greatly exceeding random expectation. Genes that were most significant in the replication stage and in the combined analysis include the known European RA loci BLK, AFF3, and CCL21. Thus, in addition to the previously associated STAT4 alleles, variants at these three loci may contribute to RA not only among Europeans, but also among Asians. In addition, we observed replication signals near the genes PTPN2, FLI1, ARHGEF3, LCP2, GPR137B, TRHDE, and CGA1. Based on the excess of small P values in the replication stage study, we estimate that more than half of these loci are genuine RA susceptibility genes. Finally, we systematically analyzed the presence of association signals in Koreans at established European RA loci, which showed a significant enrichment of European RA loci among the Korean RA loci. Conclusion. Genetic risk for RA involves both population-specific loci as well as many shared genetic susceptibility loci in comparisons of Asian and European populations. Supported by the American College of Rheumatology Research and Education Foundation (Within Our Reach research grant to Dr. Gregersen), the Ministry for Health and Welfare, Republic of Korea (Korea Healthcare Technology R&D project grants A010252 and A084794 to Drs. H.-S Lee and S.-C. Bae), and the Eileen Ludwig Greenland Center for Rheumatoid Arthritis. 1 Jan Freudenberg, MD, Annette T. Lee, PhD, Peter K. Gregersen, MD: Feinstein Institute for Medical Research and North Shore–Long Island Jewish Health System, Manhasset, New York; 2 Hye-Soon Lee, MD, PhD, Yoon-Kyoung Sung, MD, PhD, MPH, Chan-Bum Choi, MD, PhD, Sang-Cheol Bae, MD, PhD, MPH: Hanyang University Hospital for Rheumatic Diseases, Seoul, South Korea; 3Bok-Ghee Han, PhD: Korea National Institute of Health, Seoul, South Korea; 4Hyoung Do Shin, DVM, PhD: Sogang University, Seoul, South Korea; 5Young Mo Kang, MD, PhD: Kyungpook National University School of Medicine, Daegu, South Korea; 6SeungCheol Shim, MD, PhD: Eulji University Hospital, Daejeon, South Korea. Address correspondence to Peter K. Gregersen, MD, Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, 350 Community Drive, Manhasset, NY 11030 (e-mail: email@example.com); or to Sang-Cheol Bae, MD, PhD, MPH, Hanyang University Hospital for Rheumatic Diseases, Seoul 133-792, South Korea (e-mail: firstname.lastname@example.org). Submitted for publication July 22, 2010; accepted in revised form December 30, 2010. A large and growing list of genetic associations with rheumatoid arthritis (RA) has emerged from genome-wide association studies (GWAS) performed in the last few years (1–6). The lists of putative risk genes have pointed to both the adaptive and innate immune systems as potential sources of biologic variation that predispose to disease, with surface and intracellular signaling molecules as well as cytokines making a major contribution. The first 2 confirmed non–major histocompatibility complex (non-MHC) associations involved the PADI4 locus in Asian populations (7) and PTPN22 in Europeans (8). Intriguingly, neither of these associations crosses over these 2 ethnic groups. The associations with 884 GWAS OF RA IN KOREANS PADI4 are extremely weak or absent in most European studies (4). Conversely, the PTPN22 risk allele, a causative amino acid change from arginine to tryptophan at codon 620 (R620W), is simply not found in Asian populations. PTPN22 encodes an intracellular phosphatase that plays a critical role in setting thresholds for receptor signaling in both T cells and B cells. Extensive resequencing of PTPN22 in Asian RA populations has failed to find evidence of any additional risk variants in this population (9). The PADI4 locus encodes a peptidyl deaminase that is directly involved in the citrullination of proteins, thereby generating a major autoantigen that is the target of a humoral response that is quite specific to RA in all major ethnic groups; nevertheless, associations at this locus are largely limited to Asians. In contrast, other genetic associations appear to be common across Asian and European RA patients; among them are associations at the HLA–DRB1 locus (10) and STAT4 (11), although the specific HLA alleles involved differ somewhat among these and other ethnic groups. In order to explore more comprehensively the genetic differences and overlap between European and Asian RA, we undertook a GWAS of RA in the Korean population, with further replication of the most strongly associated markers. Our data revealed a complex picture of both shared and population-specific genetic risk, as well as evidence for a large background of modest risk that may be common to both populations. PATIENTS AND METHODS Population sample. RA patients analyzed for the GWAS (n ⫽ 801) were taken from a panel of 1,128 Korean RA patients who were consecutively enrolled at the outpatient clinic of Hanyang University Hospital for Rheumatic Diseases in Seoul, as described previously (11). A total of 757 controls were likewise taken from a panel of 1,022 ethnically matched controls recruited at the same location. All patients analyzed in the GWAS were seropositive for either anti–cyclic citrullinated peptide (anti-CCP) antibodies (89.6%) or rheumatoid factor (95.8%). All RA patients were of Korean nationality and met the American College of Rheumatology 1987 classification criteria for RA (12). Written informed consent was obtained from all study participants. RA cases for the replication stage were recruited from 3 centers in South Korea: Kyungpook National University School of Medicine, Eulji University Hospital, and Hanyang University Hospital for Rheumatic Diseases. All replication cases were positive for anti-CCP antibodies and for rheumatoid factor. Controls for the replication study were obtained from the DNA BioBank of the Korea National 885 Institute of Health. Clinical and demographic characteristics of the RA cases and controls for the GWAS and the replication study are detailed in Supplementary Table 1 (available on the Arthritis & Rheumatism web site at http:// onlinelibrary.wiley.com/journal/10.1002/(ISSN)1529-0131 and on the author’s web site at http://www.biorep.org/ supplementary/freudenberg2010/index.html). The study was approved by the Institutional Review Board of Hanyang University Hospital. Genotyping. Genotyping for the GWAS stage was carried out at the Feinstein Institute for Medical Research, using Illumina HapMap 550v3 or 660W genotyping platforms. Data were imported into GenomeStudio software for initial review and quality control. SNP markers common to these 2 platforms were combined and subjected to further quality control analysis, as described below, leaving a set of 441,398 available for analysis. For the replication stage, genotyping was performed at Hanyang University Hospital for Rheumatic Diseases at a multiplex level using the Illumina Golden Gate genotyping system. Replication SNPs were required to show a genotype quality score of 0.25, a minimum call rate of 98%, no duplicate errors, and a Hardy-Weinberg disequilibrium test result greater than P ⬎ 0.01. Statistical analysis. Data were analyzed using the program packages Plink (13), Haploview (14), and EigenStrat (15) and the statistical software R. GWAS genotype data were subjected to quality control filtering based on SNP genotype call rates (⬎90% completeness), minor allele frequency (⬎1%), and Hardy-Weinberg equilibrium (P ⬎ 10–06). Subjects with more than 10% missing genotype data and outlier samples (deviating ⬎6 SEM on any of the major 10 principal components) were excluded. In addition, we excluded samples showing evidence of relatedness to another sample or possible DNA contamination (Plink PI_HAT ⬎0.05). Finally, SNPs with differential missingness with respect to the presence or absence of RA or with respect to haplotypes formed with neighboring SNP alleles were excluded (Plink tests of missing by phenotype or by genotype P ⬍ 10–06). The remaining SNPs had a nonmissing data rate of 99.8%. Power calculations were performed with a genetic power calculator (16). False discovery rates for markers in the replication stage were estimated by the program Q-value (17). To formally evaluate the overlap of Korean and European RA loci, we used a method that we recently proposed for the category-based analysis of GWAS data; it is described in more detail elsewhere (18). This method builds on a partitioning of SNPs into separate genetic loci as provided by linkage disequilibrium blocks from the HapMap database (19) in order to minimize redundant association signals. In the present study, we defined candidate loci based on RA association in a European meta-analysis (4). Then, we calculated the odds ratios (ORs) for these candidate loci to harbor at least 1 SNP association in the Korean GWAS data. Thus, the odds that a European RA locus would harbor an associated SNP was divided by the odds that any other locus would harbor an associated SNP. This OR statistic was normalized using permutation of the affected/unaffected status. The resulting normalized enrichment score necessarily depends on the threshold for which SNPs are called “associated,” but it does not depend Figure 1. Manhattan plot of allele association tests of all single-nucleotide polymorphisms that passed stringent quality control in 801 rheumatoid arthritis (RA) cases and 757 controls. Genome-wide significance was attained in the major histocompatibility complex region on chromosome 6 and at the PADI4 gene on chromosome 1. GWAS ⫽ genome-wide association study. 886 FREUDENBERG ET AL GWAS OF RA IN KOREANS 887 on factors such as locus size, SNP density, or linkage disequilibrium (18). RESULTS Findings of the GWAS of RA in Koreans. SNPs were genotyped on the Illumina 550K genotyping platform. After stringent quality control, a total of 441,398 SNPs with a minor allele frequency ⬎1% were available for comparison in 801 RA cases and 757 controls. Principal components analysis did not reveal any population stratification or population outliers (Supplementary Figure S1; available online at http:// www.biorep.org/supplementar y/freudenberg2010/ index.html). Accordingly, association analyses of SNPs with RA showed an estimated chi-square inflation factor 1,000 of 1.04, indicating little genome-wide stratification between cases and controls. As expected, the most significant differences between cases and controls were found in the MHC region near the HLA–DRB1 gene, as shown in Figure 1. The 2 most significant SNPs in the MHC were located near the DRB1 locus: rs7765379 (P ⫽ 4.9 ⫻ 10–23, OR 2.51) and rs13192471 (P ⫽ 1.1 ⫻ 10–20, OR 2.1). The latter SNP was also the most significant marker in a recent GWAS for RA in the Japanese population (6). Both these SNPs also showed strong associations with the same alleles in European RA patients (4). In addition, 215 markers in the MHC regions showed associations for the threshold P ⬍ 10–03 (Supplementary Figure S2; available online at http://www.biorep.org/ supplementary/freudenberg2010/index.html). Further analyses with denser marker maps will be required to tease apart this broad MHC signal and to determine whether additional signals that are independent of HLA–DRB1 are located in this region. After exclusion of SNPs from the MHC region (chromosome 6:26–35 Mb), case–control differences in remaining markers still showed a deviation from random expectation, as shown in Figure 2A. Because this deviation was most prominent for markers with smaller P values (e.g., P ⬍ 10–03), we consider it unlikely that this finding is the result of technical artifacts or stratification. Moreover, we performed a stringent quality control analysis to minimize this possibility, as detailed above. It is thus likely that additional true-positive associations exist outside the MHC region. The most significant SNPs (outside the MHC region) that we identified in the Korean population are shown in Table 1. The full list of such SNPs for the threshold P ⬍ 0.01 is given in Supplementary Table 2 (available on the Arthritis & Rheumatism web site at Figure 2. Quantile–quantile plot of the chi-square test statistic from the single-nucleotide polymorphism (SNP) allele association tests. A, After excluding the major histocompatibility complex region, a clear deviation from the expectation (straight line) indicates the presence of true-positive association signals. B, When the analysis was further restricted to the 6,726 SNPs with a significance of P ⬍ 1.0 ⫻ 10–02 obtained in a recent meta-analysis of rheumatoid arthritis in European population samples and were also genotyped in our genome-wide association study, the deviation from the expectation became more prominent. http://onlinelibrary.wiley.com/journal/10.1002/(ISSN) 1529-0131 and on the authors’ web site at http:// www.biorep.org/supplementary/freudenberg2010/ 888 FREUDENBERG ET AL Table 1. Loci found at the GWAS stage to be most strongly associated with RA in the Korean population, based on P values from allelic SNP association tests* MAF, % SNP rs2240335 rs17769245 rs2944021 rs2290652 rs1025065 rs7834685 rs1216363 rs4823569 rs7561798 rs9636786 rs11236774 rs791195 rs2062583 rs12579024 rs4583322 rs10466245 rs6962404 rs1077773 rs6702348 rs2159214 rs1474581 rs4368165 rs942880 rs10421853 rs6590343 rs4547623 rs879036 rs12831974 rs6679652 rs9916862 rs1415654 rs6815902 rs17617822 rs1265883 rs289744 rs2303025 rs4867947 rs17328497 rs12542184 rs218311 rs1126133 rs1541596 Chromosome 1 16 19 19 16 8 4 22 2 21 11 6 3 12 18 10 7 7 1 19 20 16 13 19 11 22 6 12 1 17 9 4 18 1 16 5 5 15 8 2 14 19 Cases 49.69 27.32 28.29 25.03 21 30.48 8.68 34.36 36.14 11.19 3 22.13 6.12 13.63 27.03 3.38 9.18 46.67 27.81 20.56 24.78 47 13.16 24.03 13.92 15.92 2.18 48.38 31.84 33.88 1.37 8.86 16.92 10.31 29.49 39.64 25.88 27.59 8.24 28.66 40.82 23.53 Controls 39.65 19.18 20.19 17.51 28.34 38.34 4.56 42.38 28.64 16.71 6.34 15.92 10.44 8.71 34.11 6.69 5.24 39.21 21.35 14.84 31.42 45.63 8.58 30.52 19.36 21.58 4.82 41.2 25.36 27.24 3.57 5.22 22.63 6.38 36.15 46.62 32.31 21.53 4.76 35.2 34.04 17.84 P ⫺08 2.00 1.30⫺07 1.80⫺07 4.60⫺07 2.01⫺06 3.86⫺06 4.14⫺06 4.44⫺06 7.86⫺06 8.62⫺06 9.43⫺06 1.07⫺05 1.15⫺05 1.57⫺05 1.81⫺05 2.36⫺05 2.37⫺05 2.94⫺05 2.99⫺05 3.39⫺05 3.78⫺05 4.02⫺05 4.42⫺05 4.80⫺05 5.11⫺05 5.13⫺05 5.77⫺05 5.98⫺05 6.55⫺05 6.64⫺05 7.10⫺05 7.41⫺05 7.48⫺05 8.06⫺05 8.32⫺05 8.39⫺05 8.39⫺05 8.78⫺05 8.85⫺05 8.98⫺05 9.37⫺05 9.43⫺05 OR Nearest gene(s) 1.50 1.58 1.56 1.57 0.67 0.70 1.99 1.41 1.41 1.59 0.46 1.50 0.56 0.61 1.40 0.49 1.83 0.74 1.42 0.67 1.39 1.34 1.61 1.39 0.67 0.69 2.27 1.34 0.73 1.37 0.38 0.57 0.70 1.69 0.74 0.75 0.73 1.39 1.80 1.35 0.75 1.42 PADI4 SYCE1L/MON1B CCDC123 ZNF302 MPHOSPH6/CDH13 CNBD1/CNGB3 PHF17 GRAMD4 SPHKAP ADAMTS1 C11orf30 SLC22A1 ARHGEF3 TBX3 KIAA0427 MARCH8 COL28A1/C1GALT1 AHR GPR137B NACC1/TRMT1 PLCB1 GP2/GPR139 SPATA13 TSHZ3 FLI1/ETS1 GGA1/LGALS2 ETV7 TRHDE RGS7 ABR PPP3R2/GRIN3A PCDH7/STIM2 METTL4 SLAMF6 CETP ANXA6 LCP2/C5orf58 SEMA6D CSMD1 TLK1 PRKCH CARM1 * For each locus the most significant marker (P ⬍ 0.0001), allele frequencies, and nearest gene are shown. GWAS ⫽ genome-wide association study; RA ⫽ rheumatoid arthritis; SNP ⫽ single-nucleotide polymorphism; MAF ⫽ minor allele frequency; OR ⫽ odds ratio. index.html). Interestingly, none of the loci that have been associated with RA in European populations (4) are contained in the list of top associations in the Korean RA population. However, a number of the established Caucasian risk loci did show evidence of association at lower levels of significance (Table 2). These loci include STAT4, as previously reported, as well as AFF3, TNFAIP3, CCR6, BLK, and TRAF1. Furthermore, PTPN2, which has been established as a risk factor for type 1 diabetes mellitus (20), showed some evidence of association with RA in the Korean population. We also looked at RA loci that were previously established in the Japanese RA population. We did not see any associations with the FCRL3 or CD244 gene. At the PADI gene cluster, we found the strongest signal at PADI4, as expected (Table 1 and Supplementary Table 2; available online at http://www.biorep.org/ supplementary/freudenberg2010/index.html). Interest- GWAS OF RA IN KOREANS 889 Table 2. Markers previously implicated in RA susceptibility in European populations that were found at the GWAS stage to be associated at a level of P ⬍ 0.005 in the Korean population* MAF, % SNP Chromosome Cases Controls P OR Gene locus OR from European meta-analysis rs10168266 rs2009094 rs12055552 rs204295 rs6984212 rs1953126 rs657555 2 2 6 6 8 9 18 35.27 48.19 22.69 51.37 26.47 35.33 37.77 30.25 42.59 18.01 44.97 31.94 29.54 31.62 2.87⫺03 1.72⫺03 1.23⫺03 3.54⫺04 7.85⫺04 6.03⫺04 3.31⫺04 1.26 1.25 1.34 1.29 0.77 1.30 1.31 STAT4 AFF3 TNFAIP3 CCR6 BLK TRAF1 PTPN2 1.16 0.91 1.03 0.94 0.93 1.1 1.14 * Odds ratios (ORs) for the European population were obtained from the study reported by Stahl et al (4) and refer to the same alleles as those in the Korean population. RA ⫽ rheumatoid arthritis; GWAS ⫽ genome-wide association study; SNP ⫽ single-nucleotide polymorphism; MAF ⫽ minor allele frequency. ingly, we also found a second association peak at the neighboring PADI2 gene that did not show any linkage disequilibrium with the associated SNPs in PADI4 (Supplementary Figure S3; available online at http:// www.biorep.org/supplementary/freudenberg2010/ index.html). Although the statistical significance of associated markers at PADI2 (P ⫽ 2 ⫻ 10–03, OR 1.25 for rs2075696) was much weaker than that at PADI4 (P ⫽ 2 ⫻ 10–08, OR 1.5 for rs2240335), it may be interesting to point out that PADI2 and PADI4 are the only 2 PADI genes that are highly expressed in hematopoietic cells (21). To quantify more precisely the amount of true signal in our data, we next partitioned SNP markers based on the linkage disequilibrium blocks from the HapMap phase II database (19). We then compared the observed number of linkage disequilibrium blocks with associated SNPs to their expectation, as obtained from permutation of the affection status (Supplementary Figure S4; available online at http://www.biorep.org/ supplementary/freudenberg2010/index.html). This analysis showed an excess of 14 blocks (of 46) with at least 1 associated SNP, when calling SNPs associated at a level of P ⬍ 10–04. This excess increased to 46 linkage disequilibrium blocks (of 316) for the threshold value of P ⬍ 0.001. For even less stringent thresholds (P ⬍ 0.01), 200 associated linkage disequilibrium blocks were observed above the expected number (data not shown). This indicates that additional true association signals exist at more modest levels of statistical significance in our dataset. Findings of the replication study. To gain further insight into the loci that cause the excess of RA association signals at the GWAS stage, we picked 96 SNPs for genotyping in an independent sample of 718 RA cases and 719 controls. These SNPs were primarily chosen from the set of 42 loci that harbor at least 1 SNP with significance at P ⬍ 10–04 (Table 1). Based on an OR of 1.3, a risk allele frequency of 10%, and a disease prevalence of 1%, we estimated that this replication sample provided a statistical power of 67% to attain significance of P ⬍ 0.05 for a true-positive SNP from the GWAS stage. Thus, based on the above estimate of 14 truly associated non-MHC loci among the 42 loci with P ⬍ 10–04, one may expect that around 9 of these loci would fall below P ⬍ 0.05 in our replication sample. Because we did not attempt replication for PADI4 and were unable to attain high-quality genotypes for all SNPs chosen for replication, rather fewer than 9 loci might be expected to fall under the threshold P ⬍ 0.05 in the replication stage. Furthermore, we complemented the replication stage analysis with SNPs that had attained association at P ⬍ 5 ⫻ 10–3 in the Korean GWAS and were found at loci with prior evidence of association with autoimmune disease in Europeans (Table 2). Six of these particular SNPs had shown at least a weak association (P ⬍ 5 ⫻ 10–2) with RA in Europeans, and 4 of these had the same direction of association as seen in this study of Koreans. Using the same disease parameters as above and assuming that the attempted replications are true-positive associations, we would expect a successful replication for 4 of these loci at a threshold of P ⬍ 0.05. High-quality replication genotypes could be obtained for 79 SNPs covering 46 different loci. The most significant replication signal was found at the BLK locus (P ⫽ 7 ⫻ 10–04, OR 0.77 for rs1600249). In total, nominally significant (P ⬍ 0.05) replication signals were found for 11 different loci, including 2 markers at the BLK locus. Among these, the directions of the association were consistent with the GWAS findings for 10 loci, being inconsistent only for rs10421853 at the TSHZ3 890 FREUDENBERG ET AL Table 3. Markers most strongly associated with RA in the Korean population, based on the results from the replication stage* SNP rs1600249 rs2736340 rs2009094 rs12831974 rs7024727 rs657555 rs2062583 rs7537965 rs4867947 rs4547623 rs4936059 Nearby gene(s) BLK BLK AFF3 TRHDE CCL21 PTPN2 ARHGEF3 GPR137B LCP2/C5orf58 GGA1/ LGALS2 FLI1/ETS1 MAF from the replication study, % MAF from the GWAS,% Cases Controls P ⫺03 OR Cases Controls P ⫺04 OR Combined MAFs, % Cases Controls P ⫺06 OR (95% CI) 26.88 22.72 48.19 48.38 1.31 37.77 6.12 26.03 25.88 15.92 31.85 27.25 42.59 41.2 2.77 31.62 10.44 20.5 32.31 21.58 2.29 3.51⫺03 1.72⫺03 5.98⫺05 3.73⫺03 3.31⫺04 1.15⫺05 2.68⫺04 8.39⫺05 5.13⫺05 0.79 1.27 1.25 1.34 0.47 1.31 0.56 0.73 0.73 0.69 27.44 24.41 47.56 46.3 1.81 35.94 6.55 25.21 27.48 18.19 33.24 29.76 42.24 41.93 3.13 31.92 8.79 21.91 31.02 21.16 7.15 1.24⫺03 4.20⫺03 1.83⫺02 2.28⫺02 2.28⫺02 2.41⫺02 3.69⫺02 3.71⫺02 4.78⫺02 0.76 1.31 1.24 1.19 0.57 1.20 0.73 0.83 0.84 0.83 27.14 23.52 47.89 47.4 1.55 36.91 6.32 25.64 26.64 16.99 32.53 28.47 42.42 41.56 2.95 31.77 9.63 21.19 31.68 21.38 5.18 1.22⫺05 2.14⫺05 5.69⫺06 2.49⫺04 2.93⫺05 2.16⫺06 4.73⫺05 1.88⫺05 1.75⫺05 0.77 (0.69–0.86) 1.29 (1.15–1.45) 1.25 (1.13–1.38) 1.27 (1.14–1.4) 0.52 (0.36–0.74) 1.26 (1.13–1.4) 0.63 (0.52–0.77) 0.78 (0.69–0.88) 0.78 (0.7–0.88) 0.75 (0.66–0.86) 33.65 40.35 1.10⫺04 0.75 34.89 38.42 4.94⫺02 0.86 34.23 39.41 3.38⫺05 0.80 (0.72–0.89) * Shown are the results from the initial genome-wide association study (GWAS) stage, the replication stage, and the combined analysis. RA ⫽ rheumatoid arthritis; SNP ⫽ single-nucleotide polymorphism; MAF ⫽ minor allele frequency; OR ⫽ odds ratio; 95% CI ⫽ 95% confidence interval. locus. Accordingly, the respective 10 loci showed a stronger signal in the combined analysis (Table 3). However, as mentioned above, none of these associations reached genome-wide significance in the combined analysis of the GWAS and replication data. Although P values obtained at the replication stage were individually rather weak, their overall distribution showed a clear skew toward smaller values (Supplementary Figure S5; available online at http://www.biorep.org/ supplementary/freudenberg2010/index.html). Based on the skew of this distribution, we estimate a false discovery rate of ⬃25% for the significance threshold of P ⬍ 0.05 (17). Thus, one may expect that about 8 of the 11 gene loci with SNP associations of P ⬍ 0.05 constitute genuine RA associations. This estimate of 8 truepositive loci is only slightly below the expectation, as derived above from the power analysis for this threshold. However, our analysis of the replication data tended to be conservative, in the sense that we performed a 2-sided test and the test for significance did not consider the specific alleles that were found to be associated at the GWAS stage. From loci showing the strongest association signals at the GWAS stage (Table 1), the most promising replication signals were obtained for ARHGEF3, LCP2, GPR137B, TRHDE, and GGA1 (Table 3). Among the European immune loci studied (Table 2), replication signals in addition to BLK were also found at AFF3, CCL21, and PTPN2 (Table 3). Systematic analysis of candidate loci implicated by GWAS in Europeans. Clearly, the above findings of 3 European RA loci (BLK, AFF3, and CCL21) and 1 type 1 diabetes mellitus locus (PTPN2) among the 10 loci with positive replication signals indicate a certain over- lap between European and Asian RA loci. Therefore, we wanted to formally analyze the overlap of associated loci in our Korean population with RA loci previously reported in Europeans. To this end, we used the list of loci that had shown an association with RA in a recent meta-analysis (4). In total, we retrieved 6,726 non-MHC SNPs with associations of P ⬍ 10–02 from this metaanalysis of European GWAS for RA that were also genotyped in our study. The association signals of these SNPs displayed a clear deviation from the expected (Figure 2B). We next investigated this overlap using a computational framework that we had recently proposed for category-based analysis of GWAS data (18). In short, this method takes a set of candidate loci as input and scores the enrichment of association signals at these loci in comparison to the remaining genome. To this end, the method calculates a normalized enrichment score that quantifies the excess of association signals at the loci from a candidate category. We excluded the MHC region and defined candidate loci based on the presence of associated SNPs in Europeans, varying the threshold for SNPs to be designated as being associated. We further varied the threshold for calling associated SNPs in the Korean GWAS dataset. This showed a significant enrichment of European RA loci among Korean RA loci when defining European RA loci based on SNPs with association values of P ⬍ 10–05 in the earlier meta-analysis and when calling Korean RA loci based on SNPs with association values of P ⬍ 10–02 in the GWAS dataset (Figure 3). Thus, although we did not find any established European RA loci among the top hits of our Korean GWAS (Table 1), this analysis showed them to GWAS OF RA IN KOREANS 891 maximum of 2.5% of the RA risk in Koreans was explained by risk scores when the set of SNPs was restricted to those with P values smaller than 10-02 in the European meta-analysis (Supplementary Figure S6; available online at http://www.biorep.org/ supplementary/freudenberg2010/index.html). Although the explanatory power of this risk score variable was rather small, its inclusion in the regression model was highly significant (P ⫽ 1.1 ⫻ 10–08). Thus, alleles that are associated with RA risk in Europeans also show an overall stronger-than-expected association with RA in Koreans. DISCUSSION Figure 3. Surface plot of the normalized enrichment score for rheumatoid arthritis (RA)–associated loci in the European population among loci with single-nucleotide polymorphism (SNP) associations in data from the Korean genome-wide association study (GWAS). Candidate loci were defined based on the presence of SNPs with variable evidence for RA association in Europeans (from P ⬍ 10–01 to P ⬍ 10–06). These loci were then tested for enrichment of RA association signals in the Korean GWAS, where the threshold for calling SNP associations was also varied (from P ⬍ 10–01 to P ⬍ 10–04). The colors designate the magnitude of the enrichment of candidate loci among associated loci for the respective threshold parameters, with red representing a high score and blue representing a low score. For the range of threshold parameters with the greatest enrichment scores, the enrichment actually observed for European RA loci was greater than in any of 1,000 permutations of the affected/unaffected status. be clearly enriched among loci with weaker association signals. In a final step, we conducted a genetic risk score analysis to evaluate whether the observed overlap of RA risk loci between Europeans and Koreans extended to an overlap of RA risk alleles. As proposed by the International Schizophrenia Consortium (22) and implemented in the program package Plink (13), we calculated a disease risk score for each subject from the number of present risk alleles. Risk alleles were weighted by the logOR of the allele, as estimated by the European RA meta-analysis. We again excluded SNPs from the MHC region and successively restricted the set of SNPs based on their maximum P value in the meta-analysis. For each set of SNPs, we performed a logistic regression analysis of RA affected/unaffected status on risk score. We then calculated Nagelkerke’s R2 as the fraction of variance explained by the risk score in the regression model. This showed that a We performed a GWAS and replication study in the Korean RA population and compared the results to the accumulating evidence for multiple genetic susceptibility loci in Europeans. Overall, the data demonstrated a complex picture, with both shared and population-specific disease susceptibility. Because our GWAS was of modest sample size, statistical power was limited in the discovery phase. The expected presence of associations in the HLA–DRB1 and PADI4 regions demonstrated that our case–control sample was informative with regard to RA-associated loci. Accordingly, one could expect the presence of true signal below the formal genome-wide significance threshold. This notion was supported by our estimate of an excess of 14 true-positive associations among the 42 associated loci with SNPs having a significance of P ⬍ 10–04. The respective list of putative RA loci was further narrowed down by the results from the replication stage, where we found strong skew toward smaller P values. We estimated that our study had 50% power to detect loci with a risk allele frequency of 40% and an OR 1.5 for the P value threshold of P ⬍ 5 ⫻ 10–08 and 95% power to detect such loci for a threshold of P ⬍ 10–04. Thus, it appears unlikely that many such loci exist beyond HLA–DRB1 and PADI4. In contrast, our study had less than 1% power to detect risk alleles with an allele frequency of 10% and an OR of 1.3 for the significance threshold of P ⬍ 5 ⫻ 10–08, 5% power to detect such loci for a threshold of P ⬍ 1 ⫻ 10–04, and 60% power to detect such loci for a threshold of P ⬍ 5 ⫻ 10–02. Notably, it is exactly this threshold range for which the study was most powerful, where we see the strongest overlap with RA loci identified in a much larger metaanalysis of European RA loci (Figure 3). This formally confirms the impression obtained from the presence of 892 several weaker associations signals in Koreans that were found for European RA loci (Table 2). It is therefore likely that the extent of overlapping risk factors between the 2 populations is greater than that suggested by the list of the very top associations from the GWAS stage (Table 1). However, our present study mainly examined the overlap between loci. It will be interesting in future studies to perform a more detailed analysis of whether the same or different susceptibility mutations underlie these loci that are shared across populations. Because a role of mutations for autoimmune susceptibility in Europeans has already been established for BLK, AFF3, and CCL21, the associations of these genes with RA in Koreans are the most likely to be true positives. Conversely, it was also interesting to examine which of the Korean RA loci show subthreshold associations in Europeans, since this would, in turn, increase the confidence in the association findings we obtained in the Korean sample. Therefore, we looked up the results for the associated markers at PTPN2, FLI1, ARHGEF3, LCP2, GPR137B, TRHDE, and GGA1 in a recent metaanalysis of RA (4). This showed fairly strong evidence for PTPN2 (P ⫽ 7.4 ⫻ 10–05 for rs657555) and weaker evidence for FLI1 (P ⫽ 0.003 for rs4936059) in this large European RA meta-analysis. FLI1 has been implicated in the risk of murine lupus due to regulatory polymorphisms acting in T cells (23), and it shares similar regulatory regions with humans (24). Interestingly, markers from the neighboring ETS1 gene were recently associated with systemic lupus erythematosus (SLE) in Chinese (25). These associations of ETS1 with SLE are only 130 kb away from the association of FLI1 with RA we observed in the present study. Because linkage disequilibrium between markers in FLI1 and ETS1 is weak (Supplementary Figure S7; available online at http://www.biorep.org/supplementary/ freudenberg2010/index.html), we would consider these to represent independent signals for RA and SLE susceptibility in Asians. Indeed, none of the ETS1 markers associated with SLE in Chinese showed any association with RA in our study of Koreans. Another gene with a possible role for RA in both European and Asian populations is CCR6 (4,6). Our GWAS data supported an association of the SNP rs3093024 with RA in Koreans (P ⫽ 0.004, OR 1.23). However, we also saw differences in the allele frequencies; the A allele attained a frequency of 45.9% in RA cases and 40.8% in controls in our study, whereas it attained a frequency of 52% in RA cases and 46% in controls in the Japanese population (6). Thus, rs3093024 FREUDENBERG ET AL seems to be a SNP with a fairly large allele frequency difference between the Japanese and Korean populations. Interestingly, this SNP was reported to be in strong linkage disequilibrium with a presumably functional insertion/deletion polymorphism in Japanese (6). Among the remaining candidate genes shown in Table 3, LCP2 is of particular interest, since it encodes SLP-76, a critical adaptor protein for receptor signaling in T cells and several other hematopoietic cells types (26). The associated SNP, rs4867947, is located ⬃50 kb downstream of LCP2, and therefore, much work remains before the functionally relevant locus in this region is definitively identified. In summary, we have presented support for associations with 10 different novel putative RA genes in the Korean population. Despite the fact that none of these new associations reaches generally accepted levels of genome-wide significance, we estimate that a large proportion of these associations are likely to be true positives. We further showed that the overlap between non-MHC loci that are associated with RA is significantly larger than expected by chance and, thus, at least a subset of RA loci are shared between European and Asian populations. We therefore believe that the list of associations provided herein are likely to be helpful for further fine-mapping studies and future meta-analyses of RA in Asians as well as across populations. AUTHOR CONTRIBUTIONS All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Drs. Gregersen and Bae had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study conception and design. Freudenberg, H.-S. Lee, A. T. Lee, Gregersen, Bae. Acquisition of data. H.-S. Lee, Han, Shin, Kang, Sung, Shim, Choi, A. T. Lee, Gregersen, Bae. Analysis and interpretation of data. Freudenberg, H.-S. Lee, Shin, A. T. Lee, Gregersen, Bae. REFERENCES 1. Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, et al. TRAF1-C5 as a risk locus for rheumatoid arthritis—a genomewide study. N Engl J Med 2007;357:1199–209. 2. Remmers EF, Plenge RM, Lee AT, Graham RR, Hom G, Behrens TW, et al. STAT4 and the risk of rheumatoid arthritis and systemic lupus erythematosus. N Engl J Med 2007;357:977–86. 3. Gregersen PK, Amos CI, Lee AT, Lu Y, Remmers EF, Kastner DL, et al. REL, encoding a member of the NF-B family of transcription factors, is a newly defined risk locus for rheumatoid arthritis. Nat Genet 2009;41:820–3. 4. Stahl EA, Raychaudhuri S, Remmers EF, Xie G, Eyre S, Thomson BP, et al. Genome-wide association study meta-analysis identifies GWAS OF RA IN KOREANS 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. seven new rheumatoid arthritis risk loci. Nat Genet 2010;42: 508–14. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007;447:661–78. Kochi Y, Okada Y, Suzuki A, Ikari K, Terao C, Takahashi A, et al. A regulatory variant in CCR6 is associated with rheumatoid arthritis susceptibility. Nat Genet 2010;42:515–9. Suzuki A, Yamada R, Chang X, Tokuhiro S, Sawada T, Suzuki M, et al. Functional haplotypes of PADI4, encoding citrullinating enzyme peptidylarginine deiminase 4, are associated with rheumatoid arthritis. Nat Genet 2003;34:395–402. Begovich AB, Carlton VE, Honigberg LA, Schrodi SJ, Chokkalingam AP, Alexander HC, et al. A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis. Am J Hum Genet 2004;75:330–7. Lee HS, Korman BD, Le JM, Kastner DL, Remmers EF, Gregersen PK, et al. Genetic risk factors for rheumatoid arthritis differ in Caucasian and Korean populations. Arthritis Rheum 2009;60:364–71. Lee HS, Lee KW, Song GG, Kim HA, Kim SY, Bae SC. Increased susceptibility to rheumatoid arthritis in Koreans heterozygous for HLA–DRB1*0405 and *0901. Arthritis Rheum 2004;50:3468–75. Lee HS, Remmers EF, Le JM, Kastner DL, Bae SC, Gregersen PK. Association of STAT4 with rheumatoid arthritis in the Korean population. Mol Med 2007;13:455–60. Arnett FC, Edworthy SM, Bloch DA, McShane DJ, Fries JF, Cooper NS, et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315–24. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81:559–75. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005;21: 263–5. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, 893 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904–9. Purcell S, Cherny SS, Sham PC. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 2003;19:149–50. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 2003;100:9440–5. Freudenberg J, Lee AT, Siminovitch KA, Amos CI, Ballard D, Li W, et al. Locus category based analysis of a large genome-wide association study of rheumatoid arthritis. Hum Mol Genet 2010; 19:3863–72. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 2007;449:851–61. Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, Erlich HA, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat Genet 2009;41:703–7. Vossenaar ER, Zendman AJ, van Venrooij WJ, Pruijn GJ. PAD, a growing family of citrullinating enzymes: genes, features and involvement in disease. Bioessays 2003;25:1106–18. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 2009;460: 748–52. Nowling TK, Fulton JD, Chike-Harris K, Gilkeson GS. Ets factors and a newly identified polymorphism regulate Fli1 promoter activity in lymphocytes. Mol Immunol 2008;45:1–12. Svenson JL, Chike-Harris K, Amria MY, Nowling TK. The mouse and human Fli1 genes are similarly regulated by Ets factors in T cells. Genes Immun 2010;11:161–72. Yang W, Shen N, Ye DQ, Liu Q, Zhang Y, Qian XX, et al. Genome-wide association study in Asian populations identifies variants in ETS1 and WDFY4 associated with systemic lupus erythematosus. PLoS Genet 2010;6:e1000841. Koretzky GA, Abtahian F, Silverman MA. SLP76 and SLP65: complex regulation of signalling in lymphocytes and beyond. Nat Rev Immunol 2006;6:67–78.