American Journal of Medical Genetics Part B (Neuropsychiatric Genetics) 147B:964 –972 (2008) Review Article Calibration of Credibility of Agnostic Genome-Wide Associations John P.A. Ioannidis1,2,3* 1 Clinical and Molecular Epidemiology Unit, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece 2 Biomedical Research Institute, Foundation for Research and Technology-Hellas, Ioannina, Greece 3 Department of Medicine, Tufts University School of Medicine, Boston, Massachusetts Genome-wide testing platforms are increasingly used to promote ‘‘agnostic’’ approaches to the discovery of gene variants associated with the risk of many common diseases and quantitative traits. The early track record of genome-wide association (GWA) studies suggests that some proposed associations are replicated quite consistently with large-scale subsequent evidence from multiple studies, others have a more inconsistent replication record, some have failed to be replicated by independent investigators and many more early proposed associations await further replication. An important question is how to calibrate the credibility of these postulated associations. A simple Bayesian method is applied here to achieve such calibration. The variability of the estimated credibility is examined under different assumptions. Empirical examples are drawn from existing GWA studies. It is demonstrated that the credibility of different proposed associations can cover a very wide range. The credibility of specific associations usually remains relatively robust when different plausible assumptions are made (within a reasonable range) for the prior odds of an association being true, or the magnitude of the anticipated effect size for genetic associations. Heterogeneity and bias assumptions can have a more major impact on the credibility estimates and thus they need very careful consideration in each case. Credibility calibration may be used in conjunction with qualitative criteria for the appraisal of the cumulative evidence that take into consideration the amount, consistency, and protection from bias in the data. ß 2008 Wiley-Liss, Inc. KEY WORDS: genome; association; Bayes Please cite this article as follows: Ioannidis JPA. 2008. Calibration of Credibility of Agnostic Genome-Wide Associations. Am J Med Genet Part B 147B:964–972. *Correspondence to: John P.A. Ioannidis, Professor and Chairman, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina 45110, Greece. E-mail: email@example.com Received 1 November 2007; Accepted 27 December 2007 DOI 10.1002/ajmg.b.30721 ß 2008 Wiley-Liss, Inc. INTRODUCTION: AGNOSTIC GENOME-WIDE ASSOCIATIONS The advent of platforms that can measure common genetic variation in massive scale has revolutionized the genetic epidemiology of common diseases with complex etiology. Prior approaches for the discovery of common gene variants of importance for human diseases and quantitative traits had met with frustrating hurdles. Candidate gene approaches where genes were selected with some biological or other prior rationale in mind were hindered by our incomplete knowledge of the relevant biology. Candidate gene approaches resulted in a few clear successes, but also a vast amount of non-replicated postulated associations that caused a very confusing picture in the literature [Ioannidis et al., 2001, 2003; Hirschhorn et al., 2002; Cordell and Clayton, 2005]. Linkage studies with genome-wide scans were also largely insensitive for the underlying architecture of genetic effects. In fact, even the scattered successes of prior approaches may come as a surprise when we view them from the angle of complexity that genomewide association (GWA) studies can afford us. GWA technology has evolved rapidly and the ability to measure genetic variability en-mass is likely to continue to improve in the foreseeable future. There are already a large number of excellent technical, design, and statistical reviews on GWA studies for the interested reader [Hirschhorn and Daly, 2005; Marchini et al., 2005; Thomas et al., 2005; Wang et al., 2005; Thomas, 2006]. Both the number of markers and types of variation captured (e.g., copy number variation) [Beckmann et al., 2007; McCarroll and Altshuler, 2007] in routine applications may widen and whole genome sequencing for large numbers of samples should soon become routine. Given the evolution of the technological front, one can be even more enthusiastic about the future. Nevertheless, currently used commercial GWA platforms already allow for 65–75% coverage [Barrett and Cardon, 2006] for most common human genetic variation and with the potential to increase this percentage with imputation of genotypes. Therefore, the technology is probably already quite mature to address the questions of whether and which common genetic variants affect the risk of major human diseases. An important question is how to interpret the results obtained from such studies and their replication efforts, especially when there is no other biological/external evidence to strengthen or weaken the credibility of specific associations that arise out of the mess of massive screening. A common theme of GWA studies is the agnostic approach to gene discovery. The genome is screened without any prior predilection for specific regions, genes, or variants thereof. Instead of biological rationale, the selection of targets for study and further replication is based on purely statistical, and thus ‘‘agnostic’’, criteria. In this setting, the replication and the consistency of the replication across different studies on the same variant and Calibration of Credibility of Agnostic GWAs phenotype assume prime importance [Ioannidis, 2007; NCINHGRI Working Group on Replication in Association Studies et al., 2007; Ioannidis et al., 2008]. This article reviews briefly the current track record of replication and consistency for GWA investigations as of late 2007. To calibrate the credibility of genome-wide-proposed associations, the article adopts a simple method using Bayesian principles and the method is applied illustratively in specific examples. Factors that affect the credibility estimates are also discussed and credibility is juxtaposed against existing consensus criteria for appraising the epidemiological evidence in genetic associations. EARLY REPLICATION RECORD Early results from GWA studies are already available for a few dozens of disease phenotypes and traits as of the writing of this article (October 2007) and the pace of data production is accelerating. Using a previously described classification [Ioannidis, 2007], the early track record of these agnostic approaches includes several major successes of consistent replication, some inconsistent results, some more clear-cut failures of replication, and many tentative or inconclusive associations that still await further strengthening of the evidence and replication from independent teams. Replication For several diseases, GWA approaches have already led to the discovery of common gene variants that confer small effects of susceptibility and that have been replicated also quite consistently across several other independent studies. These include, but are not limited to, Crohn’s disease [Duerr et al., 2006; Hampe et al., 2007; Mathew, 2008; Parkes et al., 2007; Raelson et al., 2007; Rioux et al., 2007; Wellcome Trust Case Control Consortium, 2007], age-related macular degeneration [Klein et al., 2005; Swaroop et al., 2007], type 2 diabetes mellitus [Diabetes Genetics Initiative et al., 2007; Scott et al., 2007; Zeggini et al., 2007], breast cancer [Easton et al., 2007], and myocardial infarction [Helgadottir et al., 2007; McPherson et al., 2007]. The number of markers identified and the cumulative impact of these markers on the disease risk varies a lot for different diseases. Thus, the identified variants probably already explain a substantial proportion of the risk variance for age-related macular degeneration and Crohn’s disease. Conversely, each of 11 identified polymorphisms for susceptibility to type 2 diabetes mellitus explain only 0.4–2% of this variability [Zeggini et al., 2007]. Inconsistency Very promising, but apparently inconsistent results have been seen in some other cases. For example, while polymorphisms in the TRAF1/C5 locus have been implicated as risk factors for rheumatoid arthritis by some large studies (both GWA and candidate-approach) with several replication investigations [Kurreeman et al., 2007; Plenge et al., 2007], not all GWA studies find such an effect [Wellcome Trust Case Control Consortium, 2007], and the results across studies seem heterogeneous, even if the region definitely seems very interesting. Also for type diabetes 2, some of the proposed susceptibility variants [Zeggini et al., 2007] also show very large inconsistency [Ioannidis et al., 2007a], as discussed below. Non-Replication For some proposed risk variants, much larger replication studied have more conclusively failed to verify any effect on the risk of the disease of interest. Examples include the LTA variant for myocardial infarction that emerged from the first published GWA study [Ozaki et al., 2002; Clarke et al., 2006]; 965 the 13 polymorphisms proposed for Parkinson’s disease [Maraganore et al., 2005; Elbaz et al., 2006]; the INSIG2neighboring variant that was proposed to be associated with obesity risk [Herbert et al., 2006; Dina et al., 2007; Loos et al., 2007; Rosskopf et al., 2007]; and even some of the gene variants (such as EXT2-ALX4 variants) proposed for modulation of risk of type 2 diabetes in the first-published GWA on this disease [Sladek et al., 2007], but not replicated with much larger GWA and replication studies [Zeggini et al., 2007]. Early Evidence Finally, for many newly proposed associations, investigators have presented early data clearly stating that the findings are tentative and need further data before any association can be claimed. Examples include the first published GWA data for bipolar disorder [Wellcome Trust Case Control Consortium, 2007], ischemic stroke [Matarin et al., 2007], and Alzheimer’s disease [Coon et al., 2007] where signals were either nondefinitive or already known (e.g., linked polymorphisms in the APOE epsilon 4 polymorphism region for Alzheimer’s disease). This classification of the status of the evidence is unavoidably subject to potential misclassification. For example, one can never exclude completely the possibility that very small effects below the threshold of resolution of epidemiological methods exist for associations in the ‘‘non-replication’’ category. For the inconsistency category, it is likely that more data may clarify whether inconsistency is due to bias or genuine diversity underlying genuine associations. Finally, early evidence evolves continuously as new data are generated. CALIBRATION OF CREDIBILITY: A SIMPLE BAYESIAN METHOD I present here a simple method for calibrating the credibility (the probability that it is true) for a proposed association that has been derived from an agnostic genome-wide screening approach. The method follows standard Bayesian principles and has been previously applied to estimate Bayes factors empirically for a large sample of non-genetic epidemiological associations and associations of gene variants from the candidate gene era [Ioannidis, 2008]. The presented approach is not the only one that can be used. Some main alternatives include the false discovery rate (FDR), the false-positive report probability (FPRP) and other Bayesian extensions thereof; these measures are related among themselves [Wacholder et al., 2004; Wakefield, 2007]. In the Bayesian framework, the credibility of an association depends on the pre-study odds and the Bayes factor conferred by the study data. The pre-study odds Opre reflect what we think about the association before running the study (or, more generically, excluding the evidence conferred by the study data). Then the data of the study correspond to a Bayes factor B that modifies our prior belief. After the study, the credibility of the association is C ¼ Opre/(B(1 þ Opre/B)). Given that for massive testing approaches, Opre is very low, this is practically equal to the probability of any association (among the ones massively tested) being true before any data are collected, C0. An advantage of using log-odds ratio as the metric of association is that it is reasonable to assume normality for this metric, unless small studies are involved. One could consider modeling also other measures of association, for example, the variance explained, but the distribution would typically be more complex. Use of normal likelihoods simplifies the calculations. The prior can be specified for convenience as a ‘‘lump and smear’’ where a lump of likelihood is placed at the null hypothesis (no association) and the remaining is normally distributed under the alternative centered on 0 (to allow for bidirectionality of effects) and with variance var(yA). The 966 Ioannidis observed effect size for a single polymorphism of interest is considered to be an estimate of the true effect y with variance var(y). Thus the observed data are represented by: N½y; varðyÞ and the alternative is represented by: N½0; varðyA Þ From this, it follows that the Bayes factor [Spiegelhalter et al., 2004] is given by: sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ varðyA Þ z2m exp B¼ 1þ varðyÞ 2ð1 þ ðvarðyÞ=varðyA ÞÞÞ where zm is the standardized test statistic for the null hypothesis derived from the data. Let us call yA the expected value of the effect under the alternative hypothesis, if there is an effect in the positive direction (relative risk >1.00), then pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ yA ¼ 2p1 varðyA Þ. If we coin the genetic comparison for all true associations so as to express the relative risks as >1.00, then yA is the average expected effect. Then the ratio of the two variances (alternative and observed) is py2A =2varðyÞ and thus: vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ " # !ﬃ u 2 u py z2m A B ¼ t1 þ exp 2varðyÞ 2ð1 þ ð2varðyÞ=ðpy2A ÞÞÞ that is, the Bayes factor can be estimated from the variance of the observed genetic effect, its frequentist-derived P-value (through the corresponding z-statistic in the normal distribution) and the pre-specified expected value of the effect under the alternative hypothesis, if there is a susceptibility effect. Sensitivity analyses can examine whether conclusions are affected appreciably by different prior assumptions regarding the value of yA. Based on current evidence, the majority of associations revealed through GWA approaches tend to have small effect sizes corresponding to odds ratios of 1.10–1.40 [Ioannidis et al., 2006; Khoury et al., 2006]. Theoretical considerations also suggest that there may be more associations of very small effects, fewer with small effects and very few with larger effects [Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, 2007]. The modeling of the alternative prior fits well to this scheme. We will consider here a range of yA corresponding to log-odds ratio of 0.049, 0.140, 0.262, 0.405, and 0.588, that is, 1.05, 1.15, 1.30, 1.50, and 1.80 in the odds ratio scale. EMPIRICAL DEMONSTRATION: BAYES FACTOR Table I shows estimates of B under these different assumptions for different postulated associations that have emerged from GWA studies and have had a different replication track record. To show the spectrum of possibilities, I have selected an example with a single signal giving consistent replication in several studies (periodic limb movements), one example with very extensive consistent replication of many signals and some inconsistency that may reflect interesting heterogeneity in genetic effects (type 2 diabetes mellitus), and one example where none of the most statistically significant signals were replicated with further testing. The association between the rs3923809 variant of the BTBD9 gene and periodic limb movements [Stefansson et al., 2007] has had a very consistent effect in the original GWA and two replication studies. The estimated Bayes factor after combining all data varies between 105.26 and 1011.44. The joint effort from three teams of GWA investigators and their replication studies reported 11 polymorphisms giving strong signals of association for type 2 diabetes mellitus [Zeggini et al., 2007]. Here I use summary estimates of genetic effects that incorporate also between-study heterogeneity in the calculations (random effects model) [Ioannidis et al., 2007a]. As shown in Table I, there are considerable differences in the estimated Bayes factors for these 11 variants. Different assumptions for the alternative do not affect a lot the Bayes estimate for a specific variant, with variability being less than 1 log10 scale and often less than 0.5 log10 scale for each variant. However, across the 11 variants, the range of Bayes factors is extreme, extending from 100.12 to <1030. Six variants have Bayes factors of 105 or less, and for another one (SLC30A8 rs13266634) additional data from subsequent studies [Sladek et al., 2007; Steinthorsdottir et al., 2007] would bring the Bayes factor in the same range. Also for CDKAL1, at least one other study has found a strong signal in the same gene, but different marker [Steinthorsdottir et al., 2007]. Of the remaining variants that have less impressive Bayes factors, one was found to be associated to a different, correlated phenotype (FTO, see below), one is still uncertain (rs9300039), and one (PPARG rs1801282) is further supported by several previous studies already performed in the candidate gene era, although the inconsistency of effects across studies may be a hint that the true culprit polymorphism linked with the rs1801282 marker is not yet identified. The table shows also the results of the first two-tier GWA on Parkinson’s disease, where several polymorphisms were proposed as susceptibility loci for the disease [Maraganore et al., 2005], but none of them were replicated with subsequent evidence from much larger replication studies [Elbaz et al., 2006]. Variability in the estimated Bayes factors is not very prominent, and no variant reached a Bayes factor more extreme than 103.5 under any assumption; most values were actually at least a log10 scale more conservative. Thus it should not be surprising that none of these variants were eventually replicated. EMPIRICAL DEMONSTRATION: CREDIBILITY There is some unavoidable subjectivity in the choice of C0. If we set C0 ¼ 0, then under this extremely skeptical (nihilistic) stance, no association is true and this cannot change, regardless of what results we get. If the whole approach to common genetic variants is irrelevant to genetic risk, no matter what P-values we find, the associations simply reflect error and bias. Leaving the nihilistic prior aside, it may be reasonable to assume C0 ¼ 0.00001 approximately for platforms with 300–500 k markers that achieve 65–75% coverage of the genome at r2 > 0.8 [Barrett and Cardon, 2006]. This means that approximately 3–5 markers among those tested are expected to reflect true associations. This is based on considerations of how many common variants are likely to underlie the genetic variability, given the typical observed effects for single variants. However, it is unavoidable that this will vary from disease to disease depending on the proportion of the risk variance due to common genetic variants, the linkage disequilibrium pattern of these variants, the exact magnitude of their effects, and the presence or not of epistatic interactions. Obviously, C0 would also depend on the testing platform characteristics, including the number of markers, coverage achieved in the populations under study, and possible redundancy of the represented markers. Therefore, in most circumstances, one may need to consider a range of C0 spanning at least two log10 scales (0.0001–0.000001). Table II shows for different values of C0 the post-study credibility for the associations whose Bayes factors were calculated in Table I. Estimates are provided for Bayes factors Calibration of Credibility of Agnostic GWAs 967 TABLE I. Estimated Bayes Factors for Selected Associations Proposed by GWA Studies According to Different Values of yA (0.049, 0.140, 0.262, 0.405, and 0.588, Corresponding to Odds Ratios of 1.05, 1.15, 1.30, 1.50, and 1.80, Respectively) Estimated log10(Bayes factor) under different assumptions for the yA Gene Variant OR (95% CI) Periodic limb movements in sleep BTBD9 rs3923809 1.72 (1.50–1.98) Type 2 diabetes mellitus — rs9300039 1.25 (1.04–1.50) FTO rs8050136 1.13 (1.02–1.25) PPARG rs1801282 1.16 (1.07–1.25) CDKAL1 rs10946398 1.12 (1.07–1.17) SLC30A8 rs13266634 1.12 (1.07–1.18) CDKN2B rs564398 1.12 (1.07–1.17) HHEX rs5015480– 1.13 (1.08–1.17) rs1111875 KCNJ11 rs5215 1.14 (1.10–1.19) IGF2BP2 rs4402960 1.15 (1.10–1.19) CDKN2B rs10811661 1.20 (1.14–1.25) TCF7L2 rs7901695 1.37 (1.31–1.43) Parkinson’s disease SEMA5A rs7702187 1.74 (1.36–2.24) — rs10200894 1.84 (1.38–2.45) — rs2313982 2.01 (1.44–2.79) — rs17329669 1.71 (1.33–2.21) — rs7723605 1.78 (1.35–2.35) — ss46548856 1.88 (1.38–2.57) GALNT3 rs16851009 1.84 (1.36–2.49) PRDM2 rs2245218 1.67 (1.29–2.14) PASD1 rs7878232 1.38 (1.17–1.62) — rs1509269 1.71 (1.30–2.26) — rs11737074 1.50 (1.21–1.86) P-value yA ¼ 0.049 yA ¼ 0.140 yA ¼ 0.262 yA ¼ 0.405 yA ¼ 0.588 3 1014 5.26 10.35 11.30 11.44 11.40 0.015 0.015 0.0003 3.2 106 8.7 106 1.2 107 5.7 1010 0.31 0.56 1.74 3.68 3.26 4.89 7.01 0.67 0.63 2.04 3.74 3.36 5.09 7.29 0.63 0.45 1.88 3.53 3.15 4.90 7.10 0.50 0.28 1.71 3.35 2.98 4.72 6.93 0.36 0.12 1.56 3.20 2.82 4.57 6.78 5 1011 6.5 1012 7.8 1015 1.0 1048 7.96 8.75 10.99 >30 8.31 9.17 12.00 >30 8.13 8.99 11.90 >30 7.96 8.82 11.75 >30 7.80 8.67 11.60 >30 0.78 0.57 0.44 0.67 0.56 0.45 0.47 0.62 1.12 0.49 0.68 2.62 2.17 1.92 2.29 2.07 1.86 1.88 2.11 2.44 1.81 1.96 3.34 2.96 2.82 2.93 2.75 2.64 2.62 2.69 2.62 2.40 2.30 3.48 3.15 3.10 3.05 2.90 2.85 2.80 2.78 2.56 2.51 2.29 3.45 3.15 3.15 3.01 2.89 2.86 2.80 2.73 2.45 2.48 2.21 7.62 106 1.70 105 1.79 105 2.30 105 3.30 105 3.65 105 4.17 105 4.61 105 6.87 105 9.21 105 1.55 104 For the polymorphism implicated in periodic limb movements in sleep, there is no between-study heterogeneity in the three datasets [Stefansson et al., 2007] and fixed effects synthesis coincides with random effects. For the type 2 diabetes polymorphisms summary effects from three GWA studies and their replication efforts [Zeggini et al., 2007] are obtained by random effects synthesis [as in Ioannidis et al., 2007a]. Note that some of the polymorphisms for type 2 diabetes are not really derived from agnostic approaches, but were already suggested or known from pre-GWA studies (notably PPARG, KCNJ11, IGF2BP2, and TCF7L2), however all 11 polymorphisms are shown here for completeness. For Parkinson’s disease, genetic effects from the synthesis of data from the two tiers of the first GWA are obtained directly from the GWA publication [Maraganore et al., 2005]. calculated for yA corresponding to odds ratios of 1.05, 1.30, and 1.80. As shown, for the rs3923809 variant of the BTBD9, the credibility is always very high unless there is a combination of very low pre-study credibility (0.000001) and genetic effects are expected to be very small (averaging odds ratios around 1.05). With this combination of assumptions, the observed genetic effect (odds ratio 1.72) is only 16% likely to be true. For the type 2 diabetes associations, some variants have ubiquitously very high credibility regardless of the exact assumptions, while some others have a wide range of credibility depending on the exact assumptions. Finally, for the polymorphisms that arise from the first GWA study of Parkinson’s disease, credibility is always practically negligible, regardless of the background assumptions. IMPACT OF HETEROGENEITY When several datasets are combined to obtain a summary genetic effect for the variant of interest, the results may sometimes differ substantially depending on whether the summary estimates takes into account or not the possibility of between-study heterogeneity. Statistical tests for betweenstudy heterogeneity such as the Cochran’s Q statistic have negligible power when few studies are involved [Higgins and Thompson, 2002]. This is almost the rule when a new association is proposed and an effort is made to replicate it in a few other populations. Therefore, a non-significant test of heterogeneity is no proof of homogeneity. Similarly, the I2 metric which can measure the extent of variability that is beyond chance can have very wide confidence intervals in the presence of only few datasets [Huedo-Medina et al., 2006; Ioannidis et al., 2007b], thus an I2 ¼ 0% is not necessarily reassuring that there is no heterogeneity in the strength of a genetic association between different settings. Statistical heterogeneity is only a surrogate of clinical and biological heterogeneity. The interested reader is referred elsewhere for a more detailed list of causes that may result in heterogeneity of genetic effects across different populations in the GWA setting [Ioannidis, 2007]. Besides bias, genuine diversity may be seen at the level of the genetic structure (e.g., differential linkage disequilibrium with the real culprit marker); or at the level of phenotype structure (e.g., differential correlation with some other correlated phenotype). In the agnostic setting of GWA screening of markers, it is more likely to hit linked markers rather than the true culprit markers that mediate the functional biological effects. This has implications on whether fine mapping should precede replication efforts or vice versa, and theoretical simulations suggest that this depends on a number of assumptions [Clarke et al., 2007]. Moreover, most common disease definitions have been developed based on operational (clinically functional) criteria and may reflect a very high level of underlying biological diversity. Common diseases are also highly correlated with each other [Rzhetsky et al., 2007]. We already have examples where the originally identified association that emerged from a GWA probably reflected an association with a correlated 968 Ioannidis TABLE II. Credibility Estimates for the Associations of Table I With prior credibility, C0 ¼ 0.0001 Gene Variant With prior credibility, C0 ¼ 0.00001 With prior credibility, C0 ¼ 0.000001 yA ¼ 0.049 yA ¼ 0.262 yA ¼ 0.588 yA ¼ 0.049 yA ¼ 0.262 yA ¼ 0.588 yA ¼ 0.049 yA ¼ 0.262 yA ¼ 0.588 Periodic limb movements in sleep BTBD9 rs3923809 0.948 Type 2 diabetes mellitus — rs9300039 0.000 FTO rs8050136 0.000 PPARG rs1801282 0.005 CDKAL1 rs10946398 0.325 SLC30A8 rs13266634 0.154 CDKN2B rs564398 0.886 HHEX rs5015480– 0.999 rs1111875 KCNJ11 rs5215 1.000 IGF2BP2 rs4402960 1.000 CDKN2B rs10811661 1.000 TCF7L2 rs7901695 1.000 Parkinson’s disease SEMA5A rs7702187 0.001 — rs10200894 0.000 — rs2313982 0.000 — rs17329669 0.000 — rs7723605 0.000 — ss46548856 0.000 GALNT3 rs16851009 0.000 PRDM2 rs2245218 0.000 PASD1 rs7878232 0.001 — rs1509269 0.000 — rs11737074 0.000 1.000 1.000 0.645 1.000 1.000 0.154 1.000 1.000 0.000 0.000 0.007 0.252 0.124 0.887 0.999 0.000 0.000 0.004 0.136 0.062 0.788 0.998 0.000 0.000 0.001 0.046 0.018 0.437 0.990 0.000 0.000 0.001 0.033 0.014 0.440 0.992 0.000 0.000 0.000 0.015 0.007 0.270 0.984 0.000 0.000 0.000 0.005 0.002 0.072 0.911 0.000 0.000 0.000 0.003 0.001 0.073 0.927 0.000 0.000 0.000 0.002 0.001 0.036 0.857 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.999 1.000 1.000 1.000 0.999 1.000 1.000 1.000 0.998 1.000 1.000 1.000 0.989 0.998 1.000 1.000 0.993 0.999 1.000 1.000 0.985 0.998 1.000 1.000 0.179 0.084 0.062 0.079 0.054 0.042 0.040 0.046 0.040 0.024 0.019 0.222 0.125 0.123 0.094 0.071 0.068 0.060 0.051 0.027 0.029 0.016 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.021 0.009 0.007 0.008 0.006 0.004 0.004 0.005 0.004 0.003 0.002 0.028 0.014 0.014 0.010 0.008 0.007 0.006 0.005 0.003 0.003 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.001 0.001 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.003 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.000 0.000 0.000 The value 0.000 corresponds to estimated credibility <0.001. Only the data of Table I are considered here, but some associations have additional evidence from other studies as well (see text for details). phenotype. Thus for an FTO variant, while there was large between-study heterogeneity in the strength of the association with type 2 diabetes, the associations had consistent effects when obesity and body mass index were examined as phenotypes [Frayling et al., 2007]. Searching for correlated phenotypes when heterogeneity is detected in an original association is an interesting prospect. If an association is inconsistent, one may search for a correlated phenotype where the association is consistent across populations. However, one has to be cautious that for some disease phenotypes, there exist a very large number of correlated phenotypes. This introduces a new layer of multiplicity of analyses. If many such correlated phenotypes are probed, I would argue that the new, seemingly consistent association needs to be further replicated in new, independent datasets. Otherwise, it may be spurious and may have risen simply out of ‘‘moving the goalpost,’’ a well-known bias in the literature of selective outcome analysis and reporting [Chan and Altman, 2005]. Heterogeneity should not be ignored in the calculations of summary genetic effects and the respective credibility. Unfortunately, it is common practice to combine data from GWA and replication investigations with fixed effects approaches that do not take any such heterogeneity into account. The estimated credibility levels can change a lot if heterogeneity is not factored in the calculations. Table III shows such examples for five variants that were implicated to be associated with type 2 diabetes, but on closer inspection they have modest or larger heterogeneity (I2 point estimates >25%). The Bayes factor is far more impressive when heterogeneity is ignored, but assumption of homogeneous effect sizes misses the opportunity to use the heterogeneity in an informative fashion and to pursue correlated phenotypes, linked polymorphisms and other reasons that may underlie the diversity of effects. In the presence of unexplained heterogeneity that exceeds a certain threshold, it has been shown that associations become non-replicable (the power to replicate them cannot exceed a TABLE III. Bayes factors Ignoring Heterogeneity for Associations With Estimated I2 > 25% (Assuming yA ¼ 0.140, i.e., OR ¼ 1.15) Gene — FTO PPARG CDKAL1 SLC30A8 Polymorphism I2 (95% CI) Random effects, P-value log10 (Bayes factor) Fixed effects OR (95% CI) Fixed effects, P-value log10 (Bayes factor) rs9300039 rs8050136 rs1801282 rs10946398 rs13266634 75% (0–90) 77% (0–91) 47% (0–84) 46% (0–84) 32% (0–81) 0.015 0.015 0.0003 3.2 106 8.7 106 0.67 0.63 2.04 3.74 3.36 1.25 (1.15–1.37) 1.17 (1.12–1.22) 1.14 (1.08–1.20) 1.12 (1.08–1.16) 1.12 (1.07–1.16) 4.3 107 1.3 1012 1.7 106 4.1 1011 5.3 108 4.60 9.85 4.04 8.37 5.41 Based on data from three GWA studies, as synthesized by Zeggini et al.  and Ioannidis et al. [2007a]. For PPARG, the association was already proposed and replicated in several studies before the GWA studies. Moreover, additional data have accumulated on several of these polymorphisms. The Bayes factors are calculated strictly based on the data from the three GWA investigations for illustrative purposes. Calibration of Credibility of Agnostic GWAs certain level), regardless of how large are the conducted studies [Mooneshinghe et al., 2008]. IMPACT OF BIAS Following a previous formulation [Ioannidis, 2005], I define bias as any reason, beyond random chance, that may yield a nominally statistically significant association when no such association exists. An extensive number of checklists exist for biases in observational epidemiology [Sanderson et al., 2007]. For genetic associations, main considerations [Ioannidis et al., 2008] include biases in genotype and phenotype measurements, confounding, and selective reporting and have been extensively discussed in the genetic epidemiology literature already from the candidate gene era [Cordell and Clayton, 2005; Hattersley and McCarthy, 2005; Newton-Cheh and Hirschhorn, 2005; Pan et al., 2005; Pompanon et al., 2005; Calnan et al., 2006; Wang et al., 2006]. Here I will only mention briefly a few issues that may be more relevant to GWA efforts. Differential (according to phenotype) genotyping errors are a special concern when case and control samples have been collected, saved, or processed separately. Unless special care is taken to make the whole process similar for case and control samples, spurious systematic differences may arise. Differential (according to genotype) misclassification is less likely to occur at the phenotype level. Confounding due to population stratification remains a threat. Even though the available evidence suggests that stratification is not a big concern for carefully geographically/ethnically defined populations (e.g., the UK population in the Wellcome Trust Case Control Consortium ), strict control for stratification with appropriate methods such as principal component analysis is indicated [Price et al., 2006]. Even with negligible average stratification, a few specific emerging associations per scan may still reflect stratification. Finally, in the GWA setting, variants of selective reporting bias may arise, for example, selective presentation of only the most promising, most statistically significant results for specific variants, analyses and choice of genetic model or adjustment. The importance of making all GWA databases publicly available with detailed, non-selected information, cannot be overstressed. Efforts such as GAIN [The GAIN Collaborative Research Group et al., 2007] are critical, but public availability of datasets may be difficult to make a ubiquitous mandate worldwide. For replication studies, one may still face some of the same selective reporting forces that existed in the candidate gene era. A collateral damage from the otherwise very useful advent of large-scale evidence with powerful consortia [Ioannidis et al., 2005; Seminara et al., 2007] may ensue, if the results of the current generation of studies with massive testing and large sample sizes are considered definitive and smaller replication studies find difficulty publishing results disagreeing with or being inconclusive against findings from powerful consortia. Independent replication needs to be safeguarded in an era of global research networking. One has to scrutinize each proposed association that stems from GWA or any other study very carefully for the presence of any visible biases. When bias is known or is revealed, its effect can be properly factored. However, bias often remains latent. One may simulate the impact of different amounts of latent bias on the credibility of the associations. Let us consider that bias can cause an x proportion of variants pass a given P-value threshold for a specific phenotype association. If k variants have been tested, then the expected number of variants that pass the threshold due to bias is xk. If n variants have passed this threshold, then xk out of n are expected to reflect bias. By default, we do not know which these specific ‘‘biased’’ variants are. However, we can correct the credibility of each of the n variants for bias on average, 969 multiplying by (n xk)/n. For variants with uncorrected credibility estimates exceeding 50%, the corrected for bias credibility will remain above 50% if xk < (C 0.5)n/C. For example, assuming a C0 ¼ 0.00001, for periodic limb movements in sleep we have already estimated the credibility of the association with the rs3923809 polymorphism of BTBD9 to be 0.645, if the average genetic effects are expected to have odds ratio of 1.05; and 1.000, if the average genetic effects are expected to have odds ratio of 1.30–1.80. Here n ¼ 1 (only this variant had such extreme statistical significance) and to retain credibility of at least 50% for this association, we need xk < 0.22 or xk < 1, respectively. This means that, if the average genetic effects are expected to have an odds ratio of 1.05, then bias would not be able to decrease the credibility of this association below 50%, unless it can produce associations with P ¼ 1014 at least once in every five diseases screened with similar GWA approaches with half a million markers. If the average genetic effects are expected to have an odds ratio of 1.30–1.80, then bias would not be able to decrease the credibility of this association below 50%, unless it can produce associations with P ¼ 1014 at least once in every single disease screened with a similar GWA approach with half a million markers. One may also calculate the corrected-for-bias credibility allowing xk to vary according to the threshold of statistical significance, assuming it is more difficult for bias alone to produce lower levels of statistical significance. However, even for extreme levels of statistical significance, bias cannot be completely excluded. P-VALUES VERSUS CREDIBILITY P-values correlate with the Bayes factors and derived credibility estimates, but correlation is not perfect. Big versus small studies and different assumptions about the prior and the extent of bias can affect this correlation. P-values only deal with the possibility of random chance being responsible for a false refutation of the null hypothesis under conditions of perfection (no bias). They do not account for the possible distribution of genetic effects and they offer no reassurance against bias. For example, consistent replication across many studies will decrease the P-value of the association in an overarching meta-analysis of all data. The credibility will also often increase with such consistent replication. However, credibility may not decrease in parallel to increasing P-values, if the emerging effects are considered atypical in magnitude. For example, accumulation of very large sample sizes from many studies may create highly statistically significant results for associations with very small effects, for example, odds ratio 1.02. Depending on the prior, such effects may be considered more compatible with the null rather than the alternative hypothesis. Moreover, credibility will decrease, even with decreasing P-values, if replication is consistent simply because the same biases occur repeatedly across studies. The Bayesian approach is thus not just a test of the data, but can also incorporate explicitly our assumptions about the exact genetic architecture (e.g., relative balance of rare variation, heterogeneity, epistasis, common variants with very small effects) and the validity of our models. One can evaluate the credibility of an association under different assumptions about the genetic architecture, models and biases. A simple Excel calculator of Bayes factor and credibility estimates is available at www.dhe.med.uoi.gr/software.htm. CONCLUDING COMMENTS Interim guidelines, the Venice criteria, have been proposed recently on assessing the cumulative evidence on genetic associations [Ioannidis et al., 2008]. They focus on evaluation of the amount of evidence, consistency of replication and protection from bias. These three axes are used in a semi- 970 Ioannidis quantitative approach and they fit to the description of credibility ranking process described above. For more information on their operationalization, the reader is referred to the guidelines publication [Ioannidis et al., 2008]. The Bayesian approach that was described here is one of several variants that can be adopted to calibrate credibility. It is simple to use and hopefully can provide some useful insights. One should also caution that while the illustrative examples presented here span a very wide range of credibility estimates and a very wide range of supporting epidemiological evidence, perhaps calibration of credibility would make more sense to apply only on associations that have a large amount of data, consistent replication with lack of demonstrable betweenstudy heterogeneity in the datasets where the association has been probed, and also the evidence seems to be probably adequately protected from obvious sources of bias (AAA categorization in the Venice criteria). For small studies, those without consistent replication, and those with clear presence of bias, the credibility is very low by default. Finally, any effort for calibrating the credibility of associations derived from GWA investigations needs to be corroborated eventually by the accumulating evidence on the longerterm replication history of GWA-proposed associations. Additional lines of evidence, such as various sources of experimental information or other data pointing to biological plausibility may be useful to incorporate in the credibility estimation down the road. However, we still have a lot to learn about the interface of epidemiological and biological credibility, especially for otherwise agnostic associations. The important question is: In the evolution of evidence over time, do associations that are graded as having ‘‘strong evidence’’ (AAA Venice categorization) and high credibility survive upon further replication testing? This may not necessarily be a perfect gold standard, but we have to accept that a meticulous examination of the accumulated evidence-to-date is the best we can achieve. In this regard, genetic epidemiology offers a situation where continuation of replication ad infinitum with continuous accumulation of evidence and clarification of associations is not necessarily bad or unjustified, if the required resources can be met. REFERENCES Barrett JC, Cardon LR. 2006. Evaluating coverage of genome-wide association studies. Nat Genet 38:659–662. Beckmann JS, Estivill X, Antonarakis SE. 2007. Copy number variants and genetic traits: Closer to the resolution of phenotypic to genotypic variability. Nat Rev Genet 8:639–646. Calnan M, Smith GD, Sterne JA. 2006. The publication process itself was the major cause of publication bias in genetic epidemiology. J Clin Epidemiol 59:1312–1318. Chan AW, Altman DG. 2005. Identifying outcome reporting bias in randomised trials on PubMed: Review of publications and survey of authors. BMJ 330:753. Clarke R, Xu P, Bennett D, Lewington S, Zondervan K, Parish S, Palmer A, Clark S, Cardon L, Peto R, Lathrop M, Collins R. 2006. Lymphotoxinalpha gene and risk of myocardial infarction in 6,928 cases and 2,712 controls in the ISIS case-control study. PLoS Genet 2:e107. Clarke GM, Carter KW, Palmer LJ, Morris AP, Cardon LR. 2007. Fine mapping versus replication in whole-genome association studies. Am J Hum Genet 81:995–1005. Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, Hughes TE, Groop L, Altshuler D, Almgren P, Florez JC, Meyer J, Ardlie K, Bengtsson Bostrom K, Isomaa B, Lettre G, Lindblad U, Lyon HN, Melander O, Newton-Cheh C, Nilsson P, Orho-Melander M, Rastam L, Speliotes EK, Taskinen MR, Tuomi T, Guiducci C, Berglund A, Carlson J, Gianniny L, Hackett R, Hall L, Holmkvist J, Laurila E, Sjogren M, Sterner M, Surti A, Svensson M, Svensson M, Tewhey R, Blumenstiel B, Parkin M, Defelice M, Barry R, Brodeur W, Camarata J, Chia N, Fava M, Gibbons J, Handsaker B, Healy C, Nguyen K, Gates C, Sougnez C, Gage D, Nizzari M, Gabriel SB, Chirn GW, Ma Q, Parikh H, Richardson D, Ricke D, Purcell S, 2007. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316:1331–1336. Dina C, Meyre D, Samson C, Tichet J, Marre M, Jouret B, Charles MA, Balkau B, Froguel P. 2007. Comment on ‘‘A common genetic variant is associated with adult and childhood obesity’’. Science 315:187. Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A, Dassopoulos T, Bitton A, Yang H, Targan S, Datta LW, Kistner EO, Schumm LP, Lee AT, Gregersen PK, Barmada MM, Rotter JI, Nicolae DL, Cho JH. 2006. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314:1461–1463. Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger DG, Struewing JP, Morrison J, Field H, Luben R, Wareham N, Ahmed S, Healey CS, Bowman R; SEARCH collaborators, Meyer KB, Haiman CA, Kolonel LK, Henderson BE, Le Marchand L, Brennan P, Sangrajrang S, Gaborieau V, Odefrey F, Shen CY, Wu PE, Wang HC, Eccles D, Evans DG, Peto J, Fletcher O, Johnson N, Seal S, Stratton MR, Rahman N, Chenevix-Trench G, Bojesen SE, Nordestgaard BG, Axelsson CK, Garcia-Closas M, Brinton L, Chanock S, Lissowska J, Peplonska B, Nevanlinna H, Fagerholm R, Eerola H, Kang D, Yoo KY, Noh DY, Ahn SH, Hunter DJ, Hankinson SE, Cox DG, Hall P, Wedren S, Liu J, Low YL, Bogdanova N, Schurmann P, Dork T, Tollenaar RA, Jacobi CE, Devilee P, Klijn JG, Sigurdson AJ, Doody MM, Alexander BH, Zhang J, Cox A, Brock IW, MacPherson G, Reed MW, Couch FJ, Goode EL, Olson JE, Meijers-Heijboer H, van den Ouweland A, Uitterlinden A, Rivadeneira F, Milne RL, Ribas G, Gonzalez-Neira A, Benitez J, Hopper JL, McCredie M, Southey M, Giles GG, Schroen C, Justenhoven C, Brauch H, Hamann U, Ko YD, Spurdle AB, Beesley J, Chen X, kConFab; AOCS Management Group, Mannermaa A, Kosma VM, Kataja V, Hartikainen J, Day NE, Cox DR, Ponder BA, 2007. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447:1087–1093. Elbaz A, Nelson LM, Payami H, Ioannidis JP, Fiske BK, Annesi G, Carmine Belin A, Factor SA, Ferrarese C, Hadjigeorgiou GM, Higgins DS, Kawakami H, Kruger R, Marder KS, Mayeux RP, Mellick GD, Nutt JG, Ritz B, Samii A, Tanner CM, Van Broeckhoven C, Van Den Eeden SK, Wirdefeldt K, Zabetian CP, Dehem M, Montimurro JS, Southwick A, Myers RM, Trikalinos TA. 2006. Lack of replication of thirteen singlenucleotide polymorphisms implicated in Parkinson’s disease: A largescale international study. Lancet Neurol 5:917–923. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, Perry JR, Elliott KS, Lango H, Rayner NW, Shields B, Harries LW, Barrett JC, Ellard S, Groves CJ, Knight B, Patch AM, Ness AR, Ebrahim S, Lawlor DA, Ring SM, Ben-Shlomo Y, Jarvelin MR, Sovio U, Bennett AJ, Melzer D, Ferrucci L, Loos RJ, Barroso I, Wareham NJ, Karpe F, Owen KR, Cardon LR, Walker M, Hitman GA, Palmer CN, Doney AS, Morris AD, Smith GD, Hattersley AT, McCarthy MI. 2007. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316:889–894. Hampe J, Franke A, Rosenstiel P, Till A, Teuber M, Huse K, Albrecht M, Mayr G, De La Vega FM, Briggs J, Günther S, Prescott NJ, Onnie CM, Häsler R, Sipos B, Fölsch UR, Lengauer T, Platzer M, Mathew CG, Krawczak M, Schreiber S. 2007. A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16 L1. Nat Genet 2007. 39:207–211. Hattersley AT, McCarthy MI. 2005. What makes a good genetic association study? Lancet 366:1315–1323. Cordell HJ, Clayton DG. 2005. Genetic association studies. Lancet 366: 1121–1131. Helgadottir A, Thorleifsson G, Manolescu A, Gretarsdottir S, Blondal T, Jonasdottir A, Jonasdottir A, Sigurdsson A, Baker A, Palsson A, Masson G, Gudbjartsson DF, Magnusson KP, Andersen K, Levey AI, Backman VM, Matthiasdottir S, Jonsdottir T, Palsson S, Einarsdottir H, Gunnarsdottir S, Gylfason A, Vaccarino V, Hooper WC, Reilly MP, Granger CB, Austin H, Rader DJ, Shah SH, Quyyumi AA, Gulcher JR, Thorgeirsson G, Thorsteinsdottir U, Kong A, Stefansson K. 2007. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316:1491–1493. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research, Saxena R, Herbert A, Gerry NP, McQueen MB, Heid IM, Pfeufer A, Illig T, Wichmann HE, Meitinger T, Hunter D, Hu FB, Colditz G, Hinney A, Hebebrand J, Coon KD, Myers AJ, Craig DW, Webster JA, Pearson JV, Lince DH, Zismann VL, Beach TG, Leung D, Bryden L, Halperin RF, Marlowe L, Kaleem M, Walker DG, Ravid R, Heward CB, Rogers J, Papassotiropoulos A, Reiman EM, Hardy J, Stephan DA. 2007. A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer’s disease. J Clin Psychiatry 68:613–618. Calibration of Credibility of Agnostic GWAs 971 Koberwitz K, Zhu X, Cooper R, Ardlie K, Lyon H, Hirschhorn JN, Laird NM, Lenburg ME, Lange C, Christman MF. 2006. A common genetic variant is associated with adult and childhood obesity. Science 312: 279–283. Mathew CG. 2008. Links to the pathogenesis of Crohn disease provided by genome-wide association scans. Nat Rev Genet 9:9–14. Higgins JP, Thompson SG. 2002. Quantifying heterogeneity in a metaanalysis. Stat Med 21:1539–1558. McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, Cox DR, Hinds DA, Pennacchio LA, Tybjaerg-Hansen A, Folsom AR, Boerwinkle E, Hobbs HH, Cohen JC. 2007. A common allele on chromosome 9 associated with coronary heart disease. Science 316:1488–1491. Hirschhorn JN, Daly MJ. 2005. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108. Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. 2002. A comprehensive review of genetic association studies. Genet Med 4:45–61. Huedo-Medina TB, Sanchez-Meca J, Marin-Martinez F, Botella J. 2006. Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol Methods 11:193–206. McCarroll SA, Altshuler DM. 2007. Copy-number variation and association studies of human disease. Nat Genet 39(7 Suppl):S37–S42. Mooneshinghe R, Khoury MJ, Liu T, Ioannidis JP. 2008. Required sample size and nonreplicability thresholds for heterogeneous genetic associations. Proc Natl Acad Sci USA 105:617–622. Ioannidis JP. 2008. Effect of formal statistical significance on the credibility of observational associations. Am J Epidemiol (in press). NCI-NHGRI Working Group on Replication in Association Studies, Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, Thomas G, Hirschhorn JN, Abecasis G, Altshuler D, Bailey-Wilson JE, Brooks LD, Cardon LR, Daly M, Donnelly P, Fraumeni JF Jr, Freimer NB, Gerhard DS, Gunter C, Guttmacher AE, Guyer MS, Harris EL, Hoh J, Hoover R, Kong CA, Merikangas KR, Morton CC, Palmer LJ, Phimister EG, Rice JP, Roberts J, Rotimi C, Tucker MA, Vogan KJ, Wacholder S, Wijsman EM, Winn DM, Collins FS, 2007. Replicating genotypephenotype associations. Nature 447:655–660. Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. 2001. Replication validity of genetic association studies. Nat Genet 29:306–309. Newton-Cheh C, Hirschhorn JN. 2005. Genetic association studies of complex traits: Design and analysis issues. Mutat Res 573:54–69. Ioannidis JP, Trikalinos TA, Ntzani EE, Contopoulos-Ioannidis DG. 2003. Genetic associations in large versus small studies: An empirical assessment. Lancet 361:567–571. Ozaki K, Ohnishi Y, Iida A, Sekine A, Yamada R, Tsunoda T, Sato H, Sato H, Hori M, Nakamura Y, Tanaka T. 2002. Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat Genet 32:650–654. Ioannidis JP. 2005. Why most published research findings are false. PLoS Med 2:e124. Ioannidis JP. 2007. Non-replication and inconsistency in the genome-wide association setting. Hum Hered 64:203–213. Ioannidis JP, Bernstein J, Boffetta P, Danesh J, Dolan S, Hartge P, Hunter D, Inskip P, Jarvelin MR, Little J, Maraganore DM, Bishop JA, O’Brien TR, Petersen G, Riboli E, Seminara D, Taioli E, Uitterlinden AG, Vineis P, Winn DM, Salanti G, Higgins JP, Khoury MJ. 2005. A network of investigator networks in human genome epidemiology. Am J Epidemiol 162:302–304. Ioannidis JP, Trikalinos TA, Khoury MJ. 2006. Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am J Epidemiol 164:609–614. Ioannidis JP, Patsopoulos NA, Evangelou E. 2007a. Heterogeneity in metaanalyses of genome-wide association investigations. PLoS ONE 2:e841. Ioannidis JP, Patsopoulos NA, Evangelou E. 2007b. Uncertainty in heterogeneity estimates in meta-analysis. BMJ 335:914–916. Ioannidis JP, Boffetta P, Little J, O’brien TR, Uitterlinden AG, Vineis P, Balding DJ, Chokkalingam A, Dolan SM, Flanders WD, Higgins JP, McCarthy MI, McDermott DH, Page GP, Rebbeck TR, Seminara D, Khoury MJ. 2008. Assessment of cumulative evidence on genetic associations: Interim guidelines. Int J Epidemiol 37:120–132. Khoury MJ, Little J, Gwinn M, Ioannidis JP. 2006. On the synthesis and interpretation of consistent but weak gene-disease associations in the era of genome-wide association studies. Int J Epidemiol 36:439–445. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable C, Hoh J. 2005. Complement factor H polymorphism in agerelated macular degeneration. Science 308:385–389. Kurreeman FA, Padyukov L, Marques RB, Schrodi SJ, Seddighzadeh M, Stoeken-Rijsbergen G, van der Helm-van Mil AH, Allaart CF, Verduyn W, Houwing-Duistermaat J, Alfredsson L, Begovich AB, Klareskog L, Huizinga TW, Toes RE. 2007. A candidate gene approach identifies the TRAF1/C5 region as a risk factor for rheumatoid arthritis. PLoS Med 4:e278. Loos RJ, Barroso I, O’rahilly S, Wareham NJ. 2007. Comment on ‘‘A common genetic variant is associated with adult and childhood obesity’’. Science 315:187. Maraganore DM, de Andrade M, Lesnick TG, Strain KJ, Farrer MJ, Rocca WA, Pant PV, Frazer KA, Cox DR, Ballinger DG. 2005. High-resolution whole-genome association study of Parkinson disease. Am J Hum Genet 77:685–693. Marchini J, Donnelly P, Cardon LR. 2005. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet 37:413–417. Matarin M, Brown WM, Scholz S, Simon-Sanchez J, Fung HC, Hernandez D, Gibbs JR, De Vrieze FW, Crews C, Britton A, Langefeld CD, Brott TG, Brown RD Jr, Worrall BB, Frankel M, Silliman S, Case LD, Singleton A, Hardy JA, Rich SS, Meschia JF. 2007. A genome-wide genotyping study in patients with ischaemic stroke: Initial analysis and data release. Lancet Neurol 6:414–420. Pan Z, Trikalinos TA, Kavvoura FK, Lau J, Ioannidis JP. 2005. Local literature bias in genetic epidemiology: An empirical evaluation of the Chinese literature. PLoS Med 2:e334. Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, Fisher SA, Roberts RG, Nimmo ER, Cummings FR, Soars D, Drummond H, Lees CW, Khawaja SA, Bagnall R, Burke DA, Todhunter CE, Ahmad T, Onnie CM, McArdle W, Strachan D, Bethel G, Bryan C, Lewis CM, Deloukas P, Forbes A, Sanderson J, Jewell DP, Satsangi J, Mansfield JC; Wellcome Trust Case Control Consortium, Cardon L, Mathew CG, 2007. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn’s disease susceptibility. Nat Genet 39:830–832. Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, Liew A, Khalili H, Chandrasekaran A, Davies LR, Li W, Tan AK, Bonnard C, Ong RT, Thalamuthu A, Pettersson S, Liu C, Tian C, Chen WV, Carulli JP, Beckman EM, Altshuler D, Alfredsson L, Criswell LA, Amos CI, Seldin MF, Kastner DL, Klareskog L, Gregersen PK. 2007. TRAF1-C5 as a risk locus for rheumatoid arthritis—A genomewide study. N Engl J Med 357: 1199–1209. Pompanon F, Bonin A, Bellemain E, Taberlet P. 2005. Genotyping errors: Causes, consequences and solutions. Nat Rev Genet 6:847–859. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. 2006. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909. Raelson JV, Little RD, Ruether A, Fournier H, Paquin B, Van Eerdewegh P, Bradley WE, Croteau P, Nguyen-Huu Q, Segal J, Debrus S, Allard R, Rosenstiel P, Franke A, Jacobs G, Nikolaus S, Vidal JM, Szego P, Laplante N, Clark HF, Paulussen RJ, Hooper JW, Keith TP, Belouchi A, Schreiber S. 2007. Genome-wide association study for Crohn’s disease in the Quebec Founder Population identifies multiple validated disease loci. Proc Natl Acad Sci USA 104:14747–14752. Rioux JD, Xavier RJ, Taylor KD, Silverberg MS, Goyette P, Huett A, Green T, Kuballa P, Barmada MM, Datta LW, Shugart YY, Griffiths AM, Targan SR, Ippoliti AF, Bernard EJ, Mei L, Nicolae DL, Regueiro M, Schumm LP, Steinhart AH, Rotter JI, Duerr RH, Cho JH, Daly MJ, Brant SR. 2007. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat Genet 39:596–604. Rosskopf D, Bornhorst A, Rimmbach C, Schwahn C, Kayser A, Kruger A, Tessmann G, Geissler I, Kroemer HK, Volzke H. 2007. Comment on ‘‘A common genetic variant is associated with adult and childhood obesity’’. Science 315:187. Rzhetsky A, Wajngurt D, Park N, Zheng T. 2007. Probing genetic overlap among complex human phenotypes. Proc Natl Acad Sci USA 104:11694– 11699. Sanderson S, Tatt ID, Higgins JP. 2007. Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: A systematic review and annotated bibliography. Int J Epidemiol 36: 666–676. 972 Ioannidis Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, Prokunina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, Buchanan TA, Watanabe RM, Valle TT, Kinnunen L, Abecasis GR, Pugh EW, Doheny KF, Bergman RN, Tuomilehto J, Collins FS, Boehnke M. 2007. A genomewide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316:1341–1345. Seminara D, Khoury MJ, O’Brien TR, Manolio T, Gwinn ML, Little J, Higgins JP, Bernstein JL, Boffetta P, Bondy M, Bray MS, Brenchley PE, Buffler PA, Casas JP, Chokkalingam AP, Danesh J, Davey Smith G, Dolan S, Duncan R, Gruis NA, Hashibe M, Hunter D, Jarvelin MR, Malmer B, Maraganore DM, Newton-Bishop JA, Riboli E, Salanti G, Taioli E, Timpson N, Uitterlinden AG, Vineis P, Wareham N, Winn DM, Zimmern R, Ioannidis JP. Human Genome Epidemiology Network, the Network of Investigator Networks. 2007. The emergence of networks in human genome epidemiology: Challenges and opportunities. Epidemiology 18:1–8. Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, Boutin P, Vincent D, Belisle A, Hadjadj S, Balkau B, Heude B, Charpentier G, Hudson TJ, Montpetit A, Pshezhetsky AV, Prentki M, Posner BI, Balding DJ, Meyre D, Polychronakos C, Froguel P. 2007. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445:881–885. Swaroop A, Branham KE, Chen W, Abecasis G. 2007. Genetic susceptibility to age-related macular degeneration: A paradigm for dissecting complex disease traits. Hum Mol Genet 16(Spec No. 2):R174–R182. The GAIN Collaborative Research Group, Manolio TA, Rodriguez LL, Brooks L, Abecasis G; The Collaborative Association Study of Psoriasis, Ballinger D, Daly M, Donnelly P, Faraone SV; The International MultiCenter ADHD Genetics Project, Frazer K, Gabriel S, Gejman P; The Molecular Genetics of Schizophrenia Collaboration, Guttmacher A, Harris EL, Insel T, Kelsoe JR; The Bipolar Genome Study, Lander E, McCowin N, Mailman MD, Nabel E, Ostell J, Pugh E, Sherry S, Sullivan PF; The Major Depression Stage 1 Genomewide Association in Population-Based Samples Study, Thompson JF, Warram J; The Genetics of Kidneys in Diabetes (GoKinD) Study, Wholley D, Milos PM, Collins FS, 2007. New models of collaboration in genome-wide association studies: The Genetic Association Information Network. Nat Genet 39:1045–1051. Thomas DC. 2006. Are we ready for genome-wide association studies? Cancer Epidemiol Biomarkers Prev 15:595–598. Thomas DC, Haile RW, Duggan D. 2005. Recent developments in genomewide association scans: A workshop summary and review. Am J Hum Genet 77:337–345. Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N. 2004. Assessing the probability that a positive report is false: An approach for molecular epidemiology studies. J Natl Cancer Inst 96:434–442. Spiegelhalter DJ, Abrams KR, Myles JP. 2004. Bayesian approaches to clinical trials and health-care evaluation. Wiley: Chichester. Wakefield J. 2007. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am J Hum Genet 81:208–227. Stefansson H, Rye DB, Hicks A, Petursson H, Ingason A, Thorgeirsson TE, Palsson S, Sigmundsson T, Sigurdsson AP, Eiriksdottir I, Soebech E, Bliwise D, Beck JM, Rosen A, Waddy S, Trotti LM, Iranzo A, Thambisetty M, Hardarson GA, Kristjansson K, Gudmundsson LJ, Thorsteinsdottir U, Kong A, Gulcher JR, Gudbjartsson D, Stefansson K. 2007. A genetic risk factor for periodic limb movements in sleep. N Engl J Med 357:639–647. Wang WY, Barratt BJ, Clayton DG, Todd JA. 2005. Genome-wide association studies: Theoretical and practical concerns. Nat Rev Genet 6:109–118. Steinthorsdottir V, Thorleifsson G, Reynisdottir I, Benediktsson R, Jonsdottir T, Walters GB, Styrkarsdottir U, Gretarsdottir S, Emilsson V, Ghosh S, Baker A, Snorradottir S, Bjarnason H, Ng MC, Hansen T, Bagger Y, Wilensky RL, Reilly MP, Adeyemo A, Chen Y, Zhou J, Gudnason V, Chen G, Huang H, Lashley K, Doumatey A, So WY, Ma RC, Andersen G, Borch-Johnsen K, Jorgensen T, van Vliet-Ostaptchouk JV, Hofker MH, Wijmenga C, Christiansen C, Rader DJ, Rotimi C, Gurney M, Chan JC, Pedersen O, Sigurdsson G, Gulcher JR, Thorsteinsdottir U, Kong A, Stefansson K. 2007. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat Genet 39:770–775. Wellcome Trust Case Control Consortium. 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678. Wang Y, Localio R, Rebbeck TR. 2006. Evaluating bias due to population stratification in epidemiologic studies of gene-gene or gene-environment interactions. Cancer Epidemiol Biomarkers Prev 15:124–132. Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney AS; Wellcome Trust Case Control Consortium (WTCCC), McCarthy MI, Hattersley AT, 2007. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316:1336–1341.