AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 67:241-250 (1985) Genetic Variation in North Amerindian Populations: Covariance With Climate DENNIS H. O’ROURKE, BRIAN K. SUAREZ, AND JILL D. CROUSE Department of Anthropology (0.H. O’R.), University of Utah, Salt Lake City, Utah 84112 and Departments of Psychiatry (D.H.O R . , B.K.S., J.D. C.) and Genetics (B.K.S.), Washington University School of Medicine, The Jewish Hospital ofSt. Louis, St. Louis, Missouri 63110 KEY WORDS Climate Amerindian, Genetic variation, Heterozygosity, ABSTRACT Allelic frequencies at seven polymorphic loci in 74 North Amerindian populations are examined relative to patterns of climatic variation. Canonical correlation analysis reveals strong and significant associations of heterozygosity a t the ABO, Ss, D f i y , and P loci with climatic variability. Principal component analysis demonstrates that these loci tend to form correlated ensembles. Moreover, canonical correlation analysis of component scores provides support for a n association between polymorphism a t these loci and environmental variability. The results are concordant with two previous investigations which suggested a relationship between polymorphism for the ABO, Dufly, and Diego systems and climate. It is suggested that the examination of broad geographic patterns of genetic variation at multiple loci is a valuable, but underutilized, method of screening for the effects of long-term systematic pressures. The traditional concerns of anthropological genetics, and human population genetics in general, have been to either (1) document the genetic uniqueness of the populatiods) under study, or to (2) assess the relative similarity or dissimilarity among a group of historically related populations through some form of distance analysis. On a broader scale, this same approach has been used to assess genetic variation within and between major “racial” groups (e.g., Lewontin, 1972; Nei and Roychoudhury, 1972, 1974; Mitton, 1977; Latter, 1980; Weiss and Maruyama, 1976). This analytical orientation has led to a concentration on examining the effects of genetic drift and gene flow on the structure of gene pools in contemporary human populations. Relatively little is known about the role of systematic evolutionary forces in the maintenance and patterning of most human genetic polymorphisms (Harpending, 1974). During the past 30 years anthropological genetic studies have generated a bounty of allelic frequency data for many populations throughout the world (e.g., see Mourant et 0 1985 ALAN R. LISS, INC. al., 1976). We suggest that such collective data are appropriate for examining broad geographic patterns of genetic variation in order to assess the potential role of natural selection operating on marker loci. An obvious characteristic that varies with geography is climate. The effect of climate on patterns of morphological variability has long been appreciated (e.g., Roberts, 1978), but similar investigations with regard to genetic polymorphisms in humans have been few. Levins (1965, 1968) proposed that in heterogeneous environments genetic heterozygosity should increase with environmenta! variability for those loci important to fitness. Subsequent investigations found this prediction to hold for a variety of nonhuman organisms (e.g., Smith and Koehn, 1971; Johnson, 1973; Koehn and Mitton, 1972; McLeod et al., 1981). Bryant (1974) found that 70% of the geographic variation in genetic heterozygosity of several genera could be accounted Received December 13, 1983: revised March 5, 1985; accepted March 14, 1985. 242 D.H. O’ROURKE, B.K. SUAREZ, AND J.D. CROUSE for by climatic variation. He concluded that “. . . patterns of geographic variation in heterozygosities, for statistically correlated ensembles of loci, may represent adaptive shifts in response to changes in variability of specific components of the environment” (p. 15). Moreover, Band (1972) has documented genetic changes in natural populations of D r e sophila melanogaster that are associated with minor but significant climatic shifts. Comparable studies on the relation between genetic heterozygosity and environmental variability in human populations are rare. Wilson and Franklin (1968) examined the level of polymorphism a t the Diego (Di) locus in 97 North, Central, and South Amerindian groups and found that Di(a+) frequencies are elevated in warm, wet environments. Piazza et al. (1981) used 39 independent alleles a t ten loci to evaluate the role of climate on level of polymorphism. Although they reject climate as a major determinant of total genetic variation, they did find that two-thirds of the loci examined showed a significant association with climate. In particular, alleles at the Duffy, haptoglobin, ABO, Rh, MNSs, HLA, and Acid Phosphatase loci exhibited correlations greater than 0.4 with distance from the equator, which they took as a n indirect measure of climatic variability. Ananthakrishnan and Walter (1972) also reported a significant correlation (r = - .71) between the frequency of the Acid Phosphatase allele P and mean annual temperature. As part of a larger study of genetic variation in Amerindian populations (Suarez et al., 1985a,b),the present paper examines the relationship between genetic heterozygosity in 74 North Amerindian populations and climatic variation derived from mean monthly precipitation, mean monthly maximum temperature, mean monthly minimum temperature, the standard deviation of each of these, and elevation. MATERIALS AND METHODS Eleven allelic frequencies at seven polymorphic loci were recorded for 74 North Amerindian populations. Two principal criteria were used for inclusion of published frequencies in the present study. First, each society must have been typed for the ABO, Rh and MN systems. In addition to these three blood groups, allelic frequencies for the Ss (of MNSs), P, Haptoglobin, D a y , Kidd, Diego, Kell, and Transferrin systems were also obtained. The Kell and Transferrin systems proved to be virtually monomorphic in these samples and were subsequently dropped from the analysis, while too few samples reported frequencies for the Kidd and Haptoglobin loci to permit their use. For the Rh system haplotype frequencies were used. In some reports frequencies of d bearing haplotypes could not be unambiguously identified. We have therefore pooled these haplotypes for the estimate of dcde). The very low frequency, or absence, of most of these haplotypes in unadmixed Amerindians suggests that any error introduced by this procedure is minimal. For the MN and Ss loci too few reports included haplotype frequencies to be useful. Accordingly, these two loci are treated separately. Although all populations had reported frequencies for the ABO, Rh, and MN loci, this was not the case for all the other systems included in the analysis. All societies had reported values for the Duffy system, but one or more samples lacked a frequency for the Ss, P, or Diego systems. For these missing data values, estimates were obtained using a geographic distance-weighted average allele frequency based on all observed values for the locus in North America. This gives greatest weight to geographically proximal populations in the estimation of missing data. Moreover, it tends to have a “smoothing” effect when viewed in continental perspective since all observed values contribute to the estimate and no estimate can be outside the observed range. The analyses that follow were performed on data sets with missing data estimated and with societies with missing values excluded. The results were remarkably concordant. We therefore only report analyses which include estimates of missing data, where possible, to take advantage of the larger number of societies included. In total, 2.8% of the data was estimated. It is well known that most North Amerindian populations have experienced admixture from non-Indian gene pools. Although admixture, per se, should not drastically affect results of investigations of environmental constraints on gene frequency variation, recent admixture that results in disruption of equilibrium may have a n effect. In a n effort to accommodate such a n effect, societies judged to be highly admixed (i.e., frequency of r > .05) were deleted from the sample and the analyses repeated. This reduced, less-admixed data set is composed of 57 populations. GENETIC VARIATION AND CLIMATE The proportion of estimated data is slightly less (2.5%)than in the total data set. We take the data base to be representative of all published genetic studies of native North Americans. However, given the disappearance of many groups in recent history and the recency of genetic studies of Amerinds, the representativeness of such studies of all of native North America is unknown. Nevertheless, we have made every effort to use samples from populations still residing in or near their traditional homelands. For example, many eastern woodland populations were removed to reservations in the Midwest in recent historical times. Our samples for these groups come from populations still residing in the East rather than the western reservation groups. A list of populations, sample size, loci with missing values, and original references for the data used in this study may be found in Appendix I of Suarez et al. (1985a). Average per locus heterozygosity (h) was estimated under the assumption of HardyWeinberg equilbrium (Harpending and Chasko, 1976; Suarez et al., 1985b). Due to the decidedly non-normal distribution of h, we ranked the heterozygosities of each system separately and obtained normal scores by the method of Blom (1958; Suarez et aL, 1985b). An index of mean heterozygosity (H) is obtained by computing a n average of normal scores weighted by the number of systems observed. This is a measure of each population’s degree of heterozygosity relative to all others. Since h is computed only on non-missing data, sample sizes are 50 and 38 for the full and reduced data sets, respectively, when these variables are the unit of analysis. Climatic variation The second criterion for inclusion was the availability of meteorological data for the geographic location of each society. Climatic data for each locale were obtained from the Environmental Technical Applications Center (ETAC) technical library a t Scott Air Force Base, Illinois. Three classes of climatic variables were obtained from the World Wide Climatic Summaries: mean January through December monthly precipitation, mean monthly maximum temperature, mean monthly minimum temperature, and elevation. Mean hourly humidity by month was also recorded but underreporting of the humidity data from the majority of stations pre- 243 cluded its use in the analysis. Consequently, the monthly precipitation and temperatures were used to compute the mean and standard deviation of these values over a year. The resulting six variables and elevation constitute the environmental variables used in the analysis. Latitude, as used by Piazza et al. (19811, indirectly indexes more climatological variables than used here, but is insensitive to local climatic differences and the effects of elevation. We have therefore opted to use actual climatic variables rather than latitude to assess environmental heterogeneity. Statistical analysis Two primary analytical tools are employed to assess the relationship between heterozygosity and climatic variation: canonical correlation and principal component analysis. The aim of canonical correlation analysis is to derive linear functions of two vector variables such that the covariance between the linear functions is maximized. Extraction of maximally correlated linear functions from the two vector variables proceeds sequentially, subject to the restriction that each pair of canonical variates be orthogonal to all previously derived linear combinations (Cooley and Lohnes, 1971). The number of canonical variates extracted is limited by the lesser number of entries in either of the two vector variables. It is desirable to have fewer environmental variables than genetic variables in the canonical correlation analyses since we are interested in the environment as a limiting factor on genetic variation rather than the converse. For examination of the relationship between heterozygosity and environment, the environmental variables which contributed least to the first canonical variate were excluded from subsequent analyses. Unfortunately, the maximization of the covariance of the canonical variates does not guarantee that the variance within a vector variable is maximized. As a check on the consistency and reliability of the analysis, the allelic frequencies and environmental variables were subjected separately to principal component analysis. This assures that each linear function extracted is orthogonal to all others, and the amount of variance explained is maximized. This also encompasses somewhat more data since for ABO and Rh multiple alleles can be included rather than a single heterozygosity value for these D.H. O’ROURKE, B.K. SUAREZ, AND J.D. CROUSE 244 systems. The components extracted from each data base are then rotated to simple structure using varimax rotation (Harman, 1967). For each component with a n eigenvalue greater than 1.0 the factor score coefficients are used to compute new synthetic variables (component scores). These new variables are treated a s new vector variables representing genetic and environmental variability and subjected to canonical correlation analysis. All canonical correlation analyses were performed with the CANCORR routine in SPSS (Nie et al., 1975) on a Harris 500 computer, while the principal component analyses were done using the BMDP package (Dixon and Brown, 1979) on a n IBM 370. RESULTS The mean and standard deviation of the 11 allele frequencies and seven environmental variables are given in Table 1 for both the full and reduced samples. Brief examination of Table 1 reveals that reducing the data base by exclusion of highly admixed groups (based on frequency of r haplotype) alters the mean gene frequencies very little. Indeed, only for this haplotype does the frequency differ by as much as 0.02 between the data sets. The climatic variables are also very similar between the two data sets. The only noticeable difference is the approximately 2” increase in mean temperatures in the reduced sample relative to the full data set. This is a function of most of the societies being excluded for extreme admixture coming from areas north of Mexico. Overall, the pattern of values in Table 1 suggests that separate analyses based on the “admixed” and “less-admixed” data sets should be quite similar. Canonical correlation analysis When the normalized, ranked heterozygosities, and climatic variables are subjected to canonical correlation analysis, strong and significant associations between genetic heterozygosity and climate are revealed. Partial results are given in Table 2. For the full data set two highly significant canonical correlations (R,) are found between the two vector variables. The eigenvalues given in Table 2 are (R,? and reflect the shared variance between the linear functions. For the first pair of canonical variates for North America (Rc = .86), the proportion of variance shared approaches 75%, while for the second orthogonal pair (Rc = .69) the corresponding value is 48%. For the reduced data base only a single significant canonical correlation is obtained. At R, = .89, however, it is slightly greater than the correlation between the first canonical variates in the full data set and rep- TABLE 1. Means and standard deviations of gene frequencies and climatic variables Full data set (N = 74) Reduced data set ,8622 f ,1465 ,1192 f ,1346 .0577 f ,0543 .4895 f ,1367 ,3892 f ,1347 ,0307 f ,0608 .7417 f ,1055 ,3084 f ,1107 ,4680 ? ,1632 ,7260 f ,1296 ,0487 f ,0640 ,8816 k ,1416 3041 & .1318 ,0653 k ,0534 ,5054 + ,1452 ,3963 ,1415 ,0015 ? ,0069 ,7354 k ,1079 ,3013 k ,1138 ,4654 f ,1701 .7384 k ,1284 .0542 k .0714 2,061.2162 f 2313.3945 3.6765 I 3 . 7 8 7 0 2.3453 f 2.6391 64.8922 f 22.5899 12.3578 i 9.5508 44.0946 f 21.3624 4.4226 f 2.4478 2,134,5714 k 2,449.5623 4.1582 f 4.1835 2.7489 f 2.8699 66.8827 23.7880 10.8308 f 9.3881 46.3396 ? 22.1847 4.2674 f 2.3695 (N = 57) Gene frequencies 0 A R0 R’ R2 r M S P’ FY‘l D ia Climatic variables’ ELEV MPREC SDPREC MTMAX SDTMAX MTMIN SDTMIN ‘ELEV, elevation; MPREC, mean precipitation; SDPREC, standard deviation of precipitation; MTMAX, mean maximum temperature; SDTMAX, standard deviation of maximum temperature; MTMIN, mean minimum temperature; SDTMIN, standard deviation of minimum temperature. 245 GENETIC VARIATION AND CLIMATE TABLE 2. Canonical correlation analysis of heterozygosity for seven loci and seven climatic uariables' Data base Eigenvalue Full Reduced Canonical correlation ,7434 ,4821 ,7952 P value < ,001 ,8622 ,6944 ,8917 .a24 < .001 'Only statistically significant canonical correlations are retained in the table. TABLE 3. Sequential pattern of removal of per locus heterozygosity with highest contribution to first canonical variate Step No. Full data set Reduced data set 1 ABO (2)' Duffy (2) Diego (2) P (11 ABO (1) 22 3 4 P (1) Diego (1) s (1) 'No. in parentheses reflects No. of significant canonical correlations found with cumulative removal of highest loading heterozygosity. 'The precipitation variables and elevation did not contribute strongly to the first canonical variate and were deleted from the analysis in subsequent steps. resents nearly 80% of shared variance between the vector variables. Assessment of the relative contribution of each variable in a set to the canonical variate is achieved by examination of the magnitude of the canonical coefficients (the "loading" of the variable on the variate). By sequentially removing the variable with the highest canonical coefficient from the analysis it is possible to determine a subset of variables that contribute to a significant canonical correlation and a subset for which no significant association results. Table 3 presents the order of sequential and cumulative removal of those heterozygosities with the highest loadings on the first canonical variate in each analysis. The ABO locus had the highest loading on the first canonical variate in both data configurations. In addition, the Diego and P loci are seen to be major contributors t o significant correlations between heterozygosity and climate in both data arrangements. The Ss locus appears prominently in the reduced data set, while the Duffy locus contributes strongly in the full data base. However, in the full data set the variable with the highest loading after P was the Ss locus heterozygosity, while in the reduced data base, Duffy was the strongest contributor after Ss to the linear function of heterozygosity values. Thus, although the order of importance is slightly different between the two data sets, the same variables in both cases are associated with climatic variation. Of equal interest is the fact that only heterozygosity values for the Rh and MN systems do not appear to be at all associated with climate. The mean temperature variables have the greatest covariance with the heterozygosities utilized here, although in some data configurations the measures of temperature variation had moderately high canonical coefficients. Since both maximum and minimum mean temperatures contributed strongly in most analyses, temperature range may be the important criterion accounting for the association between genetic heterozygosity and climatic variation in these data. The difference in order of loading of genetic variables between the two data sets may be the result of two factors. First, it may reflect differences in heterozygosities due t o the extent of admixture in groups in the full data set. Alternatively, since the heterozygosity data are based only on observed frequencies (i.e., estimated gene frequencies are not used) the reduction in number of groups to 38 in the reduced sample may contribute to this difference. This latter problem may be overcome by using gene frequency data where, with missing values estimated, the sample sizes in the two data bases increase to 74 and 57 societies. Moreover, since allele frequencies and heterozygosity estimates at a locus are not linearly or monotonically related, examination of both genetic variable sets may reveal patterns of variation or associations in one set not apparent in the other. Utilization of allelic frequencies also increases the number of genetic variables from 7 to 11, eliminating the need to base the analysis on a reduced number of climatic variables. The results of this analysis are given in Table 4. TABLE 4. Canonical correlation analysis of heterozygosity for 11 allele frequencies and seven climatic variables' Data base Eigenvalue Full Reduced ,7200 ,4924 ,7860 Canonical correlation ,8485 ,7017 .a866 P value < ,001 .005 < ,001 'Oiily statistically significant canonical correlations are retained in the Table. 246 D.H. O'ROURKE, B.K. SUAREZ, AND J.D. CROUSE Once again, for the full data base two pairs of canonical variates are extracted from the vector variables. The correlation between the first pair of canonical variates approaches .85, representing 72% of shared variance. The second significant canonical correlation (Rc = .70) represents a shared variance of nearly 50%. For the reduced data set only a single canonical correlation is found (Rc = .89). These results are very similar to those obtained in the analysis of heterozygosities. Moreover, sequential removal of gene frequencies from the analysis to identify the subset most associated with climatic variation shows additional similarities to the previous analysis. In addition to alleles A, S, FF, and P', the R' and R2 haplotypes of Rh were also found to be associated with climate in both data sets. The only difference in order of importance of these frequencies with respect to climate in the two data sets is that Duffy is more strongly associated with climate than P1 in the full data set while the opposite is true in the reduced set. Heterozygosity at the Diego locus was strongly associated with climate in the previous analysis but is only moderately associated when gene frequencies are used. Although the major difference in the two data sets is in the fre- quency of r, in neither data set did this chromosomal segment show any association with climate. This suggests that these results faithfully reflect a true correspondence between patterns of genetic variation at specific loci and geographic distribution of climate; not a n artifact of European admixture in Amerindian populations. Examination of bivariate correlations (not shown) of allele frequencies with climatic variables reveal that only the MN locus was not consistently associated with one or more of the climatic variables. If, as has been suggested, genes evolve as correlated units (Lewontin, 1965; Franklin and Lewontin, 1970; Allard and Kahler, 1972), "it is important to delineate these polymorphic units, rather than evaluate possible selection on individual genes" (Bryant, 1974:2). Principal component analysis In order to maximize the proportion of variation accounted for within each data set, and to identify those loci that form correlated units, principal component analysis (PCA) with varimax rotation to simple structure was undertaken for each data configuration. Table 5 summarizes the PCA for the full complement of North American data. For the TABLE 5. Rotated factor loadings for eleven allele frequencies and seven climatic Variables for full data set' Variable 1 2 Component 3 4 5 Alleles 0 A FY" p' S R' R" M D in R0 r % Variance Explained Climatic variables SDTMIN SDTMAX MTMIN MTMAX SDPREC MPREC ELEV Q Variance Explained ,957 - ,946 ,362 -.775 ,689 .652 - ,495 ,253 ,968 -.796 .895 -586 ,275 ,929 - ,407 ,497 - ,309 28.81 19.20 11.78 -.975 ,964 ,940 ,906 .736 ,266 ,496 p.619 ,816 67.10 18.46 ,683 'Component scores less than .25 have been omitted. See text for discussion. - .556 9.83 9.25 247 GENETIC VARIATION AND CLIMATE TABLE 6. Bivariate correlations between allelic frequency and climatic principal components for both data configurations' Gc1 Full data set cc 1 cc2 Reduced data set cc1 cc2 .5557* ,0064 .6627* ,0264 Gc2 GC3 .3493* ,0577 .3334* - .0547 GC4 GC5 .2894* ,0482 -.2432* -.0049 ,2011 ,1678 ,1944 - .2329 - ,1447 ,0111 'W, genetic component; CC, climatic component *P < .05. TABLE 7. Canonical correlation analysis of genetic and climatic component scores for both data configurations Data base Eigenvalue Full Reduced .6228 ,6440 Canonical correlation ,7892 ,8025 P value < ,001 < ,001 allelic frequency data, five components with eigenvalues greater than 1.0 were found to account for 78.87% of the variation in the original data. Component 1 is characterized by high loadings for the ABO locus and moderate loadings for Ss, Diego, and r of Rh; component 2 by Duffy, P, and Ss with a moderate contribution from R2 and r; component 3 by the R', R', and r haplotypes of R h component 4 by loci MN,Diego, and a moderate loading by Duffy; component 5 is dominated by RO and r of Rh. All climatic variables load on the first principal component. However, component 2 is characterized by elevation and moderate loadings for mean precipitation and the standard deviation of precipitation. Bivariate correlations between allelic frequency component scores and component scores for the climatic variables are seen in Table 6. Fully four of the five genetic components have significant correlations with climate (CC 1) in the full data base. Only genetic component 5, characterized by Ro and r, is not strongly correlated with the climate factors. This is interesting since examination of the bivariate correlations between the original variables has already revealed that the MN locus has little if any association with these climatic variables. That component 4, which is dominated by this locus, is significantly correlated with the first cli- matic component suggests that it is the other variables that load on this component which result in the correlation; namely, Duffy and Diego. This is consistent with the earlier results. When the component scores are subjected to canonical correlation analysis, a significant Rc of .79 is obtained (Table 7). Th'is accounts for a shared variance of 62%between linear functions of the component scores for the genetic and climatic variables. When the first component of the genetic data (characterized by loci ABO, Ss, and to some degree by Diego and r) is removed, the Rc (= 5 7 ) remains significant (P < .001). Indeed, the first two components may be removed from the analysis and a n R, of .44 still remains significant at the .05 level. The PCA for the reduced data set differs only slightly. Here (Table 81, four components from the genetic data are found with eigenvalues greater than 1.0, accounting for 72.47% of the variation in the original data. Characterization of the components may be seen as component 1,ABO and S, with moderate loadings from Diego, Duffy and PI; component 2, R 1 , R2, and contributions from S , Duffy, P1, and r; component 3, Ro, r, PI, and D a y ; component 4, M, Diego, r, and moderate contributions from Duffy. The PCA for the climatic variables of the reduced data base is virtually identical to that seen for the full data set. Canonical correlation analysis of the genetic and climatic component scores for the reduced data base (Table 7) resulted in a significant correlation of .80. Removal of the components 1and 2 from the analysis results in a nonsignificant correlation between the two vector variables. Here the ABO, Ss, and D a y loci and R1 and R2 of Rh, which dominate the first two components, may be con- 248 D.H. O'ROURKE, B.K. SUAREZ, AND J.D. CROUSE TABLE 8. Roto.ted factor loadings for 11 allele frequencies and seven climatic variables for the reduced data set' Component Variable 1 2 3 ,387 .364 4 Alleles 0 ,957 A S - ,952 557 ,960 - ,918 R' RZ RO ,880 ,869 M Din ,292 FY" - ,388 P ,348 r - ,559 ,425 ,383 -.275 - ,404 ,286 ,451 - .474 .479 17.17 12.32 10.72 .- % Variance Explained Climatic variables SDTMIN SDTMAX MTMIN MTMAX SDPREC ELEV MPREC % Variance Explained 32.26 - ,978 - ,971 ,934 ,913 ,669 ,316 ,618 66.69 ,592 - ,800 ,685 19.25 'Loadings less than 2 5 have been omitted. See text for discussion. sidered strongly associated with climate. Moreover, although P has its highest loading on component 3, which is not significantly associated with climate (see Table 6) and does not contribute to the significant canonical correlations, it also has high loadings on the first two components and was found to be strongly associated with patterns of climatic variation in the earlier analysis. Polymorphism at this locus, then, may also be influenced by constraints imposed by the environment. DISCUSSION The explication of polymorphism for individual loci has had a long and controversial history, with adherents to both major schools of thought: selection (e.g., Burns and Johnson, 1971; Ayala, 1972; Stebbins and Lewontin, 1972) and neutral mutation (e.g., Kimura, 1968; King and Jukes, 1969; Kimura and Maruyama, 1971). While selection through a variety of vectors has been proffered as an explanation of polymorphism for a few loci in humans (e.g., ABO with maternal-fetal incompatibility and disease, Morton et al., 1966; Brues, 1954, 1963; Levine, 1943; G6PD deficiency and malaria, Siniscalco et al., 1961; the effects of maternal fetal incompatibility on the Rh system, Levine, 1943; and perhaps haptoglobin and disease, Eaton et al., 1982), only for the hemoglobins has a direct selective effect of the environment been satisfactorily demonstrated (e.g., Allison, 1954; Livingstone, 1967). In the present work, several polymorphic loci have been examined relative to climatic variation to determine whether or not individual Amerindian populations occupy the same relative positions in two different measurement spaces: one genetic and the other climatic. Both the ABO and Duffy loci are consistently associated with climate in all analyses. These loci were also found by Piazza et al. (1981) to be strongly correlated with distance from the equator. Polymorphism a t the Ss and P loci also appear strongly associated with climatic variation. The relationship of heterozygosity a t the Diego locus to climate is somewhat more problematic. Although it appeared to be strongly associated with climatic variables in the analysis of heterozygosity, it was only marginally associated with climate in the analysis of gene frequencies. In addition, the Diego locus usually does not load highly on 249 GENETIC VARIATION AND CLIMATE any one single component in PCA. Rather, it generally loads moderately on two or three components; and always in association with ABO. Moreover, Wilson and Franklin (1968) reported a strong association with climate for Diego frequencies in Amerindians. The present demonstration of significant association between genetic polymorphism and climatic variation does not necessarily imply a causal relationship. A confounding factor here is the collinearity of the north-tosouth migration of the original colonists and climatic gradients (i.e., there is a pronounced north-south latitudinal component to climatic variation). Indeed, latitude may be substituted for the climatic variables in these analyses with little change in the overall results. The major difference observed when substituting latitude for the climatic variables, is a stronger contribution of the Rh phenotypes, particularly r in the full data set. This would appear to reflect the presence of greater admixture and suggest that, since this is not the case when climatic variables are used, the demonstrated association between genetic polymorphism and climatic variation is real and not illusory. Further, when true climatic variables are used in the analysis, only few and minor differences are found between the full data base and the less-admixed sample. It should be noted that this is the case whether only observed values are used or whether missing data have been estimated. An additional factor is a n earlier suggestion that degree of heterozygosity is influenced by level of sociocultural integration (Beak and Kelso, 1974). We have demonstrated (Suarez et al., 198513)that an association between cultural level and genetic heterozygosity was lacking in these data when latitude was controlled for. Finally, the collinearity of the north-tosouth migration of early Amerindians and climate may confound these results. However, generation of synthetic gene frequency maps (Suarez et al., 1985a)from North Amerindian gene frequency data provided little evidence for patterns of gene frequency variation concordant with migration patterns of much antiquity. The autocorrelation of migration, climate, and latitude may confound these analyses with only a slight augmentation of the correlations between genetic variation and climate. Thus, the results of the present analyses are that much more striking. These results indicate that the effects of systematic pressures acting on human gene pools may be detected by adopting a broad continental perspective of gene frequency variation. Whether this systematic pressure is the result of ancient, long-range migrations or selection through environmental constraints remains to be completely determined. SUMMARY 1. Seven polymorphic loci in 74 North Amerindian populations are examined for covariation with seven climatic variables. 2. Significant association between four genetic loci (-0, Ss, D&y, and P) and the constellation of climatic variables is demonstrated. 3. The Diego locus also appeared correlated with climatic variation in some analyses but not in others. More data and further work are required to clarify this association. 4. The climatic variables used here are crude and incomplete indices of either climatic variation or environmental heterogeneity. That strong and significant associations between the genetic and climatic domains were nonetheless found suggests that the associations are real, and that the systematic pressure on these loci may be substantial. ACKNOWLEDGMENTS We gratefully acknowledge the assistance of Mr. Wayne McCullom and Staff Sgt. George Elder of the ETAC Technical Library, Scott Air Force Base, Illinois. We have benefitted from discussions with Dr. E.J.E. Szathmary, and Dr. F. Auger graciously provided unpublished gene frequencies for the James Bay Cree. We wish to express our gratitude to Dr. T.R. Przybeck for his generous programming help. This work was supported in part by MH 31302 and MH 14677 from the United States Public Health Service and by a Faculty Development Grant from the Research Committee of the University of Utah. LITERATURE CITED Allard, RW, and Kahler, AL (1972)Patterns of molecular variation in plant populations. Roc. Sixth Berkeley Symp. Math. Stat. Rob. 5:237-254. Allison, AC (1954)The distributionof the sicklecell trait in East Africa and elsewhere, and its apparent relationship to the incidence of subtertian malaria. Trans. R. SIX.Trop. Med. Hyg. 48:312-318. Ananthakrishnan, R, and Walter, H (1972) Some notes on the geographical distribution of the human red cell acid phosphatase phenotypes. Humangenetik 15:177181. 250 D.H. O’ROURKE, B.K. SUAREZ, AND J.D. GROUSE Ayala, FJ (1972) Darwinian uersus non-Darwinian evolution in natural populations of Drosophila. Proc. Sixth Berkeley Symp. Math. Stat. Prob. 5211-236. Band, HT (1972) Minor climatic shifts and genetic changes in a natural population of Drosophila rnelanoguster. Am. Nat. 106:102-115. Beals, KL, and Kelso, AJ (1975) Genetic variation and cultural evolution. Am. Anthropol. 77566-579. Blom, G (1958) Statistical Estimates and Transformed Beta Variables. New York: John Wiley and Sons, Inc. Brues, AM (1954) Selection and polymorphism in the ABO blood groups. Am. J. Phys. Anthrop. 12:559-597. Brues, AM (1963)Stochastic tests of selection i n the ABO blood groups. Am. J. Phys. Anthrop. 21 287-299. Bryant, E H (1974) On the adaptive significance of enzyme polymorphisms i n relation to environmental variability. Am. Nat. 108:l-19. Burns, JM, and Johnson, FM (1971) Esterase polymorphism i n the butterfly Herniargus isola: Stability in a variable environment. Proc. Natl. Acad. Sci. USA 68:34-37. Cooley, WW, and Lohnes, PR (1971) Multivariate Data Analysis. New York John Wiley and Sons, Inc. Dixon, WJ, and Brown, MD (1979) BMDP: Biomedical Computer Programs. Los Angeles: Univ. of California Press. Brandt, P, Mahoney, JR, and Lee, JT, Jr. Eaton, JW, (1982) Haptoglobin: A natural bacteriostat. Science 215:691-693. Franklin, I, and Lewontin, RC (1970)Is the gene the unit of selection? Genetics 15:707-734. Harman, HH (1967) Modern Factor Analysis, 2nd edition. Chicago: Univ. of Chicago Press. Harpending, H (1974) Genetic structure of small populations. Ann. Rev. Anthropol. 3:229-343. Harpending, H and Chasko, W, Jr. (1976)Heterozygosity and population structure in Southern Africa. In E Giles and J S Friedlaender (eds): The Measures of Man. Cambridge, MA: Peabody Museum Press, pp. 214-229. Johnson, GB (1973) Relationship of enzyme polymorphism to species diversity. Nature 242:193-194. Kimura, M (1968) Genetic variability maintained in a finite population due to mutational production of neutral and nearly neutral isoalleles. Genet. Res. 11:247269. Kimura, M, and Maruyama, T (1971) Patterns of neutral polymorphism in a geopaphically structured population. Genet. Res. 18:125-131. King, SL, and Jukes, TH (1969) Non-Darwinian evolution. Science 164:788-798. Koehn, RK, and Mitton, JB (1972) Population genetics of marine pelecypods. I. Ecological heterogeneity and evolutionary strategy at a n enzyme locus. Am. Nat. 106:47-56. Latter, DBH (1980) Genetic differences within and between populations of the major human subgroups. Am. Nat. 116:220-237. Lewontin, RC (1965) Selection in and of populations. I n JA Moore (ed): Ideas in Modern Biology. New York Natural History Press, pp. 299-310. Lewontin, RC (1972)The apportionment of human diversity. Evol. Biol. 6:381-398. Levine, P (1943) Serological factors as possible causes in spontaneous abortions. J. Hered. 34:71-80. Levins, R (1965) Theory of fitness i n a heterogeneous environment. V. Optimal genetic systems. Genetics 52:891-904. Levins, R (1968) Evolution in Changing Environments. Princeton: Princeton Univ. Press. Livingstone, FB (1967) Abnormal Hemoglobins in Human Populations. Chicago: Aldine. McLeod, MS, Hornbach, DS, Guttman, SI, Way, CM, and Burky, AS (1981) Environmental heterogeneity, genetic polymorphism, and reproductive strategies. Am. Nat. 118:129-134. Mitton, JB (1977) Genetic differentiation of races of man as judged by single-locus and multilocus analyses. Am. Nat. 111:203-212. Morton, NE, Krieger, H, and Mi, MP (1966) Natural selection and polymorphisms in northeastern Brazil. Am. J. Hum. Genet. 18:153-171. Mourant, AE, Kopec, AC, and Domaniewska-Sobczak, K (1976) The Distribution of the Human Blood Groups and Other Polymorphisms. London: Oxford Univ. Press. Nei, M, and Roychoudhury, AK (1972) Gene differences between Caucasian, Negro, and Japanese populations. Science 197:434-436. Nei, M, and Roychoudhury, AK (1974) Genetic variation within and between the three major races of Man, Caucasoids, Negroids, and Mongoloids. Am. J. Hum. Genet. 26:421-443. Nie, NH, Hull, CH, Jenkins, JG, Steinbrenner, K, and Bent, DH (1975) Statistical Package for the Social Sciences. New York: McGraw-Hill, Inc. Piazza, A, Menozzi, P, and Cavalli-Sforza, LL (1981) Synthetic gene frequency maps of man and selective effects of climate. Proc. Natl. Acad. Sci. USA 78:26382642. Roberts, DF (1978) Climate and human variability. Menlo Park, C A Cummings. Siniscalco, M, Bernini, L, Latte, B, and Motulsky, AG (1961) Favism and thalassemia in Sardinia and their relationship to malaria. Nature 190:1179-1180. Smith, GR, and Koehn, RK (1971) Phyletic and cladistic studies of biochemical and morphological characteristics of Catostornus. Syst. Zool. 20282-297. Stebbins, GL, and Lewontin, RC (1972) Comparative evolution a t the levels of molecules, organisms, and populations. Proc. Sixth Berkeley Symp. Math. Stat. Prob. 523-42. Suarez, BK, Crouse, JD, and O’Rourke, DH (1985a) Genetic variation in North Amerindian populations: The geography of gene frequencies. Am. J. Phys. Anthropol. 67r217-232. Suarez, BK, O’Rourke, DH, and Crouse, JD (1985b) Genetic variation in North Amerindian populations: Association with sociocultural complexity. Am. J. Phys. Anthropol. 67:233-239. Weiss, KM, and Maruyama, T (1976) Archaeology, population genetics and studies of human racial ancestry. Am. J. Phys. Anthropol. 44:31-49. Wilson, WP, and Franklin, IR (1968) The distribution of the Diego blood group and its relationship to climate. Carib. J. Sci. 8:l-13.