Calculation of the Maximum Amount of Gene Admixture in a Hybrid Population EMOKE J. E. SZATHMARY AND T. EDWARD REED Department ofAnthropotogy, McMaster University, 1280 Main Street West, Hamilton, Ontario, Canada, L8S 4L9 and Departments ofZoology and Anthropology, brniversity of Toronto. Toronto, Ontario, Canada, M5S 1Al 1 KEY WORDS Maximum admixture Gene flow . Canadian Indians - Evidence is presented to show that “Caucasian” genes fB, K, ABSTRACT Luu, r, AK’, P c , andGm3,5,’1) in hybrid North American Indian populations follow a Poisson distribution. A method of determining the maximum amount of admixture, given an observed count of Caucasian genes, is developed. Establishment of the upper limit of admixture is suggested as the preferred estimate of gene flow in situations where absence of specific genes a t particular loci precludes the calculation of a mean admixture estimate. The estimation of the proportion of genes obtained from two ancestral populations in a hybrid sample requires a knowledge of the gene frequencies of each of the two parental populations as well as the hybrids. In Amerindian studies the gene frequencies of the ancestral Indians are unknown, and must be inferred from the hybrid data. One solution to this problem is to partition the modern Indian population under investigation into groups that differ in their amounts of known nonIndian ancestry (e.g., Pollitzer e t al., ’62). Several different admixture estimation methods, ranging from least squares (Roberts and Hiorns, ’621, and maximum likelihood (Krieger e t al., ’65; Elston, ’71) to probability of gene identity (Chakraborty, ’75) have been used on such data. Criticism has already been made about some of these approaches (Reed, ’69; Chakraborty, ’75). A more basic issue is the nature of the data to which the estimation techniques are applied. That is, partitioning a sample by degrees of known ancestry depends not just on having genealogical records readily available, but on the supposition that these are of sufficient time depth to be accurate and are free of recording error. This is clearly not the case for very many Amerindian groups. In Canada, for example, tribal registries listing degrees of Indian ancestry are not kept because the legal definition of “Indian-ness” does not rest on such AM. J. PHYS. ANTHROP. (1978)48: 29-34. information. An alternative approach in such situations is to dispense with subdividing a sample into “mixed” and “unmixed’ categories and to consider only those alleles in admixture estimation that were absent in Indian populations prior to non-Indian contact. There is accumulated evidence from very many North American Indian groups that indicates that in these populations at least eight different alleles are now present as a consequence of Caucasian Le., European) gene flow (Szathmary and Reed, ’72). Unfortunately all of these alleles will not always be found in a hybrid population sample. When only blood group data are available the gene most commonly present is r, while either ha andlor K may be absent. Such a distribution of Caucasian alleles poses problems in the determination of the magnitude of gene flow. Foremost is the fact that the lack of observation of a specific gene precludes the calcu.&ation of a mean admixture estimate (MI. M is a weighted mean, the 1 weighting factor being - where siZ is the si* variance of the ith admixture estimate (Szathmary and Reed, ’72; Cavalli-Sforza and Bodmer, ’71). Some authors get around this problem by calculating single locus admixture estimates (M,) only, without attempting to derive a mean (e.g., Doeblin and Mohn, ‘67); others select one allele and consider that to be 29 30 EMOKE J. E. SZATHMARY AND T. EDWARD REED TABLE 1 Frequencies of eight "Caucasian " alleles in Western Europeans Allele Frequency (p) ' A2 0.0647 0.0678 0.0457 0.0356 0.4048 0.0470 0.0500 0.6650 0.1726 B K Lua r PC AK Gm3 5 I 1 Mean frequency of Caucasian alleles 1 Source: (Szathmary and Reed, '72). indicative of the possible amount of Caucasian gene flow (e.g., Allen and Corcoran, '60). As very rough estimates of admixture, or as demonstrations of admixture of undefined magnitude such methods may be permissible. However, it would be preferable to be able to establish with a stated degree of confidence the range of possible admixture, even if a mean value cannot be calculated. This paper suggests a method whereby the maximum amount of detectable gene flow can be established & those situations wherein the calculation of M is impossible. METHODS AND RESULTS The occurrence of individual Caucasian genes in a North American Indian group could be considered t o be rare events if admixture has been low. If this is so, their incidence could be described by the Poisson distribution. The upper confidence limits for expected Poisson variables given an observed count are available (Pearson and Hartley, '621, and could be used to calculate the upper limit of admixture in any Indian population. The fundamental hypothesis in this argument is that Caucasian genes are distributed in these populations as Poisson variates. To test this, a sample of 105 Ojibwa (Szathmary and Reed, '72) from Wikwemikong, Ontario, Canada, were examined. Each individual was regarded as a sampling unit within which 0 t o 14 rare events (i.e., Caucasian genes) could occur a t 7 different loci. This method requires that Caucasian genes be identifiable in the phenotype. This poses no AK2 and P c problems for the genes Gm3.5.11, which are recognizable, but presents difficulties for their blood group counterparts. For example, the phenotypes A2, B, Lu(a 1, + and K ( + ) could be heterozygous or homozygous in a genotype, respectively, while the phenotypes A, and CcDEe could containA2 (in A '/A2)and r (in R Vr), respectively. To bypass this difficulty calculations were made to determine the probability of homozygous combinations of each of the Caucasian genesA2, B, Lu a, K and r and the heterozygous combinations of A1/A2and RVr. The results showed (table 2) that none of the alleles were likely present homozygously; hence each A2, B, Lu(a+) and K ( + ) phenotype included a single Caucasian allele, respectively. It further seemed improbable that any of the CcDEe phenotypes contained r in combination with R z . Of the A, phenotypes, however, a t least one was likely to contain a n A 2 allele. This single A 2 allele in turn could either be the only, or the second or the third Caucasian gene in any one individual. Because no known method exists to distinguish homozygous from heterozygous forms of Al, it was not possible to determine which A, phenotype containedA2.Since our method demands that the genes be counted, and because we could not do so with A2, this allele had to be eliminated from subsequent calculations. Once the number of Caucasian genes in each phenotype was established, the total number of these genes (B, r, K, Lua, Gm3.5.11, AK2 and P c ) were counted in each of the 105 sampling units. The mean number of these genes per unit (i.e., per individual) was 0.83, and the sample variance was 0.64. The variance test (Snedecor and Cochran, '67) showed no significant difference between the observed and Poisson variances (x2,04= 80.8, p > 0.95). To determine goodness-of-fit to the Poisson distribution, the expected numbers in classes with 3 counts or more were first pooled to yield a minimum class value of 5. The fit of the data to the expected Poisson distribution was then calculated, and found to be good (xZ3 = 2.85, p > 0.25). These findings support the basic hypothesis that Caucasian genes in a hybrid Indian population follow the Poisson distribution. The next step in the analysis was to determine the upper limit of the amount of admixture detectable in an Indian population. Since our approach is meant to be used specifically in those instances in which 0 Caucasian alleles occur a t a particular locus, data from another Indian population had to be employed. For this purpose the Pikangikum 31 CALCULATION OF THE MAXIMUM AMOUNT OF ADMIXTURE TABLE 2 Probability of hornozygous and heterozygous combinations of specific “Caucasian” blood group genes in the Wikwernikong Ojibwa Probability of genotype Aliele Frequency in Wikwemikong A= A‘ B 0.039 0.133 0.044 0.010 0.010 0.100 0.043 K Lua r RZ Homozygote Expected no. of indicated genotype ( N = 105) Heternzygote 0.00153 - - 0.01037IA’/A21 - 0.00860(Rzlr) 0.00194 0.00010 0.00010 0.01000 0.1597 1.0920 0.2033 0.0105 0.0105 1.0500 0.9030 - ’ - The cde frh) phenotype was not observed in the sample. Ojibwa were selected, for in this group 7 of the 8 Caucasian genes under consideration were absent. OnlyA2was found, and then only in a single individual (Szathmary and Reed, ’72). The use of the Pikangikum data rested on two assumptions. Firstly, that if 7 Caucasian alleles were distributed as Poisson variates in one hybrid Indian population, they would be so distributed in any such population. Secondly, that if 7 Caucasian genes followed a Poisson distribution, the eighth possible allele, A2 would behave in similar fashion. Neither of these assumptions could be tested directly in the Pikangikum Ojibwa. For any observed frequency or count of a Poisson variable, the upper and lower confidence limits of its expectations can be found (Pearson and Hartley, ’62). The upper 99% confidence limit for an observed count of 1 is 7.43. Applied to the Pikangikum data, this meant that for the one observed A2 allele, a s many as 7.43 Caucasian genes could be expected. The lower 99%confidence limit for an observed count of 1was given as 0.0253. However, in order to take into account the information from all other loci a t which no Caucasian genes were observed, the lower limit of 0 observed counts (0.0000) should set the minimum number of foreign genes that have entered the Pikangikum gene pool. In other words, in a situation where counts are made a t some loci but not a t others, one can be 99% (or 95%, dependent on the confidence limits desired) sure of the upper boundary since that depends upon the number of observed genes. The lower boundary, however, must be set at 0. The total number of loci examined for the presence of Caucasian markers in the Pikangikum Ojibwa was 1,330 (table 3). The ra- TABLE 3 Total number of genes “examined” in the Pikangikurn Ojibwa Allele Number of persons tested Number of genes examined A= 95 190 96 96 96 91 91 100 192 192 192 182 182 200 1,330 B r K Lua PC AK Gm 3.5 I 1 Total tio of the maximum expected number of Caucasian genes to the total number of genes examined (7.43/1,330) yielded the maximum Caucasian gene frequency of 0.0056 in this population. The mean gene frequency of the Caucasian markers used was calculated to be 0.1726 (table 1). By the ratio method (0’0056) -the max0.1726 imum amount of admixture in the Pikangikum Ojibwa was 3.24%. DISCUSSION In the study of the genetic characteristics of American Indian populations, it is almost inevitable that some evidence of Caucasian gene flow will be found. The perplexing question always is the method whereby an estimate of the magnitude of this gene flow can be made. In the past we have advocated an approach that includes testing for betweenlocus heterogeneity and which considers the different amounts of information provided by various loci (Szathmary and Reed, ’72). This method is still recommended whether or not 32 EMOKE J. E. SZATHMARY AND T. EDWARD REED TABLE 4 Maximum Caucasian admkfure (M,,& LIZ several North American indian populations ' No. of loci at Population Mmax Pikangikum Ojibwa Blood Stoney Naskapi Montagnais 0.024 0.308 0.339 0.156 0.123 Caucanian alleles found which Caucasian alleles could occur 7 4 4 4 4 A' B andr r B andr B ' Maximum 18 based on the upper 95%confidence limit (see text) detailed genealogical information is available for the population concerned. In the absence of such data, however, the single locus approach, using only those alleles which were absent prior to Caucasian contact is the only estimation procedure possible. Studies that have used this method in calculating admixture have sometimes reported the absence of specific Caucasian markers. When only blood groups were considered, Lu a andlor K were most often absent while r was almost always found. This unequal distribution of Caucasian genes has been interpreted by some as evidence of genetic drift (e.g., Allen and Corcoran, '60) or possibly, of natural selection. However, if it is assumed that gene flow is the only microevolutionary force affecting gene frequencies, the appearance of low frequency Caucasian genes (low in the parental Caucasian population) in a hybrid sample will be a function of both sample size and magnitude of admixture. For example, when gene flow has been low (e.g., 5%)and the sample size is small (e.g., 75 persons) r and Gm3 are most likely to be the only Caucasian genes found. At the same level of admixture, but with a sample size of 300 persons, the expected number of each Caucasian gene is a t least 1.0. Unfortunately, samples as large as 300 are rare in North American Indian genetic studies. Therefore, one should expect studies which report the absence of one or more low frequency Caucasian genes in the presence of r and G m 3 5 Such a distribution of Caucasian alleles poses problems in determination of the magnitude of gene flow. Foremost is that the method for the calculation of the mean amount of admixture does not allow the inclu- Source Szathmary and Reed, '72 C h o w and Lewis, '53 Chown and Lewis, '55 Blumberg et al., '64 Blumberg et al., '64 All calculations of M,, are those of the authors. sion of 0 value Mi estimates. No satisfactory approach has been presented to date that would circumvent this statistical obstacle. Our method proposes both to take account of the information provided by the absence of Caucasian markers at specific loci, and to define the upper limit of the magnitude of admixture that can be expected to have occurred given the observed data. I t is worth emphasizing that our method is most useful when the magnitude of gene flow has been small. In such situations 0 value Mi estimates will occur, and low sample size serves to increase this likelihood. The method is applicable even when no Caucasian genes are observed at any locus. By way of illustration of its general utility, table 4 lists five Canadian Indian populations for which previously no statement could be made about admixture, other than that it had occurred. In each of these cases, the true amount of gene flow is probably much less than the maximum shown. Nevertheless this value gives an indication of the greatest amount of admixture that may have affected the gene pool of each of these populations. LITERATURE CITED Allen, F. H., and P. A. Corcoran 1960 Blood groups of the Penobscot Indians. Am. J. Phys. Anthrop., 18: 109-114. Blumberg, B. S.,J. R. Martin, F. H. Allen, J. L. Weiner, E. M. Vitaglioni and E. Cooke 1964 Blood groups of the Naskapi and Montagnais Indians of Schefferville, Quebec. Hum. Biol., 36: 263-272. Cavalli-Sforza,L. L., and W. F. Bodmer 1971 The Genetics of Human Populations. W. H. Freeman and Co., San Francisco. Chakraborty, R. 1975 Estimation of race admixture - a new method. Am. J. Phys. Anthrop., 42: 507-511. Chown, B., and M. Lewis 1953 The ABO, MNSs, P, Rh, Lutheran Kell, Lewis, Duffy and Kidd blood groups and the secretor status of the Blackfoot Indiana of Alberta, Canada. Am. J. Phys. Anthrop., 11: 369-383. CALCULATION OF THE MAXIMUM AMOUNT OF ADMIXTURE 1955 The blood group and secretor genes of the Stoney and Sarcee Indians of Alberta, Canada. Am. J. Phys. Anthrop., 13: 181-190. Doeblin, T. D., and J. F. Mohn 1967 The blood ~ ~ O U PofSthe Seneca Indians. Am. J. Hum. Genet., 19: 700-712. Elston, R. C. 1971 The estimation of admixture in racial hybrids. Ann. Hum. Genet., 35: 9-17. Krieger, H., N. E. Morton, M. P. Mi, E. Azevedo, A. FreireMaia and N. Yasuda 1965 Racial admixture in northeastern Brazil. Ann. Hum. Genet., 29: 113-125. Pearson, E. S., and H. 0. Hartley 1962 Biometrika Tables for Statisticians. Cambridge University Press, London. Pollitzer, W. S., R. C. Hartmann, H. Moore, R. E. Rosen- 33 field, H. Smith, S. Hakim, P. J. Schmidt and W. C. Leyshon 1962 Blood types of the Cherokee Indians. Am. J. Phye. Anthrop., 20: 33-43. Reed, T. E. 1969 Caucasian genes in American Negroes. Science, 165: 762-770. Roberta, D. F., and R. W. Hiorns 1962 The dynamics of racial intermixture. Am. J. Hum. Genet., 14: 261-277. Snedecor, George W.,and William G. Cochran 1967 Statistical Methods. Sixth ed. Iowa State University, Ames, Iowa. Szathmary. E. J. E., and T. E. Reed 1972 Caucasian admixture in two Ojibwa Indian communities in Ontario. Hum. Biol., 44: 655-671.