AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 70:63-68(1986) Primate Population Structure: Evaluation of Models HENRY HARPENDING AND SUZANNE COWAN Department of Anthropology, Pennsylvania State University, University Park, Pennsylvania 16802 (H. H.);Department of Anthropology, University of New Mexico, Albuquerque, New Mexico 87131 (S.C.) KEY WORDS Migration, Effective size, Gene frequencies, Breeding structure ABSTRACT Genetic markers among macaques on Cay0 Santiago island were analyzed in an attempt to infer aspects of mating structure. Several models that included high levels of gene flow among groups could not be distinguished, but the data are clearly incompatible with group endogamy and with high variance in male fitness. Drift effective size is approximately one half census size in this population. There are a large number of very detailed studies of the local distribution of genetic markers in human populations as well as in populations of other species (for review, see Jorde, 1980). Since relatively inexpensive methods have become available for typing genetic markers, the amount of such data appearing in the literature seems to have increased. The demographic and ethological information accompanying these data vary widely in quality, with the best information being provided about human populations where there are often high quality records of births, matings, and migration. Three sorts of rationale appear in the literature for doing this kind of work. The first is from those who seek the signal of natural selection among the noises of local genetic drift. This effort has not produced much convincing evidence of natural selection, especially at the level of differences among local populations, largely because gene frequencies and environmental variables are spatially autocorrelated and statistical inference is very difficult and weak for such variables. A second theme in the literature is to use the distribution of markers to reconstruct population history (for review, see Felsenstein, 1982). While molecular taxonomy and dating have been very valuable and useful for studying taxa above the species level over long (i.e., evolutionary) time periods, they have not been very convincing for intraspecific work because time, effective population size, and migration rates are inevitably confounded. Finally, there is the effort to use local markers to measure aspects of population ethology. The hope here is that markers will provide objective chemical evidence of 0 1986 ALAN R. LISS, INC. social processes and that inference about these processes can be made by examining the distribution of chemical markers over space and over social groups. In this article we pursue the third tradition of using chemical markers as social indicators. As Felsenstein (1982) remarks in his review article, human geneticists have always had higher quality data than the zoologists have had, since people can talk, and their methods have been somewhat more sophisticated. We propose here to apply some current methods from human genetics to data that have been collected about a free-ranging primate population. Our interest is to see whether contemporary population genetic theory can inform us about aspects of the population structure of these animals that are not already well known. We analyze the data on gene frequencies in rhesus macaque troops on Cay0 Santiago island reported by McMillan and Duggleby (1981) and by Duggleby (1978). These data are excellent for our purposes since they are a relatively complete sample of individuals from a delimited population in which the social structure is well known. If current models in population genetics are to be really useful for inference about feral mammals, then they should at least be able to provide meaningful information about this simple population. CAY0 SANTIAGO LINEAGES A population of rhesus macaques was established on the Caribbean island of Cay0 Santiago in 1938, and it has been under more Received April 9, 1984;revision accepted November 22,1985. 64 H. HARPENDING AND S. COWAN or less continuous scientific scrutiny since that time. In the late 1970s there were approximately 300 animals on the island in four troops, labeled F, I, J, and L in the literature. These troops remain after several others were removed in 1972. Gene frequency data for these four troops are available in Duggleby (1978; eight blood group loci) and in Buettner-Janusch and Sockol(l974; transferrin variants). The differences among these troops have been analyzed previously by O'Rourke and Bach Enciso (1982), who apparently could detect a correspondence between differences among the groups and their relative locations on the island. Our purposes here are to use these very excellent data about semifree-ranging primates t o evaluate three different models of population breeding structure to see whether population genetic data about feral mammals can give us useful insight into local mating behavior. Our conclusions are essentially that these very high quality data are entirely compatible with all three models and that the genetic data cannot distinguish among them. The three models all incorporate very high levels of gene flow among troops. The genetic marker data are not compatible with either troop endogamy or with the low effective size that would result from a very few males fathering most of the offspring each generation. All three models that we fit to the data are special cases of the migration matrix model of Smith (1969) and Bodmer and CavalliSfona (1968). In this model, genetic drift at reproduction of each generation leads to dispersal of local gene frequencies from the array mean, while migration among groups homogenizes local gene frequencies. These two processes reach an equilibrium at which drift is balanced by local gene flow. At such an equilibrium, the dispersion in gene frequencies among newborns will be greater than the dispersion of gene frequencies in adults sampled after migration. We will call the theoretical distribution of gene frequencies in newborns the child model and the distribution in parents the adult modeL Since in Duggleby's data animals are classified according to their troop of birth, the child model rather than the adult model is appropriate. We assume that the relative sizes of the groups have been nearly constant for several generations and that some fairly stable pattern of gene flow among the troops exists. (In fact, we are considering models of high levels of gene flow among troops, so these assump- tions are not really necessary. As an extreme case, recall that Hardy-Weinberg binomial proportions follow from one generation of random mating.) In particular, gene flow among troops is described by a migration matrix where the entry in row i and column j is the frequency among immigrants to thejth group each generation of those born in group i. "%is is a socalled backwards matrix, and each column sums to unity. Given this, Rogers and Harpending (1985) show that the normalized weighted gene frequency covariance matrix among the adult members of the groups (that is, after migration has occurred) is Re) = VB(")V t where B'"' (areferring to adults) is diagonal with and In this expression, the matrix V is the matrix of left eigenvectors of the migration matrix, N is the total genetic effective size of the array of subpopulations, and the are the second through the last eigenvalues of the migration matrix. (The first eigenvalue of the migration matrix is unity, and the eigenvector corresponding t o the leading eigenvalue gives the relative group sizes, which we assume to be constant.) The quantity Ro is a convenient overall measure of genetic differentiation within the array of subpopulations, equivalent to Wright's Fst. Rogers and Harpending (1983)show that this is given by the sum of the diagonal entries B k of the matrix B'"). Notice that this formulation takes into account the variance generated by the random sampling of migrants among groups, so it is an elaboration of and a correction to the model given by Harpending and Ward (1982) and by Rogers and Harpending (1983), who neglected this source of variance. Incorporating the randomness of genes in migrants yields the curious result that the adult model in this report is identical to the child model of Rogers and Harpending (1983): The stochasticity of gene frequencies in migrants has an effect equivalent to the increment in drift from one generation of reproduction. PRIMATE POPULATION STRUCTURE 65 .5 .5 .5 .5 The data from Cay0 Santiago, on the other .17 .17 .17 .17 hand, require a slightly different model, since .17 .17 .17 .17 individuals are classified according to the .17 .17 .17 .17 troop in which they were born rather than the troop with which they live. This is equivThis matrix has a leading eigenvalue of 1 alent to sampling newborn individuals in troops rather than sampling adults. Rogers and three eigenvalues of zero. Substituting = 0 in equation 2 yields and Harpending (1985) show that the expected normalized covariance matrix for the child model is Ro=-. 3 N + 3 & ); = VB‘C’ vt Random mating, then, yields the observed where B“) (c referring to children) is a diago- value = .0216 if the total effective size of nal matrix with the island population is 136. This is slightly less than one half the census size and is an entirely reasonable estimate of total effective B!Ccc) = (1 - Ro) (2 - Af) size. 2N(1 - A)! Under the random male migration model, ( 2 = 2,. . . . ) half of the genes are completely endogamous (i.e., those in females) and their migration and matrix is just the identity matrix. The matrix for the other half of the gene pool is one in which each row is composed of identical entries, the relative size (“weight” wi)of the Note that Ro measures overall differentia- group corresponding to that row. This is the tion in this case in the same way as before, random mating model we just used. We take except that the differentiation is now that the actual migration matrix of our model to either among newborns or among individu- be the average of these two. Given troop sizes als classified b place of birth. It is equal to of approximately 150:50:50:50,the model mithe trace of B“ ? .The diagonal matrix B is not gration matrix is the average of (representquite a matrix of eigenvalues of the R ma- ing the endogamous females) trix, but of a related matrix that incorporates relative population sizes (see Harpending and Ward, 1982). 1 0 0 0 Given this general formulation, we now 0 1 0 0 derive parameters for three possible mating 0 0 1 0 situations: random mating, random male 0 0 0 1 mating, and random male exogamy. By random mating we mean the random allocation of parents-that is, if parents form and (representing random male movement) groups and mate completely at random, then .50 50 .50 .50 gene frequency variation among the off.17 .17 .17 .17 spring of these groups will be greater (by one .17 .17 .17 .17 generation of drift)than the variation in their .17 .17 .17 .17 parents. Thus, we cannot naively ‘test” for random mating by using a chi-square statistic unless we are sampling adults after mi- which is gration. But in these data individuals are .75 .25 .25 .25 labeled by their birth troop, so the child .08 .58 .08 .08 rather than the adult model is appropriate. .08 .08 .58 .08 If adults are allocated at random, the cor.08 .08 .08 .58 responding migration matrix is just a matrix in which each row is the relative size of the group corresponding to that row. For Cay0 Santiago, where the four troops are of apSince this is a totally connected probability proximate sue 150:50:50:50, the random mi- transition matrix, it has a unique leading gration matrix is eigenvalue of 1 and three more eigenvalues 66 H. HARPENDING AND S. COWAN each equal to .5. Therefore, substituting into equation 2 above, the predicted Bii for i = 2 , 3 , 4 are Bii = (1.75) (1 - Ro) -* 2(.75)N (3) The observed value of Ro is -0216, and this would equal the prediction from this model if the total effective size on the island were 159, which is slightly more than one half of the census size of 250. This may be about the right ratio of effective to actual size for mammals like these, so the fit of the macaque data to the random male migration model is excellent. Finally, we may evaluate a model called random male exogamy. The assumption of this model is that males mate at random among any troops except for their natal troop, where they are forbidden to mate. To construct the migration matrix for this model we assume that females are endogamous, while for males the frequency of migrants from group j into i is proportional to the size of j for a l l j except i, for which the probability is zero. For the Cay0 Santiago situation we must average the female matrix: 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 and the male matrix 0 .33 .33 .33 .6 0 .2 .2 .6 .6 .2 0 .2 .2 .2 0 .3 .5 .1 .1 .3 .1 .5 .1 .3 .1 .1 .5 to yield .5 .17 .17 .17 (4) which is our random male exogamy migration matrix. There now occurs an interesting problem. The vector 3:l:l:l is not a stable vector of this matrix In other words, this pattern of migration does not lead to the observed distribution of troop sizes. The reason is clear: If male mating within the natal troop does not occur, then there is very strong selection against males from the large troop. This means that there is also selection against the mothers in large troops through their sons, and hence strong selection for fission because of mate competition induced by the mating rule. For the sake of clarity, assume that the troop sizes are exactly 150:50:50:50and that exactly half of each generation is males and half females. Then a male born to the large troop has a pool of 75 females (i.e., the females in the three small troops) that are potential mates. He has 74 (his troop mates) plus 50 males (from the two other small troops) or 124 competitors for the 75 potential mates he has available. Now consider a male from a small troop. He has 125 potential mates, rather than 75, and there are 124 competitors for 50 of them and 74 competitors for 75 of them. The result of all this is that mate availability is 751124 for males from the large troop and 125195 for males from small troops. The mating rule gives males from the small troops roughly a two to one selective advantage over males from the large ones. This amounts to very strong selection in favor of troop fission. This mechanical selection imposed by the mating rule also explains why the random male exogamy model does not lead by itself to the stable troop size distribution built into the model. We will proceed anyway, assuming for the moment that the necessary amount of selection does occur each generation. The (computed) eigenvalues of the matrix 4 are 1, .4, .4, and .2, and substituting these values into equation 2 leads to an estimate of effective size of 149, which is intermediate between the estimates generated by the random allocation of the parent model and the random male mating model. All three estimates of effective size are very similar and reasonable, and we have no basis for distinguishing among these biologically different models. All three models we evaluated are models with a great deal of gene flow among groups. It is clear that this is the state on this island and that there is very little or no endogamy, population structure, or restriction to gene flow among the groups. Further, all three estimates of effective size are substantially higher than the rough and ready estimate of one-third census size often used in human population studies. There is certainly not as much polygyny among these monkeys as some models of social dominance and re- PRIMATE POPULATION STRUCTURE stricted sexual access to females might imply. If a few males were fathering most of each generation, the effective size would be much lower. We were somewhat surprised to find such a consistently high estimate of effective size from these data and from McMillan and Duggleby’s other set (1981) (see below) since the figure of one-third is widely accepted as a standard in anthropological genetics. In this regard it is worth noting that a recent very careful analysis of effective size in the Gainj of New Guinea (Wood, 1985) reveals that the best estimate of their effective size is about one-half census size, that is, of the same order of magnitude as we find for the macaques. Wood’s analysis is based on demographic and life-history data, while ours is based on genetic markers. There is evidence that this open exogamous mating system may have changed since the capture and removal of a number of troops from the island in 1972. McMillan and Duggleby (1981) have subdivided the Cay0 Santiago population into lineages (pooled into troops in the previous discussion) and reported gene frequency by lineage for both 1972, immediately following depopulation of the island by capture, and for 1976, after 4 years of rapid population growth. In every case except one, each lineage in 1976 is more heterozygous and closer to the overall island gene frequency centroid than it was in 1972. This implies that there might have been more gene exchange among these groups, which both homogenizes gene frequencies and raises heterozygosity within groups. This would lead to the rather startling convergence of gene frequencies, distances, and overall heterozygosities to the overall mean. It will be interesting to see whether endogamy increases as population density on Cay0 Santiago island recovers from the capture. McMillan and Duggleby (1981)suggest that this convergence of lineage gene frequencies may be due simply to the larger sizes of each lineage in the later sample. The total size was 255 in 1972 and 441 in 1976. We know from equation 1 that Ro is almost proportional to the inverse of total effective size. Estimates, from Duggleby’s biochemical data (1978) of Ro from the lineages before and after the four years of population growth are .124 and .068; these are in the ratio of 681 124 .55, which is close to the ratio of census sizes 2551441 .58. Duggleby’s suggestion - - 67 that the differences in genetic differentiation in the two samples simply reflect population size is likely to be correct. On the other hand, these data are not incompatible with random mating, and there is no need to invoke lineage-specificmale mating as she does. Under the random allocation of parents model, for example, equation 2 is (2x1 - .124) (14) = .124, 2N since there are 15 lineages and thus 14 zero eigenvalues under random allocation of parents. This provides an effective size estimate of about 100, approximately 40% of census. Recall that when troops were the unit of study the estimate of effective size was about 50%of the census size. It is reassuring that these two agree as they do: Note that they are both somewhat greater than the estimate of one-third often used by human geneticists. Since the fraction of a primate troop that is capable of reproduction is larger than the fraction of a human population in its reproductive years, this estimate is probably quite good. It seems clear that, from the viewpoint of method, we need to know more about effective size. There is a rich elaborate literature about estimating efyective size from demographic parameters, but this has been of limited use in anthropology because there was little to be done with the estimates generated by these methods. But now the rest of the required theory is catching up, and good demographic inference about effective breeding size would make the migration theory very powerful. The gene frequency data could not distinguish among three models of mating structure. However, these were all models that were variants of random mating, at least for one sex. The data are not compatible with more than minimal endogamy or restricted mating structure on this island or with high differential male reproductive success. ACKNOWLEDGMENTS Richard Ward called our attention to the changes in the Cay0 Santiago genetic structure between 1972 and 1976. We have benefitted from comments and advice from Eric Devor, Jeffrey Froelich, Alan Rogers, Lisa Sattenspiel, and Jim Wood. H. HARPENDING AND S. COWAN 68 LITERATURE CITED Bodmer, W, and Cavalli-Sforza, LL (1968)A migration matrix model for the study of random genetic drift. Genetics 59565-592. Buettner-Janusch, J, and Sockol, M (1977)Genetic studies of free ranging macaques of Cay0 Santiago. Am. J. Phys. Anthropol. 47:371-374. Cavalli-Sforza, L (1969)Genetics of Human Populations. Tokyo: Roc. XI1 Int. Cong. Genetics, pp. 405-417. Duggleby, C (1978)Blood group antigens and the population genetics of Macaca mulatta on Cay0 Santiago. Am. J. Phys. Anthropol. 48:35-40. Felsenstein, J. (1982)How can we infer geography and history from gene frequencies? J. Theor. Biol. 96:9-20. Harpending, H, and Jenkins, T (1974)Kung population structure. In JF Crow and C Denniston (eds): Genetic Distance. New York Plenum Press, pp. 137-161. Harpending, H, and Ward, R (1982)Chemical systematics and human populations. In M Nitecki (ed): Biochemical Aspects of Evolutionary Biology. Chicago: University of Chicago Press, pp. 213-256. Jorde, L (1980)The genetic structure of subdivided hu- man populations: A review. In J Mielke and M Crawford (eds): Current Developments in Anthropological Genetics, Vol. I. New York: Plenum, pp. 135-208. McMillan, C, and Duggleby, C (1981)Interlineage genetic differentiation among rhesus macaques on Cay0 Santiago. Am. J. Phys. Anthropol. 56:305-312. O’Rourke, D, and Bach Encisco, V (1982)Primate social organization, ecology, and genetic organization. In M Crawford and J Mielke (eds): Current Developments in Anthropological Genetics, Vol. II., New York: Plenum, pp. 1-28. Rogers, A, and Harpending, H (1983)Population structure and quantitative characters. Genetics 105:9851002. Rogers, A, and Harpending, H (1985)Obstacles to inference in population structure studies. Submitted for publication. Smith, CAB (1969)Local fluctuations in gene frequencies. Ann. Hum. Genet. 32:251-260. Wood, J (1985)The genetic demography of the Gainj of Papua New Guinea III. Determinants of effective population size. Submitted for publication.