Degrees of freedom in interspecific allometry An adjustment for the effects of phylogenetic constraint.код для вставкиСкачать
AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 93:95-107 (1994) Degrees of Freedom in Interspecific Allometry: An Adjustment for the Effects of Phylogenetic Constraint RICHARD J . SMITH Department of Anthropology, Washington Uniuersity, St. Louis, Missouri 63130 KEY WORDS Interspecific allometry, Comparative methods, Phylogenetic constraint, Degrees of freedom, Nested analysis of variance ABSTRACT The data used in studies of bivariate interspecific allometry usually violate the assumption of statistical independence. Although the traits of each species are commonly treated as independent, the expression of a trait among species within a genus may covary because of shared common ancestry. The same effect exists for genera within a family and so on up the phylogenetic hierarchy. Determining sample size by counting data points overestimates the effective sample size, which then leads to overestimating the degrees of freedom that should be used in calculating probabilities and confidence intervals. This results in an inflated Type 1 error rate. Although some workers (e.g., Felsenstein [19851 Am. Nat. 125:1-15) have suggested that this issue may invalidate interspecific allometry as a comparative method, a correction for the problem can be approximated with variance components from a nested analysis of variance. Variance components partition the total variation in the data set among the levels of the nested hierarchy. If the variance component for each nested level is weighted by the number of groups at that level, the sum of these values is an estimate of an effective sample size for the data set which reflects the effects of phylogenetic constraint. Analysis of two data sets, using taxonomy to define levels of the nested hierarchy, suggests that it has been common for published studies of interspecific allometry to severely overestimate the number of degrees of freedom. Interspecific allometry remains an important comparative method for evaluating questions concerning individual species that are not similarly addressed by the format of most of the newer comparative methods. With the correction proposed here for estimating degrees of freedom, the major statistical weakness of the procedure is substantially reduced. o 1994 Wiley-Liss, Inc. For many years, a key quantitative method for comparative analyses has used the mean value of traits for species as the basic unit of raw data. Typically, data on two traits for each of several species are transformed to logarithms and then examined by correlation and bivariate regression. This methodology is usually identified as “interspecific allometry,” particularly when one of the traits represents the general body size of the species. In recent years, workers developing new comparative statistical methods 0 1994 WILEY-LISS, INC have sometimes considered interspecific allometry as one example of a group of techniques all characterized by analysis across species, and have identified either interspecific allometry or this group of methods as “the nonphylogenetic approach (Felsen- Received March 4,1993;accepted September 8,1993 Address reprint requests to Dr. Richard J.Smith, Department of Anthropology, Washington University, One Brookings Drive, St. Louis, MO 63130. 96 R.J. SMITH stein, 1988), “the traditional ‘equilibrium’ analysis” (Martins and Garland, 1991),“species regression” (Pagel and Harvey, 19881,or “TIPS’ (referring to the fact that the data are derived from the species across the tips of the phylogeny when represented as a cladogram) (Martins and Garland, 1991). Although interspecific allometry has been widely used since the early 1930s (Huxley, 1932) and insights from studies using it were the subject of at least six books during the 1980s (Calder, 1984; Jungers, 1985; McMahon and Bonner, 1983; Peters, 1983; Reiss, 1989; Schmidt-Nielsen, 1984), in recent years interspecific allometry has fallen into great disfavor with some authors. Harvey and Pagel (1991) have suggested that some of the assumptions underlying the method “should be anathema for anyone who believes in evolution.” Gittleman and Luh (1992) declared that methods dependent on a simple correlation of traits across species are “obsolete.” The fundamental flaw with interspecific allometry that has lead to these views is straightforward: since species may share similarities in traits because of common ancestry, as well as because of convergent and parallel evolution, the species used in the regression equation may not be independent examples of the relationship between traits (Pagel and Harvey, 1988). Simulation studies (Grafen, 1989; Martins and Garland, 1991) have confirmed the expected problem; the number of species overestimates the degrees of freedom of the equation, resulting in excessive Type 1 errors, underestimated standard errors, and underestimated confidence intervals. Since this problem was first discussed by Clutton-Brock and Harvey (19771, many new comparative methods have been developed that better meet the statistical implications of the fact that there are similarities in traits among species due to shared common ancestry. These methods are sometimes explicitly presented as replacing interspecific allometry as a comparative method. For example, in the seminal paper from which several variants of new comparative methods have been derived, Felsenstein (1985) begins by listing six papers that are examples of studies using methods that his method is designed to replace. Four of the six (Armstrong, 1983; Damuth, 1981; Martin, 1981; Pilbeam and Gould, 1974) are textbook examples of interspecific allometry. Riska (1991) suggests that “analyses incorporating phylogenetic information are probably to be preferred over the traditional allometric regressions,”and Grafen (1989)and Martins and Garland (1991) directly contrast the Type 1 error rates of simple correlations across species with the newer comparative methods. The objectives of this work are twofold. First, I will argue that interspecific allometry is not obsolete, and for the foreseeable future will not be. The interspecific allometric equation, using phylogenetically related species as individual data points, remains an important method for addressing questions that have not been addressed by the newer comparative methods. Second, the problem with interspecific allometry-an inflated number of degrees of freedomdoes not require discarding the method. What is required is an estimate of degrees of freedom for each data set that reflects the extent to which phylogenetic constraints limit variation in the traits under study. A simple method for doing so is presented here. This adjustment is also applicable to those newer comparative methods that average values of traits for several species in order to evaluate trends across higher nodes of the phylogenetic hierarchy. It is emphasized that this study is not a critique of the new comparative methods, many of which are important and should be used in preference to interspecific allometry in many circumstances. INTERSPECIFIC ALLOMETRY AS A COMPARATIVE METHOD Proposed alternatives Considerable attention has been focused on a comparison between nonphylogenetic correlations across species and new comparative methods insofar as they are able to document independent instances of correlated evolution between two traits. Some of these comparisons have been made using DEGREES OF FREEDOM IN INTERSPECIFIC ALLOMETRY simulation studies (Grafen, 1989; Martins and Garland, 19911, and they demonstrate unequivocally the inflated Type 1error rate of the type of data sets used for interspecific allometric equations. However, the fact of correlated evolution, which is dependent upon the calculation of statistical probability, is a different issue than that of describing the pattern or form of correlated evolution. Harvey and Pagel (1991), in introducing the organization of their recent book, state (their italics): “The sixth chapter deals not with whether characters have evolved together, but with the way in which they show correlated evolution.” This sixth chapter concerns allometry, and includes two of the new comparative methods that appear to meet the goal of replacing interspecific allometry for describing the pattern of evolution between traits. The many other new comparative methods that are preferable to interspecific allometry for documenting the presence of correlated evolution are not the issue here. The first new method is a generalization of the solution proposed by Clutton-Brock and Harvey (1977). In the “nested” ANOVA/ allometry method” (Harvey and Pagel, 1991; Pagel and Harvey, 1988; Steams, 19921, a nested analysis of variance is calculated for each trait t o determine a taxonomic level a t which a pragmatic balance can be struck between the independence of data points and the number of data points that will be left for use in the regression. Determination of the taxonomic levels that account for most of the variation can be used to select a single level at which it is most reasonable to consider the data points as “independent.” In their original 1977 examples, Clutton-Brock and Harvey proposed that using genera as independent data points by averaging the values for all species within each genus substantially improved the validity of the degrees of freedom used to evaluate the allometric relationships under consideration in that study. Clearly, and as always recognized by all who have used it, the method improves the nonindependence problem, but does not eliminate it. Genera within families may share phylogenetic constraints, as may families within superfamilies, and so on for any 97 level of the taxonomy within the next higher level. The second new method is described a t length in Harvey and Pagel (1991) as the “allometry based on independent comparisons method.” They demonstrate that the paired contrast values derived by Felsenstein’s (1985) method can be used as data points in an allometric regression equation, and that the slope derived from plotting contrast scores of one variable against contrast scores of a second variable will estimate the same allometric slope as the individual species data, with the benefit of statistical independence between each set of contrast scores. Thus, putting aside the problem of unresolved phylogenies, the analysis proceeds by identifying for every species in the data set the phylogenetically most closely related other species. These two species are paired, and the raw data used in the regression equations are the differences between each set of paired species. Limitations of the alternatives Although Harvey and Pagel (1991) illustrate the application of the “nested ANOVA/ allometry method” and the “allometry based on independent comparisons method for describing the form of a relationship between two traits, it does not follow that these methods replace all applications of the traditional interspecific allometric equation as a comparative method. Comparative biology is a vast discipline with many objectives (Bock, 1989).Establishing the fact of correlated evolution between two or more traits, or the pattern of that evolution, does not exhaust all possibilities. Two simple examples will be given here of other comparative questions that have traditionally depended upon interspecific allometry but that are not addressed by these new methods. These are counterexamples to the proposition that interspecific allometry is obsolete. With both of the proposed alternatives t o interspecific allometry, the values of traits in individual species are lost to evaluation and interpretation. In the nested ANOVA/ allometry approach, several species are averaged until some higher level is reached. In 98 R.J. SMITH the independent comparisons approach, differences between two species become the raw data. These differences are specific to the particular pair, i.e., the difference value changes if one species is held constant but the other one changes. These types of data are unsatisfactory for the many problems within comparative biology for which the species is the question. Some critics of interspecific allometry appear to view the species selected for a comparative study exclusively in terms of the utility of the species for addressing issues about the adaptation or evolution of traits (e.g., Huey, 1987). There are, however, many comparative biologists with a long-term interest in understanding a few particular species, whatever the difficulties may be. For example, some comparative biologists may not be interested in documenting whether or not there is a relationship between brain size and lifespan, but simply in whether or not a specific species has a large brain relative to its lifespan. In anticipation of one line of criticism, it is important to recognize that questions about the status of traits in individual species do not necessarily assume that the interspecific allometric equation can reveal why a species has a relatively large brain, or what a relatively large brain is an adaptation for. As Gittleman (1989) has noted, “Comparative analyses are not always meant to infer causation.” Once features are characterized (e.g., once relative brain size for a species has been documented), other methods may be used to understand why this might be so or how the trait might be used. The central role of adaptation in evolutionary biology (Coddington,1988) does not exclude the possibility of other important questions (Bock, 1989). The traits of animals, whether adaptations or exaptations, primitive or derived, parallelisms or convergences, homologies or homoplasies, are of interest for the biological role (Bock and von Wahlert, 1965) or current utility (Harvey and Pagel, 1991) that they serve in the life of the animal. Ridley (1991) has commented on the awkward consequences of defining traits as adaptations only when they first evolve, as is common in discussions concerning the objectives of the new comparative methods. He points out that this definition leads to defining the pos- session of eyes in humans as a constraint, not an adaptation. It is obvious that those interested in the biological role of a structure might find this concept of adaptation of little relevance. Anthony and Kay (1993) note that the maintenance of a trait in descendant lineages is useful evidence of biologically interesting stabilizing selection. A second example of a question for which the new comparative methods cannot yet substitute for interspecific allometry concerns the use of the equation in some types of paleontological inference. It is common to use fragmentary fossil remains to estimate body mass of extinct species. The widely accepted method of choice (e.g., Damuth and MacFadden, 1990) involves generating an interspecific allometric equation using modern species in which some anatomical feature also available on fossil specimens is the independent variable and body mass of the extant species is the dependent variable. The interspecific allometric equation generated with extant species is then used to predict the dependent variable in the extinct species. The objective is to predict the value for an individual species, and the equation is constructed with individual species. Furthermore, it is usually considered desirable to take advantage of phylogenetic constraint in generating the equation and to use species closely related to the fossil species of interest. For example, the relationship between tooth size and body mass may differ substantially in carnivores and primates. An equation based on modern primates would be irrelevant for prediction of a fossil carnivore, and vise versa. Therefore, equations are often limited to a group of the most closely related species possible. It follows from the preceeding examples that a comparative method that allows for the retention of the identity of individual species and their trait values has a place in comparative biology. Interspecific allometry is useful, but statistically flawed. It should be noted that some of the new comparative methods can also produce estimates of values for individual species, although I am not aware of any attempts to use them for that purpose. Steams’s (1983) phylogenetic subtraction method, the autocorrelation analysis proposed by Cheverud et al. (1985) and DEGREES OF FREEDOM IN INTERSPECIFIC ALLOMETRY 99 fined as the effective sample size (effective N) for the data set and trait, as opposed to the traditionally used observed sample size (observed N). In Felsenstein’s example (19851, a set of species highly constrained to the ancestral values could approach an effective sample size of 2, but could not possibly be lower. A set of species without phylogenetic constraint could approach an effective sample size of 40,but could not possibly be higher. The nested analysis of variance has been used often since initially proposed by Clutton-Brock and Harvey (1977) to partition A CORRECTION FOR THE DEGREES OF the variation in a trait to levels of a nested FREEDOM IN INTERSPECIFIC taxonomy. It is now well-established and acALLOMETRY cepted as an effective method for showing The problem with using individual species the taxonomic level at which phylogenetic as data points in an interspecific regression correlation occurs (Gittleman and Luh, is that the values of individual species are 1992). The method for quantifying taxoonly partially free to vary. The magnitude of nomic effects from a nested ANOVA inthis effect reflects the closeness of the phylo- volves using the observed and expected genetic relationships between species in the mean squares a t each level to calculate varidata set coupled with the extent to which ance components for each level. These variphylogenetic inertia limits the rate with ance components are then expressed as perwhich the particular trait undergoes adap- centage variance components by summing tive change. Therefore, any correction for them for all levels and expressing each level phylogenetic constraint must be specific to as a percentage of the total (e.g., Sokal and the particular set of species and an individ- Rohlf, 1981; Bell, 1989). The percentage variance components sum to 100% and parual trait. In a widely used example of the problem tition the total variation between subjects presented by Felsenstein (19851, two ances- (i.e., species) to levels of the taxonomic hiertral species each give rise to twenty descen- archy. In the method proposed here, the percentdant species. Typically, an interspecific allometric equation would treat the 40 species age variance components are used to estias 40 independent observations. Felsenstein mate the effects of phylogenetic constraint asks whether it might be more realistic to on the freedom of species to vary by weightview the data set as having closer to 2 obser- ing the number of groups at each level of the vations, with 20 highly constrained similar taxonomic hierarchy according to the pervalues in all the descendants of each of the centage of the total variation explained by two independent ancestors. Clearly, neither that level of the nested analysis of variance. extreme estimate is true. There are not 40 For example, in a nested ANOVA of a trait independent observations, because species with no phylogenetic constraint, virtually within each set of 20 are constrained by the all variation would be assigned to the speancestral value. Neither are there two ob- cies level, and the percentage variance comservations, because within each set of 20, ponent for that level would approach 100%. there is variation and descendants differ Multiplying the number of species by 1.0 from the ancestral value. If we consider each would result in an effective sample size species as some fraction of a free observa- equal to the number of species. On the other tion, varying between 0 and 1.0, a value hand, in a situation such as Felsenstein’s could be computed between 2 and 40 that hypothetical radiation, a large component of woud reflect the balance between constraint variation would be attributed to the level and independent evolution. This value is de- with two nodes. If all variation was due to modified by Gittleman and Kot (1990), and Lynch’s (1991) mixed model or maximum likelihood approach, all allow for the calculation of values for individual species. In these methods, the values for species would represent trait values after the removal of the portion of the trait attributed t o phylogenetic constraint by each method. Thus, interspecific allometry would continue to serve a different purpose in that the values of traits relative to each other represent the removal of covariation with the independent variable, usually body size. R.J. SMITH 100 the two higher nodes, with no variation between the 20 species in each group, the higher level would have a variance component of 1.0 and would result in a sample size of two. Consider an example in which species are organized into superfamilies, families, and genera. With percentage variance components (PVC) from a nested ANOVA expressed in decimal form and summing to 1.0, the effective sample size would be: effective N = (# of superfamilies) (PVC for superfamilies) (# of families)(PVC for families) + ( # of genera)(PVC for genera) (# of species)(PVCfor species) + + Levels of the nested analysis are defined so that species within genera (the variance component for species) is defined by the error (residual)variance. This approach weights the sample size according to the sources of variation in species values. The maximum effective sample size is the number of species, and is reduced according to the extent to which higher phylogenetic relationships (which of necessity have a smaller number of groups) appear to explain variation among species. In order to use interspecific allometry as a comparative method, the procedures described here require that two nested ANOVAs (one for each trait) are calculated as additional, separate steps after the calculation of the interspecific allometric regression. Workers wishing to use this method would calculate an allometric equation just as they always have. However, to interpret the slope and/or constant of the equation, or the residuals of individual data points, it is often desirable to make some statistical inferences about them, such as the significance of the bivariate correlation, or the confidence interval for the slope. Calculation of these statistics requires a sample size, and here is where the nested ANOVA is applied. The same species used in the allometric equation are now organized as a separate, second analysis into a nested format, and the procedure outlined earlier in this section is carried out for each variable. The output of these nested ANOVAs is not used for any comparative insights in and of itself, but solely to estimate an effective sample size for the analysis that reflects the effects of phylogenetic constraint. Thus, although several workers have proposed comparative methods that are based on a nested ANOVA structure (e.g., Bell, 1989; Harvey and Pagel, 1991; Page1 and Harvey, 1988),the purpose and use is entirely different here. Worked examples I have evaluated this method on two data sets. Harvey and Clutton-Brock (1985) published data on life history traits for 135 primate species. They analyzed each trait with a nested ANOVA, partitioning variation into family, subfamily, genus, and species levels. As a result of this analysis, they decided to average species values within genera, and generic values within subfamilies. This reduced their data set for each trait to 19 subfamily values, and they calculated allometric equations with these 19 data points. Thus, in spite of the fact that some variation in the trait existed between species within genera, and between genera within subfamilies, the information that might be gained from this variation was excluded before analysis began. I repeated the nested ANOVA with two traits from the Harvey and Clutton-Brock study, female body weight and adult brain size. My estimates for the percentage variance components differ slightly from theirs, which is not unexpected given possible differences in the algorithm for calculating this measure. Using the formula for calculating an effective N and my values for variance components, the effective sample size when regression equations were generated using all available species as separate data points was 21.5 for female body weight and 17.5 for adult brain weight (Table 1, part A). In addition to confirming that Harvey and Clutton-Brock were remarkably percipient in selecting the subfamily level with an N of 19 for their analysis, the results document that phylogenetic constraint in this type of data set is very strong, and that previous studies that have determined significance or confidence intervals on the basis of observed sample size need to be reevaluated. However, in using the 19 subfamilies as the sample size from which statistical significance was cal- DEGREES OF FREEDOM IN INTERSPECIFIC ALLOMETRY 101 TABLE 1 . Data from nested analysis of uariance used to calculate effectiue sample size from observed sample size Superfamily Part A Body mass Brain mass Part B Body mass Brain mass Part C Body mass Brain mass Part D Body mass Molar area N * * PVC N * PVC * Family Subfamily Genus Species Within species Observed N 66.9 12 80.3 12 19.0 19 11.4 19 10.2 50 6.8 54 3.8 124 1.6 129 * * * * 124 25.3 19 18.1 19 * * * * * * * * N * * PVC N * 74.7 12 81.9 12 34.1 6 47.0 6 36.8 12 34.9 12 15.0 19 8.9 19 8.5 50 4.7 54 68.9 70.1 8 19.2 6.3 13 6.8 17.7 23 PVC PVC N PVC N PVC PVC N * * * * 21.5 17.5 129 * 13.8 * 19 * 19 5.5 124 4.5 129 * * * * 124 2.7 5.4 42 2.5 0.44 81 * Effective N 13.3 21.0 17.0 124 12.7 13.1 81 PVC = Percentagevariance component:N = Number of groups at the taxonomic level indicated;* = Level not used in the analysis. Part A = Data from Harvey and Clutton-Brock(1985). Nested ANOVA calculated using the same levels as in their original study. Part B = Data from Harvey and Clutton-Brock (1985). Data for species combined into generic means. Generic means then combined into subfamily means. Nested ANOVA calculated with subfamilyvalues. Part C = Data from Harvey and Clutton-Brock(1985). Nested ANOVA of species values calculatedwith one additional level (superfamilies)from the analysis reported in Part A. Part D = Data from Gingerich et al. (1982). Nested ANOVA calculatedon 81 data points. Taxonomy for classifying species taken from Harvey and Clutton-Brock(1985). culated, Harvey and Clutton-Brock did not take into account the fact that they had lost the variance in genera and species that was present in their original nested ANOVA. As they noted, subfamilies are not independent of their families. I repeated the calculation of variance components using a data set in which trait values for the 19 subfamilies formed the raw data (Table 1, part B). A substantial portion of the variation is attributed to the differences between families. Using the formula for calculating an effective N, the value for adult female body weight becomes (19" .253) + (12" .747) = 13.8. The effective N for adult brain weight is also 13.8. Thus, in treating subfamilies as independent when they clearly are not, Harvey and Clutton-Brock have overestimated the df for their analyses also. This effect should be evaluated whenever a higher-nodes approach is used, although the method presented here eliminates a major portion of the logic that justifies higher-nodes methods. Using the Harvey and Clutton-Brock data, I also tested the method for differences in effective N when different taxonomic lev- els are used to define the nested levels of a data set. The data for 135 species were evaluated by a four-level nested ANOVA, in which each species was classified into a superfamily, family, subfamily and genus (Table 1, part C). When effective Ns were calculated with these percentage variance components, the value for female body weight decreased from 21.5 to 21.0, and for adult brain weight from 17.5 to 17.0. Thus, the method will produce some differences in the estimated effects of phylogenetic constraint depending upon the taxonomic levels used in the nested ANOVA for a particular data set. The second data set evaluated was one published by Gingerich et al. (1982) to estimate body mass of fossil primates from tooth dimensions. They used tooth size data for 43 extant primate species, and generated interspecific allometric equations in which body mass was the dependent variable. Unlike Harvey and Clutton-Brock, whose method reflected the removal of the lower levels of variance before analysis, Gingerich et al. (1982) treated mean values for males and females within species as separate data 102 R.J. SMITH points. Thus, Gingerich et al. (1982) calculated confidence intervals for their body mass estimates assuming that the 86 data points from 43 primate species produced 86 degrees of freedom. I calculated a four-level nested ANOVA with their data for both body mass and lower first molar area, using the available 81 data points from 42 species. Families, subfamilies, genera and species defined the nested levels, and males and females within species were the residual variance. The calculations indicate that the effective N for this data set is 12.7 for body mass and 13.1 for lower molar area, rather than the 81 used in the original study (Table 1, part D). One reason for this reduction is the inflation of the apparent sample size resulting from using the two sexes as separate values. Separating males and females rather than using species values accounts for only a small proportion of the variance in the nested ANOVA, and using this percentage variance component within species multiplied by the 81 data points within species contributes only a small amount to the total df. It should be noted that neither the Harvey and Clutton-Brock (1985) nor Gingerich et al. (1982) papers were selected for evaluation because of any particular problems with sample selection or data analysis. On the contrary, both are well-done studies and include a wide diversity of primate species, so that the relationship between effective and observed sample size in these studies should be representative or better than much of the rest of the primatological literature. The results pointedly confirm the fact that interspecific allometry has deficiencies as a comparative method, at the same time as they illustrate a solution t o the specific problem of statistical nonindependence that has been the basis for the strongest criticism. The method for estimating a n appropriate sample size described here should largely eliminate the problem of an inflated Type 1 error rate. However, it requires (unavoidably, I believe) a substantial loss of statistical power in comparison with analyses using newer comparative methods that generate a sample of independent data points equal to the number of species in the original data set. Applications, assumptions, and limitations Degrees of freedom and effective sample size Most discussions of the consequences of phylogenetic constraint on the statistics of comparative methods have phrased the problem as a discrepancy between the numbers of species and the degrees of freedom (do used in statistical calculations. A slight modification of emphasis has been used here. The df for a statistic reflects both the data set and the statistic. When data points are independent, each one equals one df, but then depending on the statistic to be calculated, a certain number of df are subtracted. Phylogenetic constraint has an effect on only one component of the calculation of df. It results in an error in the assumption that each data point is equal to one df. It does not alter the number of df that are subtracted for the particular statistic. Therefore, in this discussion the effects of phylogenetic constraint have been defined as resulting in an effective sample size that is less than the number of data points used to calculate the regression. After an effective sample size is calculated that reflects the effects of phylogenetic constraint, the df for a statistic is determined by subtracting an appropriate number of df from the effective N, rather than from the observed N. Combining estimates of effective sample size for two traits The calculation of statistics for a bivariate regression equation (or a bivariate correlation coefficient) requires a single value for df, not separate values for the x and y variables. #en a nested ANOVA has been used to select a higher level for grouping species, it has been routine to evaluate variance components only for the dependent (y axis) trait. For the method proposed here, a conservative approach would be to calculate the phylogenetically reduced effective sample size for both the x and y axis traits and select the smaller of two choices: (1)the effective N calculated for the y axis trait, or (2) the mean of the effective Ns for the two axes. DEGREES OF FREEDOM IN INTERSPECIFIC ALLOMETRY 103 TABLE 2. Variance components from nested analysis of variance of residuals from interspecific allometric equations Part A PVC N Part B PVC N Family Subfamily Genus Species 0 8 20.1 13 54.0 23 42 19.6 12 51.1 19 2.0 59 27.4 118 8.2 Within species Residuals Eff N/Obs N 17.0 81 32.3/81 * * Original traits Eff N/Obs N 45.3/118 1 3 . m1 12.7/81 17.5/129 21.5/124 * = level not used in the analysis. Part A = Residuals from least squares regression of In lower molar area (y-axis)against In body mass (x-axis).Nested ANOVA calculated with residuals in natural log fonn. Data from Gingerich et al. (1982). Part B = Residuals from least squares regression of In adult brain mass (y-axis)against In body mass (x-axis).Nested ANOVA calculated with residuals in natural log form. Data from Harvey and Clutton-Brock(1985). Effectivesample size for analyses with residuals Following calculation of the interspecific allometic regression, it is often desirable to use the residuals as data in further statistical analyses. The regression line has been described as a “criterion of subtraction” (Gould, 1975; Smith, 1984) that allows trait values t o be partitioned into a component explained by correlated change with size, and a component reflecting differences that are independent of size. The effective N for analyses with residuals should also reflect the phylogenetic relationships between species. This can be approximated by proceeding with three steps after obtaining the residuals from the allometric equation. First, a new nested analysis of variance should be performed on the residuals. Since part of the phylogenetic constraint on a trait may reflect a constraint on body size that is manifested in the trait, the pattern of variance components may differ substantially between the original two traits and the residuals from a regression between them. These variance components from the nested ANOVA on the residuals should then be used t o calculate an effective N for the residuals in an identical manner to the calculation of an effective N for a trait. The final step is to assign each residual a fractional value of an observation (between 0.0 and 1.0) depending upon the overall relationship between the observed N and the effective N calculated for the set of residuals. For example, if the nested analysis of variance of residuals leads to a calculation of an effective N of 30 for a data set in which there were 40 observations, further analyses with residuals could be interpreted by considering each residual to contribute 0.75 (30/40) degrees of freedom. Obviously, this would not adjust for differences in the density of the phylogenetic relationship in different portions of the cladogram. I examined this procedure by calculating nested ANOVAs with residuals from two regression equations: (1) brain mass against female body mass with the data from Harvey and Clutton-Brock (19851, and (2) lower first molar area against body mass for the data from Gingerich et al. (1982). As shown in Table 2, there is a large change in the taxonomic location of constraint on residuals compared with the original traits (Table l), and the effective N may be substantially larger for the residuals than it is with the original traits. Random effects in nested ANOVA One question about the method proposed here (as well as with any use of variance components from a nested ANOVA) is the extent to which a nested analysis of variance takes advantage of random variation in the data set, inflating the proportion of variance assigned to higher levels. This was evaluated by calculating percentage variance components from nested ANOVAs with randomly assigned values. Using the “shuffle” option in the random number generator of J M P 2.0.5 (SAS Institute, Inc), I reassigned the female body mass from Harvey and Clutton-Brock (1985) and the lower molar area from Gingerich et al. (1982) to species within their data set at random. Thus, instead of 104 R.J. SMITH using random numbers, the actual species values were used, but they were assigned randomly to the incorrect species in the data set. Ten trials of reshuffled values were calculated for each trait, and a three-level nested ANOVA was used for female body mass and a four-level nested ANOVA for molar tooth area. The results of these analyses indicate that variance components are heavily weighted to the lowest level (residual) variance in a nested ANOVA with random data, as they should be. It is difficult to interpret the results as percentage variance components for each level, since in every trial some (usually most) of the higher levels had a negative variance component. In only six of the twenty trials was the sum of variance components at all higher levels combined (excluding only the residual variance) greater than zero. Making the most conservative interpretation and treating negative variance components simply as zero variance explained, the minimum value for the percentage variance attributed to the residual level [the species level with Harvey and Clutton-Brock (1985) data and the intraspecific level with the Gingerich et al. (1982) data] was 77% and 65%, respectively, and the median values for the ten trials were 89%and 83%.Thus, the nested ANOVA may find some structure in random data for which no variance should be attributed to any higher level. Attributing variance to lower levels increases the estimate of the effective N, while variance attributed to higher levels decreases the effective N. The fact that some variance was attributed to higher levels in these trials of randomly assigned values indicates that the method proposed here tends toward a conservative estimate of the effective N for a data set. Phylogeny, taxonomy, and cladistics It may appear that this method involves a theoretical preference for rank-based taxonomic classifications over information concerning the cladistic relationship between species. This is not the case. The method is based first and foremost on recognition of the fact that interspecific allometry as a comparative method is fatally flawed unless some way is found to incorporate independent information on phylogeny into the statistical analysis. The method does not require that levels of the nested hierarchy are defined by taxonomic categories. It is possible, and would be preferable, to organize the species of an allometric data set cladistically, using higher and lower levels of nodes from a cladogram to define levels of the nested ANOVA. For some simplified cladograms this is straightforward, but in general, it is often difficult (or impossible) to organize “real-life” cladograms into 3 or 4 levels and apply a nested ANOVA to them. Others who have used nested ANOVA as a comparative method (Bell, 1989; Page1 and Harvey, 1988)have also used taxonomic categories to define levels. Taxonomy is wellrecognized to often “only provide crude representations of phylogenetic distance” (Gittleman, 1993) and should be used with that understanding. The key problem with using taxonomy as a representation of phylogenetics in the setting up of a nested ANOVA is the assumption that taxonomic groups are monophyletic. It should also be noted, as Miles and Dunham (1992) pointed out, that nested ANOVA models assume that lineage-specific variation in a trait is evidence for phylogenetic constraint; in other words, that trait variation correlated with phylogeny is due to phylogeny. This will result in a lower estimate of df than is necessary. Unit of analysis The routine use of mean values for species as the unit of data in interspecific allometry is usually in violation of the assumption of homoscedasticity (equal variances for all observations) that is required €or regression analysis. Species means are rarely based on an identical number of individuals within each species, and these differences in sample size result in different standard errors for each mean. The solution to this problem is the method of “weighted least squares” regression, which, although well-established in the statistical literature (e.g., Darlington, 1990; Draper and Smith, 1981), has not received attention in applied allometric DEGREES OF FREEDOM IN INTERSPECIFIC ALLOMETRY studies. While the use of weighted least squares (LS) regression is relatively straightforward, and is readily available in statistical packages, I am unaware of any literature that clarifies the management of weighted values in nested ANOVA. Furthermore, the literature on weighted regression appears specific to least squares, and it is unclear whether or not the method is appropriate for other line-fitting criteria, such as reduced major axis or major axis. Although the issue has been ignored in literally thousands of interspecific allometric studies to date, one solution might be to use individuals within species as the data points in interspecific allometry, with the effective sample size adjusted by the nested ANOVA method. If this is done, and individuals within species define the lowest (subordinate) level of the nested design, then the treatment of sex differences is an important consideration. There would appear to be two options for dealing with sex differences: (1) simply list each individual as a case within a species, ignoring sex, or (2) within each species, define a nested level with two groups (males and females) and list values separately by sex. The latter choice would create a fixed effect for sex within species, and the overall nested ANOVA would be a “mixed model” with the single fixed effect of sex within species and random effects a t all other levels. This structure is problematical. Sokal and Rohlf (1981) suggest that it is a “crucial point” that only the highest level of a nested ANOVA be fixed, and that all subordinate levels must be random. Although there may be computations possible for fixed effects within random effects (McKone and Lively, 1993), fixed effects in nested ANOVA remain controversial in any case (Zucker, 1990).The only approach that can be recommended is to ignore sex and list all individuals as simple cases within a species. It should also be noted that the use of a nested ANOVA to estimate an effective N allows for yet another permutation in the selection of data points for interspecific allometry. If multiple studies of a single species allow for several independent estimates of a species mean, all of these estimates can be used, with the nested ANOVA con- 105 structed so that different estimates of a trait within single species define the lowest level of the nested hierarchy. Sample selection There has been some discussion of the fact that interspecific allometric equations can be calculated on a poorly selected set of species. Clutton-Brock and Harvey (1984) use the example of a study by Millar (1977) in which the 98 species supposedly representing the entire Class Mammalia included 81 species of rodents. It is routine to find studies of the primates heavily biased toward species of Old World monkeys. However, there is nothing unique about interspecific allometry in being either susceptible or influenced by this problem. Naive selection of species results in a bias toward oversampled taxa whether the study involves independent contrasts, directional methods, or interspecific allometry. Consider the case of a sample of mammals overweighted with rodents. In an analysis using new comparative methods based on independent or directional contrasts, the number of contrasts between members of Rodentia will bias the results according to the particular pattern of pairwise relationship or evolutionary change experienced in this order, just as the individual species values will bias a simple allometric regression line. It occasionally seems that some of those writing about comparative methods assume naive species selection by those doing allometry, and balanced species selection by those using newer approaches. Obviously, this is not inherent in the techniques. The many recent discussions of phylogenetic constraint make it clear that reasonable balance and diversity is necessary in the selection of species for any comparative method. The issue of balance and diversity in selection of species involves an attempt to appropriately distribute species within each taxonomic level (see also Gittleman, 1989). If a study is concerned with an entire taxonomic class, the first step is to evaluate the number of orders that are represented, and then the number of species within each order. There should be an attempt to achieve diver- 106 R.J. SMITH therefore to find a way to incorporate phylogenetic information into the statistical inferences of an interspecific allometric equation, and thereby salvage a technique that some comparative biologists have discarded without concern. The estimation of effective sample size by weighting variance components from a nested analysis of variance uses a procedure (nested ANOVA) that has been applied for other purposes in several of the newer comparative methods. It results in a useful estimate of the phylogenetic effects that interspecific allometry must account for. The results also indicate, however, that phylogenetic constraint results in a large decrease in statistical power when typical data sets are analyzed by interspecific allometry. There is thus a substantial cost to the continued use of interspecific allometry on CONCLUSIONS groups of closely related species. Finally, the For some time now, a body of literature implications of phylogenetic relationships has been developing in which it is suggested among species on the statistics of comparathat interspecific allometry as routinely pre- tive methods are sufficiently well-recogformed is invalid as a statistical technique nized and accepted that statistical inference (e.g., Clutton-Brock and Harvey, 1977; from an interspecific allometric equation Felsenstein, 1985; Harvey and Pagel, 1991). without any consideration of the effects of Coinciding with this conclusion has been the phylogenetic constraint should be unacceptdevelopment of a large variety of new com- able. parative statistical methods that take into ACKNOWLEDGMENTS account what interspecific allometry does not; namely, the fact that species resemble I thank Jim Cheverud, John Gittleman, each other because of phylogenetic con- and Tab Rasmussen for helpful comments straint as well as because of convergent and on earlier drafts of the manuscript. parallel evolution. These new statistical LITERATURE CITED methods are valuable tools for comparative biology, and the criticisms of interspecific Anthony MRL, and Kay RF (1993) Tooth form and diet in Ateline and Alouattine primates: Reflections on the allometry are accurate. Along the way, howcomparative method. Am. J . Sci. 293A:356-382. ever, it has been an almost incidental as- Armstrong E (1983) Relative brain size and metabolism sumption that the demise of interspecific alin mammals. Science 220:1302-1304. lometry is of no consequence, since it has Bell G (1989)A comparative method. Am. Nat. 133t553571. been replaced by better, new methods. This is the only major point addressed in this Bock WJ (1989)Principles of biological comparison. Acta Morphol. Need-Scand. 27:17-32. study that is in disagreement with the reBock WJ, and von Wahlert G (1965) Adaptation and the cent literature on new comparative methform-function complex. Evolution 19:269-299. ods. The loss of interspecific allometry as a Calder WA I11 (1984) Size, Function, and Life History. comparative method does matter, because Cambridge, MA: Harvard University Press. there are some particular comparative ques- Cheverud JM, Dow MM, and Leutenegger W (1985) The quantitative assessment of phylogenetic constraints tions for which the specific form of the outin comparative analyses: Sexual dimorphism in body put from an interspecific allometric regresweights among primates. Evolution 39:1335-1351. sion equation would be more useful than the Clutton-Brock TH, and Harvey PH (1977) Primate ecolspecific form of the output from any of the ogy and social organization. J . Zool. (Lond.) 183:l-39. new techniques, if only interspecific allome- Clutton-Brock TH, and Harvey PH (1984) Comparative approaches to investigating adaptation. In J R Krebs try were valid. The purpose of this study was sity, meaning the inclusion of most orders within the class, and balance, meaning that no order is overweighted in the analysis by containing an excessive number of species. The same procedure would then be followed examining a taxonomic level within orders, and so forth down the hierarchy. An everpresent problem will concern groups with different levels of diversity. In a study of Mammalia, achieving balance and diversity within Rodentia could seem to require a very large number of species in comparison with the number necessary to obtain diversity within the Lagomorpha, among others. Achieving diversity within levels may therefore lead to a loss of balance between higher levels. Sample selection always has, and always will, require judgement. DEGREES OF FREEDOM IN INTERSPECIFIC ALLOMETRY and NB Davies (eds.): Behavioural Ecology. An Evolutionary Approach. 2nd ed. Sunderland, MA: Sinauer, pp. 7-29. Coddington JA (1988) Cladistic tests of adaptational hypotheses. Cladistics 4t3-22. Damuth J (1981) Population density and body size in mammals. Nature 290:699-700. Damuth J , and MacFadden BJ (1990) Body Size in Mammalian Paleobiology. Cambridge: Cambridge University Press. Darlington RB (1990) Regression and Linear Models. New York McGraw-Hill. Draper NR, and Smith H (1981) Applied Regression Analysis, 2nd ed. New York: John Wiley & Sons. Felsenstein J (1985) Phylogenies and the comparative method. Am. Nat. 125:l-15. Felsenstein J (1988) Phylogenies and quantitative characters. Annu. Rev. Syst. 19t445-471. Gingerich PD, Smith BH, and Rosenberg K (1982) Allometric scaling in the dentition of primates and prediction of body weight from tooth size in fossils. Am. J . Phys. Anthropol. 58:81-100. Gittleman J L (1989) The comparative approach in ethology: aims and limitations. In PPG Bateson and PH Klopfer (eds.): Perspectives in Ethology, Vol. 8. New York: Plenum, pp. 55-83. Gittleman J L (1993) Carnivore life histories: A re-analysis in the light of new models. Symp. Zool. SOC. Lond. 65:65-86. Gittleman JL, and Kot M (1990) Adaptation: Statistics and a null model for estimating phylogenetic effects. Syst. Zool. 39~227-241. Gittleman JL, and Luh H-K (1992) On comparing comparative methods. Annu. Rev. Syst. 23:383-404. Gould SJ (1975) On the scaling of tooth size in mammals. Am. Zool. 15t351-362. Grafen A (1989) The phylogenetic regression. Phil. Trans. Roy. SOC. Lond. B 326:119-157. Harvey PH, and Clutton-Brock TH (1985) Life history variation in primates. Evolution 39t559-581. Harvey PH, and Pagel MD (1991) The Comparative Method in Evolutionary Biology. Oxford: Oxford University Press. Huey RB (1987) Phylogeny, history, and the comparative method. In: ME Feder, AF Bennett, WW Burggren and RB Huey (eds.): New Directions in Ecological Physiology. Cambridge: Cambridge University Press, pp. 7C98. Huxley J (1932) Problems of Relative Growth. London: Methuen. Jungers WL (ed.) (1985) Size and Scaling in Primate Biology. New York: Plenum. 107 Lynch M (1991) Methods for the analysis of comparative data in evolutionary biology. Evolution 45,1065-1080. Martin RD (1981) Relative brain size and basal metabolic rate in terrestrial vertebrates. Nature 293.5760. Martins EP, and Garland T, J r . (1991) Phylogenetic analyses of the correlated evolution of continuous characters: A simulation study. Evolution 45:534557. McMahon TA, and Bonner J T (1983) On Size and Life. New York: Scientific American Books. McKone MJ, and Lively CM (1993) Statistical analysis of experiments conducted at multiple sites. Oikos 67:184-186. Miles DB, and Dunham AE (1992) Comparative analysis of phylogenetic effects in the life-history patterns of iguanid reptiles. Am. Nat. 139:848-869. Millar JS (1977)Adaptive features of mammalian reproduction. Evolution 31:370-386. Pagel MD, and Harvey PH (1988) Recent developments in the analysis of comparative data. Quart. Rev. Biol. 63t413-440. Peters RH (1983) The Ecological Implications of Body Size. Cambridge: Cambridge University Press. Pilbeam D, and Gould SJ (1974) Size and scaling in human evolution. Science 1862392-901. Reiss MJ (1989)The Allometry of Growth and Reproduction. Cambridge: Cambridge University Press. Ridley M (1991) Historical ecology. Review of “Phylogeny, Ecology, and Behavior: A Research Program in Comparative Biology,” by D.R. Brooks and D.A. McLennan. Trend. Ecol. Evol. 6:104-105. Riska B (1991) Regression models in evolutionary allometry. Am. Nat. 138:283-299. Schmidt-Nielsen K (1984) Scaling. Why is Animal Size So Important? Cambridge: Cambridge University Press. Smith R J (1984) Determination of relative size: The “criterion of subtraction” problem in allometry. J. Theoret. Biol. 108t131-142. Sokal RR, and Rohlf F J (1981) Biometry. 2nd ed. San Francisco: W.H. Freeman. Stearns SC (1983) The influence of size and phylogeny on patterns of covariation among life-history traits in mammals. Oikos 41:173-187. Stearns SC (1992) The Evolution of Life Histories. Oxford: Oxford University Press. Zucker DM (1990) An analysis of variance pitfall: The fixed effects analysis in a nested design. Ed. Psych. Meas. 50:731-738.