AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 146:155–160 (2011) No Brain Expansion in Australopithecus boisei John Hawks* Department of Anthropology, University of Wisconsin-Madison, Madison, WI 53706-1393 KEY WORDS brain evolution; robust australopithecines; temporal trends ABSTRACT The endocranial volumes of robust australopithecine fossils appear to have increased in size over time. Most evidence with temporal resolution is concentrated in East African Australopithecus boisei. Including the KNM-WT 17000 cranium, this sample comprises 11 endocranial volume estimates ranging in date from 2.5 million to 1.4 million years ago. But the sample presents several difﬁculties to a test of trend, including substantial estimation error for some specimens and an unusually low variance. This study reevaluates the evidence, using randomization methods and a related test using an explicit model of variability. None of these tests applied to the A. boisei endocranial volume sample produces signiﬁcant evidence for a trend in that species, whether or not the early KNM-WT 17000 specimen is included. Am J Phys Anthropol 146:155–160, 2011. V 2011 Wiley-Liss, Inc. The endocranial volumes estimated for late Australopithecus boisei specimens (e.g., after 1.8 Ma) are larger than those of earlier specimens. Elton et al. (2001) found that this trend is statistically signiﬁcant, arguing for the evolution of larger brains over time. Such a trend bears on the ecology and social behavior of A. boisei and lends some doubt to the idea that brain size evolution in early Homo was exceptional (Elton et al., 2001). But the A. boisei sample has some unusual aspects that may complicate the test of a trend. One question is whether the early KNM-WT 17000 specimen represents A. boisei or another species (possibly, Australopithecus aethiopicus). Another question arises from the very small variation of estimated endocranial volumes in the A. boisei sample. Even including the small KNM-WT 17,000 volume estimate, the coefﬁcient of variation in the sample examined by Elton et al. (2001) is only 8.2%. Excluding KNM-WT 17000, the within-sample CV is 6.8%. By comparison, Tobias (1971) reported data on endocranial volumes of hominoids. Great ape values include chimpanzees with 9.7%, orangutans at 10.9%, and gorillas with a CV of 13.1%. According to these estimates, A. boisei had less variation than any living hominoids, even though its craniodental variation was as great as gorillas or orangutans (Silverman et al., 2001). There are several possible interpretations for the low variation of the A. boisei sample: (1) A. boisei actually had very low-size dimorphism; (2) its endocranial variation has been greatly undersampled; or (3) the sample has been biased by estimation error. Other characters of the A. boisei sample show extensive variability compared to extant hominoids (Silverman et al., 2001), so that monomorphism for this species seems unlikely. Low-sample variance is a special concern, because estimation error might lead to false positive results in a test of trend. Here, I conduct three new tests of the null hypothesis of stasis of endocranial volume in A. boisei. These tests explore the effect of estimation error on the appearance of a trend in the sample as well as the effect of lowsample variation and small sample size. None of these tests results in a statistically signiﬁcant trend in the sample. MATERIALS AND METHODS Fossil specimens C 2011 V WILEY-LISS, INC. C Estimating endocranial volume can be challenging even for relatively complete specimens, considering the subtle distortion exhibited by many fossils. For more fragmentary cranial remains, the estimation of endocranial volume requires not only the correction of distortions but also the reconstruction of missing portions. The eleven cranial specimens of Australopithecus boisei listed below vary in their completeness and preservation of relevant anatomy. There is no explicit way of statistically controlling for error in the estimation of endocranial volume, considering the diversity of methods of reconstruction. In several cases, different workers have provided competing estimates. For less complete specimens, choosing one estimate above another must involve a close critique of anatomical details. The following list reviews the anatomical condition of each of these specimens. It is not an exhaustive list of volume estimates, but focuses on the range between credible extremes for the more disputed specimens. This gives an impression of the boundary conditions for measurement accuracy for each specimen. 1. KNM-WT 17000 is a well-preserved skull with relatively small vault fragments missing. Walker et al. (1986) estimated the volume as 410 ml. 2. Omo L338-y6 is a juvenile cranium of uncertain age. Holloway (1981) estimated its volume at 427 ml. Grant sponsor: Graduate School of the University of WisconsinMadison. *Correspondence to: John Hawks, Department of Anthropology, University of Wisconsin–Madison, 5240 Social Science Building, 1180 Observatory Drive, Madison, WI 53706–1393. E-mail: jhawks@wisc.edu Received 22 April 2010; accepted 10 September 2010 DOI 10.1002/ajpa.21420 Published online 16 August 2011 in Wiley Online Library (wileyonlinelibrary.com). 156 J. HAWKS Elton et al. (2001) estimated an adult volume 4% higher or 444 ml. 3. The Omo 323-1976-896 cranial remains are exceedingly fragmentary. One side of the posterior cranial base is preserved, allowing a relatively good estimate of the posterior endocast breadth. The preserved frontal and parietal elements do not join with each other or the temporal; their small size and unknown positions do not allow an accurate estimate of endocast volume. Brown et al. (1993) reported an estimate of ‘‘about 490’’ based on similarity with the 491-ml KNM-ER 23000. Falk et al. (2000) considered it too fragmentary for an accurate estimate. I concur that the available estimate cannot be considered independent of other endocasts on which it may have been based. 4. KNM-WT 17400 preserves only the anterior third of the endocast, consisting mainly of the frontal lobes. Brown et al. (1993) gave an estimate of 500 ml by modeling missing portions after the more complete KNM-ER 23000, but Holloway (1988b) put the volume between 390 and 400 ml, and Falk et al. (2000) adopted an estimate of 390 ml. 5. OH 5 has good preservation of the endocast, but an uncertain join between the anterior and posterior portions of the vault. This discontinuity has caused a disparity in estimates of its volume, including a low 500ml estimate by Falk et al. (2000) and a high 530-ml estimate by Tobias (1963). The range of estimates on this well-preserved specimen covers nearly a quarter of the range of variation cited for A. boisei as a whole. 6. KNM-ER 13750 preserves only the superior vault, accounting for under half of the total endocranial contour. The range of estimates provided by Falk et al. (2000), from 450 to 480 ml, again covers roughly a quarter of the range attributable to the species. Brown et al. (1993) reported a higher estimate of 500 ml. 7. KNM-ER 23000 is a nearly complete vault missing the midline cranial base. Its endocranial volume of 491 ml (Brown et al., 1993) may be the most accurate assigned to A. boisei. 8. KNM-ER 406 is also well-preserved (Wood, 1991). Its volume estimate of 525 ml is uncontroversial (Holloway, 1988a). 9. KNM-ER 407 is missing several vault sections including those enclosing the frontal lobe. Holloway (1988a) estimated the volume at 510 ml; Falk et al. (2000) prepared a new reconstruction with a volume estimate of 438 ml. The difference between these two estimates covers nearly 50% of the total range of the sample. 10. KNM-ER 732 has good preservation of the left side of the vault, but is not complete across the rear of the cranium or basicranium, making a mirror reconstruction problematic. Holloway (1988b) estimated the endocast volume at 500 ml; Falk et al. (2000) at 466 ml. 11. KGA 10-525 lacks most of the frontal and anterior cranial base. Suwa et al. (1997) estimated its volume at 545 ml. The damaged or missing frontals of many specimens have added to ambiguity about their reconstructed volume. Robust endocasts that preserve this region, such as KNM-WT 17400, differ in their anatomy from other taxa, especially early Homo. Falk et al. (2000) reconstructed specimens with missing or incomplete frontal American Journal of Physical Anthropology Fig. 1. Endocranial volume estimates for specimens of A. boisei against time. The sample is that used in this study, excluding Omo 323. endocasts using more complete robust australopithecine endocasts as models; this resulted in substantially smaller endocranial estimates for OH 5, KNM-ER 732, and KNM-ER 407. Some authors place KNM-WT 17000 within the A. boisei hypodigm, but many would put it into a different species, often Australopithecus aethiopicus. This earlier species may have been part of a single evolving lineage with later A. boisei, but need not have been so. Elton et al. (2001) found signiﬁcant evidence for a trend in the A. boisei sample whether or not the sample included KNMWT 17000. Including this early, small specimen tends to amplify the evidence for a trend. The pattern is likewise ampliﬁed by the assumption of a small volume for the other early specimen, Omo L338y-6. Holloway (1981) assessed the Omo L338-y6 juvenile as likely belonging to A. africanus not A. boisei. This observation gained support due to the lack of an occipital–marginal sinus drainage on the endocast. Thus, both early specimens are problematic. In the following tests, I have retained these specimens within the A. boisei sample, because including them tends to stack the deck in favor of a trend. When the same tests are run without these specimens, the P-values are without exception farther from statistical signiﬁcance. However, I want to emphasize that these specimens are not included within A. boisei by any consensus, and their status must be evaluated with evidence beyond their endocasts. Tests of temporal trends Most A. boisei specimens with EV estimates date to the approximate center of the species’ temporal span. The reason for the appearance of a trend is quite clear: there is little variation in the center of the species’ temporal range; the latest two specimens are also the two largest; the earliest two specimens include two of the three smallest (see Fig. 1). A test of a temporal trend might be conducted in several ways. A simple linear regression of endocranial volumes against time will test for a trend, but may be confounded by small numbers of specimens at early and late-temporal extremes. Testing for a difference in means among temporal subsamples may address this problem. 157 NO BRAIN EXPANSION IN Australopithecus boisei Comparing each specimen as a temporal subsample results in Spearman’s rank-order correlation (q), which (Elton et al., 2001) reported as signiﬁcant for their sample of A. boisei EV estimates. Also, following Leigh (1992) and Konigsberg (1990), Elton et al. (2001) applied the ‘‘Hubert test’’ (Hubert et al., 1985), sometimes simply called the ‘‘Gamma’’ (G) test (Wood et al., 1994; Lockwood et al., 2000). This test is a randomization test of association of one continuous and one ranked variable, involving four steps: 1. The age of each specimen is converted to a rank within the sample. For a two-tailed signiﬁcance test, ranks are standardized with a mean of zero. 2. The endocranial volume of each specimen is multiplied by its temporal rank, and all the values thus obtained are summed. This is equivalent to calculating the dot product of a vector of endocranial volumes with a vector of ranks. 3. The sample is reordered at random an arbitrarily large number of times, each time obtaining the dot product of endocranial volume and rank vectors. 4. The statistic G is estimated to be (M 1 1)/(N 1 1), where M is the number of permutations with dot products greater than or equal to that of the observed sample, and N is the number of permutations examined. A G 0.05 is taken as a signiﬁcant rejection of the null hypothesis of no trend. It is perhaps of interest that although the Hubert test uses the dot product of the two vectors, the use of the product–moment correlation yields precisely the same G (shown in Appendix). Samples for which the dot product shows a signiﬁcant trend are samples that have signiﬁcant correlations between EV and temporal ranks. This suggests a weakness of the test, because a correlation is a measure not of change over time, but of ﬁt to a linear model. A sample may have a signiﬁcant correlation with very little change, if its variance is also very low. Hence, the interpretation of the test depends on whether the variance is biologically realistic. Because A. boisei appears to be relatively invariant in endocranial volume compared to sexually dimorphic hominoids, the test might be confounded by error in the sample of EV estimates. The Hubert test has been applied in the anthropological literature in two partially incompatible ways, which became evident to me when trying to replicate the results of different studies. As applied by Konigsberg (1990), following Hubert et al. (1985), the vector of temporal ranks is centered on zero (i.e., the values are . . . 22, 21, 0, 1, 2 . . .). But as applied by Leigh (1992) and Elton et al. (2001), the temporal ranks are simple ordinal ranks (i.e., 1, 2, 3, . . .). These two alternatives are mathematically equivalent for performing a one-tailed test. But while the ﬁrst alternative (zero-centered ranks) readily admits a two-tailed test, the second alternative requires a bit more algorithmic complexity for a twotailed test. Elton et al. (2001) and Leigh (1992) did not report whether their tests are one- or two-tailed; following the procedures they described will result in a onetailed test. Wood (1994) also applied the Hubert test to test for trends in dental characters of A. boisei, citing Leigh (1992); these authors also did not specify whether they performed one-tailed or two-tailed tests. Lockwood et al. (2000) used the Hubert test (there called the G statistic) and explicitly described a two-tailed approach. One-tailed tests ignore the strength of any negative associations in the permuted samples and therefore lead TABLE 1. Results of tests 1 and 2 Sample Test Including Omo 323 This study (no Omo 323) Spearman’s q Hubert test Spearman’s q Hubert test Model-based test P value P [ 0.10 P 5 0.10 P [ 0.05 P 5 0.07 P 5 0.07 (ns) (ns) (ns) (ns) (ns) to incorrect assessments of statistical signiﬁcance. This study applies only two-tailed tests of the null hypothesis of no trend. For future research, I recommend the zerocentered ranks approach. Test 1: Lower estimate for KNM-WT 17400 Falk et al. (2000) argued that smaller estimates are more accurate for several robust australopithecine specimens, and the smaller estimates were generally used by Elton et al. (2001). One exception is KNM-WT 17400, for which Elton et al. (2001) used the highest estimate of 500 ml (Brown et al., 1993), even though both Holloway (1988b) and Falk et al. (2000) adopted much lower estimates, between 390 and 400 ml. This smaller estimate would make KNM-WT 17400 the smallest member of the sample. A small size for this specimen at the center of the species’ time range increases overall sample variability and decreases the relative contribution of early specimens to that variability. This makes KNM-WT 17400 very important to any test of a trend. As a preliminary step, I recalculated Spearman’s q and the Hubert test statistic G for the sample of Elton et al. (2001), using the smaller 390-ml estimate for KNMWT 17400. This replicates the methods of that study, except for the change in size of the single KNM-WT 17400 specimen. Test 2: Model-based simulation values A difﬁculty of the A. boisei sample is the nonindependence of estimates. Less complete specimens have been reconstructed using explicit information from more complete endocasts, chieﬂy Sts 5 and OH 5. The sample should therefore have reduced variation compared to a sample of intact crania. A reduced variance may increase the chance that a null hypothesis of stasis will be falsely rejected. This is a context in which randomization tests are potentially invalid: they do not assume a statistical distribution, but they do assume independence. An additional aspect of the problem is that the state of preservation of fossils may be autocorrelated with time. In the present sample, the early and late specimens are relatively complete, whereas the middle of the time range is dominated by incomplete specimens. This situation arises frequently in paleontology, because species abundance is often highest at the center of a species’ temporal range. Early and late specimens will be more likely attributed to a species if their anatomy is unambiguous— which is more likely if they are more complete. Early or late specimens may be represented at different fossil localities than the majority of specimens, again requiring more complete specimens for conﬁdent assignment. In a Holocene context, specimens are likely to be more fragmentary and rarer earlier in time. These situations present the possibility of ﬁnding spurious trends due to differential preservation. American Journal of Physical Anthropology 158 J. HAWKS To attempt to correct for these issues, it is necessary to use tests that rely on an explicit model of sample variability, instead of randomization of the sample values themselves. A simple model-based test replaces the sample EV estimates with new random deviates from a normal distribution. A normal distribution takes two parameters: the population mean and standard deviation. Deviates drawn from this distribution are independent; an arbitrary number of simulated samples may be obtained by repeatedly drawing new values to replace the sample values. Here, the model-based sampling technique was used to generate samples with the same temporal ranks as the observed data, but with new EV values. In cases where the observed sample has two specimens of the same date, two specimens in all simulated samples were assigned the same temporal rank. The observed A. boisei sample has two such pairs of specimens. As in the Hubert test, the computer generated an arbitrarily large number of simulated samples (in this study, 100,000). The dot product of EV and temporal rank vectors in each simulated sample is compared to the dot product of the observed sample. The signiﬁcance measure is taken as (M 1 1)/(N 1 1), where N is the number of simulated samples, and M is the number of those samples in which the absolute value of the dot product is more extreme than the observed value. This is a two-tailed test of the null hypothesis of no trend. I refer to the test below as the ‘‘model-based Hubert test.’’ This test was applied to the A. boisei sample described earlier, including KNM-WT 17000, excluding the extremely fragmentary Omo 323-1976-896, and using an estimate of 390 ml for KNM-WT 17400. Simulated samples were generated using the observed sample mean (468 ml) and standard deviation (49.1). Test 3: Arbitrary variation The model-based Hubert test described earlier is not limited to the observed sample variation. It can also be applied using a different value for the population standard deviation. This option is relevant to the A. boisei endocranial volume sample, because the sample of estimates may have lower variation than the population from which the specimens were drawn. Even with the lower estimate of 390 ml for KNM-WT 17400, the CV of the observed A. boisei sample is still only 10.3%—between chimpanzees (9.7) and orangutans (10.9). This value might be uncharacteristic of the A. boisei population, if its sexual dimorphism or temporal variability is undersampled by available EV estimates. Because the test described here derives its simulated EV estimates from a model distribution, it is easy to apply a more variable model—for example, matching the CV of gorillas at 13.1% (Tobias, 1971). As a further example, I varied the population CV parameter of the model-based test, covering the entire range between 4 and 15%. This range encompasses the CVs of all extant hominoids. In all cases, I assumed a mean equal to the A. boisei sample mean (468 ml). Using this procedure, it is possible to evaluate whether possible underestimation of variability in the observed sample may affect the signiﬁcance of the test of no trend. RESULTS Test 1: Lower estimate for KNM-WT 17400 The ﬁrst tests performed were on the A. boisei sensu lato sample of Elton et al. (2001), with the exception of a American Journal of Physical Anthropology lower estimate of 390 ml for KNM-WT 17400. With this estimate, the nonparametric Spearman’s correlation q 5 0.52, which is nonsigniﬁcant (P [ 0.10, two-tailed). For the two-tailed Hubert test on the sample, P 5 0.10. For both tests, the lower estimate for KNM-WT 17400 causes the signiﬁcance of a temporal trend in A. boisei to completely disappear. This low estimate currently appears to be a consensus for the specimen, although it must be treated cautiously, because the endocast is less than 50% complete. This single specimen illustrates well the importance of accurate estimates. Test 2: Model-based simulated values The removal from the sample of the 490-ml estimate for Omo 323-1976-896 actually enhances the appearance of a trend. This is reﬂected by the Hubert test result, with P 5 0.07 (compared to P 5 0.10 when Omo 323 is included). Spearman’s nonparametric correlation for the sample was 0.58, again nonsigniﬁcant (P [ 0.05, twotailed). The model-based test described in this work came to a very similar result on this sample, with P 5 0.07. Both these tests failed to reject the null hypothesis of no trend for the A. boisei sample. Further examination of the simulated samples gave some indication of the relationship between sample variability and the appearance of a trend. One hypothesis might be that the size of early KNM-WT 17000 specimen is actually relatively extremely small, and the late KGA 10-525 specimen is actually relatively extremely big, resulting in the appearance of a steady expansion from smallest to biggest through the sample. The simulated samples, in which specimens are drawn from a population with equal standard deviation (49.1) to the A. boisei sample, rejected this hypothesis. Forty-four percent of the simulated samples had at least one specimen smaller than 390 ml, the smallest in the observed sample. Fortysix percent had at least one specimen larger than 545 ml, and 19% of simulated samples had specimens more extreme than both the largest and smallest of the observed sample. Test 3: Arbitrary variation An alternative hypothesis is that the appearance of a trend is due to low-sample variability, increasing the correlation of EV, and temporal rank. The result of the model-based test applied to a range of model CV between 4 and 15% shows the close relationship of signiﬁcance of the A. boisei trend and population variation (Fig. 2). Brieﬂy, the greater the variation in the population, the more likely each simulated sample will present a trend at least as great as that in the observed sample. If the A. boisei sample was drawn from a population with greater EV variability, then the level of correlation of EV with time is less surprising. If the A. boisei population were as variable in endocranial volume as extant gorillas, then 15.1% of randomly drawn samples would exhibit an apparent trend as strong as or stronger than the observed sample. With the extant sample, it is not possible to conﬁrm this hypothesis of underrepresentation—in particular, body size dimorphism does not necessarily follow from variability in cranial and masticatory variability. NO BRAIN EXPANSION IN Australopithecus boisei Fig. 2. Result of Test 3, testing the signiﬁcance of a trend in A. boisei with a range of models for population CV. Each point represents 100,000 simulated samples of equal mean to the A. boisei sample and CV given as on the x-axis. The greater the assumed variation in the underlying population, the greater the chance that an increase over time equal or greater than that in the A. boisei sample will be observed. There is no signiﬁcant trend for any model of variation within the range of living great apes and humans. DISCUSSION The problem with testing a trend in any early hominid species is similar in form to the problems discussed by Holloway (1970). All reconstructions are based on relevant knowledge of the anatomy of other specimens. Whether reconstructions are done on crania, endocasts, or CT data, they all rely on knowledge of more complete specimens—for A. boisei endocasts, these models include OH 5 and KNM-ER 23000, and the well-known A. africanus endocast Sts 5. When we test hypotheses using samples of reconstructions, we are to some extent including multiple instances of these well-known specimens, spread through many semi-independent reconstructions. There is no ready statistical model to incorporate the effects of estimation error from fragmentary specimens. These estimates are likely to be biased by the use of more complete specimens as models, the more frequent preservation of some parts of the cranial surface as opposed to others, or unrecognized sex differences in fossil individuals. In other words, one effect of estimation error is to reduce the variation within the fossil sample. Interestingly, this problem does not present itself as a lack of precision or repeatability. An estimate based on a fragmentary specimen may be highly repeatable, even by different observers, when the missing portions are well known from model specimens. Instead, the problem may be noted as a bias affecting the between-species and within-species variances among early hominins. Presently, samples assigned to different early hominid species exhibit some anatomical differences. Such differences may reﬂect neuroanatomical adaptations in these species. If so, then it would be anatomically misleading to use a specimen of A. africanus like Sts 5 as a model for the reconstruction of an incomplete A. boisei specimen. On the other hand, small samples will always exhibit some chance differences. If the differences between well-preserved endocasts are mainly idiosyncratic variations, then when we use only other 159 A. boisei specimens as models for incomplete A. boisei reconstructions, we will tend to artiﬁcially inﬂate the differences between A. boisei and A. africanus as well as artiﬁcially reducing variation within A. boisei. The smaller the sample, the more likely that between-species differences will be inﬂated by reconstruction and withinspecies differences minimized. The ambiguity about specimens like Omo L338-y6 pertains to this issue. Should a variant be included within a species or assigned to a different one based on the presence of a diagnostic trait? Different estimates of endocranial volumes may result from different reconstructions, including CT versus more traditional methods of reconstruction. Some authors (e.g., Falk et al., 2000) have provided new estimates for certain specimens that differ by 10% or more from previous estimates, generally representing a reduction in size compared to earlier estimates. Evaluating the accuracy of individual volume estimates is beyond the scope of this analysis, but would be clearly desirable. I believe that the best way to assess the accuracy of such estimates, including the reconstruction of missing parts from models, is for multiple workers to perform blind replication studies, providing open access to CT and endocast data. Even with a CV of 10.3%, the variation in A. boisei is likely undersampled. The extant sample is apparently male-biased, with only three presumed females (KNM-ER 732, KNM-WT 17400, and KNM-ER 407). Incomplete specimens have been reconstructed by modeling after more complete crania, reducing variation from anatomical differences. Beyond this, temporal ﬂuctuations should tend to inﬂate variability with or without a directional trend. Moreover, at 10.3%, the CV may be inﬂated by the use of a very small estimate for KNM-WT 17400, and the inclusion of the similarly small KNM-WT 17000, which may well belong to a different species. The CV values for other anatomical measurements do not necessarily bear on the endocranial volume, but it may be relevant that 10.3% would be near the minimum for large (n [ 8) samples of early hominin molar areas, which range up to 17% for mandibular M1 and M2 areas in A. africanus. These factors also must affect the samples currently assigned to Homo habilis (including KNM-ER 1470), which taken together have an endocranial volume CV of 12.6%. Endocranial volume has a disproportionately important role in differentiating between smaller and larger Plio-Pleistocene Homo morphs, and this may bias the consideration of evolutionary trends in early Homo. Naturally, more numerous fossil specimens would be welcomed as a way to improve our statistical understanding of early hominins. Meanwhile, some statistical methods may bear some increased scrutiny with reference to large samples of living taxa. Resampling approaches assume that the observed data are characteristic of the population variability that they sample. The reconstruction necessary for fossil specimens can in some cases violate that assumption. Using a model-dependent statistical method in this case has highlighted an instance where the variation of a fossil sample presents problems for statistical comparisons. Similar methods might prove fruitful for other hypotheses tested with fossil samples. ACKNOWLEDGMENTS I thank Aaron Sams and Marc Kissel who investigated the statistical tests of trend and made many helpful American Journal of Physical Anthropology 160 J. HAWKS comments. Ralph Holloway deserves many thanks for his advice and encouragement. I also thank Christopher Ruff, Sarah Elton, and one anonymous reviewer for their comments, which greatly helped the manuscript. LITERATURE CITED Brown B, Walker AC, Ward CV, Leakey RE. 1993. A new Australopithecus boisei cranium from East Turkana, Kenya. Am J Phys Anthropol 91:137–159. Elton S, Bishop LC, Wood B. 2001. Comparative context of PlioPleistocene hominin brain evolution. J Hum Evol 41:1–27. Falk D, Redmond JCR, Guyer J, Conroy C, Recheis W, Gerhard W, Weber HS. 2000. Early hominid brain evolution: a new look at old endocasts. J Hum Evol 38:695–717. Holloway RL. 1970. New endocranial volumes for the australopithecines. Nature 227:199–200. Holloway RL. 1981. The endocast of Omo juvenile L338y-6 hominid: gracile or robust Australopithecus. Am J Phys Anthropol 54:109–118. Holloway RL. 1988a. Brain. In: Tattersall I, Delson E, Couvering JV, editors. Encyclopedia of human evolution and prehistory. New York: Garland. p 98–105. Holloway RL. 1988b. ‘‘Robust’’ australopithecine brain endocasts: some preliminary observations. In: Grine FE, editor. Evolutionary history of the ‘‘robust’’ australopithecines. New York: Aldine de Gruyter. p 97–105. Hubert LJ, Golledge RG, Costanzo CM, Gale N. 1985. Tests of randomness: unidimensional and multidimensional. Environ Plan A 17:373–385. Konigsberg LW. 1990. Temporal aspects of biological distance: serial correlation and trend in a prehistoric skeletal lineage. Am J Phys Anthropol 82:45–52. Leigh SR. 1992. Cranial capacity evolution in Homo erectus and early Homo sapiens. Am J Phys Anthropol 87:1–14. Lockwood CA, Kimbel WH, Johanson DC. 2000. Temporal trends and metric variation in the mandibles and dentition of Australopithecus afarensis. J Hum Evol 39:23–55. Silverman N, Richmond B, Wood B. 2001. Testing the taxonomic integrity of Paranthropus boisei sensu stricto. Am J Phys Anthropol 115:167–178. Suwa G, Asfaw B, Beyene Y, White TD, Katoh S, Nagaoka S, Nakaya N, Uzaha K, Renne P, Wolde Gabriel G. 1997. The ﬁrst skull of Australopithecus boisei. Nature 389:489–492. Tobias PV. 1963. Cranial capacity of Zinjanthropus and other australopithecines. Nature 197:743–746. Tobias PV. 1971. The brain in hominid evolution. New York: Columbia. American Journal of Physical Anthropology Walker AC, Leakey RE, Harris JM, Brown FH. 1986. 2.5-Myr Australopithecus boisei from west of Lake Turkana, Kenya. Nature 322:517–522. Wood B, Wood C, Konigsberg L. 1994. Paranthropus boisei: an example of evolutionary stasis? Am J Phys Anthropol 95:117–136. Wood BA. 1991. The cranial remains from Koobi Fora, Kenya. Oxford: Clarendon Press. Wood BA. 1994. The problems of our origins. J Hum Evol 27:519–529. APPENDIX The dot product is commonly used in vector transformations, but interpreting it in the context of a temporal trend may not be intuitive. The dot product of two vectors is the sum of the products of their respective elements: xy¼ n X xi yi i¼1 This product is a measure of the projection of one vector onto the other; it increases as the angle between the vectors (taken from the origin) decreases. The dot product of two perpendicular vectors is zero. The product–moment correlation between two vectors is r¼ n X zxi zyi i¼1 n1 where zxi and zyi are standardized values of xi and yi, respectively. Thus, the product–moment correlation is the dot product of two standardized vectors divided by their rank (minus 1). In a randomization test, the different values of x and y are scrambled with respect to each other. However, the means x and y and the standard deviations sx and sy are constant in all these randomized samples, because each includes exactly the same specimens. Thus, within any random set of permutations of a sample, the product– moment correlation can be obtained by a simple linear transformation from the dot product: P r¼ P P xi yi x y ðn 1Þsx sy

1/--страниц