Brief communication Measurement size precision and reliability in craniofacial anthropometry Bigger is better.код для вставкиСкачать
AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 90:495-500 (1993) Brief Communication: Measurement Size, Precision, and Reliability in Craniofacial Anthropometry: Bigger is Better PAUL L. JAMISON AND RICHARD E. WARD Department of Anthropology, Indiana Uniuersity, Bloomington, Indiana 47405 (P.L.J J ; Department of Anthropology, Indiana Uniuersity, Purdue Uniuersity, Indianapolis, and Oral Facial Genetics, Indiana Uniuersity School of Dentistry, Indianapolis, Indiana 46202 (R.E.W.) KEY WORDS Intraobserver error, Repeatability, Measurement magnitude, Scale ABSTRACT In this paper we examine the results of an intraobserver measurement error study involving 49 craniofacial variables that ranged in size from less than 1cm to approximately 20 cm. Repeat measurements were taken on 10 male and 10 female adult subjects (19-59 years old). Our focus is on the relationship between measurement size and measurement error across the 49 variables. We found that the size of the variable showed no relationship with the magnitude of the error as measured by the technical error of measurement. When the error was expressed as a coefficient of relative variation (Malina et al.: Vital and Health Statistics, Series 11, No. 23. Washington, DC: US Department of Health and Human Services, 19731, this quantity was negatively associated with the size of the measurement. Conversely, reliability (Fleiss: The Design and Analysis of Experiments. New York: John Wiley & Sons, 1986) was positively correlated with measurement size. We did not find effects of scale (Marks et al.: Am. J. Epidemiol. 130.578-587,1989) within the individual measurements. Thus, for the range of size of the craniofacial measurements in this study, measurement size must be added to the list of factors such as ease of locating landmarks, measurement technique, and systematic bias in the application of the technique that can affect precision and reliability in anthropometry. o 1993 WiIey-Liss, Inc. Over the past 60 years, several authors have speculated that the size of anthropometric measurements may affect their reliability (i.e., Davenport et al., 1935; Gavan, 1950; Malina et al., 1973). Recently, Marks et al. (1989)reported a relationship between measurement size and measurement error within several skinfold measurements, i.e., error increased as the size of the individual measurement increased. In our own anthropometric research (Ward and Jamison, 1991) we have reported that craniofacial dimensions less than 6 cm were not as reliably ascertained as larger dimensions. In the present study we will examine the relationship between craniofacial variables of different size and the precision and reliability with which they can be measured. In addi0 1993 WILEY-LISS, INC tion we will examine the “effects of scale” defined by Marks et al. (1989)on the basis of errors within individual variables. MATERIALS Forty-nine craniofacial dimensions described by Farkas (1981),covering the entire head and face and ranging in average size from 20 cm down to less than 1 cm, were measured twice on 10 male and 10 female subjects. All were healthy adults between 19 and 59 years of age. Standard anthropometric equipment (spreading and sliding calipers and tapes) was utilized for all measurements. Measurements were obtained by a Received December 2,1991; accepted September 16,1992. 496 P.J. JAMISON AND R.E. WARD trained anthropometrist (R.E.W.)with more than 5 years of experience in a clinical setting. Depictions of the landmarks and additional details on methodology can be found in Ward and Jamison (1991). In our original study we did not “correct” obvious errors such as inversions and voice errors (i.e., recording 20.1 as 21.0). For the present study we did adjust nine probable recording or instrument reading errors out of a total of 1,960 data entries. METHODS The earlier measurement error literature focusing on adults (see references in Jamison and Zegura, 1974; Lohman et al., 19881, generally described intraobserver and/or interobserver error, usually with two or more trials by the same or different investigators spanning a very brief period of time. In contrast, for many of the recent auxological studies (Cameron, 1986; Himes, 1989; Martorell et al., 1975; Pelletier et al., 1991) the problems were broader, i.e., multiple observers working with children of different ages and repeat measurements spanning long enough periods of time that the subject would be expected to change from one session to the next. Thus Habicht et al. (1979) and others who followed their approach (see Mueller and Martorell, 1988) not only defined and discussed reliability, they also calculated unreliability, precision and imprecision, and dependability and undependability. Of these, Mueller and Martorell (1988) conclude that precision and reliability should be reported in every anthropometric study in order to give the reader an impression of the quality of the data. The questions of measurement error addressed by Ward and Jamison (1991) were similar to those of the earlier anthropometric research. The subjects were adults and one observer took two sets of measurements over a brief enough time period that subject change in craniofacial dimensions was not anticipated. We were concerned with anthropometric precision, defined as the closeness of repeated measurements of the same quantity (Sokal and Rohlf, 1969). To examine precision we calculated the technical error of measurement (TEM)and what Malina et al. (1973)call the “coefficientof variation” for each craniofacial variable. The formula for TEM is: TEM = V‘ZZEL where d is the difference between the Time 1 and Time 2 measurements for each subject. This provides a measure of precision that is in the original units of measurement. The coefficient of variation is the TEM divided by the grand mean for each variable. Malina et al. (1973)note that this provides a measure of relative variability; i.e., the magnitude of the error relative to the size of the measurement is reported as a percentage. This is not a coefficient of variation in the traditional statistical sense (see Malina et al., 1973, p. 42); so we have chosen to call it a coefficient of relative variation (CRV). In addition to these measures of precision, we also calculated the intraclass correlation coefficient of reliability (R) described by Fleiss (1986).This measure of R is: V2T R= CTZT + UZe where T represents individual “error-free” scores (the mean of Time 1and Time 2) and e is the difference between Time 1and Time 2 measurements for each individual. According to Fleiss (1986), the result “is directly interpretable as a proportion of variance. It is the proportion of the variance of an observation due to subject-to-subject variability in error-free scores” (p. 3). Thus the higher the value the more reliable the measure. Bivariate plots and regression statistics were obtained to examine the relationships between these measures of precision and reliability and the size of the craniofacial variables. Both linear and curvilinear regressions were calculated using the Statistical Package for the Social Sciences (SPSS, Inc., 1990) implemented at the University Computer Center at Indiana University. Finally, Fleiss (1986) notes that an assumption of the intraclass correlation statistic is independence between the distribution of errors and the value of T (p. 2). Marks et al. (1989) found that for some skinfold measurements this assumption is violated because the error increases in size with the SIZE VS. RELIABILITY IN CRANIOFACIAL ANTHROPOMETRY size of the measurement. They refer to this as the effect of scale. We tested this assumption in our data by running correlations between Time 1vs. Time 2 differences, and the mean of Time 1 and Time 2 measurements. These results are also reported below. 497 TABLE 1. Unrlateral and bilateral craniofacial measurements: Grand mean (cm), technical error of measurement (TEM), coefficient of relative variation (CRVI, and reliability (Ri between Time 1 and Time 2 (measurements ordered bv decreasing size) Grand mean TEM CRV’ R ~ Unilateral Head circumference RESULTS 55.92 .27 0.49 .94 31.30 .28 Mandibular curvature 0.88 .95 Table 1 presents basic descriptive data on 28.64 .25 Maxillary curvature 0.87 .96 19.29 .08 Head length 0.39 .96 the craniofacial anthropometric variables of Head breadth 15.07 .06 0.40 .98 interest in this study. Included in Table 1is 13.81 .lo Bitragal breadth 0.74 .96 the grand mean for each measurement, two Bizygomatic breadth 13.61 .14 1.06 .90 Total facial height 11.97 .26 2.20 .85 indicators of measurement precision (TEM Minimum frontal breadth 1.16 .89 10.35 .12 and CRV), and one reliability indicator (R). Bigonial breadth 10.30 .15 1.43 .90 Biocular breadth 8.96 .ll 1.23 .83 In a previous paper (Ward and Jamison, Lower facial height 7.02 .21 3.00 .88 1991), we discussed other aspects of these Bipupillary breadth 2.54 .78 6.09 .15 data and noted that in craniofacial dimen5.27 .18 Nose length 3.33 .69 Mouth breadth 2.77 .78 4.92 .14 sions of less than 6 cm the CRV seemed 3.43 .08 2.23 .91 Nose breadth rather large and conversely, reliabilities Interocular breadth 3.24 .06 1.73 .90 were low. We concluded that very small Nasal prominence 2.12 .ll 5.05 .67 1.88 .13 7.04 .58 Nasal root breadth craniofacial anthropometric measurements Philtrum length 1.67 .15 8.74 .68 (less than 6 cm) were problematic, especially Philtrum breadth 0.96 .13 13.62 .44 Columella breadth 0.71 .05 7.77 .54 those small measurements with poorly defined landmarks. Table 1demonstrates this Bilateral Lower facial depth (Rt) 13.97 .19 1.35 3 8 conclusion. (Lt) 13.91 .ll 0.78 .97 0.94 .94 Midfacial depth (Rt) 12.52 .12 In relation to the effect of scale, only three 1.02 .94 (Lt) 12.43 .13 variables: mouth breadth, nasal root 1.02 .91 Upper facial depth (Rt) 11.99 .12 breadth, and right ear breadth displayed (Lt) 11.94 .ll 0.95 .93 Labial-tragial depth (Rt) 10.80 .20 1.82 .84 significant correlations between Time 1and (Lt) 10.74 .16 1.47 .89 Time 2 differences and measurement size. Exocanthal-gonial depth (Rt) 9.48 .17 1.79 .88 (Lt) 9.36 .13 1.34 .93 Three significant results out of 49 in a reMandibular depth (Rt) 9.40 .25 2.69 6 7 peated measures test is approximately 6% (Lt) 9.48 .22 2.33 .77 or very close to the 5% expected by chance Exocanthal-tragial depth (Rt) 7.74 .13 1.66 .84 (Lt) 7.70 .14 1.79 .83 alone. In addition, examination of the scatMandibular ramus height (Rt) 6.79 .29 4.25 .56 tergrams indicated that an outlier was re(Lt) 6.63 .30 4.52 .64 sponsible for the significant correlation in 2.00 3 3 Ear length (Rt) 6.20 .12 2.51 3 1 (Lt) 6.20 .16 each of the three cases. We took this to mean Exocanthal-glabella depth (Rt) 5.80 .21 3.58 .51 that there was no systematic “effect of scale” 3.24 .59 (Lt) 5.81 .19 Ear attachment length (Rt) 5.14 .24 4.63 .66 in these three variables. Thus, we feel confi(Lt) 5.10 .19 3.76 .71 dent that we have not violated the assump3.32 .64 Alar depth (Rt) 3.29 .ll tion of independence in our application of (Lt) 3.26 .09 2.75 .76 5.25 .65 Ear breadth (Rt) 3.22 .17 Fleiss’ R statistic. 4.31 .58 (Lt) 3.28 .14 The overall relationship of mean meaPalpebral fissure breadth (Rt) 3.09 .08 2.55 .67 surement size with, respectively, the TEM, 2.85 .62 (Lt) 3.09 .09 4.82 .61 Endocanthal-facial (Rt) 2.35 .ll the CRV, and R can be seen in Figure 1and midline curvature (Lt) 2.35 .10 4.22 .70 Table 2. Figure 1displays scattergrams and regression lines for TEM, CRV, and R vs. ‘CRV reported as a percentage mean measurement size for 49 craniofacial variables ranging in size from less than 1cm (columella breadth) to approximately 20 cm (r = .072; P = .624). However, the CRV and R are both significantly related to size. Ta(head length). These data indicate no relationship be- ble 2 indicates both linear and curvilinear tween mean measurement size and TEM statistics for the CRV because the latter pro- P.J. JAMISON AND R.E. WARD 498 I 3 Fig. 1. Relationship between overall measurement size (grand mean) vs. technical error of measurement (A). coefficient of relative variation (B), and reliability (C). Regression statistics can be found in Table 2. TABLE 2. Regression statistics for relationship between size of craniofacial variables (N = 49) and measures of precision and reliability (variables ranging in size from i l cm to 20 cm): technical error of measurement (TEM), coefficient o f relative variation fCRV) and reliabilitv IRJ Regression TEM vs. size CRV vs. size CRVvs. size R vs. size Intercept Slope ,140 5.177 6.122 ,589 .001 -.324 -.637 ,025 Slope’ ,019 r P ,072 -.771 -.8002 .765 ,624 .OOO ,000 ,000 ’ Curvilinear regression with x2 term ‘Multiple correlation coefficient. vides a significant improvement in the fit of the line. The relationship is negative for CRV indicating that as the average size of the craniofacial variables increases, the percentage error decreases (multiple r = - B O O ; P = .OOO). For R,the relationship is positive (r = .765; P = .OOO), showing that R increases with increasing size of the measurement. To determine whether we were stacking the deck by including both left and right bilateral measures in the analysis, we looked at them separately with the unilateral variables. In both analyses the pattern noted above was seen, i.e., there was no relationship between the TEM and variable size, a negative relationship between size and the CRV,and a positive one between size and R. Malina et al. (1973) discuss the relationship between anthropometric measurement size and both the TEM and CRV in auxological data. They argue that both indicators of precision can be affected by variable size and therefore one should only compare them “within variables measured by the same instrument and within variables of about equal magnitude” (p. 45). To examine this latter assertion, Table 3 presents regression statistics for 48 unilateral and bilateral craniofacial variables broken down into three size categories: 0-5 cm, 5-10 cm, and SIZE VS. RELIABILITY IN CRANIOFACIAL ANTHROPOMETRY TABLE 3. Linear regression statistics for relationship between size of cranwfacial variables and measures of precision and reliability for 48 variables (all unilateral and all bilateral) divided into three size groups: Technical error of measurement (TEMI, coefficient of relative variation (CRVI, and reliability (R) 499 ition. In fact, the magnitude of the error appears to be quite similar between larger and smaller variables. Thus, in percentage terms, the CRV becomes smaller as the average size of the variable increases. Not surRegression Intercept Slope r P prisingly, R displays the converse-it increases with mean measurement size. Variables from &5 cm (N = 16) ,644 ,004 ,125 TEM vs. size ,098 Therefore, measurement size must be con-.667 .005 7.798 -1.301 CRV vs. size sidered along with ease of locating land,635 .008 ,473 ,074 R vs. size Variables from 5 1 0 cm (N = 18) marks, measurement technique, and sys,517 p.006 -.163 TEM vs. size ,229 tematic bias in the application of that p.668 ,002 CRV vs. size 5.970 -.447 technique when attempting to minimize er.032 ,038 ,506 R vs. size ,472 Variables from 10-15 cm (N = 14) ror. Ward and Jamison (1991) concluded ,298 -.013 -.386 .172 TEM vs. size ,022 that the most unreliable craniofacial an-.184 p.604 CRV vs. size 3.442 R vs. size 698 ,017 .618 ,018 thropometric measurements would be those with small absolute dimensions and poorly defined landmarks. To this must now be added that measurement size, at least at the 10-15 cm. Head length is eliminated from small end of the measurement scale, prothis analysis because it is so far outside the vides a continuous relationship with precisize range represented by the other vari- sion and reliability, not a threshold effect. ables. Again, no significant relationship is Our study suggests that R increases with found between TEM and variable size but measurement magnitude while the CRV dethere is a significant negative relationship creases and TEM remains the same. between size and CRV and a significant posFinally, we note that in order to calculate itive relationship between size and R. These valid reliabilities, it is important to test the same results are seen in all three measure- assumption of no relationship between meament size categories. surement error and measurement size within each anthropometric variable, i.e., DISCUSSION the effect of scale. We found this assumption The present study suggests that for cran- to be correct within our data and we recomiofacial measurements in the size range of mend that future reliability studies include 0-20 cm, the TEM is not affected by size but such tests. It must be remembered that the overall both the CRV and R are so affected. Furthermore, when these craniofacial variables are size of the measurements in this study is subdivided into 5 cm size groups, the same less than 20 cm and for the three measurepattern of relationships holds, i.e., no rela- ment groupings, 15 cm and less. This entire tionship between mean measurement size craniofacial measurement battery is toand TEM but a negative relationship be- wards the smaller end of the overall anthrotween size and CRV and a positive relation- pometric scale. Thus we cannot as yet genership between size and R. Using 5 cm size alize our results to the rest of the groups as a criterion of “about equal magni- anthropometric battery, which ranges in tude” (Malina et al., 1973), this result is con- size up to stature measurements. Nor, withtrary to their expectation that such relation- out further study, can we generalize these ships would not be found in variables of results to craniofacial measurements taken on dry skulls. However, it would seem to us approximately the same size. Thus the “bigger is better” concept is dem- that our results might have application in onstrated throughout the measurement size auxological anthropometry. Here, especially range under consideration here. While it for infant measurements, increase in size might seem intuitive that the larger the with age would be a potential factor affectcraniofacial measurement the larger the ab- ing measurement precision and reliability. solute size of the measurement error, the The other potential application of this cauTEM results in this study counter this intu- tion would seem to be for skinfold measure- P.J. JAMISON AND R.E. WARD 500 ments on subjects of all ages. A well-nourished subject population might be measured more precisely and with greater reliability than a poorly nourished sample if the relationships found here for craniofacial anthropometry extend to skinfold determinations. LITERATURE CITED Cameron N (1986) The methods of auxological anthropometry. In F Falkner and JM Tanner (eds.): Human Growth, A Comprehensive Treatise (2nd ed.). New York: Plenum Press, pp. 3-46. Davenport CB, Steggerda M, and Drager W (1935) Critical examination of physical anthropometry of the living. Proc. Am. Acad. Arts Sci. Boston 69:265-285. Farkas LG (1981)Anthropometry of the Head and Face in Medicine. New York: Elsevier. Fleiss J L (1986) The Design and Analysis of Clinical Experiments. New York: John Wiley and Sons. Gavan JA (1950) The consistency of anthropometric measurements. Am. J . Phys. Anthropol. 8r417-426. Habicht JP, Yarbrough C, and Martorell R (1979) Anthropometric field methods: Criteria for selection. In DB Jelliffe and EFP Jelliffe (eds.): Nutrition and Growth. New York: Plenum Press, pp. 365-387. Himes J H (1989) Reliability of anthropometric methods and replicate measurements. Am. J. Phys. Anthropol. 79t77-80. Jamison PL, and Zegura SL (1974) A univariate and multivariate examination of measurement error in anthropometry. Am. J. Phys. Anthropol. 40:197-204. Lohman TG, Roche AF, and Martorell R (eds.1 (19881 Anthropometric Standardization Reference Manual. Champaign, IL: Human Kinetics Books. Malina RM, Hamill P W , and Lemeshow S (1973) Selected body measurements of children 6-11 years, United States. Vital and Health Statistics, Series 11, No. 123. Washington, DC: US Department of Health and Human Services. Marks GC, Habicht J P , and Mueller WH (1989)Reliability, dependability, and precision of anthropometric measurements. Am. J. Epidemiol. 130r578-587. Martorell R, Habicht JP, Yarbrough C, Guzman G, and Klein RE (1975) The identification and evaluation of measurement variability in the anthropometry of preschool children. Am. J. Phys. Anthropol. 43r347-352. Mueller WH, and Martorell R (1988) Reliability and accuracy of measurement. In TG Lohman, A F Roche, and R Martorell (eds.): Anthropometric Standardization Reference Manual. Champaign, IL: Human KInetics Books, pp. 83-86. Pelletier DL, Low JW, and Msukwa LAH (1991) Sources of measurement variation in child anthropometry in the Malawi maternal and child nutrition study. Am. J. Hum. Biol. 3r227-237. Sokal RR, and Rohlf FJ (1969) Biometry. San Francisco: Freeman. SPSS, Inc. (1990) SPSS Reference Guide. Chicago: SPSS, Inc. Ward RE, and Jamison PL (1991) Measurement precision and reliability in craniofacial anthropometry: Implications and suggestions for clinical applications. J. Craniofac. Biol. Dev. Genet. lIt156-164.