Comparability in Skeletal Maturation Research WILLIAM M. MOORE Medical Director, Ross Laborntories, Columbus, Ohio 43216 ABSTRACT Comparability is a fundamental issue in skeletal maturation research. Since the introduction of the first edition of the Greulich-Pyle Atlas and the Tanner-Whitehouse method, a number of methodologic reports have appeared regarding potential sources of error, reliability and replicability i n the assessment of skeletal maturity from hand-wrist radiographs. Some of these reports are mentioned and two recent examples of methodologic studies are cited. Maximum reliability of skeletal assessments can be expected only when there is strict adherence to carefully standardized investigative procedures. Technical as well as human factors must be taken into account to insure minimal variation in findings within and between laboratories over time. Single or serial skeletal radiographs uniformly taken on properly identified subjects constitute a valuable permanent record of biologic maturation. While the film image can be considered as objective evidence of skeletal maturity, a subjective element is introduced in observing and reporting the presence or absence of particular ossification centers, or rating a n ossification pattern against a standard. Intra- and interobserver skeletal maturity assessment replicability relates to such factors as motivation, training, assessment method, and quality control procedures. Suggestions are presented to facilitate comparability in skeletal maturation research, including the possibility of preparation and distribu tion of sets of standardized skeletal radiographs for periodic determination and improvement of assessor reliability. Since the introduction of the first edition of the Greulich-Pyle Atlas (Greulich and Pyle, ' 5 0 ) and the Tanner-Whitehouse method (Tanner, Whitehouse and Healy, '62), a number of methodologic reports have appeared regarding potential sources of error, reliability, and replicability in the assessment of skeletal maturity from hand-wrist radiographs. It seems appropriate in the Symposium on the Assessment of Skeletal Maturity to mention some of these reports, to cite some recent examples of methodologic studies, and to raise for discussion some suggestions designed to improve comparability in skeletal maturation research. Mainland in the mid-1950's described systematic and variable errors in the assessment of radiographs (Mainland, '53, '54). Briefly, he reported that one observer demonstrated a systematic error in tending to underestimate skeletal ages when compared with previous assessments by experts of the same radiographs. He concluded that the skeletal assessment method would probably be sufficiently reAM. J. PHYS.ANTHROP., 35: 411416. liable in the comparison of average skeletal ages of groups of children, but i t would be of doubtful value in the assessment of a single radiograph or of a child's progress. Regarding variable errors, he indicated that there was no significant difference in variable error associated with age of child, sex, differences between skeletal and chronologic age, or differences between radiographs of the same child, except as related to poor reproductions of some films. In both communications an appeal for more data was issued. The second edition of the Greulich-Pyle Atlas carries an effective response to the points made by Mainland (Greulich and Pyle, '59). Early in the 1960's Acheson and colleagues investigated the reIiability of assessing skeletal maturity from radiographs and presented findings related to the use of the Greulich-Pyle Atlas (Acheson et al., '63), the bone-specific approach (Acheson et al., '64), and a comparison of the Greulich-PyIe Atlas and the TannerWhitehouse methods (Acheson et al., '66). 41 1 41 2 WILLIAM M. MOORE Several papers also appeared involving lateral comparisons of skeletal maturity in the human hand and wrist (Dreizen et al., ’57; Roche, ’63) and, more recently, studies have been done on factors influencing the replicability of assessments of skeletal maturity (Roche, Davila, Pasternak and Walton, ’70) including the effect of training (Roche, Rohinann, French and Davila, ’70). EXAMPLES OF RECENT METHODOLOGIC STUDIES To avoid the appearance of favoring either the Greulich-Pyle or the TannerWhitehouse methods of skeletal maturity assessment, I will refer to recent methodologic studies involving the application of both methods. In the first example, two investigators with extensive experience in skeletal maturation assessment, independent of each other and with no background information other than the sex of the child, made estimates of skeletal age for each individual center in hand-wrist radiographs of 157 Chinese children, utilizing the Greulich-Pyle Atlas. The children were born between January, 1960 and March, 1967 and the films were obtained between September, 1969 and March, 1970. While the data are not yet fully analyzed, certain findings are worth mentioning. Less than 6 % of individual center readings were more than 12 months apart. The frequency distribution of maximum single center interobserver skeletal age difference is in table 1. Parenthetically, the first 20 films rated in this double-blind study showed a maximum interobserver rating difference of 18 months, whereas increasing the sample size led to several differences of as much as 30 months and a single difference of 42 months. Another aspect of this study relates to a difference in technique of the two investigators. That is, one deliberately randomized the sequence of bone assessment while the other deliberately read bones in “blocks” such as all metacarpals, all proximal phalanges, etc. It is apparent that the investigator who randomized bone assessments has a lower frequency of all bones in particular “block” being assigned the same skeletal age. However, the investigator using the randomized approach assigned a zero when a center was missing, whereas the other investigator read the end of the shaft. This difference in technique would tend to reduce the frequency of all bones in a “block’ being assigned the same skeletal age under the randomized approach. This factor, however, probably could not account for the relatively higher frequencies of all bones being assigned the same skeletal age utilizing the “block’ approach (table 2). In another example, the Tanner-Whitehouse method of assessing skeletal ma- TABLE 1 Maximztm interobserver difference in skeletnl age for single bones zismg the Greuhclt-Pyle Atlas o n Chiizese boys a n d gcrls ~ Maximum difference Boys Total Girls Number Per cent Number Per cent Number Per cent 3 6 9 12 15 18 21 24 27 30 33 36 39 42 2 4 12 19 16 21 6 1 4 2 2.3 4.6 13.6 21.6 18.2 23.8 6.8 0 0.0 4 15 14 20 13 3 5.8 21.8 20.3 29.0 18.8 4.4 1.1 0 4.6 2.3 0 0 0 1 0 0 0 0.0 0.0 0.0 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.3 5.1 17.2 21.0 22.9 21.0 5.7 1.3 2.5 1.3 0.0 0 0 0 0 0 0 2 8 27 33 36 33 9 2 4 2 1 0.6 Total 88 100.0 69 100.1 157 99.9 months 0.0 0.0 413 COMPARABILITY IN SKELETAL MATURATION RESEARCH TABLE 2 Frequency of nll bones w i t h i n the same “block” b e m g assigned t h e same skeletal age in (I s a m p l e of 157 C h i n e s e children (Grezclich-Pyle M e t h o d ) Metacarpals no. Randomized approach “Block” approach Prox. phalanges 5% 3 1.9 3 1.9 1 0.6 29 18.5 11 7.0 44 28.0 18 11.5 5, TABLE 3 Rating summations Assessor Average Standard deviation 140.80 136.24 136.09 135.00 133.71 133.49 133.18 133.11 133.09 45.65 46.15 44.16 45.89 46.87 45.20 45.55 46.98 42.48 Q Current study L Y . Z 1 From. Malina, ’68;96. no. no. 0.0 Comparison of the r n t m g s of 28 bones averaged over 45 widlographs with thow w p o i t r d b y A c h e s o n e t nl. (’64) 1 ( T n n i i e r - W h i t e h o u s eM e t h o d ) 0 Distal phalanges 0 turity was used. In this instance, the investigator, prior to the assessment of a semi-longitudinal sample of white and Negro elementary school children, assessed the films used by Acheson and colleagues (Acheson et al., ’64) in their study of the reliability of assessing skeletal maturity from hand-wrist films (Malina, ’68). Table 3 indicates the standing of this investigator i n comparison with the participants in the Acheson study. The unweighted summations of ratings for all 28 bones of the hand and wrist, when rank ordered, allowed the investigator to judge his standing among the other assessors. The investigator then proceeded to assess approximately 2200 films without knowledge of the age and sex of the subjects. Subsequently, he reassessed every fifteenth film throughout the series and recorded the average unweighted summations of ratings for all 28 bones for each assessment (table 4) and the average sums of 7 round bones, 13 long bones, and skeletal age for first and second assessments (table 5 ) . The N P M Mid. phalanges no. 5% % degree of replicability was high with correlations for replicate readings yielding r’s of f 0 . 9 8 2 , +0.981, and +0.989 for round bones, long bones and skeletal age, respectively (Malina, ’68, ’70). DISCUSSION I n view of the multiple potential sources of error and factors affecting skeletal maturity assessments, maximum reliability can be expected only when there is strict adherence to carefully standardized investigative procedures. Technical as well as human factors must be taken into account to insure minimal variation in findings within and between laboratories over time. Single or serial skeletal radiographs uniformly taken on properly identified subjects constitute a valuable permanent record of biologic maturation. While the film image can be considered as objective evidence of skeletal maturity, a subjective element is introduced in observing and reporting the presence or absence of particular ossifications centers, or rating a n ossification pattern against a standard. Comparability of findings is frequently a fundamental issue in skeletal maturation research when as assessor rates the same film at different times, or different films at the same or different times. The need for comparability of findings is increased when multiple assessors rate the same or different films at different times. To insure comparability, irrespective of the assessment method, assessors must have a good and constant level of motivation, satisfactory training and experience, and a sensitivity for the need of periodic quality control procedures. Improving comparability in skeletal maturation research. To facilitate comparability in skeletal maturation research, the possibility of preparation and 414 WILLIAM M. MOORE TABLE 4 Average unweighted s u m m a t i o n s of ratings for rill 28 bones on two separate assessments (Tanner-Whitehouse Method) First assessment 1st 50 films 2nd 50 films 3rd 50 films Total 150 films Second assessment Average Standard deviation Average Standard deviation 129.72 135.14 128.00 130.95 22.71 24.18 26.37 24.49 128.58 133.64 126.04 129.42 21.99 24.35 25.94 24.20 From: Malina. ’68; 97. 1 TABLE 5 Averages, standard deviations, a n d product-moment correlation coefficients f o r the s u m of seven round bones, t h e slim of thirteen long bones, a n d skeletal age on tulo separate assessments 1 (Trinner-Whitehouse Method) First assessment Average Second assessment Standard deviation Average Standard deviation r 74.07 74.03 82.61 76.81 0.989 0.981 0.981 0.982 1st 50 films 2nd 50 films 3rd 50 films Total 150 films 192.54 202.84 184.98 193.45 Round bones 76.23 186.80 84.22 193.06 176.26 92.47 185.37 84.32 1st 50 films 2nd 50 films 3rd 50 films Total 150 films 195.72 216.90 196.00 202.87 Long bones 81.29 89.49 86.32 85.77 195.98 209.52 194.10 199.87 76.44 87.28 85.40 82.90 0.981 0.979 0.984 0,981 1st 50 films 2nd 50 films 3rd 50 films Total 150 films 9.67 10.25 9.62 9.85 Skeletal age 2.15 1.89 2.35 2.14 9.62 10.02 9.49 9.71 2.10 1.81 2.34 2.09 0,990 0.983 0.992 0.989 1 From: Malina, ’68; 98. distribution of sets of standardized skeletal radiographs for periodic determination and improvement of assessor reliability deserves consideration. This approach has proved useful in the establishment and maintenance of comparability in microbiological and clinical chemistry laboratories in the United States and throughout the world by the presentation of standard “unknowns” for determinations by such agencies as the Center for Disease Control of the U . S. Public Health Service and the various reference laboratories of the World Health Organization. An important feature of such a program is that there be prompt feedback of results of the standardization tests, together with appropriate comments on how reliability might be improved. During the Intensive Course in Human Biology directed by Gabriel Lasker and Morris Goodman and held at the Wayne State Medical School in Detroit, Michigan, in conjunction with the meeting of the AAPA-SSHB in April, 1968, previously rated hand-wrist radiographs were available for graduate student training. It would seem that implementation of occasional follow-up exercises of this type involving not only graduate students, but also principal investigators, would promote greater comparability of skeletal maturation assessments, both within and between laboratories. This suggestion is predicated on a minimum cost basis in view of the fact that films can be easily and faithfully reproduced and mailed from one center to another, thereby precluding COMPARABILITY IN SKELETAL MATURATION RESEARCH expensive transportation and subsistence costs which would be necessitated by the movement of investigators from one laboratory to another. The process could work the other way, however, by having a single individual or small group of individuals travel from one laboratory to another either to standardize their own techniques or to assist in the standardization of others. Most participants in this symposium can probably recall instances where either one or the other, or both of these procedures have been followed. If some of these beneficial or not so helpful experiences could be shared during the discussion, it might be easier to arrive at a conclusion as to the feasibility of adopting these suggestions for improving comparability i n skeletal maturation research. ACKNOWLEDGMENTS I thank Dr. Ann Sproul of the Child Health and Development Studies, Oakland, California, Dr. Marjorie M. C. Lee of the University of Nebraska College of Dentistry, Lincoln, Nebraska, as well as Dr. Robert M. Malina of the University of Texas, Austin, Texas, for the use of previously unpublished data and findings. LITERATURE CITED Acheson. R. M., G. Fowler, E. I. Fry, M. Janes, K. Koski, P. Urbano and J. J. Van Der Werff Ten Bosch 1963 Studies i n the reliability of assessing skeletal maturity from X-rays. Part I. Greulich-Pyle Atlas. Hum. Biol., 35: 317-349. Acheson, R. M., J. H . Vicinus and G. B. Fowler 1964 Studies i n the reliability of assessing skeletal maturity from X-rays. Part 11. The BoneSpecific Approach. Hum. Biol., 36: 211-228. Studies in the reliability of assessing skeletal maturity from X-rays. Part 111. Greu- 415 lich- Pyle Atlas and Tanner-Whitehouse Method Contrasted. Hum. Biol., 38: 204-218. Dreizen, S . , R. M. Snodgrasse, H . Webb-Peploe, G. S. Parker and T. D. Spies 1957 Bilateral symmetry of skeletal maturation i n the human hand and wrist. Amer. J. Dis. Child., 93: 122127. Greulich, W. W., and S. I. Pyle 1950 First Ed. 1959 Second Ed. Radiographic Atlas of Skeletal Development of the Hand and Wrist. Stanford University Press. Stanford, California. Mainland, D. 1953 Evaluation of the skeletal age method of estimating children’s development. I. Systematic errors i n the assessment of roentgenograms. Ped., 1 2 : 114-129. 1954 Evaluation of the skeletal age method of estimating children’s development. 11. Variable errors i n the assessment of roentgenograms. Ped., 13: 165-1 73. Malina, R. M. 1968 Growth, maturation and performance of Philadelphia Elementary School Children. Ph.D. Dissertation i n Physical Anthropology, Graduate School of Arts and Sciences, University of Pennsylvania. 1970 Skeletal maturation studies longitudinally over one year in American Whites and Negroes six through thirteen years of age. Hum. Biol., 42: 377-390. Roche, A. F. 1963 Lateral comparisons of the skeletal maturity of the human hand and wrist. Amer. J. Roentgen., 89: 1272-1280. Roche, A. F., G. H. Davila, B. A. Pasternak and M. J. Walton 1970 Some factors infiuencing the replicability of assessments of skeletal maturity (Greulich-Pyle). Amer. J. Roentgen., 99: 29% 306. Roche, A. F., C. G. Rohmann, N. Y. French and G. H. Davila 1970 Effect of training on replicability of assessments of skeletal maturity (Greulich-Pyle). Amer. J. Roentgen., 98: 51 1515. Tanner, J. M., R. H. Whitehouse and M. J. R. Healy 1962 A new system for estimating skeletal maturity from the hand and wrist with standards derived from a study of 2,600 healthy British children. 11. The Scoring System. Pans, International Children’s Centre.