Statistical presentation and analysis of ordered categorical outcome data in rheumatology journals.код для вставкиСкачать
Arthritis & Rheumatism (Arthritis Care & Research) Vol. 47, No. 3, June 15, 2002, pp 255–259 DOI 10.1002/art.10453 © 2002, American College of Rheumatology ORIGINAL ARTICLE Statistical Presentation and Analysis of Ordered Categorical Outcome Data in Rheumatology Journals MICHAEL P. LAVALLEY1 AND DAVID T. FELSON2 Objective. To assess the appropriateness of presentation of summary measures and analysis of ordered categorical (ordinal) data in three rheumatology journals in 1999, and to consider differences between basic and clinical science articles. Methods. Six hundred forty-four full-length articles from the 1999 editions of 3 rheumatology journals were evaluated for inclusion of an ordinal outcome. Articles were classified as basic or clinical science, and the appropriateness of presentation and analysis of the ordinal outcome were assessed. Chi-square tests were used to evaluate difference in percentages. Results. Ordinal outcomes were identified in 175 (27.2%) of 644 articles. Only 69 (39.4%) had appropriate data presentation, and 111 (63.4%) had appropriate data analysis. Appropriate presentation was seen less commonly in the basic science rather than the clinical science articles, but differences in the occurrence of appropriate analysis were not seen. Conclusion. Ordinal data are common in rheumatology articles, but presentation usually does not conform to recommended guidelines. KEY WORDS. Ordinal; Summary statistics; Hypothesis tests; Estimation. Ordinal data are generated when observations are placed into ordered categories. Such data are often generated by scoring radiographs or histologic slides, or from evaluating questionnaire responses. Ordinal data contain more information than categorical data without ordering (nominal data), but do not contain as much information as continuously measured data. This makes presentation of summary measures and hypothesis testing with ordinal data challenging. Previous analyses of medical research articles have suggested that ordinal outcome data is often presented or Supported by NIH Grant AR-20613. Dr. LaValley’s work was supported by an Arthritis Foundation New Investigator Award. 1 Michael P. LaValley, PhD: Boston University School of Public Health, Boston University School of Medicine, Boston, Massachusetts; 2David T. Felson, MD, MPH: Boston University School of Public Health, Boston University School of Medicine, Boston Medical Center, Boston, Massachusetts. Address correspondence to Michael LaValley, PhD, Boston University Arthritis Center, 715 Albany Street, A203, Boston, MA 02118. E-mail: email@example.com. Submitted for publication May 9, 2001; accepted in revised form October 5, 2001. analyzed in ways that do not account for either the ordering or the categorical structure of the data (1–3). This can lead to biased estimates and reduced ability (low power) to detect important effects. Ordinal variables may be dichotomized as being above or below a fixed cut-off value and treated as binary (0/1), but this combines different levels together and can sacrifice information from the original scale (1). Contingency table methods that are appropriate for unordered categorical data do not take advantage of ordering in the data, resulting in loss of information and difficulty in interpretation (1). Methods for continuous data, such as the mean, standard deviation, Student’s ttest, and F test, make several assumptions (e.g., consistent spacing, symmetry, and normality of the data distribution) that are generally not satisfied by ordinal data. As noted by Altman and Bland, “Although some statistical methods, such as the t-test, are not sensitive to moderate departures from normality, it is generally preferable not to rely on this feature”(4). To use the order information in ordinal data, but to avoid unnecessary assumptions, biostatistics textbooks (5) and journal articles (1,3,4,6 –9) have recommended that nonparametric methods based on ranking the data be used. These methods include use of percentiles, the median, range, and interquartile range for presentation of summary 255 256 LaValley and Felson Table 1. Appropriate assessment of presentation and analysis of ordinal outcomes Category Method Presentation Percentage within each category Median and range or interquartile range Mean and standard deviation after assessment of normality Nonparametric test, Spearman correlation, or ordinal logistic regression Pearson correlation, t-test, linear regression after assessment of normality Logistic regression if dichotomization justified on clinical or scientific grounds Analysis measures (9), and the Wilcoxon (1) and Kruskal-Wallis (3) tests for using the data to test hypotheses. To evaluate whether inappropriate presentation and hypothesis testing with ordinal data is a current problem in rheumatology literature, we examined articles published in 1999, from 3 rheumatology journals. Our objectives were to assess the percentage of articles that use ordinal outcomes in rheumatology journals, to estimate the percentage of articles with presentation of summary measures and analysis with ordinal data that are appropriate, and to determine if there is a difference in percentage of articles with appropriate presentation or analysis between basic and clinical science articles. To simplify data collection and analysis, we focused on ordinal variables used as study outcomes, and excluded ordinal variables used solely as predictors of an outcome. was never stated in the article that normality had been assessed for the outcome. If normality was tested and present, use of methods for continuous normally distributed data were considered appropriate. If both appropriate and inappropriate presentations for an outcome were listed, the method was classified as appropriate. What is termed analysis in this article consists mainly of hypothesis testing, but also includes measures of association and confidence intervals. Appropriate analysis of ordinal outcomes was defined to be any of the methods listed in the analysis category of Table 1. Percentages are used for summary measures in analyses of these data. Testing for associations between article type and journal on the percentages of appropriate presentation and analysis was done with chi-square tests at the 0.05 level of significance. Statistical analysis was performed with SAS version 8 (SAS Institute, Cary, NC). MATERIALS AND METHODS RESULTS To assess the current use of statistics for ordinal data in rheumatology research publications, we evaluated three journals: Arthritis & Rheumatism (A&R), Journal of Rheumatology (JR), and Arthritis Care & Research (AC&R). All 1999 issues of these journals were hand searched for fulllength research articles. Editorials, case reports, and letters were excluded from consideration. A statistician (ML) with a standardized extraction form evaluated articles for inclusion of an ordinal outcome (yes/no). A variable was considered to be an outcome if summary statistics were presented for the variable, if it was compared between groups, or if it was predicted by other variables. All other variables were considered to be predictors. Articles with an ordinal outcome were then evaluated for appropriateness of presentation of summary measures (yes/no), and appropriateness of analysis (yes/no). Articles were also classified in basic or clinical science categories according to either the journal subheading (A&R) or by a rheumatologist (DF). Clinical science articles were defined as those reporting research in which the whole patient was studied. Two types of statistical methods were considered: 1) presentation of summary measures (called descriptive statistics), and 2) statistical analysis (called inferential statistics). Appropriate methods for the presentation of summary measures were defined to be any of those listed in the presentation category in Table 1. Listing the mean and standard deviation was not considered to be adequate if it A total of 644 articles were evaluated (282 A&R; 322 JR, and 40 AC&R), of which 175 (27.2%) were identified as having an ordinal outcome. Percentage of articles with ordinal outcomes varied between the journals (A&R 16.0%, JR 31.1%, AC&R 75.0%). Of these 175 articles with an ordinal outcome, 145 (82.9%) were clinical science topics and 30 (17.1%) were basic science. Some of the ordinal outcomes used in articles included in this sample were Kellgren/Lawrence score (10), staining intensity (11), histologic score (12), Rodnan skin score (13), severity of Lyme arthritis (14), erosion score (15), Larsen score (16), pain measured on a Likert scale (17), and questionnaire response (18). Only 69 (39.4%) of 175 articles had appropriate presentation of summary measures for ordinal outcomes (Table 2). The percentage of articles using an appropriate presentation was higher for clinical science than for basic science Table 2. Appropriate presentation and analysis in articles with an ordinal outcome Articles n % Appropriate presentation % Appropriate analysis Clinical science Basic science Total 145 30 175 44.3 20.0 39.4 61.4 71.4 63.4 Ordinal Outcomes in Rheumatology Journals articles (chi-square ⫽ 6.9, P ⫽ 0.0085), although the rate in both groups was low. In the 106 articles without appropriate presentation of summary results, means and standard deviations were used without assessment of whether the data were normally distributed. Overall, 111 (63.4%) of the 175 articles with an ordinal outcome used appropriate analysis of the ordinal outcome (Table 2). There was no significant difference in the percentages of appropriate analysis between basic and clinical science articles (chi-square ⫽ 1.21, P ⫽ 0.2719), although the percentage in basic science articles was higher. Of the 64 articles without appropriate analysis, 63 used procedures appropriate for normally distributed data (usually the t-test) without asserting that normality had been assessed, and one dichotomized the outcome without justification of the cut-off value. When the data were analyzed for each journal, there were no significant differences in the percentage of articles with appropriate presentation of summary measures for ordinal outcomes (chi-square ⫽ 4.00, P ⫽ 0.1350), or for percentage with appropriate testing of an ordinal outcome (chi-square ⫽ 3.31, P ⫽ 0.1912). DISCUSSION Ordinal outcome data are common in these 3 rheumatology journals, appearing in ⬃25% of all research articles. Appropriate presentation of summary results for ordinal data was uncommon, occurring in only about 40% of articles, and was less frequent in basic science articles than in clinical science. A majority of articles used appropriate hypothesis tests with ordinal outcomes, and this result did not vary significantly between article types. However, in general there is room for improvement in presentation and analysis of ordinal data. Several assessments of the use of ordinal data in medical research articles have been performed in the past. Moses et al (1) surveyed articles from the New England Journal of Medicine for the first six months of 1982 and found 18% (32 of 168) of these used ordinal data. They found inappropriate analysis due to dichotomizing the outcome in 8 out of 27 analyses (30%) and use of contingency tables that ignored the ordering in 9 analyses (33%). Avram et al (2) examined 243 articles from two anesthesia journals in 1981 and 1983 for errors in statistical methods for all outcome types. They found that the most common error in presentation of summary measures was description of ordinal data by means and standard deviations, with 60 instances out of the 65 major presentation errors discovered. Of 308 analysis errors discovered, 24 were due to analyzing ordinal data as if continuous. Forrest and Anderson analyzed 175 papers with ordinal outcomes published in 1982 in 12 major medical journals (3). Of 188 presentations of summary measures, only 49 (26.1%) were appropriate; of 336 hypothesis tests, 116 (35%) were done appropriately. Our results are not directly comparable to these results from previous assessments of methods for ordinal data in the medical literature. Unlike Moses et al (1) we examined the use of methods for presentation of summary measures. 257 We have reported the percentage of articles that have used an inappropriate method rather than the percentage of errors that are of a certain type (as in Avram et al ) or the percentage of presentations or analyses that were inappropriate (as in Forrest and Anderson ). However, we found much lower rates of dichotomization and use of contingency table analysis that ignore ordering than was reported by Moses et al. In addition, if we assume that percentage of inappropriate presentations and tests found by Forrest and Anderson is similar to the percentages of articles with inappropriate presentations and tests in their sample, then we found higher percentages of articles with appropriate presentation of summary measures and analysis than they did. Any such improvement could be due to secular improvements in use of statistical methods in medical research or due to differences in the journals used in sampling. There is a tradition of defending the use of tests designed for continuous normally distributed data for the analysis of ordinal data that originates in psychology and the social sciences (19 –21), and is found to a lesser extent in the medical literature (22,23). The defense has centered on the empirical observation that the significance level for some tests designed for continuous data (mainly the t-test and the F test) is approximately correct when used for ordinal data (20,23). However, these empirical observations do not necessarily extend outside the particular data distributions considered in these papers. Also, there is the issue of test validity: if a test designed for continuous normally distributed data is statistically significant on ordinal data, and a rank-based test is not, which test result should be used? Use of a test that may achieve significance by drawing on assumptions known not to be true (e.g., data normality) over one that does not use these assumptions seems questionable. Finally, there is a misconception that the t-test will be more powerful statistically for ordinal data than a nonparametric test (22,24). The t-test is more powerful for normally distributed data, but the Wilcoxon test has been shown to be more powerful on a variety of real-world continuous data distributions that are not normally distributed (25). Ordinal data are also not normally distributed. Unlike the defense of parametric analysis noted above, there is no tradition of defending the presentation of means and standard deviations as summary measures for ordinal variables. Even apologists for analyzing ordinal data with methods designed for continuous data suggest that these methods of presentation are incorrect (20,24). The main rationale given in introductory statistics textbooks for providing the mean and standard deviation as summary statistics is that for the normal distribution the central 95% of the data fall within 2 standard deviations of the mean (26). This rationale does not hold if the data distribution is not symmetric. In the absence of a symmetric distribution, medians and percentiles are more informative as descriptive statistics than means and standard deviations (9). Therefore, the high levels of inappropriate presentation found in our study and in the studies by Avram et al (2) and Forrest and Anderson (3) point to an ongoing concern in the medical literature. 258 Table 3. Recommendations for presentation and analysis of ordinal outcomes Presentation ⱕ4 categories, use percentiles ⱖ5 categories, use median and interquartile range Analysis Group comparisons, use Wilcoxon or Kruskal-Wallis tests Correlation, use Spearman’s rho or Kendall’s tau Regression analysis, use ordinal logistic regression with appropriate model validation Although we have classified as appropriate the use of testing designed for normally distributed data following assessment of normality, we do not feel that this approach should be used for ordinal outcomes because those data are not normally distributed. Similarly, we would not recommend the use of means and standard deviations following assessment of normality, although this was classified as appropriate for presentation. If means and standard deviations are presented to aid interpretation or make explicit comparisons with previous research, the median and range or percentages should also be presented. In addition, we have allowed the use of any type of ordinal logistic regression to be counted as appropriate, and this is also generous. For all of these models, the assumptions behind them need to be validated for the data under consideration before placing trust in the analysis results (27–30). Our recommendations for methods of presentation and analysis of ordinal outcomes are listed in Table 3. These recommendations are based on the literature cited in the references and on our experiences in presentation and analysis of rheumatologic and medical outcomes. This is not intended to be an exhaustive list, but is intended to provide guidance as to the most commonly used methods. Recursive-partitioning (31), Rasch or Item-Response (32), or Bayesian (33) analyses provide appropriate alternative approaches for ordinal outcomes. In summary, we found that the majority of current articles in rheumatology journals presenting summary measures for ordinal data do not conform to recommendations from journal articles and biostatistics textbooks. To a lesser extent, analysis of ordinal data also does not conform to recommendations. Standard statistical software implements the recommended methods, so there are few barriers to the appropriate presentation and analysis of ordinal data. REFERENCES 1. Moses LE, Emerson JD, Hosseini H. Analyzing data from ordered categories. N Engl J Med 1984;311:442– 8. 2. Avram MJ, Shanks CA, Dykes MH, Ronai AK, Stiers WM. Statistical methods in anesthesia articles: an evaluation of two American journals during two six-month periods. Anesth Analg 1985;64:607–11. 3. Forrest M, Andersen B. Ordinal scale and statistics in medical research. Br Med J (Clin Res Ed) 1986;292:537– 8. 4. Altman DG, Bland JM. Statistics notes: the normal distribution. BMJ 1995;310:298. LaValley and Felson 5. Dawson-Sanders B, Trapp RG. Basic & clinical biostatistics. Norwalk (CT): Appleton & Lange; 1994. 6. Stevens SS. On the theory of scales of measurement. Science 1946;103:677– 80. 7. Gaddis GM, Gaddis ML. Introduction to biostatistics: part 5, statistical inference techniques for hypothesis testing with nonparametric data. Ann Emerg Med 1990;19:1054 –9. 8. Kuzon WM Jr, Urbanchek MG, McCabe S. The seven deadly sins of statistical analysis. Ann Plast Surg 1996;37:265–72. 9. Davies HT. Informative presentation of summary data. Hosp Med 1998;59:154 –5. 10. Ayral X, Ravaud P, Bonvarlet JP, Simonnet J, Lecurieux R, Nguyen M, et al. Arthroscopic evaluation of post-traumatic patellofemoral chondropathy. J Rheumatol 1999;26:1140 –7. 11. Manoussakis MN, Dimitriou ID, Kapsogeorgou EK, Xanthou G, Paikos S, Polihronis M, et al. Expression of B7 costimulatory molecules by salivary gland epithelial cells in patients with Sjögren’s syndrome. Arthritis Rheum 1999;42:229 –39. 12. Jorgensen C, Apparailly F, Canovas F, Verwaerde C, Auriault C, Jacquet C, et al. Systemic viral interleukin-10 gene delivery prevents cartilage invasion by human rheumatoid synovial tissue engrafted in SCID mice. Arthritis Rheum 1999;42:678 – 85. 13. Black CM, Silman AJ, Herrick AI, Denton CP, Wilson H, Newman J, et al. Interferon-␣ does not improve outcome at one year in patients with diffuse cutaneous scleroderma: results of a randomized, double-blind, placebo-controlled trial. Arthritis Rheum 1999;42:299 –305. 14. Chen J, Field JA, Glickstein L, Molloy PJ, Huber BT, Steere AC. Association of antibiotic treatment-resistant Lyme arthritis with T cell responses to dominant epitopes of outer surface protein A of Borrelia burgdorferi. Arthritis Rheum 1999;42: 1813–22. 15. McCartney-Francis NL, Song XY, Mizel DE, Wahl CL, Wahl SM. Hemoglobin protects from streptococcal cell wall-induced arthritis. Arthritis Rheum 1999;42:1119 –27. 16. Lehtinen JT, Kaarela K, Belt EA, Kautiainen HJ, Kauppi MJ, Lehto MU. Incidence of acromioclavicular joint involvement in rheumatoid arthritis: a 15 year endpoint study. J Rheumatol 1999;26:1239 – 41. 17. Wakitani S, Imoto K, Saito M, Murata N, Hirooka A, Yoneda M, et al. Evaluation of surgeries for rheumatoid shoulder based on the destruction pattern. J Rheumatol 1999;26:41– 6. 18. Solomon DH, Bates DW, Horsky J, Burdick E, Schaffer JL, Katz JN. Development and validation of a patient satisfaction scale for musculoskeletal care. Arthritis Care Res 1999;12:96 –100. 19. Gaito J. Scale classification and statistics. Psychol Rev 1960; 67:277– 8. 20. Baker BO, Hardyck CD, Petrinovich LF. Weak measurements vs. strong statistics: an empirical critique of S. S. Stevens’ proscriptions on statistics. Educ Psychol Meas 1966;26:291– 309. 21. Kim J. Multivariate analysis of ordinal variables. Am J Sociology 1975;81:261–98. 22. Armstrong GD. Parametric statistics and ordinal data: a pervasive misconception. Nurs Res 1981;30:60 –2. 23. Heeren T, D’Agostino R. Robustness of the two independent samples t-test when applied to ordinal scaled data. Stat Med 1987;6:79 –90. 24. Gaito J. Non-parametric methods in psychological research. Psychol Reports 1959;5:115–25. 25. Bridge PD, Sawilowsky SS. Increasing physicians’ awareness of the impact of statistics on research outcomes: comparative power of the t-test and Wilcoxon Rank-Sum test in small samples applied research. J Clin Epidemiol 1999;52:229 –35. 26. Moore DS, McCabe GP. Introduction to the practice of statistics. New York: W. H. Freeman & Company; 1993. p. 66. 27. Greenwood C, Farewell V. A comparison of regression models for ordinal data in an analysis of transplanted-kidney function. Can J Stat 1988;16:325–35. 28. Brazer SR, Pancotto FS, Long TT 3rd, Harrell FE Jr, Lee KL, Tyor MP, et al. Using ordinal logistic regression to estimate the likelihood of colorectal neoplasia. J Clin Epidemiol 1991; 44:1263–70. Ordinal Outcomes in Rheumatology Journals 29. Scott SC, Goldberg MS, Mayo NE. Statistical assessment of ordinal outcomes in comparative studies. J Clin Epidemiol 1997;50:45–55. 30. Bender R, Grouven U. Ordinal logistic regression in medical research. J R Coll Physicians Lond 1997;31:546 –51. 31. Bloch DA, Moses LE, Michel BA. Statistical approaches to classification: methods for developing classification and other criteria rules. Arthritis Rheum 1990;33:1137– 44. 259 32. Raczek AE, Ware JE, Bjorner JB, Gandek B, Haley SM, Aaronson NK, et al. Comparison of Rasch and summated rating scales constructed from SF-36 physical functioning items in seven countries: results from the IQOLA Project International Quality of Life Assessment. J Clin Epidemiol 1998;51:1203– 14. 33. Johnson VE, Albert JH. Ordinal data modeling: statistics for social science and public policy. New York: Springer; 1999.