вход по аккаунту


Statistical presentation and analysis of ordered categorical outcome data in rheumatology journals.

код для вставкиСкачать
Arthritis & Rheumatism (Arthritis Care & Research)
Vol. 47, No. 3, June 15, 2002, pp 255–259
DOI 10.1002/art.10453
© 2002, American College of Rheumatology
Statistical Presentation and Analysis of
Ordered Categorical Outcome Data in
Rheumatology Journals
Objective. To assess the appropriateness of presentation of summary measures and analysis of ordered categorical
(ordinal) data in three rheumatology journals in 1999, and to consider differences between basic and clinical science
Methods. Six hundred forty-four full-length articles from the 1999 editions of 3 rheumatology journals were evaluated for
inclusion of an ordinal outcome. Articles were classified as basic or clinical science, and the appropriateness of
presentation and analysis of the ordinal outcome were assessed. Chi-square tests were used to evaluate difference in
Results. Ordinal outcomes were identified in 175 (27.2%) of 644 articles. Only 69 (39.4%) had appropriate data
presentation, and 111 (63.4%) had appropriate data analysis. Appropriate presentation was seen less commonly in the
basic science rather than the clinical science articles, but differences in the occurrence of appropriate analysis were not
Conclusion. Ordinal data are common in rheumatology articles, but presentation usually does not conform to recommended guidelines.
KEY WORDS. Ordinal; Summary statistics; Hypothesis tests; Estimation.
Ordinal data are generated when observations are placed
into ordered categories. Such data are often generated by
scoring radiographs or histologic slides, or from evaluating
questionnaire responses. Ordinal data contain more information than categorical data without ordering (nominal
data), but do not contain as much information as continuously measured data. This makes presentation of summary measures and hypothesis testing with ordinal data
Previous analyses of medical research articles have suggested that ordinal outcome data is often presented or
Supported by NIH Grant AR-20613. Dr. LaValley’s work
was supported by an Arthritis Foundation New Investigator
Michael P. LaValley, PhD: Boston University School of
Public Health, Boston University School of Medicine, Boston, Massachusetts; 2David T. Felson, MD, MPH: Boston
University School of Public Health, Boston University
School of Medicine, Boston Medical Center, Boston, Massachusetts.
Address correspondence to Michael LaValley, PhD, Boston University Arthritis Center, 715 Albany Street, A203,
Boston, MA 02118. E-mail:
Submitted for publication May 9, 2001; accepted in revised form October 5, 2001.
analyzed in ways that do not account for either the ordering or the categorical structure of the data (1–3). This can
lead to biased estimates and reduced ability (low power) to
detect important effects. Ordinal variables may be dichotomized as being above or below a fixed cut-off value and
treated as binary (0/1), but this combines different levels
together and can sacrifice information from the original
scale (1). Contingency table methods that are appropriate
for unordered categorical data do not take advantage of
ordering in the data, resulting in loss of information and
difficulty in interpretation (1). Methods for continuous
data, such as the mean, standard deviation, Student’s ttest, and F test, make several assumptions (e.g., consistent
spacing, symmetry, and normality of the data distribution)
that are generally not satisfied by ordinal data. As noted by
Altman and Bland, “Although some statistical methods,
such as the t-test, are not sensitive to moderate departures
from normality, it is generally preferable not to rely on this
To use the order information in ordinal data, but to
avoid unnecessary assumptions, biostatistics textbooks (5)
and journal articles (1,3,4,6 –9) have recommended that
nonparametric methods based on ranking the data be used.
These methods include use of percentiles, the median,
range, and interquartile range for presentation of summary
LaValley and Felson
Table 1. Appropriate assessment of presentation and analysis of ordinal outcomes
Percentage within each category
Median and range or interquartile range
Mean and standard deviation after assessment of normality
Nonparametric test, Spearman correlation, or ordinal logistic
Pearson correlation, t-test, linear regression after assessment
of normality
Logistic regression if dichotomization justified on clinical or
scientific grounds
measures (9), and the Wilcoxon (1) and Kruskal-Wallis (3)
tests for using the data to test hypotheses.
To evaluate whether inappropriate presentation and hypothesis testing with ordinal data is a current problem in
rheumatology literature, we examined articles published
in 1999, from 3 rheumatology journals. Our objectives
were to assess the percentage of articles that use ordinal
outcomes in rheumatology journals, to estimate the percentage of articles with presentation of summary measures
and analysis with ordinal data that are appropriate, and to
determine if there is a difference in percentage of articles
with appropriate presentation or analysis between basic
and clinical science articles. To simplify data collection
and analysis, we focused on ordinal variables used as
study outcomes, and excluded ordinal variables used
solely as predictors of an outcome.
was never stated in the article that normality had been
assessed for the outcome. If normality was tested and
present, use of methods for continuous normally distributed data were considered appropriate. If both appropriate
and inappropriate presentations for an outcome were
listed, the method was classified as appropriate. What is
termed analysis in this article consists mainly of hypothesis testing, but also includes measures of association and
confidence intervals. Appropriate analysis of ordinal outcomes was defined to be any of the methods listed in the
analysis category of Table 1.
Percentages are used for summary measures in analyses
of these data. Testing for associations between article type
and journal on the percentages of appropriate presentation
and analysis was done with chi-square tests at the 0.05
level of significance. Statistical analysis was performed
with SAS version 8 (SAS Institute, Cary, NC).
To assess the current use of statistics for ordinal data in
rheumatology research publications, we evaluated three
journals: Arthritis & Rheumatism (A&R), Journal of Rheumatology (JR), and Arthritis Care & Research (AC&R). All
1999 issues of these journals were hand searched for fulllength research articles. Editorials, case reports, and letters
were excluded from consideration. A statistician (ML)
with a standardized extraction form evaluated articles for
inclusion of an ordinal outcome (yes/no). A variable was
considered to be an outcome if summary statistics were
presented for the variable, if it was compared between
groups, or if it was predicted by other variables. All other
variables were considered to be predictors. Articles with
an ordinal outcome were then evaluated for appropriateness of presentation of summary measures (yes/no), and
appropriateness of analysis (yes/no). Articles were also
classified in basic or clinical science categories according
to either the journal subheading (A&R) or by a rheumatologist (DF). Clinical science articles were defined as those
reporting research in which the whole patient was studied.
Two types of statistical methods were considered: 1)
presentation of summary measures (called descriptive statistics), and 2) statistical analysis (called inferential statistics). Appropriate methods for the presentation of summary measures were defined to be any of those listed in the
presentation category in Table 1. Listing the mean and
standard deviation was not considered to be adequate if it
A total of 644 articles were evaluated (282 A&R; 322 JR,
and 40 AC&R), of which 175 (27.2%) were identified as
having an ordinal outcome. Percentage of articles with
ordinal outcomes varied between the journals (A&R
16.0%, JR 31.1%, AC&R 75.0%). Of these 175 articles with
an ordinal outcome, 145 (82.9%) were clinical science
topics and 30 (17.1%) were basic science. Some of the
ordinal outcomes used in articles included in this sample
were Kellgren/Lawrence score (10), staining intensity (11),
histologic score (12), Rodnan skin score (13), severity of
Lyme arthritis (14), erosion score (15), Larsen score (16),
pain measured on a Likert scale (17), and questionnaire
response (18).
Only 69 (39.4%) of 175 articles had appropriate presentation of summary measures for ordinal outcomes (Table
2). The percentage of articles using an appropriate presentation was higher for clinical science than for basic science
Table 2. Appropriate presentation and analysis in
articles with an ordinal outcome
% Appropriate
% Appropriate
Clinical science
Basic science
Ordinal Outcomes in Rheumatology Journals
articles (chi-square ⫽ 6.9, P ⫽ 0.0085), although the rate in
both groups was low. In the 106 articles without appropriate presentation of summary results, means and standard
deviations were used without assessment of whether the
data were normally distributed.
Overall, 111 (63.4%) of the 175 articles with an ordinal
outcome used appropriate analysis of the ordinal outcome
(Table 2). There was no significant difference in the percentages of appropriate analysis between basic and clinical science articles (chi-square ⫽ 1.21, P ⫽ 0.2719), although the percentage in basic science articles was higher.
Of the 64 articles without appropriate analysis, 63 used
procedures appropriate for normally distributed data (usually the t-test) without asserting that normality had been
assessed, and one dichotomized the outcome without justification of the cut-off value.
When the data were analyzed for each journal, there
were no significant differences in the percentage of articles
with appropriate presentation of summary measures for
ordinal outcomes (chi-square ⫽ 4.00, P ⫽ 0.1350), or for
percentage with appropriate testing of an ordinal outcome
(chi-square ⫽ 3.31, P ⫽ 0.1912).
Ordinal outcome data are common in these 3 rheumatology journals, appearing in ⬃25% of all research articles.
Appropriate presentation of summary results for ordinal
data was uncommon, occurring in only about 40% of
articles, and was less frequent in basic science articles
than in clinical science. A majority of articles used appropriate hypothesis tests with ordinal outcomes, and this
result did not vary significantly between article types.
However, in general there is room for improvement in
presentation and analysis of ordinal data.
Several assessments of the use of ordinal data in medical
research articles have been performed in the past. Moses et
al (1) surveyed articles from the New England Journal of
Medicine for the first six months of 1982 and found 18%
(32 of 168) of these used ordinal data. They found inappropriate analysis due to dichotomizing the outcome in 8
out of 27 analyses (30%) and use of contingency tables that
ignored the ordering in 9 analyses (33%). Avram et al (2)
examined 243 articles from two anesthesia journals in
1981 and 1983 for errors in statistical methods for all
outcome types. They found that the most common error in
presentation of summary measures was description of ordinal data by means and standard deviations, with 60
instances out of the 65 major presentation errors discovered. Of 308 analysis errors discovered, 24 were due to
analyzing ordinal data as if continuous. Forrest and
Anderson analyzed 175 papers with ordinal outcomes
published in 1982 in 12 major medical journals (3). Of 188
presentations of summary measures, only 49 (26.1%) were
appropriate; of 336 hypothesis tests, 116 (35%) were done
Our results are not directly comparable to these results
from previous assessments of methods for ordinal data in
the medical literature. Unlike Moses et al (1) we examined
the use of methods for presentation of summary measures.
We have reported the percentage of articles that have used
an inappropriate method rather than the percentage of
errors that are of a certain type (as in Avram et al [2]) or the
percentage of presentations or analyses that were inappropriate (as in Forrest and Anderson [3]). However, we found
much lower rates of dichotomization and use of contingency table analysis that ignore ordering than was reported
by Moses et al. In addition, if we assume that percentage of
inappropriate presentations and tests found by Forrest and
Anderson is similar to the percentages of articles with
inappropriate presentations and tests in their sample, then
we found higher percentages of articles with appropriate
presentation of summary measures and analysis than they
did. Any such improvement could be due to secular improvements in use of statistical methods in medical research or due to differences in the journals used in sampling.
There is a tradition of defending the use of tests designed for continuous normally distributed data for the
analysis of ordinal data that originates in psychology and
the social sciences (19 –21), and is found to a lesser extent
in the medical literature (22,23). The defense has centered
on the empirical observation that the significance level for
some tests designed for continuous data (mainly the t-test
and the F test) is approximately correct when used for
ordinal data (20,23). However, these empirical observations do not necessarily extend outside the particular data
distributions considered in these papers. Also, there is the
issue of test validity: if a test designed for continuous
normally distributed data is statistically significant on ordinal data, and a rank-based test is not, which test result
should be used? Use of a test that may achieve significance
by drawing on assumptions known not to be true (e.g., data
normality) over one that does not use these assumptions
seems questionable. Finally, there is a misconception that
the t-test will be more powerful statistically for ordinal
data than a nonparametric test (22,24). The t-test is more
powerful for normally distributed data, but the Wilcoxon
test has been shown to be more powerful on a variety of
real-world continuous data distributions that are not normally distributed (25). Ordinal data are also not normally
Unlike the defense of parametric analysis noted above,
there is no tradition of defending the presentation of
means and standard deviations as summary measures
for ordinal variables. Even apologists for analyzing ordinal data with methods designed for continuous data
suggest that these methods of presentation are incorrect
(20,24). The main rationale given in introductory statistics textbooks for providing the mean and standard deviation as summary statistics is that for the normal distribution the central 95% of the data fall within 2
standard deviations of the mean (26). This rationale
does not hold if the data distribution is not symmetric.
In the absence of a symmetric distribution, medians and
percentiles are more informative as descriptive statistics
than means and standard deviations (9). Therefore, the
high levels of inappropriate presentation found in our
study and in the studies by Avram et al (2) and Forrest
and Anderson (3) point to an ongoing concern in the
medical literature.
Table 3. Recommendations for presentation and analysis
of ordinal outcomes
ⱕ4 categories, use percentiles
ⱖ5 categories, use median and interquartile range
Group comparisons, use Wilcoxon or Kruskal-Wallis
Correlation, use Spearman’s rho or Kendall’s tau
Regression analysis, use ordinal logistic regression with
appropriate model validation
Although we have classified as appropriate the use of
testing designed for normally distributed data following
assessment of normality, we do not feel that this approach
should be used for ordinal outcomes because those data
are not normally distributed. Similarly, we would not
recommend the use of means and standard deviations
following assessment of normality, although this was classified as appropriate for presentation. If means and standard deviations are presented to aid interpretation or make
explicit comparisons with previous research, the median
and range or percentages should also be presented. In
addition, we have allowed the use of any type of ordinal
logistic regression to be counted as appropriate, and this
is also generous. For all of these models, the assumptions behind them need to be validated for the data
under consideration before placing trust in the analysis
results (27–30).
Our recommendations for methods of presentation and
analysis of ordinal outcomes are listed in Table 3. These
recommendations are based on the literature cited in the
references and on our experiences in presentation and
analysis of rheumatologic and medical outcomes. This is
not intended to be an exhaustive list, but is intended to
provide guidance as to the most commonly used methods.
Recursive-partitioning (31), Rasch or Item-Response (32),
or Bayesian (33) analyses provide appropriate alternative
approaches for ordinal outcomes.
In summary, we found that the majority of current articles in rheumatology journals presenting summary measures for ordinal data do not conform to recommendations
from journal articles and biostatistics textbooks. To a
lesser extent, analysis of ordinal data also does not conform to recommendations. Standard statistical software
implements the recommended methods, so there are few
barriers to the appropriate presentation and analysis of
ordinal data.
1. Moses LE, Emerson JD, Hosseini H. Analyzing data from ordered categories. N Engl J Med 1984;311:442– 8.
2. Avram MJ, Shanks CA, Dykes MH, Ronai AK, Stiers WM.
Statistical methods in anesthesia articles: an evaluation of two
American journals during two six-month periods. Anesth
Analg 1985;64:607–11.
3. Forrest M, Andersen B. Ordinal scale and statistics in medical
research. Br Med J (Clin Res Ed) 1986;292:537– 8.
4. Altman DG, Bland JM. Statistics notes: the normal distribution. BMJ 1995;310:298.
LaValley and Felson
5. Dawson-Sanders B, Trapp RG. Basic & clinical biostatistics.
Norwalk (CT): Appleton & Lange; 1994.
6. Stevens SS. On the theory of scales of measurement. Science
1946;103:677– 80.
7. Gaddis GM, Gaddis ML. Introduction to biostatistics: part 5,
statistical inference techniques for hypothesis testing with
nonparametric data. Ann Emerg Med 1990;19:1054 –9.
8. Kuzon WM Jr, Urbanchek MG, McCabe S. The seven deadly
sins of statistical analysis. Ann Plast Surg 1996;37:265–72.
9. Davies HT. Informative presentation of summary data. Hosp
Med 1998;59:154 –5.
10. Ayral X, Ravaud P, Bonvarlet JP, Simonnet J, Lecurieux R,
Nguyen M, et al. Arthroscopic evaluation of post-traumatic
patellofemoral chondropathy. J Rheumatol 1999;26:1140 –7.
11. Manoussakis MN, Dimitriou ID, Kapsogeorgou EK, Xanthou
G, Paikos S, Polihronis M, et al. Expression of B7 costimulatory molecules by salivary gland epithelial cells in patients
with Sjögren’s syndrome. Arthritis Rheum 1999;42:229 –39.
12. Jorgensen C, Apparailly F, Canovas F, Verwaerde C, Auriault
C, Jacquet C, et al. Systemic viral interleukin-10 gene delivery
prevents cartilage invasion by human rheumatoid synovial
tissue engrafted in SCID mice. Arthritis Rheum 1999;42:678 –
13. Black CM, Silman AJ, Herrick AI, Denton CP, Wilson H,
Newman J, et al. Interferon-␣ does not improve outcome at
one year in patients with diffuse cutaneous scleroderma: results of a randomized, double-blind, placebo-controlled trial.
Arthritis Rheum 1999;42:299 –305.
14. Chen J, Field JA, Glickstein L, Molloy PJ, Huber BT, Steere
AC. Association of antibiotic treatment-resistant Lyme arthritis with T cell responses to dominant epitopes of outer surface
protein A of Borrelia burgdorferi. Arthritis Rheum 1999;42:
15. McCartney-Francis NL, Song XY, Mizel DE, Wahl CL, Wahl
SM. Hemoglobin protects from streptococcal cell wall-induced arthritis. Arthritis Rheum 1999;42:1119 –27.
16. Lehtinen JT, Kaarela K, Belt EA, Kautiainen HJ, Kauppi MJ,
Lehto MU. Incidence of acromioclavicular joint involvement
in rheumatoid arthritis: a 15 year endpoint study. J Rheumatol
1999;26:1239 – 41.
17. Wakitani S, Imoto K, Saito M, Murata N, Hirooka A, Yoneda
M, et al. Evaluation of surgeries for rheumatoid shoulder
based on the destruction pattern. J Rheumatol 1999;26:41– 6.
18. Solomon DH, Bates DW, Horsky J, Burdick E, Schaffer JL, Katz
JN. Development and validation of a patient satisfaction scale
for musculoskeletal care. Arthritis Care Res 1999;12:96 –100.
19. Gaito J. Scale classification and statistics. Psychol Rev 1960;
67:277– 8.
20. Baker BO, Hardyck CD, Petrinovich LF. Weak measurements
vs. strong statistics: an empirical critique of S. S. Stevens’
proscriptions on statistics. Educ Psychol Meas 1966;26:291–
21. Kim J. Multivariate analysis of ordinal variables. Am J Sociology 1975;81:261–98.
22. Armstrong GD. Parametric statistics and ordinal data: a pervasive misconception. Nurs Res 1981;30:60 –2.
23. Heeren T, D’Agostino R. Robustness of the two independent
samples t-test when applied to ordinal scaled data. Stat Med
1987;6:79 –90.
24. Gaito J. Non-parametric methods in psychological research.
Psychol Reports 1959;5:115–25.
25. Bridge PD, Sawilowsky SS. Increasing physicians’ awareness
of the impact of statistics on research outcomes: comparative
power of the t-test and Wilcoxon Rank-Sum test in small
samples applied research. J Clin Epidemiol 1999;52:229 –35.
26. Moore DS, McCabe GP. Introduction to the practice of statistics. New York: W. H. Freeman & Company; 1993. p. 66.
27. Greenwood C, Farewell V. A comparison of regression models
for ordinal data in an analysis of transplanted-kidney function. Can J Stat 1988;16:325–35.
28. Brazer SR, Pancotto FS, Long TT 3rd, Harrell FE Jr, Lee KL,
Tyor MP, et al. Using ordinal logistic regression to estimate
the likelihood of colorectal neoplasia. J Clin Epidemiol 1991;
Ordinal Outcomes in Rheumatology Journals
29. Scott SC, Goldberg MS, Mayo NE. Statistical assessment of
ordinal outcomes in comparative studies. J Clin Epidemiol
30. Bender R, Grouven U. Ordinal logistic regression in medical
research. J R Coll Physicians Lond 1997;31:546 –51.
31. Bloch DA, Moses LE, Michel BA. Statistical approaches to
classification: methods for developing classification and other
criteria rules. Arthritis Rheum 1990;33:1137– 44.
32. Raczek AE, Ware JE, Bjorner JB, Gandek B, Haley SM, Aaronson NK, et al. Comparison of Rasch and summated rating
scales constructed from SF-36 physical functioning items in
seven countries: results from the IQOLA Project International
Quality of Life Assessment. J Clin Epidemiol 1998;51:1203–
33. Johnson VE, Albert JH. Ordinal data modeling: statistics for
social science and public policy. New York: Springer; 1999.
Без категории
Размер файла
56 Кб
outcomes, categorical, data, presentation, statistics, rheumatology, journal, analysis, ordered
Пожаловаться на содержимое документа