вход по аккаунту


Radiographic progression depicted by probability plotsPresenting data with optimal use of individual values.

код для вставкиСкачать
Vol. 50, No. 3, March 2004, pp 699–706
DOI 10.1002/art.20204
© 2004, American College of Rheumatology
Radiographic Progression Depicted by Probability Plots
Presenting Data With Optimal Use of Individual Values
Robert Landewé and Désirée van der Heijde
to the way in which the data are descriptively presented
to the medical readership. Both in RA and in AS,
radiographic progression scores are not normally distributed. Only a small fraction of all patients show substantial progression of damage, and the majority show no
progression at all. One of the assumptions underlying
the use of means and standard deviations—descriptive
parametric statistics with which most clinicians are
familiar—is a normal distribution of the data. Therefore,
progression scores should not be presented (only) as
mean scores with standard deviations. Means and standard deviations calculated for a set of radiographic
scores are extremely sensitive to subtle changes at the
upper extreme, as demonstrated by us previously (8). A
better way of presenting radiographic data is by medians
(the value cutting off the 50th percentile) and 25th and
75th percentiles, or by box-and-whisker plots that in
addition present the 5th and 95th percentiles as well as
the extreme values. Some investigators present logarithmically transformed data which may result in a data set
with a normal distribution, but these are even more
difficult to interpret.
The most important disadvantage of presenting
data as percentiles in comparison with means and standard deviations is that percentiles only relate to 1
observation in the distribution (e.g., the median observation) and neglect the majority of the variable’s values.
Means and standard deviations are inferential statistics
that include all of the variable’s values and describe the
internal coherence of the data. Since the presentation of
percentiles does not allow a proper judgment of the
coherence of the data, it may easily conceal irregularities
in the frequency distribution of radiographic scores. This
may become important if cutoff levels for clinically
important progression scores are chosen: a small change
in the selected cutoff level may have a major effect on
the results. Standard presentation of data (with percentiles only) that are not normally distributed thus gives
Radiographic damage is inherent to inflammatory rheumatic diseases, such as rheumatoid arthritis
(RA) and ankylosing spondylitis (AS). Structural damage evolves slowly over a long period of time, but with
marked interindividual variation (1,2). Radiographic
progression has become increasingly important in evaluating the efficacy of disease-modifying antirheumatic
drugs (DMARDs) and, more recently, biologic agents,
in the treatment of RA. Since biologic agents have been
shown to be effective in AS as well, it is to be expected
that radiographic progression will become an important
outcome for evaluating the potential of these drugs to
prevent structural damage in AS.
Various scoring systems have been developed for
assessment of both RA and AS. Examples are the Sharp
score (with modifications) and the Larsen score (with
modifications) for evaluating progression in RA (3,4),
and the Bath Ankylosing Spondylitis Radiology Index
and the Stoke Ankylosing Spondylitis Spine Score
(SASSS) (with modifications) for evaluating progression
in AS (5–7). Sets of radiographs (hands and feet for RA,
and pelvis and lumbar and cervical spine for AS) obtained at regular time intervals are scored, and the sum
score per patient reflects total damage at a time point.
The within-patient difference occurring between 2 or
more observations is considered to be the individual
change (progression) score.
A number of difficulties limit the interpretability
of radiographic scores in clinical studies. The first relates
Robert Landewé, MD, PhD, Désirée van der Heijde, MD,
PhD: University Hospital Maastricht and Research Institute Caphri,
University of Maastricht, Maastricht, The Netherlands.
Address correspondence and reprint requests to Robert
Landewé, MD, PhD, Department of Internal Medicine/
Rheumatology, University Hospital Maastricht, PO Box 5800, 6202
AZ Maastricht, The Netherlands.
Submitted for publication April 18, 2003; accepted in revised
form November 20, 2003.
rise to a significant loss of information as compared with
presentation of data derived from a normal distribution
by means and standard deviations. Therefore, there is a
consensus that at minimum, presentation of radiographic data should include both the mean and standard
deviation and the median and interquartile range (9).
Another problem is measurement error, the phenomenon whereby different observers score the same
radiographs differently, or 1 observer who scores the
same radiographs twice arrives at different scores. Measurement error is inherent to scoring radiographic progression, because typical features of damage, such as
erosions and joint space narrowing in RA, and erosions,
squaring, and sclerosis in AS, are often subtle and prone
to subjective interpretation (interobserver error). Moreover, positioning and quality of consecutive radiographs
are almost never identical. In the ideal situation of a
randomized controlled trial (RCT), the issue of measurement error is of minor importance because the
subject of interest is treatment effect, and when treatment groups are compared, measurement error is
equally divided across these groups as a consequence of
randomization and blinding of readings. In uncontrolled
observational studies or in analyses within 1 treatment
group in a comparative trial, however, measurement
error can become crucial.
In order to gauge part of measurement error,
there is some consensus that radiographs in RA clinical
trials should be scored by at least 2 readers, and that the
average score obtained by all readers should be used in
analyses (9). There is no definite consensus regarding
whether the readers should be aware of the time order of
the radiographs. Scoring of radiographs with known time
order increases sensitivity to change because it encourages the readers to increase scores in individual patients
over time, but it neglects part of measurement error and
may therefore overestimate “true progression” (10).
Scoring with concealed time order better reflects measurement error, but the true signal (progression) may
easily become lost in the noise of error, and intuitively,
false-negative progression scores can occur (10). Interobserver measurement error (which is only part of the
error) can be demonstrated by Bland and Altman plots,
but use of this technique is difficult to understand by the
untrained audience, and therefore these plots are often
not published in medical journals (11).
Herein we introduce cumulative probability plots
as a means of presenting radiographic progression scores
that addresses the interpretation problems outlined above.
A cumulative probability plot is a visual presentation of all
observed data by plotting the observed cumulative propor-
tion (scores ranked from the lowest through the highest
values, and presented as a cumulative proportion of all
scores) against the variable’s actual value. Unlike descriptive summary statistics, cumulative probability plots include all individual data and enable visualization of the
internal coherence of the data. Probability plots can be
used to help the reader of a given report make a betterinformed judgment about radiographic progression in the
patients studied.
Probability plots
Cumulative probability plots of radiographic
change scores. The COBRA (Combinatietherapie Bij
Reumatoı̈de Arthritis) trial was a 1-year randomized
clinical trial that compared the effects of a treatment
strategy with combination therapy (prednisolone, methotrexate, and sulfasalazine) versus monotherapy (sulfasalazine only) in 135 patients with RA (12). Table 1
Table 1. Frequency distribution of radiographic progression scores
in 135 patients who participated in the COBRA trial
Patients with this score
or below
Progression score
No. of
probability, %
Figure 1. Individual progression scores of 135 rheumatoid arthritis patients who participated in the COBRA trial. Data are
presented by histogram (A), cumulative probability plot (B), and
dot plot (C). See text for additional explanation.
summarizes all observed radiographic progression
scores from the COBRA trial, which were defined as the
difference between the total damage score (van der
Heijde–modified Sharp score) (10) at the end of the trial
and that at the start of the trial. Data were summarized
by change score in 3 different ways: 1) number of
patients with a particular score; 2) cumulative number of
patients with a score less than or equal to that particular
score (cumulative frequency); and 3) cumulative percentage of patients with a score less than or equal to that
particular score (cumulative probability). Cumulative
probability is the cumulative frequency expressed as a
percentage of the total number of patients. Note that
every patient contributes an equal part (1/135 ⫽ 0.0074,
or 0.74%) to the cumulative probability.
If data such as those shown in Table 1 are plotted
in a graph, a bar chart such as that shown in Figure 1A
can be created. Bar charts are useful to provide an
impression about the type of distribution of the data,
e.g., to determine whether the data are normally distributed (bell-shaped curve). The pattern in Figure 1A is a
typical example of a set of radiographic progression
scores: since the change scores with the highest frequencies lie to the left as compared with the scores of the
normal distribution, such a distribution is called skewedto-the-left.
If all separate cumulative probability values (xaxis) are plotted against all separate scores (n ⫽ 135)
(y-axis), a probability plot is created (Figure 1B). It
should be noted that it does not matter which of the 2
variables is plotted on the x-axis and which on the y-axis;
both types of probability plots can be found in the
literature. Every single change score (1 score per patient) is now plotted on the graph and represents a
similar proportion of the cumulative probability (0.74%
in the case of the COBRA trial’s 135 patients), so the
density of dots is similar along the entire range of the
x-axis. The curve is a typical example of a radiographic
score distribution in which the radiographs were scored
with knowledge of the time order. The lowest possible
scores are 0 (truncated to 0), and it is obvious from the
figure that the scores in the lower range (⬍10 Sharp
units) occur far more frequently than those in the higher
range. It is easy to see what proportion of patients has a
change score of 0. High and very high scores occur
sporadically and contribute only very minimally to the
cumulative frequency, but they importantly determine
the curvature of the graph, as well as the mean and the
standard deviation. The median and the 25th and 75th
percentiles can easily be derived from the probability
plot by drawing a straight line from the corresponding
Figure 2. Cumulative probability plots of individual 2-year radiographic
progression scores (in modified Stoke Ankylosing Spondylitis Spine Score
[mod. SASSS] units) in 109 ankylosing spondylitis patients from the
OASIS cohort. Each patient was scored twice by the same reader: once
with concealed time order (circles) and once with open time order
(triangles). (A circle and a triangle with similar cumulative probability do
not necessarily represent the same patient.)
percentile on the x-axis through the curve (Figure 1B).
The matching progression scores can be read from the
Figure 1C shows a dot plot of the same radiographic data. Dot plots also include all separate scores.
Probability plots as well as dot plots allow interpretation
of the coherence of the data (irregularities, “jumps”),
but it is impossible to directly interpret percentiles from
a dot plot.
Cumulative probability plots and measurement
error. The distribution of radiographic scores obtained
from studies in which radiographs are read with known
time order and those with readings with concealed
time order differ importantly. Medians and percentiles
do not easily reflect these differences. Assuming that
radiographs are scored with concealed time order and
there is no “true progression” in the patients, the
“true change score” would be 0 and every deviation
from a change score of 0 is by definition considered
to be random. Thus, this reflects random measurement
error, which can be either negative or positive. In
patients with “true progression,” measurement error
may also be operative, but the signal will exceed the
Figure 3. Cumulative probability plot (A) and Bland and Altman plot (B) of individual 2-year radiographic progression scores (in modified Stoke
Ankylosing Spondylitis Spine Score [mod. SASSS] units) in 109 ankylosing spondylitis patients from the OASIS cohort. Each patient was scored twice by
2 different readers, both of whom read the radiographs with concealed time order. (Circles and triangles in A represent reader 1 and reader 2, respectively;
a circle and a triangle with similar cumulative probability do not necessarily represent the same patient. Each circle in B refers to the same patient scored
by 2 readers, but 1 symbol may comprise more than 1 patient.) Arrow indicates an example of how actual progression scores cannot be easily and directly
depicted in a Bland and Altman plot: the mean score of ⫹5 and the difference score of ⫺4 were derived from actual scores of ⫹3 by reader 1 and ⫹7 by
reader 2.
noise in some patients and not in others. Cumulative
probability plots reveal at a glance the differences
between readings with open and those with concealed
time order. Figure 2 shows the probability plots from
2 different readings in 109 patients with AS who
were part of the OASIS (Outcome Assessments in
Ankylosing Spondylitis International Study) cohort (13):
1 reader scored with open time order and the other
scored with concealed time order. The scorings reflect
the change between the assessment at time 0 and that at
2 years.
The most striking feature of the concealed time
order data (Figure 2) is the occurrence of negative
scores, which do not occur in the open time order data.
Because these scorings incorporate a high number of
scores of 0, this feature is not reflected appropriately by
presenting the median values and 25th and 75th percentiles, which are similar with open time order and concealed time order readings. Another typical feature that
is seen repeatedly in plots that compare scorings with
open and concealed time order is that the curve from the
open time order readings lies to the left of and above the
curve from the concealed time order readings and shows
somewhat higher scores. The best explanation for the
phenomenon that reading with open time order tracks
scores toward higher values is that readers anticipate
progression and score accordingly, whereas they are
likely to be more conservative if they do not know the
true time order, especially in radiographs with minor
Because of measurement error, radiographs are
usually read by 2 or more readers, as noted above.
Cumulative probability plots can be used to visually
depict interreader variability and to explore trends.
Figure 3A shows the probability plot of change scores
obtained by 2 independent readers who scored the same
sets of radiographs of AS patients from the OASIS
cohort (2-year progression scores) according to the
modified SASSS. It is obvious at a glance that reader 1
assigned scores that were somewhat higher than those
assigned by reader 2. Reader 1 saw some progression in
a greater proportion of patients than did reader 2 (was
more sensitive to change), but assigned negative scores
in a smaller proportion than did reader 2 (sensitivity to
change was not at the cost of specificity here). As
compared with reader 2, the entire curve of scores from
reader 1 is to the left.
As mentioned above, Bland and Altman plots
can be used to assess agreement between readers.
These plots present the difference in progression scores
between 2 readers (on the y-axis) against the average
of the progression scores assigned by the readers (on
the x-axis). Figure 3B displays the same data as Figure
3A, but in the format of a Bland and Altman plot.
Again, it is obvious that scores assigned by reader 1
were a little higher (represented by a mean negative
difference between the readers (dotted line), but it is
difficult to deduce additional information from this
What are the differences between probability
plots and Bland and Altman plots? First, the actual
progression scores can be easily and directly depicted
in the probability plot. Additional inference is needed in
order to obtain this information from a Bland and
Altman plot. An example is the dot designated with an
arrow in Figure 3B: the mean score of ⫹5 and the
difference score of ⫺4 derive from an actual score of ⫹3
by reader 1 and of ⫹7 by reader 2. Second, in probability
plots, unlike Bland and Altman plots, the scores by 2
readers for a particular value on the x-axis do not
necessarily represent the same patient. Third, probability plots can simultaneously plot the scores by more than
2 readers, which is not possible with Bland and Altman
plots. An advantage of probability plots is that they are
appropriate for investigating the coherence of the data
in the group, with presentation of the actual progression
scores. It should be noted, however, that probability
plots are not appropriate to quantify measurement
error, which can be done by using the data from Bland
and Altman plots. Therefore, the 2 types of plots give
complementary information, and which of them to use,
or whether to use both, depends on the data and the
study question.
Cumulative probability plots and clinical trials.
A third application area for cumulative probability plots
is the RCT. Radiographic progression is a pivotal outcome measure of many RCTs in RA and may become a
key outcome measure in RCTs in AS. Probability plots
can be used to visually compare the distributions of
results in 2 (or more) treatment arms. Figure 4 shows the
probability plots for the 2 treatment arms of the COBRA
trial. COBRA combination therapy was shown to be
significantly better than sulfasalazine monotherapy in
slowing 1-year progression, as well as 5-year progression,
of radiographic damage (14). The plots immediately
Figure 4. Cumulative probability plots of individual 1-year radiographic progression scores in 135 rheumatoid arthritis patients who
participated in the COBRA trial (67 patients in the monotherapy
group [circles] and 68 patients in the combination therapy group
[triangles]). Cumulative probability was calculated per group.
show that the treatment groups differed with respect to
radiographic progression. In the COBRA trial the curve
representing the combination therapy group lies closer
to the x-axis than that representing the monotherapy
group, along the entire range of change scores except for
those close to or equal to 0. The latter represents the
“bottom” effect inherent to distributions that are truncated to 0. It is also obvious that the distribution for the
monotherapy group includes higher absolute change
scores as compared with the distribution for the combination therapy group.
Finally, the cumulative probability curves are not
entirely “smooth,” and the space between the 2 curves,
which is an indication of the treatment contrast, varies
along the axis of cumulative probability. This irregularity
is important if one realizes that binomial cutoff levels for
radiographic progression are often (understandably)
used to describe the magnitude of the treatment effect.
The probability curves demonstrate that the choice of
the cutoff level is relevant with regard to the magnitude
of the treatment contrast. For example, if a cutoff level
of 0 Sharp units is selected (every patient with a score
⬎0 is considered to have progression), there is progression in 80% of the patients in the combination group
compared with 87% in the monotherapy group, resulting
in a between-group contrast of only 7%. The choice of a
cutoff level of 5 Sharp units, in contrast, would adjudicate progression to 31% and 58% in the combination
group and monotherapy group, respectively, with a
treatment contrast of 27%. As a consequence, an optimal cutoff level (i.e., one that provides the highest
contrast) can easily be constructed by the investigator, as
we have shown previously (8), but this can also easily be
detected by viewing probability plots.
The typical way of presenting radiographic
change scores, by descriptive statistics such as medians
and percentiles combined with means and standard
deviations, gives rise to a loss of potentially relevant
information. Probability plots can be used to visualize
the phenomenon of measurement error or to explore
differences in treatment outcome in clinical trials, and
may provide much more information about the course of
radiographic progression. Arguably the most important
advantage of probability plots over conventional means
of data presentation is that probability plots, unlike
percentiles or box-and-whisker plots, clarify whether
there is coherence of the data. Such coherence may add
to the credibility of a group result compared with
presenting it only as a median. Technical details, such as
concealment of reading order and the subsidiary occurrence of negative scores, which may decisively influence
the interpretation of results, can be easily visualized with
probability plots, whereas this information can be easily
missed if results are presented as medians and 25th/75th
Use of a cutoff level of 0 (or 0.5 if the average of
2 readers is used) is often inadequate for differentiating
patients with and those without progression, an issue
that we recently encountered in a meta-analysis on the
efficacy of DMARDs in slowing radiographic progression (15). We have previously advocated the concept of
the smallest detectable difference (SDD) beyond measurement error as a minimum cutoff level for distinguishing patients with and those without radiographic
progression (9). The SDD level can be easily plotted in
the probability curve, and the benefit of doing so is
obvious. It is easy to see whether the SDD cutoff is a
conservative one with respect to treatment difference,
and the implications of different cutoff levels can be
immediately discerned.
Cumulative probability plots are an aid in explorative analysis. They certainly do not replace statistical
testing, and should be used only as an adjunct to formal
hypothesis testing. However, they may provide useful
information if a between-group difference in a comparative clinical trial does not appear to be statistically
significant. They can help in interpreting Type II error as
a possible cause for lack of a finding of statistical
significance of a trend.
Probability plots do also not replace Bland and
Altman plots. The latter are useful in determining
important sources of measurement error: interreader
variability and systematic error. Probability plots of
change scores aggregated from 2 or more readers do not
provide insight into interreader variability; rather, they
enable visualization of the entire level of measurement
In summary, we propose cumulative probability
plots as a new means to depict radiographic progression
scores in reports of observational or methodologic studies and clinical trials. Probability plots may reveal additional and important information that is not provided by
simply presenting medians and percentiles or box-andwhisker plots. We advocate this application of probability plots in reports of studies involving assessment of
radiographic progression, in order to help readers better
understand what has occurred in the study.
1. Plant MJ, Jones PW, Saklatvala J, Ollier WE, Dawes PT. Patterns
of radiological progression in early rheumatoid arthritis: results of
an 8 year prospective study. J Rheumatol 1998;25:417–26.
2. Wolfe F, Sharp JT. Radiographic outcome of recent-onset rheumatoid arthritis: a 19-year study of radiographic progression.
Arthritis Rheum 1998;41:1571–82.
3. Sharp JT. Radiographic evaluation of the course of articular
disease. Clin Rheum Dis 1983;9:541–57.
4. Larsen A, Dale K, Eek M. Radiographic evaluation of rheumatoid
arthritis and related conditions by standard reference films. Acta
Radiol 1977;18:481–91.
5. MacKay K, Mack C, Brophy S, Calin A. The Bath Ankylosing
Spondylitis Radiology Index (BASRI): a new, validated approach
to disease assessment. Arthritis Rheum 1998;41:2263–70.
6. MacKay K, Brophy S, Mack C, Calin A. Patterns of radiological
axial involvement in 470 ankylosing spondylitis patients [abstract].
Arthritis Rheum1997;40 Suppl 9:S61.
7. Averns HL, Oxtoby J, Taylor HG, Jones PW, Dziedzic K, Dawes
PT. Radiological outcome in ankylosing spondylitis: use of the
Stoke Ankylosing Spondylitis Spine Score (SASSS). Br J Rheumatol 1996;35:73–6.
8. Landewé R, Boers M, van der Heijde D. How to interpret
radiological progression in randomized clinical trials? [editorial].
Rheumatology (Oxford) 2003;42:2–5.
9. Van der Heijde D, Simon L, Smolen J, Strand V, Sharp J, Boers
M, et al. How to report radiographic data in randomized clinical
trials in rheumatoid arthritis: guidelines from a roundtable discussion. Arthritis Rheum 2002;47:215–8.
10. Van der Heijde D, Boonen A, Boers M, Kostense P, van der
Linden S. Reading radiographs in chronological order, in pairs or
as single films has important implications for the discriminative
power of rheumatoid arthritis clinical trials. Rheumatology (Oxford) 1999;38:1213–20.
11. Lassere M, Boers M, van der Heijde D, Boonen A, Edmonds J,
Saudan A, et al. Smallest detectable difference in radiological
progression. J Rheumatol 1999;26:731–9.
12. Boers M, Verhoeven AC, Markusse HM, van de Laar MA,
Westhovens R, van Denderen JC, et al. Randomised comparison
of combined step-down prednisolone, methotrexate and sulphasalazine with sulphasalazine alone in early rheumatoid arthritis. Lancet 1997;350:309–18. Erratum in: Lancet 1998;351:220.
13. Spoorenberg A, de Vlam K, van der Heijde D, de Klerk E,
Dougados M, Mielants H, et al. Radiological scoring methods in
ankylosing spondylitis: reliability and sensitivity to change over
one year. J Rheumatol 1999;26:997–1002.
14. Landewé RBM, Boers M, Verhoeven AC, Westhovens R, van de
Laar MAFJ, Markusse HM, et al. COBRA combination therapy in
patients with early rheumatoid arthritis: long-term structural benefits of a brief intervention. Arthritis Rheum 2002;46:347–56.
15. Jones G, Halbert J, Crotty M, Shanahan EM, Batterham M, Ahern
M. The effect of treatment on radiological progression in rheumatoid arthritis: a systematic review of randomized placebocontrolled trials. Rheumatology (Oxford) 2003;42:6–13.
Без категории
Размер файла
133 Кб
data, progressive, plotspresenting, radiographic, depicted, probability, optima, values, use, individual
Пожаловаться на содержимое документа