вход по аккаунту


Design strategies in multiple sclerosis clinical trials.

код для вставкиСкачать
Design Strategies in Multiple Sclerosis
Clinical Trials
George W. Ellison, MD,” Lawrence W. Myers, MD,” Barbara D. U e , PhD,? M. Ray Mickey, PhD,t
Dershin Ke, MD,S Karl Synddko, PhD,XS Wallace W. Tourtellotte, MD, PhD,*S
The Cyclosporine Multiple Sclerosis Study Group
After analyzing our natural history data on the course of multiple sclerosis (MS) in more than 500 patients followed
for 20 years and our experience in several therapeutic trials, we concluded that a phase 111 (full) trial for efficacy
should have certain properties. For a power of 0.8, a of 0.05, and attrition rate of 10% per year, we think the trial
should have a minimum sample size of 130 (65 in each arm, placebo versus active) if the design is based upon the
proportion of subjects worsening by clinical measures. N o stratification by entry Extended Disability Status Scale
score is needed if worsening is defined as a change of 1.0 units (2 to 0.5 steps) maintained for 90 days for an entry
score of 1 to 5.0 units; or 0.5 units (1 to 0.5 steps) if the entry score is 5.5 to 7 units. We need not stratify by course
(relapsing-remitting versus relapsing-progressive) but are less certain about progression from the onset. No run-in
period is required to define “activity.” Minimum time for treatment is 3 years. We review the justification for our
conclusions; modifications in sample size that are necessary if survival analysis is used; impact of the interferon# trial
(future trials will have an “active” control); and alternative strategies possible if magnetic resonance imaging serves
as the primary outcome.
Ellison GW, Myers LW, h ak e BD, Mickey MR, Ke D, Syndulko K, Tourtellotte WW, The Cyclosporine
Multiple Sclerosis Study Group. Design strategies in multiple sclerosis clinical trials.
Ann Neurol1994;36:S108-S112
My colleagues and I have made estimates of the number of subjects needed for a phase I11 clinical trial for
efficacy. Because the Kumke’s Disability Status Scale
(DSS) and Extended Disability Status Scale (EDSS)
scores are widely used in clinical trials,we have emphasized these scales. We have reached an operational
definition of progression with the scales. We have also
gained insight into the influence of clinical classification
by type of multiple sclerosis (MS), course, and phase;
the need for run-in periods; and the effects of different
scores at entry into the trial upon the eventual
Our recommendations derive from our study of the
natural history of MS in 569 patients with 6,913 visits
since 1971 [l]. Dr Lawrence Myers and I (G. W.E.)
performed the vast majority of examinations (we enjoyed Dr Pierre Duquette’s help for 2 years). We also
used information from a therapeutic trial on methylprednisolone and azathioprine [2] and from a cyclosporin-A trial [3}. To trace the evolution of our appraoch,
we will take the standpoint of a biostatistician who is
advising MS researchers about clinical trials.
We think we may soon run out of appropriate candidates for therapeutic trials just as the number of interesting agents is rapidly increasing. In our center alone,
we are considering 86 different molecular sites for intervention as a treatment for MS. Since the regimen
and safety of each agent must be tested in people with
MS, 40 to 50 volunteers could be involved before we
reach an efficacy trial. Recent full trials have recruited
over 300 patients. Clearly, we must design our trials
so that we enroll the minimum number of patients
while still performing an exemplary efficacy trial (minimize our sample sizes).
There are several ways to do this: 1. mathematically,
using continuous variables; 2. using variables that are
precise (they have a small standard deviation); and 3.
using variables tightly grouped around their means
over the duration of the treatment. For example, we
rmght take a score at entry and subtract or add or
somehow manipulate the score at the end so that we
have a paired measurement or a slope. Also, we think
about unequal group sizes now that we have a treatment that is thought to be effective, and we might try
to find a more common outcome, perhaps something
in the laboratory [4] (magnetic resonance imaging?).
As a statistician, how rmght I design MS clinical trials
for a minimum number of patients? First, I would
change the approach from using variables (clinical and
laboratory measures) given to me by the neurologists
From the Departments of ‘Neurology and tBiomathematics, School
of Medicine, University of California, Los Angeles, CA, and the
$Neurology Service, Wadsworth Veterans Administration Medical
Center, Los Angeles, CA.
Address correspondence to Dr Ellison, 10833 Le Conte Ave, Los
Angeles, CA 90024-1916.
Sl08 Copyright 0 1994 by the American Neurological Association
and then deciding which statistical test seems appropriate to deciding which statistical test(s) would be “optimal” and then workmg to fulfill all the assumptions
upon which the test is predicated. I recognize that the
statistical tests we might use to determine whether our
results should be attributed to chance alone depend
upon the type (discrete, categorical, nominal, ordinal,
continuous, interval, ratio) and distribution (binomial,
normal, Gaussian; parametric, nonpacametric) of the
data 151.
As investigators of MS, all of us generate discrete
data when classifying patients as to type of MS
(clinically definite, probable, possible, laboratorysupported definite). We also classify the course
(relapsing-remitting, relapsing-progressive, progressive from onset) and deal with the phases (relapse, plateau, progression) [6].We rmght analyze such data by
calculating, comparing, and contrasting proportions;
for example, the number of patients with relapses per
group, or the number of patients with progression per
group. We mght also count the total number, or the
frequency, of events, such as the number of relapses
per group.
To decrease the sample size for a trial, I, as a statistician, prefer continuous interval data. Examples might
be: time to relapse, time to progression, slope (change
in the EDSS score per unit of time), or time to walk
25 or 50 m.
John Kwzke’s scales give ordinal data. That is, the
scores are discrete data, but each is ordered by lesser
or greater amounts of “disability” (neurological impairment) than the others. A change of score from 1 to 2
on the DSS or EDSS indicates the patient is worse; a
change from 7.0 to 6.5 indicates the patient is better.
I am aware that ordinal data analyzed with nonparametric smtistical tests may be as powerful or more powerful
than parametric tests, but I want to aim for minimal
sample sizes by using continuous interval data.
There is disagreement on when an ordinal variable
becomes continuous. Actually, every datum is discrete.
As our measurement techniques improve, values usually can be determined more accurately and precisely.
But there is always uncertainty about the exact value
because of measurement variation. How many data
points along the continuum do we require to make
ordinal data continuous? Approximately 6. Since the
DSS has 9 usable steps and the EDSS has 19, we may
be tempted to use the scores as continuous data. For
statistical purposes, however, in addition to the scores
reflecting an underlying continuum of impairment, we
would like changes between the scores to be equidistant (interval). That is, we want the magnitude of the
change between 1.0 and 2.0 to be the same as it is
between 7.0 and 8.0. With the DSS or EDSS, we do
not know that it is.
One might ask what difference does the type of data
Table 1 . Dismte Versus Continuous Data
in Determining Sample Sizes
Discrete L t a
Proportion of patients worsening 2 1.0EDSS steps
P, (Placebo) = 0.5
P, (HOORAY) = 0.25
Recommended sample size = 58
Total sample size = 116
Continuous variable
Each group’s average (standard deviation) in a timed walk:
Placebo = 20 sec ( ? 30)
HOORAY = 10 sec ( 2 15)
Recommended sample size = 32
Total sample size = 64
For the trial of a fictitious new agent named HOORAY, a 50%
improvement is expected after 3 years of treatment; a = 0.05,power
is set at .80.
Extended Disability Status Scale score.
make? Assume that we are going to do a trial of a new
agent named “HOORAY”from which we expect 50%
improvement after 3 years’ treatment (Table 1). The
probability that we will falsely attribute the result to
chance alone is going to be 0.05. We translate that to
mean we w d be 95% certain that HOORAY really
works. Also, we want to make sure we do not miss an
effective treatment and prematurely discard HOORAY. We want to say this result is correct 80% of the
time, so we set the power to detect a real effect, if it
is there, at 0.8.
As shown in Table 1, if we use discrete data and
expect that the proportion of patients worsening equal
to or more than 1.0 EDSS steps in the placebo-treated
group will be 0.5 (or 50%) and in the HOORAYtreated group only 0.25 (25%), we will have achieved
the 50% improvement. The recommended sample size
for each group would be 58 patients 141. If we go
against a placebo, we would need a total of 116 patients. Now let us use a continuous variable to detect
the 50% improvement-each group’s average for the
time it takes to walk 50 yards. Placebo-treated patients
would take 20 seconds. If HOORAY works, the patients go the distance in only 10 seconds, on average.
The recommended number of subjects per treatment
group will be 32, or a total sample size of 64. Although
these values are not adjusted for different effect sizes
[4], placebo effect [7], or attrition, one can readily see
that continuous data make quite a difference.
We would like to present data that we think make
a run-in period to confirm progression unnecessary.
For each patient followed at UCLA, we calculate a
slope, the DSS score change in units per year. We
define “worse” as an increase of more than 0.5 units
in 1 year. Over 2 years, we would expect a worsening
patient’s score to increase more than 1 DSS unit. In
Table 2, note that of 288 patients followed for at least
Ellison et al: Design Strategies in MS Trials S1W
Table 5. Course of 87 Patients in Pwgresrion Phase
Followea’for 2 Years
Table 2. Natural Histoty Outcome of 288 PatientJ
Followed for 2 Years“
DSS Score
DSS Score
< - 0.5
> 0.5
‘These patients were from a study of 569 patients with 6,913 visits
since 1971.
> 0.5
DSS = Disability Status Scale.
DSS = Disability Status Scale.
Table 6. Time to Sustained Worsening
Tabk 3. Natural History Outcome of 172 Patients
FolIowed for 4 Years”
DSS Score
f 0.5
> 0.5
These patients were from a study of 569 patients with 6,913 visits
since 1971.
DSS = Disabdiry Status Scale.
Table 4. Course of 91 Patients with Relapsing-Remitting
Course Followedfor 2 Years
DSS Score
> 0.5
- 0.5
= Disability Status Scale.
2 years, 35% worsened El}. In Table 3, of 172 patients
followed for at least 4 years, 28% worsened [l}.There
is not much difference. So we do not necessarily select
for worsening by spending 6 months or a year following candidates to see if they really do deteriorate.
Worsening in the past (activity) is no guarantee of
change once a person enters a trial E23.
When we are looking at progression, the type of
clinical course or phase also does not predict subsequent worsening very well {8]. In Table 4, of 91 patients with a relapsing-remitting type of course followed for 2 years, 25% w
ill worsen, 72% will remain
the same, and 3% will get better. If we select patients
because they are in “progression phase,” as in Table 5,
we find 17% worse; not much difference. Type of
course is not predictive of outcome 2 years hence.
Therefore, we think the only entry criterion for a
trial focused upon progression as defined by increases
3.0 (1.5)
4.0 (0.6)
4.4 (0.7)
2.7 (0.6)
3.3 (0.5)
5.1 (0.4)
5.8 (1.0)
50% of
75% of
Sample (yr)
Sample (yr)
DSS = Disability Status Scale; SEM = standard error of mean.
of the DSS or EDSS should be a diagnosis of multiple
There is no question that a patient’s DSS score at
entry into the trial can have a drastic effect upon the
outcome [9, lo]. Table 6 shows the mean, median
(50th percentile), and 75th percentile number of years
for patients with DSS scores from 1 to 7 at entry into
the clinic to advance one or more steps and maintain
the advance 90 days (sustained worsening) 111, 121.
For example, 50% of patients (15 of 30) entering the
clinic with a DSS of 1 advanced to 2 or more in 1.7
years. If a patient enters with a DSS of 2 to 5, the
median time to change is close to 2 years. For patients
entering with DSS scores of 6, there was a wait of
nearly 5 years before half worsened. If patients entered
with a 7, it took 3.6 years. These results should be
transferable to EDSS scores, since a 1-unit change in
DSS is equivalent to a 1.0-unit (two 0.5 steps) change
in EDSS.
Consider the impact of these values on a therapeutic
trial. If the vial lasts 2 years and we enter many patients
at EDSS 6.0 or greater, we would not expect much
change in either the control or experimental treatment
group. If we conduct an open trial (without placebotreated controls) with patients with EDSS scores equal
to or greater than 6.0, and our goal is stabilization of
their course for 2 years, we are quite likely to think
the intervention succeeded. Unless more patients than
expected worsen, suggesting that the intervention is
harmful, we could mistakenly attribute the stabilization
SllO Annals of Neurology Supplement to Volume 36, 1994
Table 8. Kapkan Meier Estimatesfor Worsening
Table 7. Types of Worsening“
of change
Changed (%)
3 months
6 months
to prechange
score (%)
‘Data are for patients with entry Disability Status Scale scores of 1
to 6, followed for 3 years.
bFrorn one clinic evaluation to the next, the Disability Status Scale
score increased by 1 or more.
to the treatment.
All we have seen is the naturd history
of relatively slow worsening in patients entering the
trial with DSS scores of 6 or greater.
Let us pause for a moment and consider how to
define progression. With the increasing use of survival
analysis, we need a circumscribed event (like death)
that indicates treatment failure. For example, we
looked at changes e q d to or greater than 1 in the
DSS score over 3 years in patients entering with DSS
scores of 1 to 6. In Table 7 are the percentages of
patients changing (the original sample sizes were more
than 100 patients) {12J Simple worsening means the
patients worsened 1 or more DSS units. Sixty-three
percent of the patients changed, but they returned to
their baseline prechange score within 3 months onethird of the time. If we demanded they maintain the
change for 3 months, 50% worsened and 18% returned to their baseline score (sustained worsening). If
we required the change be maintained for 6 months,
44% changed and 11% returned to baseline. Seven to
thirteen percent of the time, we misclassify a stationary
patient as worse by an increase in the DSS score { 131.
Misclassification because of improvement or worsening
of the DSS score in a patient thought to be clinically
stable occurs 22 to 28% of the time [131. We chose
3 months (90 days) or more as the time the change in
DSS score would have to be maintained to qualify the
change as sustained worsening for declaring treatment
f d u r e with survival analysis.
In survival analysis, the Kaplan-Meier technique
takes into account random attrition from a study population and may give a more accurate estimate of the
probability that a patient will worsen. In Table 8, we
present the percentages of patients with worsening of
1 or more DSS steps sustained for more than 90 days
within 1-, 2-, or 3-year follow-up if the entry DSS was
3 to 6 [lo]. At 1 year, 24% will worsen; at 2 years,
36%; and at 3 years, 50%. These results are also influenced by the patients’ entry DSS scores (Table 9)
[ll, 121.
With the recent emphasis on early treatment of patients in a relapsing-remitting course who have EDSS
scores less than 5.5, many of our patients with EDSS
of 262 Patients
Worsening (%I
1 year
2 years
3 years
‘Data are for patients with entry Disability Status Scale scores of 3
to 6 who sustained worsening of 1 or more steps for more than 90
Table 9. Kapkan-MeierEstimates f.r WorJening of Patients
with Different Extended DisabiIity Status Scale Scorn at Entty
EDSS score
EDSS = Extended ~
Patients Worsening (%)
1 year
2 years
h status
i ~scale.
TabIe 10. Sample3ize Estimates Using Proportions Worsening“
Trial duration
Size of
2 Years
3 Years
‘Power, 80% chance to detect reduction; a = 0.05. A 50% reduction in rate of worsening for parallel groups.
scores of 6.0 or greater feel left out and are anxious
to join a therapeutic trial. If we demand a 1.0-unit
worsening in the EDSS for “treatment fdure,” the latter group of patients would probably be excluded from
any trial lasting less than 5 years. We thought a 0.5-unit
increase in the EDSS score if the entry score was 5.5
to 7.0 might indicate the same worsening as a 1.0 increase if entry score was equal to or less than 5.0. In
patients randomized into the cyclosporin-A placebotreated group who entered with an EDSS score between 3.0 and 5.0, 24% worsened by 1.0 unit (two
0.5 steps) in 1 year, 35% in 2 years. If we required
the patients who entered with scores of 5.5 to 7 to
increase by only 0.5 unit maintained for 3 months,
30% worsened within 1 year, and 44% within 2 years
With the above information, we can design an efficacy trial using estimates based on proportions (Table
10). If we set power at 0.8 and do not make a falsepositive error more than 5 times out of 100 (a =
0.05), we could detect a 50% reduction in treatment
failures in a placebocontrolled parallel group trial by
entering 91 patients per group for 2 years and 65 paEllison et d: Design Strategies in MS Trials S l l l
Table 11. Charactertitics of Active-Contd Equivalence Studies
Interferon-p decreases relapse frequency compared to
Treatment “ X will be compared to interferon-p
Treatment “X” will be compared to the placebo indirectly-“historical control assumption”
May limit design options
SoIution is to report confidence intervals
Increase power to 0.95
If interferon-p is “standard of care,” must include it in
clinical trials of new agents
tients per group for 3 years 1151. Double those numbers for your total sample size if you have 2 groups.
With the successes of interferon-p for reducing relapse frequency and severity, and of highdose adrenal
steroids for decreasing relapse severity, and with the
hope that copolymer-1 and oral myelin will be efficacious, combination treatments are frequently mentioned. Our understanding of United States law is that
in a trial of combination therapies, one must show efficacy for each agent alone as well as for the combination. Efficacy trials will increase in size and complexity.
Be sure to check with the Food and Drug Administration on current requirements early in your trial design.
Now that interferon-p (Betaseron, Berlex Laboratories) has been licensed for exacerbating-(relapsing-)
remitting type of MS, we will have to show that our
new treatment “ X is equivalent to or better than interferon$, which has become the “active control.” Must
we, or dare we, also include a placebo-treated group
in future trials?
Probably not. As expressed in Table 11, we will
make an “historical control assumption” that the real
comparison is with the placebo-treated group in the
original interferon-p trial [16]. This assumption may
limit our trial design options. One way out of this dilemma is to carefully compare confidence intervals of
the results 1171.
We could increase the power 1171. We started with
a power of 0.8 or 80%. If we increase the power to
0.9 or 0.95, then we can be more certain that the two
drugs are equivalent. There are consequences: If we
make that leap to 0.95 power, our sample size per arm
increases to 372 patients, or 744 patients per trial! Dr
David Camenga pointed out that if interferon-p is the
“standard of care” for MS, it must be included in all
trials of new agents for exacerbating-remitting MS.
In conclusion, we wanted to j u s G our recom-
mended values for sample sizes that were given in the
abstract and to consider several experimental designs
for efficacy trials in MS.Whichever design we use, we
must minimize the sample size in future trials.
This work was supported by United States Public Health Service
grants NS-16776 and NS087 11, the Conrad N. Hilton Foundation,
the Sandoz Pharmaceutical Corporation, and various donors.
1. Ellison GW, Myers LW, Mickey MR, et al. The variable course
of multiple sclerosis. Neurology 1989;39:357 (Abstract)
2. Ellison GW, Myers LW, Mickey MR, et al. A placebocontrolled, randomized, doublemasked, variable dosage, clinical
trial of azathioprine with and without methylprednisobne in
multiple sclerosis. Neurology 1989;39:1018-1026.
3. The Multiple Sclerosis Study Group. Efficacy and toxicity of
cyclosporine in chronic progressive multiple sclerosis: a randomized, double-blind, placebo-controlled clinical trial. Ann N e w 1
4. Browner WS, Black D, Newman TB,et al. Estimating sample
size and power. In: H d e y SB, Cummings SR, eds. Designing
clinical research. Baltimore: Williams & Wilkins, 1988:146-149
5. A66 AA, Clark V. Computer-aided multivariate analysis. New
York Van Nostrand Rheinhold Company, 1984:12-78
6. Ellison GW, Myers LW. Taxonomy and multiple sclerosis. In:
Bauer HJ, Poser S, Rimer G, eds. Progress in multiple sclerosis
research. Berlin: Springer-Verlag, 1980:629-63 1
7. Myers LW, Ellison GW, Leake BD, et al. Placebo effect in
multiple sclerosis (MS). Can J Neurol Sci 1993;20:5158 ( A b
8. Myers LW, Ellison GW, h a k e BD. Progression phase of mdti-
ple sclerosis not a useful entry criterion for therapeutic trials.
Ann Neurol 1993;34:312 (Abstract)
9. Ellison GW, Myers LW, h a k e BD. Disability Srarus Scale influence on rate of worsening of multiple sclerosis patients. Ann
Neurol 1992;32:259 (Abstract)
10. Myers LW, Ellison GW, h a k e BD. Sample size estimates for
therapeutic trials for multiple sclerosis. Ann N e w 1 1992;32:
258 (Abstract)
11. Myers LW, Leake BD, Ellison GW. Use of survival analysis to
describe the c o m e of multiple sclerosis. Ann N e w 1 1992;32:
259 (Abstract)
12. Ellison GW, Myers LW, Leake BD. Defining progression for
multiple sclerosis (MS) therapeutic trials. Can J Neurol Sci
1993;20:5130 (Abstract)
13. Myers LW, Ellison GW, Leake BD. Reliability of the Disability
Scale (DSS). Neurology 1993;43:A204 (Abstract)
14. Ellison GW. Myers LW, Leake BD, et al. Revised recommendations for therapeutic trials for multiple sclerosis (MS). Ann Neurol 1993;34:312 (Abstract)
15. Fleiss JL Statistical methods for rates and proportions, ed 2.
New York Wiley & Sons, 1981:264-268
16. Makuch RW, Pledger G, Hall DB, et al. Active control equivalence studies. In: Pease KE, ed. Statistical issues in drug research
and development. New York Marcel Dekker, 1990:225-262
17. Makuch R, Simon R. Sample size requirements for evaluating
a conservative therapy. Cancer Treat Rep 1978;62:1037-1040
Annals of Neurology Supplement to Volume 36, 1994
Без категории
Размер файла
506 Кб
design, strategia, clinical, trials, sclerosis, multiple
Пожаловаться на содержимое документа