Assessment of different treatment failure criteria in a cohort of relapsingЦremitting multiple sclerosis patients treated with interferon Implications for clinical trials.код для вставкиСкачать
Assessment of Different Treatment Failure Criteria in a Cohort of Relapsing–Remitting Multiple Sclerosis Patients Treated with Interferon ␤: Implications for Clinical Trials Jordi Rı́o, MD, Carlos Nos, MD, Mar Tintoré, MD, Cecilia Borrás, MD, Ingrid Galán, MD, Manuel Comabella, MD, and Xavier Montalban, MD Clinical trials with interferons in relapsing–remitting multiple sclerosis have shown a modest effect on disability using fixed definitions of treatment failure to measure disease progression. However, in the course of the disease, treatment failure may be influenced by interrater variability and frequent remissions. Thus, the purpose of this study was to assess the clinical usefulness of different treatment failure criteria in a cohort of relapsing–remitting multiple sclerosis patients treated with interferon ␤. We studied 252 patients with a follow-up of more than 2 years. We used four different criteria of treatment failure with increasing stringency (1 Expanded Disability Status Scale [EDSS] point increase confirmed at 3 months, 1 EDSS point increase confirmed at 6 months, 1.5 EDSS points increase confirmed at 3 months, and 1.5 EDSS points increase confirmed at 6 months). We divided treatment failure into permanent treatment failure and transient treatment failure. We considered permanent treatment failure when treatment failure was confirmed on the last two scheduled visits and transient treatment failure when treatment failure was not confirmed on these visits at different time points (9, 12, 18, and 24 months). We calculated the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of the different criteria of treatment failure to identify patients who achieved a high degree of disability after 4 years of follow-up. Regardless of the stringency of treatment failure definitions, a variable proportion of patients with treatment failure had transient treatment failure depending on the criterion applied. Patients with transient treatment failure had a significantly lower EDSS at entry compared with those with permanent treatment failure or no treatment failure. The number of relapses in patients with transient treatment failure did not differ from that of patients with permanent treatment failure. The criterion of confirmed 1 EDSS point increase at 6 months showed the best sensitivity (76.5%), with satisfactory specificity (89%). Our study shows that a large proportion of patients treated with interferon experience transient treatment failure that may affect outcome interpretation in clinical trials. Using a more strict criterion, as extending time to confirmation of EDSS deterioration, and longer follow-up may reduce this proportion of patients with transient treatment failure and improve the validity of the results attained in clinical trials. Ann Neurol 2002;52:400 – 406 Clinical trials of various immunomodulatory drugs on relapsing–remitting multiple sclerosis (RRMS) in the past few years have demonstrated a reduction in the relapse rate and modest or nonsignificant effects on disability.1–5 Such effect on disability has been used in the poststudy marketing period to support the use of these drugs. However, the methods used to measure disease progression in these trials could be flawed mainly because of the frequent remissions that occur early in the course of the disease. In a disease that characteristically remits after exacerbations, a method to de- termine “progression” relying on a certain degree of deterioration on clinical ratings sustained for 3 or 6 months may lead to a significant proportion of erroneously assessed treatment failures.6 This phenomenon is common at the lower end of the Expanded Disability Status Scale (EDSS) where most patients recruited for these trials tend to cluster. Similarly, criticism can be made of the Kaplan–Meier survival analysis, used to study time to confirmed progression in disability, because this method assumes that the confirmed progression is irreversible. In the interferon ␤-1b study, dete- From the Unitat de Neuroimmunologia Clı́nica., Hospital Universitari Vall d’Hebron, Barcelona, Spain. Published online Jul 22, 2002, in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/ana.10290 Address correspondence to Dr Rı́o, 2a planta EUI, Unitat de Neuroimmunologı́a Clı́nica, Hospital Universitario Vall d’Hebron, Psg. Vall d’Hebron 119-120, 08035 Barcelona, Spain. E-mail: firstname.lastname@example.org Received Feb 27, 2002, and in revised form Apr 17. Accepted for publication May 1, 2002. 400 © 2002 Wiley-Liss, Inc. rioration by 1 EDSS point 3 years after entry was not confirmed one visit later in 11% of patients,1 and 7 of 15 patients in the interferon ␤-1a (Avonex) trial who met definitions for treatment failure in the first year of study actually improved during follow-up.7 Conversely, natural history studies on cohorts with early multiple sclerosis have shown that up to 24% of relapses last more than 3 months.8,9 Another source of error could come from intrarater variability and fluctuation in patient performance, which could obscure a true reduction in the proportion of treatment failures.10 A treatment failure definition requires setting the criteria a priori to assess deterioration unequivocally. Our goal was not to address treatment efficacy, but to assess the clinical usefulness of different criteria for treatment failure in a cohort of RRMS patients treated with interferon ␤. Patients and Methods In the period 1995 to 2001, 464 patients with MS according to the Poser criteria11 started therapy with interferon ␤ at our center, 384 of whom were RRMS patients. The cohort of patients treated for more than 2 years included 255 patients. Three patients were lost to follow-up during this period, so we finally analyzed 252 patients. One hundred seven of these 252 patients were followed up during at least 4 years. There were no patients lost in the follow-up period between the second and fourth year. All these patients were included in a follow-up protocol collecting basal clinical and demographical data, relapses, and EDSS scores. All patients underwent a neurological assessment every 3 months, and additional visits were conducted in case of relapse. All neurologists participating in the study were trained in EDSS performance. Our group has previously shown an average interater variability of 0.39 with a complete concordance of 89% in 1 EDSS point.12 We defined a relapse as the occurrence, recurrence, or worsening of symptoms of neurological dysfunction lasting more than 24 hours and then stabilizing or eventually resolving either partially or completely. Patients were instructed to communicate any symptom suggestive of an attack by phone. On verification of the information, an unscheduled visit took place in less than 1 week. We used four different criteria of treatment failure (TF) with increasing stringency (criterion A, 1 EDSS point increase confirmed at 3 months; criterion B, 1 EDSS point increase confirmed at 6 months; criterion C, 1.5 EDSS points increase confirmed at 3 months; and criterion D, 1.5 EDSS points increase confirmed at 6 months). Worsening measured on the EDSS could begin on either a scheduled or unscheduled visit, but needed to persist for at least two scheduled visits 3 or 6 months apart. We divided the different TF events into permanent treatment failure (PTF) and transient treatment failure (TTF). We considered PTF when failure was confirmed on the last two scheduled visits at the end of the 2-year follow-up period, and TTF when not confirmed on the same time point visits (Fig 1). We decided to study patients for a period of 2 years because this is the common length in MS clinical trials. After the 2 years of study, Fig 1. The graph shows a hypothetical scenario. Both Patients A (diamonds) and B (squares) experienced treatment failure defined as the increase of 1 point on the Expanded Disability Status Scale scale that was confirmed 3 months later. However, in Patient A this increase was not confirmed for months 21 or 24 (transient treatment failure), whereas in Patient B it was confirmed on months 21 and 24 (permanent treatment failure). we analyzed the differences between these different TF subgroups in their behavior on disability in a subsequent 4-year follow-up period. Additional analyses with the same criteria were conducted at different time points (9, 12, and 18 months) of follow-up. Statistical Methods Statistical analysis was performed with a microcomputer version of the Statistical Package for Social Sciences (SPSS, Chicago, IL). Variance analysis was conducted by analysis of variance (Bonferroni, post-hoc analysis) and Kruskal-Wallis to determine differences among different subgroups (no TF, TTF, or PTF) of the study. We used 2 and Fisher exact test to observe association between categorical variables. We calculated correlation factors between the presence of TF after 2 years of follow-up and different outcome measures of disability that included worsening to EDSS 6 after 4 years of follow-up, increase of EDSS over the 4-year period, and EDSS score at month 48 of follow-up by Spearman correlation coefficient. We applied multiple logistic regression to study different prognostic factors of sustained progression after 2 and 4 years of follow-up including baseline EDSS, duration of MS, age of onset, gender, progression rate (defined as EDSS at entry divided by duration of MS), and number of relapses in the 2 years before the onset of treatment. To define the real value of the different definitions of TF, we calculated the sensitivity, specificity, predictive positive value, negative predictive value, and accuracy of the different criteria of TF to detect patients who achieved an important degree of disability after 4 years of follow-up. We used the cutoff worsening to EDSS 6 after 4 years because it has been considered an important milestone in disease progression,13 Rı́o et al: MS Treatment Failure Criteria 401 However, regardless of the stringency in such definitions and depending of the criterion applied, a variable proportion of TF was transient (Table 1). The proportion of patients with PTF at the different time points of follow-up according to the EDSS at entry is shown in Table 2. Fifty-three percent of patients had baseline EDSS scores below 2.5. In this group, criteria that required confirmation at 3 months (criteria A and C), disclosed a large proportion of patients (approximately 50%) in whom TF was not confirmed (TTF), regardless of the follow-up time point. By contrast, criteria B and D showed only a small proportion of TTF. and this EDSS score sustained after 4 years of follow-up identifies patients with clearly established disability. The value of TF with regard to clinical follow-up at 4 years was expressed as sensitivity (true-positive/[truepositive ⫹ false-negative]), specificity (true-negative/[truenegative ⫹ false-positive]), and accuracy (true-positive ⫹ true negative/[true-positive ⫹false-negative ⫹ truenegative ⫹ false-positive]). Positive predictive value was defined as (true positive/[true-positive ⫹ false-positive]) and negative predictive value as (true-negative/[true-negative ⫹ false negative]). True-positive was defined as TF and EDSS 6 or higher at 4 years, false-positive as TF and EDSS less than 6 at 4 years, false-negative as no TF, and EDSS 6 or higher at 4 years, and true-negative as no TF and EDSS greater than 6 at 4 years. The comparison of ratios was based on the range observed at a 95% confidence interval, considering the absence of overlapping as an indicator of differences. The level of statistical significance was set at p value less than 0.05. Results We studied 252 RRMS patients. The mean age was 32.8 years (range, 18 – 62) with a female to male ratio of 2 to 3. The median EDSS at entry was 2.4 (range, 0 –5.5). The patients were clinically active with a mean number of relapses in the previous 2 years of 2.8. The patients characteristics were similar to those reported in clinical trials using interferon. Patients were treated with different types of interferon (146 patients treated with interferon ␤1b at a dose of 8 MUI subcutaneously every 2 days, 64 patients with interferon ␤1a at a dose of 6 MUI intramuscularly once a week, and 42 patients treated with interferon ␤1a at a dose of 6 Million Units International [MUI] subcutaneously three times per week). Differences between Treatment Failure Subgroups Table 3 shows the basal data of the different TF subgroups with a follow-up period of 2 years for each criterion used. The characteristics of the different TF subgroups are comparable regardless of the TF criteria. In all definitions of TF, we observed that patients with a TTF had a significantly lower EDSS at entry compared with those with PTF or no TF during the first 2 years of treatment. EDSS at entry was the only factor that predicted TTF by logistic regression analysis after 2 years of treatment. Pooled annual number of relapses for the 2 study years from the TF subgroups with the different criteria of TF is shown in Table 4. Patients with PTF and TTF had a significantly higher number of relapses compared with those with no TF in all criteria of TF used. However, it is remarkable that no differences were found between patients with PTF and those with TTF. Treatment Failure With increasingly more demanding outcome definitions, there was a reduction in the number of patients reaching the end points of TF (from 2% for criterion D at 9 months of treatment to 28% for criterion A at 24 months of treatment). We also observed that the longer the follow-up, the greater the proportion of TF. Extent of Disability One hundred seven of 252 patients included in the study were followed up for at least 4 years. The basal characteristics of these 107 patients did not differ from those of the whole population. The evolution of disability for median of EDSS scores is shown in Figure 2. The graph shows that after 48 months of follow-up Table 1. Proportion of Patients with Treatment Failure in the Different Time Points of Follow-Up Criterion Aa (mo) Criterion Bb (mo) Criterion Cc (mo) Criterion Dd (mo) Patient Type 9 12 18 24 9 12 18 24 9 12 18 24 9 12 18 24 (n)e TF (%) TTF (%) 32 13 21 42 17 21 64 25 30 70 28 26 14 6 0 25 10 4 40 16 13 59 23 12 14 6 21 22 9 23 36 14 17 43 17 19 5 2 0 9 4 0 23 9 9 33 13 9 a Increase of 1 Expanded Disability Status Scale (EDSS) point confirmed at 3 months. Increase of 1 EDSS point confirmed at 6 months. Increase of 1.5 EDSS points confirmed at 3 months. d Increase of 1.5 EDSS points confirmed at 6 months. e Number of patients experiencing TF in the different time points. b c TF ⫽ treatment failure; TTF ⫽ transient treatment failure. 402 Annals of Neurology Vol 52 No 4 October 2002 Table 2. Proportion of Patients with Permanent Treatment Failure According the EDSS at Entry EDSS Score 0 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Total Criterion Aa (mo) Criterion Bb (mo) Criterion Cc (mo) Criterion Dd (mo) ne 9 12 18 24 9 12 18 24 9 12 18 24 9 12 18 24 7 42 41 44 35 24 28 16 3 2 9 0.6 0.75 1 0.5 1 1 1 1 — — 0.75 0.84 0.3 0.5 1 0.6 1 0.75 1 1 — — 0.75 0.77 0.5 0.75 0.7 0.55 0.8 0.8 0.62 1 0 — 0.75 0.65 0.5 0.6 0.4 0.8 0.8 0.83 0.89 0.75 1 — 1 0.76 1 1 1 1 1 1 0 1 — — 1 0.89 1 0.67 1 1 1 1 1 1 — — 1 0.96 1 0.71 0.8 0.8 1 1 0.5 1 — — 1 0.88 0.67 0.8 0.5 0.9 1 1 1 1 1 — 1 0.89 1 0.5 1 0.5 1 — — 1 — — 1 0.86 0.5 0.5 0.67 0.67 1 — 1 1 — — 1 0.79 0.67 0.75 1 0.67 1 1 0.67 1 — — 1 0.86 0.33 0.5 0.75 1 1 1 0.83 0.75 1 — 1 0.82 — — 1 1 1 — — 1 — — 1 1 1 1 1 1 1 — — 1 — — 1 1 0 1 1 1 1 1 0.5 1 — — 1 0.83 0 0.75 0.75 1 1 1 1 1 — — 1 0.83 For instance: For criterion A, in 30% of patients with TF, it was PTF at 12 months. a Increase of 1 EDSS point confirmed at 3 months. Increase of 1 EDSS point confirmed at 6 months. c Increase of 1.5 EDSS points confirmed at 3 months. d Increase of 1.5 EDSS points confirmed at 6 months. e Number of patients eligible for each EDSS level. b EDSS ⫽ Expanded Disability Status Scale. Table 3. Basal Characteristics of the Different Treatment Failure Subgroups at 2 Years (mean ⫾ SD) Criterion Aa Characteristic No TF TTF Criterion Bb PTF No TF TTF Criterion Cc PTF No TF TTF Criteria Dd PTF No TF TTF PTF Age (yr) 32.5 ⫾ 9.1 30.9 ⫾ 8.4 34.2 ⫾ 8.4 32.4 ⫾ 9.1 32.1 ⫾ 9.4 34.1 ⫾ 8.4 32.7 ⫾ 9 36 ⫾ 10.4 32.2 ⫾ 7.3 32.8 ⫾ 9 34.3 ⫾ 8.6 32.2 ⫾ 7.7 Gender (ratio 2.1 1.6 2.1 1.9 2.5 2.1 2.3 4 1 2.2 3 1.1 F/M) Duration (yr) 6.1 ⫾ 5 5.8 ⫾ 5.7 6.4 ⫾ 4.7 6 ⫾ 5.1 8.1 ⫾ 7.2 6.4 ⫾ 5.2 6.1 ⫾ 5 7.6 ⫾ 7.1 6.2 ⫾ 5.6 5.9 ⫾ 5 9 ⫾ 11.4 7 ⫾ 5.6 EDSS at 2.3 ⫾ 1.2 1.7 ⫾ 1.1e 2.6 ⫾ 1.3 2.3 ⫾ 1.2 1.2 ⫾ 0.6e 2.6 ⫾ 1.3 2.3 ⫾ 1.2 1.5 ⫾ 1.5e 2.7 ⫾ 1.2 2.3 ⫾ 1.2 0.8 ⫾ 0.8e 2.7 ⫾ 1.2 entry f Relapses 2.7 ⫾ 1 3 ⫾ 1.2 3 ⫾ 2.1 2.7 ⫾ 1 2.9 ⫾ 1.2 3.0 ⫾ 2.1 2.7 ⫾ 1.1 2.4 ⫾ 0.7 3.1 ⫾ 2.4 2.7 ⫾ 1.1 2.3 ⫾ 0.6 3.3 ⫾ 2.5 a Increase of 1 EDSS point confirmed at 3 months. Increase of 1 EDSS point confirmed at 6 months. Increase of 1.5 EDSS points confirmed at 3 months. d Increase of 1.5 EDSS points confirmed at 6 months. e p ⬍ 0.05. Differences were observed between TTF and PTF/no TF for all the different criteria of TF used. No differences were observed between patients with PTF and those with no TF. f Number of relapses in the 2 years before the treatment onset. b c TF ⫽ treatment failure; TTF ⫽ transient treatment failure; PTF ⫽ permanent treatment failure; EDSS ⫽ Expanded Disability Status Scale. the median of the EDSS scores in the PTF group was 6 for all different TF criteria, whereas in the TTF and no TF groups, the medians were significantly lower. Regardless of the TF criterion we used, the behavior of the different TF subgroups was highly comparable; patients with PTF reached significant disability after 4 years of follow-up and patients with TTF were stable within the same period. Validity of the Different Treatment Failure Criteria The sensitivity, specificity, predictive positive value, negative predictive value, and accuracy values of the different TF criteria at 2 years of follow-up regarding outcome at 4 years are shown in Table 5. The most stringent criterion (criterion D) showed a better speci- ficity (97%; 87 of 90 patients) and positive predictive value (73%; 8 of 11 patients), although with the worst sensitivity (47%; 8 of 17 patients). However, confirmed increase of 1 EDSS point at 6 months (criterion B) showed the best sensitivity (76.5%; 11 of 15 patients) with satisfactory specificity (89%; 80 of 90 patients). The analysis performed at the other follow-up time points (9, 12, and 18 months) showed good specificity but very weak sensitivity (data not shown). The correlation coefficients between the different TF criteria at 2 years with the outcome measures of disability at 4 years are shown in Table 6. The different disability measures at 4 years correlate moderately with the occurrence of TF after 2 years. Criterion B at 2 years showed the greatest correlations. Rı́o et al: MS Treatment Failure Criteria 403 Table 4. Number of Relapses during the First 2 Years of Treatment Criterion Aa Bc Relapses No TF (n ⫽ 181) PTF (n ⫽ 48) TTF (n ⫽ 18) 0.6 ⫾ 0.9b 1.7 ⫾ 1.7 1.4 ⫾ 1.4 No TF (n ⫽ 192) PTF (n ⫽ 48) TTF (n ⫽ 7) 0.6 ⫾ 1b 1.7 ⫾ 1.7 1.3 ⫾ 1.8 Cd No TF (n ⫽ 207) PTF (n ⫽ 32) TTF (n ⫽ 8) De No TF (n ⫽ 217) PTF (n ⫽ 27) TTF (n ⫽ 3) 0.7 ⫾ 1b 2 ⫾2 1.6 ⫾ 1.7 0.7 ⫾ 1b 2.2 ⫾ 2.1 2.7 ⫾ 2.5 a Increase of 1 EDSS point confirmed at 3 months. p ⬍ 0.05 (no TF vs TTF and PTF). Increase of 1 EDSS point confirmed at 6 months. d Increase of 1.5 EDSS points confirmed at 3 months. e Increase of 1.5 EDSS points confirmed at 6 months. b c TF ⫽ treatment failure; TTF ⫽ transient treatment failure; PTF ⫽ permanent treatment failure. Association of Treatment Failure with Predictive Factors We examined the association between TF and different potential prognostic factors. The rationale in choosing these factors was that in previous studies they had been found in association with the long- and short-term course of MS.10,14,15 Multiple logistic regression did not show any significant association between the above-mentioned factors and sustained progression after 2 years of treatment with interferon. However, after 4 years of follow-up, EDSS at entry was the only predictive factor of sustained progression to EDSS 6. Discussion We have shown that a large proportion of the patients in our study cohort who considered they experienced TF, defined by a confirmed increase of 1 or 1.5 points in EDSS at either 3 or 6 months, had transient TF. Taking into account that phase III trials in MS are costly and long-lasting, we determined that an end point must be: (1) clinically relevant, (2) stable, (3) not prone to fluctuations, and (4) sensitive or responsive.10 Our study shows that patients with TTF, currently considered definite TF in trials, do not present any clinically relevant changes in disability compared with patients without TF. We also demonstrate that depending on choice of definition, TF becomes unstable, which may lack sensitivity and cause fluctuations. Conversely, 2- to 3-year treatment trials with RRMS are probably not long enough to demonstrate meaningful 404 Annals of Neurology Vol 52 No 4 October 2002 Fig 2. The graph shows the evolution of the Expanded Disability Status Scale (EDSS) medians during the 48-month follow-up in the different subgroups with the different treatment failure criteria used. (A) 1 EDSS point increase confirmed at 3 months; (B) 1 EDSS point increase confirmed at 6 months; (C) 1.5 EDSS points increase confirmed at 3 months; (D) 1.5 EDSS points increase confirmed at 6 months. (solid lines) permanent treatment failure; (long dashed lines) transient treatment failure; (short dashed lines) no treatment failure. effects on irreversible disability. In this respect, the Mayo Clinic Sulfasalazine Study demonstrated that short-term clinical measures of efficacy may not accu- Table 5. Validity of Different Treatment Failure Criteria for Treatment Failure at 2 Years 0ø Criterion Aa Bb Cc Dd Sensitivity (%) (CI) Specificity (%) (CI) NPV (%) (CI) PPV (%) (CI) Accuracy (%) (CI) 77 (0.68–0.84) 77 (0.68–0.84) 47 (0.38–0.57) 47 (0.38–0.57) 84 (0.77–0.91) 89 (0.83–0.95) 93 (0.89–0.98) 97 (0.93–1) 95 (0.91–0.99) 95 (0.91–0.99) 90 (0.85–0.96) 91 (0.85–0.96) 48 (0.39–0.58) 57 (0.47–0.66) 57 (0.48–0.66) 73 (0.64–0.81) 83 (0.76–0.9) 87 (0.8–0.93) 86 (0.79–0.93) 89 (0.83–0.95) a Increase of 1 EDSS point confirmed at 3 months. Increase of 1 EDSS point confirmed at 6 months. c Increase of 1.5 EDSS points confirmed at 3 months. d Increase of 1.5 EDSS points confirmed at 6 months. b CI ⫽ confidence interval; NPV ⫽ negative predictive value; PPV ⫽ positive predictive value. Table 6. Correlation between Treatment Failure Criteria at 2 Years and Disability Measures at 4 Years Criterion Aa Bb Cc Dd EDSS 6 at 4 yr ⌬ EDSS ⬎4 yr EDSS at 4 yr 0.51 0.58 0.44 0.48 0.47 0.53 0.43 0.46 0.35 0.44 0.34 0.42 a Increase of 1 EDSS point confirmed at 3 months. Increase of 1 EDSS point confirmed at 6 months. Increase of 1.5 EDSS points confirmed at 3 months. d Increase of 1.5 EDSS points confirmed at 6 months. b c EDSS ⫽ Expanded Disability Status Scale. rately predict an important, sustained impact on clinical disability.16 Several authors have showed that depending on the definition of TF considered, the proportion of patients experiencing TF range from 9% to 51%.10,17 In our study, the proportion of interferon-treated patients with TF at 2 years ranged from 13% to 28%. Depending on the TF criteria used, in one third of such patients failure was transient. Similarly, another study found that in 40 of 84 patients originally exhibiting progressive trends subsequently improved (47% of patients with TF had a TTF).17 Data from our study show that patients with lower disability in EDSS scoring are more susceptible to TTF. While the EDSS medians in patients with TTF range from 1 to 1.5 depending on the criterion used, in patients with PTF the median was 3 ( p ⬍ 0.05). This observation is probably the result of interrater variability in EDSS performance.12,18 –20 We also have analyzed the proportion of patients experiencing TTF stratified according to baseline EDSS at different follow-up time points. Fifty percent of patients in our cohort were at EDSS entry levels below 2.5. A considerable proportion of this group of patients experienced TTF at different time points, mainly for criteria requiring confirmation at 3 months. Although there were not important differences in the proportion of patients with TTF for the different time points studied, the sensitivity to detect clinically relevant changes was very poor for time points shorter than 24 months. The more discriminating criteria between TF and TTF were those requiring confirmation at 6 months for the group of patients with lower entry EDSS. However, considering the low sensitivity of criterion D (47% vs 77%), the best criterion for reducing the noise in this group of patients was criterion B. Conversely, this criterion B had the best correlations with outcome disability measures at 4 years. In consequence, and taking into account that many patients included in trials are at EDSS levels below 2.5, and that EDSS at entry is the only predictive factor for short-term disability, our observation might provide a rationale for predictive enrolment in future trials with RRMS. Different TF criteria currently are used in clinical trials to determine whether the use of IFN or other immunomodulatory drugs has an impact on disability. However, as we have shown, the use of different criteria may induce different results. Using the criterion with longer EDSS increase confirmation and longer study follow-up, we could significantly reduce the proportion of patients with TTF. Another point of interest that usually fails to be addressed in most clinical trials is the magnitude of change in disability. We have observed that groups with TTF behave similarly to patients with no TF. In contrast, the evolution of patients with PTF after the first 2 years of treatment is compatible with a real TF because the EDSS median in these patients after 4 years of treatment reaches a value up to 6. Thus, by considering PTF an end point in the design of clinical trials in RRMS, we have a more realistic survey on the behavior of these patients during the trial and a more realistic assessment of the actual effect of the drug. However, other aspects as the proportion of dropouts or patients lost to follow-up during the course of a clinical trial could influence measures such as PTF negatively. If the efficacy definition does not identify unremitting disability, any drug reducing attacks might appear to improve a measure of disability. In our study, it is clear that the number of relapses in patients with PTF Rı́o et al: MS Treatment Failure Criteria 405 was significantly higher than in patients with no TF. Furthermore, despite the limited information on the duration of exacerbations from natural history series, it has been documented in patients with early MS that 22% of initial episodes last between 3 and 12 months and 10% do between 6 and 12 months.21 In this way, it is also remarkable that patients with TTF were no different from those with PTF for relapses. This observation leads us to consider that a proportion of patients with TF (or confirmed progression) will include an unidentified number of individuals who subsequently recover from a lengthy relapse. In conclusion, in our cohort of patients treated with interferon, we have found that an important proportion of patients experience TTF. In addition, the use of confirmed progression end points can be erroneous, and, in consequence, measures of short-term efficacy need to be interpreted with caution. Use of a more strict criterion in clinical trials (longer confirmation to EDSS deterioration) and increasing time of follow-up will likely decrease this proportion of patients with TTF improving validity of results. Further analysis is needed to elucidate which efficacy measure is most reliable in short-term trials with new immunomodulatory drugs for treatment of early RRMS. We thank J. A. Graells for language editing. References 1. The IFNB Multiple Sclerosis Study Group. Interferon beta-1b is effective in relapsing–remitting multiple sclerosis. I. Clinical results of a multicenter, randomized, double-blind, placebocontrolled trial. Neurology 1993;43:655– 661. 2. Jacobs LD, Cookfair DL, Rudick RA, Herndon RM, et al. Intramuscular interferon ␤-1a for disease progression in relapsing multiple sclerosis. Ann Neurol 1996;39:285–294. 3. PRIMS Study Group. Randomised double-blind placebocontrolled study of interferon ␤-1a in relapsing–remitting multiple sclerosis. Lancet 1998;352:1498 –1504. 4. Johnson KP, Brooks BR, Cohen JA, Ford CC, et al. Copolymer I reduces relapse rate and improves disability in relapsing–remitting multiple sclerosis: results of a phase III multicenter, double-blind, placebo-controlled trial. Neurology 1995;45: 1268 –1276. 5. Fazekas F, Deisenhammer F, Strasser-Fuchs S, et al. Randomised placebo-controlled trial of monthly intravenous immunoglobulin therapy in relapsing–remitting multiple sclerosis. Austrian Immunoglobulin in Multiple Sclerosis Study Group. Lancet 1997;349:589 –593. 406 Annals of Neurology Vol 52 No 4 October 2002 6. Liu C, Wan Po AL, Blumhardt LD. “Summary measure” statistic for assessing the outcome of treatment trials in relapsing–remitting multiple sclerosis. J Neurol Neurosurg Psychiatry 1998;64:726 –729. 7. Goodkin DE. Interferon beta therapy for multiple sclerosis. Lancet 1998;352:1486 –1487. 8. McAlpine D, Compston N. Some aspects of the natural history of disseminated sclerosis. Q J Med 1952;82:135–167. 9. Kurtzke JF, Beebe GW, Nagler B, et al. Studies of the natural history of multiple sclerosis. 7. Correlates of clinical change in an early bout. Acta Neurol Scand 1973;49:379 –395. 10. Weinshenker BG, Issa M, Baskerville J. Meta-analysis of the placebo-treated groups in clinical trials of progressive MS. Neurology 1996;46:1613–1619. 11. Poser CM, Paty DW, Scheinberg L, et al. New diagnostic criteria for multiple sclerosis: guidelines for research protocols. Ann Neurol 1983;13:227–231. 12. Montalban X, TintoréM, Rı́o J, et al. Interobserver variability in the evaluation of functional systems and Kurtzke expanded disability status scale in a multiple sclerosis patients. Rev Neurol 1996;24:630 – 632. 13. Weinshenker BG, Bass B, Rice GPA, et al. The natural history of multiple sclerosis: a geographically based study. 1. Clinical course and disability. Brain 1989;112:133–146. 14. Weinshenker BG, Rice GPA, Noseworthy JH, et al. The natural history of multiple sclerosis: a geographically based study. 4. Applications to planning and interpretation of clinical therapeutic trials. Brain 1991;114:1057–1067. 15. Weinshenker BG, Rice GPA, Noseworthy JH, et al. The natural history of multiple sclerosis: a geographically based study. 3. Multivariate analysis of predictive factors and models of outcome. Brain 1991;114:1045–1056. 16. Noseworthy JH, O’Brien P, Erickson BJ, et al. The Mayo Clinic Canadian-Cooperative Trial of sulfasalazine in active multiple sclerosis. Neurology 1998;51:1342–1352. 17. Liu C, Blumhardt LD. Disability outcome measures in therapeutic trials of relapsing–remitting multiple sclerosis: effects of heterogeneity of disease course in placebo cohorts. J Neurol Neurosurg Psychiatry 2000;68:450 – 457. 18. Goodkin DE, Cookfair D, Wende K, et al. Inter- and intrarater scoring agreement using grades 1.0 to 3.5 of the Kurtzke Expanded Disability Status Scale (EDSS). Neurology 1992;42: 859 – 863. 19. Noseworthy JH, Vandervoort MK, Wong CJ, Ebers GC. Interrater variability with the Expanded Disability Status Scale (EDSS) and functional systems (FS) in a multiple sclerosis clinical trial. The Canadian Cooperation MS Study Group. Neurology 1990;40:971–975. 20. Verdier-Taillefer MH, Zuber M, Lyon-Caen, et al. Observer disagreement in rating neurologic impairment in multiple sclerosis: facts and consequences. Eur Neurol 1991;31: 117–119. 21. Weinshenker BG, Issa M, Baskerville J. Long-term and shortterm outcome of multiple sclerosis. A 3-year follow-up study. Arch Neurol 1996;53:353–358.