Validation of the Spondyloarthritis Research Consortium of Canada magnetic resonance imaging spinal inflammation indexIs it necessary to score the entire spine.код для вставкиСкачать
Arthritis & Rheumatism (Arthritis Care & Research) Vol. 57, No. 3, April 15, 2007, pp 501–507 DOI 10.1002/art.22627 © 2007, American College of Rheumatology ORIGINAL ARTICLE Validation of the Spondyloarthritis Research Consortium of Canada Magnetic Resonance Imaging Spinal Inﬂammation Index: Is It Necessary to Score the Entire Spine? WALTER P. MAKSYMOWYCH,1 SUHKVINDER S. DHILLON,1 ROY PARK,1 DAVID SALONEN,2 ROBERT D. INMAN,2 AND ROBERT G. W. LAMBERT1 Objective. The Spondyloarthritis Research Consortium of Canada (SPARCC) magnetic resonance imaging (MRI) spinal inﬂammation index has been developed to objectively measure inﬂammation in ankylosing spondylitis (AS) and to assess change in response to therapeutic intervention. Scoring of the entire spine limits feasibility and a scoring method that records inﬂammation in only the more severely affected spinal segments may improve feasibility without sacriﬁcing performance. Methods. MRI ﬁlms of 68 patients with AS were assessed in random order by 2 blinded readers. Interreader reliability was assessed by intraclass correlation coefﬁcient. Pre- and posttreatment MRI ﬁlms of 29 patients randomized to placebo or anti–tumor necrosis factor ␣ (anti-TNF␣) therapy were read by readers blinded to chronology, and responsiveness was assessed by effect size and standardized response mean. The performance of scores based on 6, 8, 10, and all 23 spinal discovertebral units (DVU) was compared. Results. The median number of affected spinal levels per patient was 6.0 and 62% of all affected levels were included when analysis was limited to only the 6 most severely affected levels per patient. Comparison of DVU scores that were limited to only the more severely affected DVU (6-, 8-, 10-DVU score) with scores for all 23 spinal DVU showed excellent interreader reliability for status and change scores (Spearman’s correlation >0.90) as well as similar construct validity. Responsiveness to anti-TNF␣ therapy was greater when the more limited scoring methods were used and was greatest with the 6-DVU score. Conclusion. The SPARCC MRI spinal inﬂammation index performs better when analysis is limited to a maximum of 6 most severely affected levels compared with assessment of the entire spine. This should improve its feasibility in clinical trials and research. KEY WORDS. Magnetic resonance imaging; Ankylosing spondylitis; SPARCC method; Validation. INTRODUCTION Magnetic resonance imaging (MRI) is the most sensitive imaging modality for detection of inﬂammatory lesions in the spine and sacroiliac joints of patients with ankylosing spondylitis (AS) (1). This has been made possible through the use of MRI sequences, such as STIR, that suppress the Dr. Maksymowych is a Senior Scholar of the Alberta Heritage Foundation for Medical Research. 1 Walter P. Maksymowych, FRCP(C), Suhkvinder S. Dhillon, FRCR(C), Roy Park, FRCP(C), Robert G. W. Lambert, FRCP: University of Alberta, Edmonton, Alberta, Canada; 2 David Salonen, FRCP(C), Robert D. Inman, FRCP(C): University Health Network, University of Toronto, Toronto, Ontario, Canada. signal from marrow fat. Elimination of fat signal on T2weighted sequences promotes the visualization of abnormal increased water content due to the underlying bone marrow edema that is associated with inﬂammation. Typical appearances in the spine include increased T2 signal at the anterior corners of the vertebrae, reﬂecting inﬂammation at the attachment of the annulus ﬁbrosus to the vertebral corner, and increased signal in the subchondral Address correspondence to Walter P. Maksymowych, FRCP(C), 562 Heritage Medical Research Building, University of Alberta, Edmonton, Alberta, Canada, T6G 2S2. E-mail: firstname.lastname@example.org. Submitted for publication February 8, 2006; accepted in revised form June 23, 2006. 501 502 bone adjacent to the vertebral end plate (2). Furthermore, it has been shown that these lesions resolve following the institution of anti–tumor necrosis factor ␣ (anti-TNF␣) therapies and it has therefore been suggested that MRI can be used to assess the efﬁcacy of treatment, particularly because clinical outcome measures are largely based on patient self-reported questionnaires (3). Accordingly, scoring systems have been developed to facilitate the evaluation of inﬂammatory lesions observed on MRI (3,4). However, the optimal approach to the scoring of MRI lesions currently lacks consensus and is presently the subject of further evaluation by investigators using the Outcome Measures in Rheumatology Clinical Trials (OMERACT) approach to the validation of outcome instruments in musculoskeletal disorders (5). In particular, OMERACT has proposed that newly developed instruments meet the criteria of feasibility, truth, and discrimination. The latter is a function of both reproducibility and responsiveness to change. Two methods have been reported for scoring inﬂammatory lesions in the spine (3,4). Both rely on the assessment of the signal on fat-suppressed images (STIR, T2-weighted fat saturation) in the anterior segment of the spine (vertebral body) and do not score lesions in the posterior elements of the spine. Both methods also use the discovertebral unit (DVU) as the primary anatomic region for scoring inﬂammation. The DVU is deﬁned as the region between 2 imaginary lines drawn through the middle of adjacent vertebrae and including adjacent vertebral end plates with the intervening disc. The Spondyloarthritis Research Consortium of Canada (SPARCC) MRI spinal inﬂammation index takes advantage of the ability of MRI to visualize lesions in several dimensions (4). The developers of this method have proposed that scoring be limited to a maximum of 6 of the most severely affected levels on the basis that the mean number of affected DVU per patient in a prior study was 3.2 (95% conﬁdence interval 1.2–5.2). Limiting the assessment to only the most severely affected levels improves feasibility in that the time necessary for evaluation is less than for the entire spine. Although this approach may introduce measurement error due to readers differing in their selection of levels for scoring, the alternative, assessment of the entire spine, is subject to signiﬁcant problems. Being forced to score the entire spine results in the inclusion of less discernable lesions, which may reduce sensitivity to change, and forces the reader to score levels that are affected by signal artifact. This is not a trivial issue as some degree of phaseencoding artifact occurs in almost every case when scanning the entire spine with large ﬁelds of view. Consequently, it is not clear how many levels should be assessed to maximize sensitivity to change without compromising interobserver reproducibility. In this study we compared the performance of the SPARCC scoring method according to the OMERACT ﬁlter for all 23 spinal levels with a scoring scheme that is limited to only the most severely affected DVU. Our objective was to determine how many levels should be analyzed for optimal feasibility and discrimination. Maksymowych et al PATIENTS AND METHODS Patients and study protocol. We studied 2 cohorts of patients with AS as deﬁned by the modiﬁed New York criteria (6). Cohort A was a cross-sectional cohort of 39 patients with AS (29 men, mean age 42.3 years [range 22– 68 years], mean disease duration 13.4 years [range 2– 41 years], mean Bath Ankylosing Spondylitis Disease Activity Index [BASDAI] score of 5.5 [range 3.0 – 8.6]) who attended the outpatient clinic in the Rheumatic Disease Unit at the University of Alberta. All patients had been recruited to a prospective, longitudinal observational cohort, the Follow up Research Cohort of AS study (FORCAST), in which clinical and laboratory data are systematically collected every 6 months and plain radiographic imaging and MRI are obtained annually. Most patients (83%) receive nonsteroidal antiinﬂammatory drugs and/or physical therapy. Cohort B comprised 29 patients who had severe, active disease as deﬁned by a BASDAI score ⱖ4 and who had been randomized to receive either an anti-TNF␣ agent or placebo in a 24-week double-blind placebo-controlled trial of either adalimumab (n ⫽ 11; 1:1 randomization, adalimumab administered in a dose of 40 mg subcutaneously on alternate weeks) or inﬂiximab (n ⫽ 18; 3:8 randomization of placebo:inﬂiximab, inﬂiximab administered in a dose of 5 mg at 0, 2, and 6 weeks and every 6 weeks thereafter). Nineteen patients in cohort B were recruited at the University of Alberta and comprised 14 men and 5 women (mean age 43.4 years [range 33– 65 years], mean disease duration 18.7 years [range 9 – 42 years]). Nine patients in cohort B were recruited at the University of Toronto and comprised 8 men and 1 woman (mean age 40.2 years, mean disease duration 16.1 years). The mean BASDAI score for the entire group of 29 patients was 6.1. Pre- and posttreatment MRI ﬁlms from the 18 patients that were recruited to the inﬂiximab trial had been scored 18 months prior to the current exercise by 1 (SSD) of the 2 readers (4). Cohort A underwent MRI at a single time point whereas cohort B underwent MRI at baseline and either 12 weeks (adalimumab trial) or 24 weeks (inﬂiximab trial) after randomization. We also included 6 controls with nonspeciﬁc back pain who underwent MRI at a single time point. The study was approved by the ethics committees of the University of Alberta and the University Health Network (Toronto). Magnetic resonance imaging. MRI of the spine was performed with 1.5T Siemens (Munich, Germany) or GE systems (Waukesha, WI) using appropriate surface coils. Sagittal sequences were obtained with 3– 4-mm slice thickness and 11–15 slices acquired. Sequence parameters were as follows: T1-weighted spin echo (time to recovery [TR] 517– 618 msec, time to echo [TE] 13 msec) and STIR (TR 2,720 –3,170 msec, time to inversion 140 msec, TE 38 – 61 msec). The spine was imaged in 2 parts: upper half comprising the entire cervical and most of the thoracic spine, lower half comprising the lower portion of the thoracic spine and entire lumbar spine. The speciﬁc MRI parame- Validation of the SPARCC MRI Spinal Index Score in AS ters for acquiring spine images are provided on our Web site (available at: www.arthritisdoctor.ca). Scoring of MRI lesions. Scoring of MRI lesions has been described previously (4). Brieﬂy, our scoring method for active inﬂammatory lesions in the spine relies on the use of the STIR sequence that suppresses the normal marrow fat signal, the presence of which frequently obscures signal emanating from bone marrow edema associated with inﬂammation. T1-weighted spin-echo images were included for anatomic reference only and were not scored. For each DVU, 3 consecutive sagittal slices were scored, which allowed evaluation of the coronal extent of lesions as well as assessment in the sagittal and anteroposterior planes. Discal lesions were not scored. Deﬁnition of abnormal STIR signal. Bone marrow signal in the center of the vertebra or an adjacent normal vertebra constituted the reference for designation of normal signal. A set of reference AS cases were included to facilitate designation of abnormal signal on STIR. Scoring of depth and intensity. Signal from cerebrospinal ﬂuid constituted the reference for designating an inﬂammatory lesion as intense. A lesion was graded as deep if there was a homogeneous and unequivocal increase in signal over at least 1 cm from the vertebral end plate. Assessment of depth was made possible by including a scale on the image. Scoring method. Each DVU was divided into 4 quadrants: upper anterior, upper posterior, lower anterior, and lower posterior. The presence of increased T2 signal in each of these 4 quadrants was scored on a dichotomous basis (1 ⫽ increased signal, 0 ⫽ normal signal). This was repeated for each of 3 consecutive sagittal slices giving a maximum score of 12 per DVU. On each slice, the presence of a lesion exhibiting intense signal in any quadrant was given an additional score of 1. Similarly, the presence of a lesion exhibiting a depth ⱖ1 cm in any quadrant was given an additional score of 1, resulting in a maximum additional score of 6 for that level and bringing the total score to 18 per DVU. MRI reading exercises. A unique MRI study number was allocated to each patient and control, thereby ensuring blinding to all patient demographics. Allocation was done by a technologist unconnected with the study. Assessment was performed on a 3-monitor review station by 2 readers using computer software that has been optimized for this type of review (Merge eﬁlm, Milwaukee, WI). Each patient was only identiﬁed by the MRI study number and ﬁlms were read in random order. Pre- and posttreatment images were scored concurrently with the reader blinded to time sequence. No instructions were provided as to how the reader should select the most severely affected DVU for the 6-, 8-, and 10-DVU scores and scoring was done from C2 to L5 in all cases. The 3-monitor review station readily permits simultaneous visualization of all segments of the spine on pre- and posttreatment images. Readers, trained in use of the SPARCC system, were instructed to identify the 6, 8, and 10 worst levels based on the scans from both 503 time points and no other speciﬁc guidance was necessary as to how the reader should select the most severely affected DVU. The same DVU were scored on pre- and posttreatment images in the assessment of the 6, 8, and 10 most severely affected DVU. Statistical analysis. Descriptive statistics (mean, median, interquartile range, standard deviation, maximum and minimum values) were used to describe the overall distribution of scores. Distribution of affected levels and DVU scores for the entire spine and according to spinal segment was based on the mean scores of the 2 readers. The interobserver reproducibility of status and change scores was calculated using analysis of variance to provide an intraclass correlation coefﬁcient (ICC). A two-way mixed effects model with observer as a ﬁxed factor was used. Values ⬎0.6 represented good reproducibility, ⬎0.8 represented very good reproducibility, and ⬎0.9 represented excellent reproducibility. Reproducibility was also examined using Bland-Altman plots and 95% limits of agreement. Construct validity was assessed by comparing changes in the index score with changes in disease activity as quantiﬁed by the BASDAI (7), nocturnal back pain, and C-reactive protein (CRP) levels. This was done using Spearman’s correlation coefﬁcient analysis. Two statistical methods were used to assess responsiveness: the effect size and the standardized response mean. Values of 0.20, 0.50, and ⱖ0.80 were considered to represent small, moderate, and large degrees of responsiveness, respectively. Discrimination was not assessed because the open-label phase of the clinical trial is still ongoing and treatment codes remain unbroken at this time. RESULTS Descriptive data. The mean number of affected levels for the entire spine was 6.9 (median 6.0) and the majority of affected levels were in the thoracic spine (mean 4.2, median 4.0) (Table 1). The highest DVU scores were also recorded in the thoracic spine and 65.2% of all affected levels were located in this region. Only 15.0% and 19.8% of affected levels were located in the cervical and lumbar spines, respectively. The percentages of patients that were assessed by both observers as having no affected level in the cervical, thoracic, and lumbar spines were 33.8% (23 of 68), 13.2% (9 of 68), and 26.5% (18 of 68), respectively. Percentages of patients assessed as having no affected level by at least 1 observer in the cervical, thoracic, and lumbar spines were 57.4% (39 of 68), 29.4% (20 of 68), and 50% (34 of 68), respectively. Median scores for the 6-, 8-, 10-, and 23-DVU scores were similar. Approximately half of the patients (51.5%) had ⱕ6 affected levels and the percentages of patients that were assessed as having more than 6, 8, and 10 affected levels were 48.5% (33 of 68), 41.2% (28 of 68), and 26.5% (18 of 68), respectively. Of the 473 affected levels, 292 (61.7%) levels were scored when analysis was limited to only the 6 most severely affected DVU per patient. When scoring was limited to only the 8 and 10 most severely affected levels, the number of analyzed DVU increased to 504 Maksymowych et al Table 1. Descriptive statistics for numbers of affected DVU and DVU scores per patient according to region of spine examined and by the number of affected DVU scored in 68 patients with ankylosing spondylitis* Parameter Mean ⴞ SD Median (IQR) Table 2. Interobserver reliability of status scores in 68 patients with ankylosing spondylitis and change scores in 29 patients who received anti–tumor necrosis factor therapy* Range Parameter No. of affected DVU Total Cervical spine Thoracic spine Lumbar spine DVU score Total (23 DVU) Cervical spine Thoracic spine Lumbar spine 10-DVU score 8-DVU score 6-DVU score 6.9 ⫾ 5.5 1.1 ⫾ 1.3 4.2 ⫾ 3.7 1.5 ⫾ 1.6 29.3 ⫾ 32.5 4.4 ⫾ 7.7 19.1 ⫾ 21.9 5.8 ⫾ 8.4 26.6 ⫾ 27.9 24.6 ⫾ 24.8 21.6 ⫾ 20.4 6.0 (2–11) 1.0 (0–2) 4.0 (1–7) 1.0 (0–3) 20 (3–43) 1 (0–6) 11 (1–32) 2 (0–8) 19 (3–41) 18 (3–37) 18 (3–32) 0–19 0–5 0–12 0–5 0–175 0–57 0–94 0–41 0–135 0–112 0–86 * DVU ⫽ discovertebral unit; IQR ⫽ interquartile range. 352 (74.4%) and 409 (86.5%), respectively (Figure 1). The sum total DVU score for all 68 patients with AS was 1,992. Analysis that was limited to only 6 of the most severely affected levels captured 73.7% of the total DVU score, whereas analysis that was limited to 8 and 10 levels captured 84% and 90.8% of the total DVU score, respectively. Mean scores for controls were 4.4, 4.5, 4.5, and 4.5 for the 6-DVU, 8-DVU, 10-DVU, and 23-DVU scores, respectively (data not shown). Reliability. The mean percentage agreement for selection of the 6 most severely affected DVU was 67.6% (range 33.4 –100%). Interobserver reliability for status scores was good to very good for detection of affected levels and excellent for scoring of affected levels in the thoracic and lumbar spines (Table 2). Reliability was only moderate for scoring affected levels in the cervical spine. Reliability of both status and change scores was excellent regardless of whether all or only a limited number of levels were analyzed. Bland-Altman plots showed that measurement dif- Figure 1. Percentages of all affected discovertebral units (DVU; n ⫽ 473) and total DVU score (n ⫽ 1,992) recorded in 68 patients with ankylosing spondylitis when scoring was limited to a maximum of 6, 8, or 10 of the most severely affected DVU per patient. Shaded bar ⫽ affected DVU; solid bar ⫽ DVU score. Affected DVU Total Cervical spine Thoracic spine Lumbar spine DVU score Total (23 DVU) Cervical spine Thoracic spine Lumbar spine 10-DVU score 8-DVU score 6-DVU score Interobserver ICC status (n ⴝ 68) Interobserver ICC change (n ⴝ 29) 0.89 0.77 0.83 0.78 0.81 0.54 0.82 0.63 0.93 0.70 0.94 0.90 0.95 0.95 0.95 0.91 0.62 0.92 0.89 0.93 0.93 0.92 * ICC ⫽ intraclass correlation coefﬁcient; DVU ⫽ discovertebral unit. ferences between the 2 observers were evident across the entire range of scores (Figure 2). This was similarly noted for the 6-, 8-, and 10-DVU scores (data not shown). Construct validity. Signiﬁcant and similar correlations were noted between changes in 6-, 8-, 10-, and 23-DVU scores and changes in CRP level in the 29 patients who received anti-TNF therapies (Table 3). No signiﬁcant correlations were observed between changes in either nocturnal pain or BASDAI score and any DVU score. Responsiveness. Analysis of changes in response to anti-TNF therapy demonstrated that this was most readily apparent in the thoracic spine (Table 4). Responsiveness was minimal following assessment of the cervical spine. A more limited scoring system was more responsive than assessment of all 23 levels. Moreover, a scoring system that Figure 2. Bland-Altman plot illustrating the difference in 23discovertebral unit scores between 2 observers (y-axis) in relation to the mean scores (x-axis). Horizontal lines represent the 95% limits of agreement. Validation of the SPARCC MRI Spinal Index Score in AS Table 3. Spearman’s correlations between changes in clinical parameters and changes in Spondyloarthritis Research Consortium of Canada magnetic resonance imaging spinal DVU scores for 6, 8, 10, and all 23 spinal DVU in 29 patients with ankylosing spondylitis following treatment with anti–tumor necrosis factor therapy* ⌬23 DVU ⌬10 DVU ⌬8 DVU ⌬6 DVU ⌬ Nocturnal pain ⌬ BASDAI ⌬ CRP level 0.26 0.36 0.68† 0.26 0.36 0.66† 0.27 0.33 0.66† 0.26 0.34 0.65† * DVU ⫽ discovertebral unit; BASDAI ⫽ Bath Ankylosing Spondylitis Disease Activity Index; CRP ⫽ C-reactive protein. † P ⬍ 0.0001. was limited to a maximum of 6 most severely affected levels demonstrated the greatest degree of responsiveness. DISCUSSION Our analyses of the SPARCC scoring method for the assessment of spinal inﬂammation by MRI demonstrated that limiting scoring to only the 6 most severely affected levels captures 62% of all affected DVU and 74% of the total DVU score. Furthermore, interobserver reliability was excellent regardless of whether analysis was limited to only the most severely affected levels or included the entire spine whereas responsiveness was optimal when scoring was limited to only the 6 most severely affected levels. These observations, together with improved feasibility, support the notion that during assessment of the entire spine in patients with AS, scoring all affected DVU is unnecessary and may therefore facilitate acceptance of this approach for clinical research and in clinical trials. These ﬁndings are not entirely surprising. Scoring of the entire spine, as opposed to only the more severely affected DVU, will include more subtle lesions that may be less responsive to change and more difﬁcult to assess. If read- 505 ers are permitted to select levels for scoring, some error in reading due to the presence of signal artifact may be eliminated because the reader has the choice of not selecting those levels that are clearly subject to phase-encoding, partial-volume, or other artifacts. In addition, reliability of assessment is not as good in the cervical spine and responsiveness to change in this region is poor. This ﬁnding likely reﬂects both a relative lack of involvement and the large ﬁeld of view that is required to image the entire spine in 2 halves. In the lumbar spine, reader reliability in selection of affected DVU and reliability of change scores were also only moderate. The majority of affected levels and the greatest contribution to the total DVU score came from the thoracic spine. Accordingly, interreader reliability for status and change scores was maximal in this spinal segment. It is premature to conclude, however, that scoring should be conﬁned to the thoracic spine because the distribution of spinal inﬂammatory lesions may vary according to disease duration and other demographic variables such as sex. Although stratiﬁcation of our data according to disease duration and sex did not signiﬁcantly inﬂuence the distribution of affected DVU and DVU scores in our cohorts (data not shown), this issue will require further study in larger data sets. This scoring method cannot be recommended for diagnostic evaluation at this time. Its primary purpose is to record change in inﬂammatory lesions for clinical and therapeutic trials research and no method for scoring MRI scans in clinical practice has yet shown consistent results. A potential source of bias, which may primarily affect the reliability of the 23-DVU score, is introduced if the reader selects the 6 most severely affected DVU before the remaining DVU are scored. This may potentially reduce the variability in identiﬁcation and scoring of the remaining less severely affected DVU, leading to higher ICC values for the 23-DVU score. In fact, readers were not provided with any instructions as to when the most severely affected DVU should be selected and they may, for instance, have scored all 23 DVU ﬁrst and then chosen the 6 Table 4. Changes in the number of affected DVU and DVU scores in 29 patients with ankylosing spondylitis following treatment with anti–tumor necrosis factor therapy* Mean ⴞ SD score Parameter Affected DVU Total Cervical spine Thoracic spine Lumbar spine DVU score Total (23 DVU) Cervical spine Thoracic spine Lumbar spine 6-DVU score 8-DVU score 10-DVU score * DVU ⫽ discovertebral unit. Standardized response mean Pretreatment Posttreatment Effect size 8.8 ⫾ 5.6 1.5 ⫾ 1.5 5.5 ⫾ 3.6 1.8 ⫾ 1.6 6.0 ⫾ 4.4 1.3 ⫾ 1.6 3.5 ⫾ 3.1 1.3 ⫾ 1.4 0.50 0.13 0.55 0.35 0.66 0.16 0.67 0.49 41.1 ⫾ 37.8 6.8 ⫾ 10.2 27.3 ⫾ 24.7 7.1 ⫾ 9.3 28.6 ⫾ 21.3 33.4 ⫾ 26.7 36.6 ⫾ 30.8 20.9 ⫾ 27.6 4.8 ⫾ 10.9 12.0 ⫾ 15.9 3.9 ⫾ 6.0 13.4 ⫾ 15.3 16.5 ⫾ 19.3 18.3 ⫾ 22.6 0.53 0.19 0.62 0.34 0.71 0.64 0.60 0.80 0.30 0.82 0.43 0.86 0.84 0.86 506 worst DVU. Alternatively, the reader may have made the selection ﬁrst but could still have chosen to change the selection of the most severely affected DVU after scoring the entire spine. The impact of this study design on the reliability of the 23-DVU score is therefore not readily apparent. In contrast, the feasibility of a study design in which readers are asked to score 6, 8, 10, and 23 DVU in independent reads with the increasing likelihood of recall, particularly for severely affected DVU, is an open question. The selection of the most severely affected DVU when assessing pre- and posttreatment images is based on a simultaneous assessment of these images using a 3-monitor review station. This readily permits simultaneous assessment of all spinal segments at both time points. Although limiting the selection to only the most severely affected DVU potentially adds to the measurement error, our data demonstrate that reliability of change scores is no different whether all 23 DVU or only the most severely affected DVU are scored. We consider it very important that viewing conditions are organized in a manner that readily permits simultaneous visualization of both preand posttreatment images. One other scoring method for assessment of spinal inﬂammation by MRI has been published (3). This approach is also based on the assessment of a spinal DVU and scores bone edema and erosion in a single dimension from a sagittal image according to the proportion of the anteroposterior length of the DVU involved. Scores are weighted towards the presence of erosion and range from 0 to 6 per DVU. This approach uses both T2-weighted and gadolinium-enhanced MRI sequences. This index was shown to be reliable and responsive to change in patients receiving anti-TNF␣ therapy. Recently, the scoring approach has been modiﬁed to include the evaluation of edema only and the range of scores per DVU has accordingly been reduced to 0 –3 (8). There has been no further work to determine whether a more focused approach to scoring the most severely affected levels might perform equally well compared with scoring the entire spine. Systematic examination of spinal lesions by MRI using this latter scoring method concurred with our observations that the majority of affected levels were located in the thoracic spine, although that examination revealed somewhat more lesions in the cervical spine than in the present study (8). Involvement of cervical DVU was evident in 16 –26% of patients, although the number of affected cervical DVU per patient was not provided. There were no obvious differences in disease duration or severity that might account for these differences with our observations. Our analyses were based entirely on the assessment of STIR MRI sequences and it is recognized that gadolinium-enhanced MRI may reveal distinct lesions that score differently, although the likelihood of this affecting the total score for a patient is low (9,10). Both approaches to scoring omit lesions in the posterior segment of the spine, including the facet joints, processes, and interspinous ligaments, which have not yet been systematically evaluated by MRI. Whether inclusion of these regions will improve the metrologic properties of MRI-based scoring systems requires further study. Assessment of construct validity demonstrated that Maksymowych et al changes in spinal inﬂammation MRI scores primarily paralleled changes in CRP level regardless of the scoring method used in our study. The lack of correlation with the BASDAI may reﬂect the fact that the latter instrument is a self-reported measure of patient symptoms such as pain, stiffness, and fatigue and is therefore not necessarily speciﬁc for AS, but may equally reﬂect the symptomatology of nonspeciﬁc causes of back pain. Additional sources of back pain other than inﬂammation are possible in patients with AS with long disease duration who may either develop secondary structural damage and/or concomitant spinal disorders unrelated to AS. Two reports have now shown that anti-TNF therapy for ⬎2 years reduces MRI scores for disease activity in the spine as recorded by the Ankylosing Spondylitis Spinal MRI score, although there is persisting disease that amounts to 25–30% of the baseline score (11,12). This could potentially raise concerns that a scoring system limited to only the most severely affected DVU might not capture residual disease, limiting its ability to record more effective treatment strategies. However, our data show that posttreatment scores are 46.9%, 49.4%, and 50% of pretreatment 6-, 8-, and 10-DVU scores, respectively, and are no different from an analysis of the entire spine (23-DVU score), which shows a posttreatment score that is 50.9% of the pretreatment score, allowing ample opportunity for assessment of more effective treatment strategies. In conclusion, the SPARCC MRI spinal inﬂammation index requires assessment of the entire spine but performs better with respect to responsiveness when analysis is limited to a maximum of 6 most severely affected levels as compared with results derived from scoring the entire spine. Interreader reliability is excellent for both status and change scores with either scoring approach. The use of the 6-DVU scoring method should improve the feasibility of this tool in clinical trials and research. AUTHOR CONTRIBUTIONS Dr. Maksymowych had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study design. Maksymowych, Dhillon, Lambert. Acquisition of data. Maksymowych, Dhillon, Park, Salonen, Inman, Lambert. Analysis and interpretation of data. Maksymowych, Inman, Lambert. Manuscript preparation. Maksymowych, Inman, Lambert. Statistical analysis. Maksymowych. REFERENCES 1. Battafarano DF, West SG, Rak KM, Fortenbery EJ, Chantelois AE. Comparison of bone scan, computed tomography, and magnetic resonance imaging in the diagnosis of active sacroiliitis. Semin Arthritis Rheum 1993;23:161–76. 2. Hermann KG, Bollow M. Magnetic resonance imaging of the axial skeleton in rheumatoid disease [review]. Best Pract Res Clin Rheumatol 2004;18:881–907. 3. Braun J, Baraliakos X, Golder W, Brandt J, Rudwaleit M, Listing J, et al. Magnetic resonance imaging examinations of the spine in patients with ankylosing spondylitis, before and after successful therapy with inﬂiximab: evaluation of a new scoring system. Arthritis Rheum 2003;48:1126 –36. 4. Maksymowych WP, Inman RD, Salonen D, Dhillon SS, Krish- Validation of the SPARCC MRI Spinal Index Score in AS 5. 6. 7. 8. nananthan R, Stone M, et al. Spondyloarthritis Research Consortium of Canada magnetic resonance imaging index for assessment of spinal inﬂammation in ankylosing spondylitis. Arthritis Rheum 2005;53:502–9. Van der Heijde DM, Landewe RB, Hermann KG, Jurik AG, Maksymowych WP, Rudwaleit M, et al. Application of the OMERACT ﬁlter to scoring methods for magnetic resonance imaging of the sacroiliac joints and the spine: recommendations for a research agenda at OMERACT 7. J Rheumatol 2005;32:2042–7. Van der Linden S, Valkenburg HA, Cats A. Evaluation of diagnostic criteria for ankylosing spondylitis: a proposal for modiﬁcation of the New York criteria. Arthritis Rheum 1984; 27:361– 8. Garrett S, Jenkinson T, Kennedy LG, Whitelock H, Gasford P, Calin A. A new approach to deﬁning disease status in ankylosing spondylitis: the Bath Ankylosing Spondylitis Disease Activity Index. J Rheumatol 1994;21:2286 –91. Baraliakos X, Rudwaleit M, Listing J, Hermann KG, Brandt J, Sieper J, et al. Magnetic resonance imaging in ankylosing spondylitis: a detailed analysis [abstract]. Ann Rheum Dis 2005;64 Suppl 3:324. 507 9. Hermann KG, Landewe RB, Braun J, van der Heijde DM. Magnetic resonance imaging of inﬂammatory lesions in the spine in ankylosing spondylitis clinical trials: is paramagnetic contrast medium necessary? J Rheumatol 2005;32:2056 – 60. 10. Baraliakos X, Hermann KG, Landewe R, Listing J, Golder W, Brandt J, et al. Assessment of acute spinal inﬂammation in patients with ankylosing spondylitis by magnetic resonance imaging: a comparison between contrast enhanced T1 and short tau inversion recovery (STIR) sequences. Ann Rheum Dis 2005;64:1141– 4. 11. Sieper J, Baraliakos X, Listing J, Brandt J, Haibel H, Rudwaleit M, et al. Persistent reduction of spinal inﬂammation as assessed by magnetic resonance imaging in patients with ankylosing spondylitis after 2 yrs of treatment with the anti-tumour necrosis factor agent inﬂiximab. Rheumatology (Oxford) 2005;44:1525–30. 12. Baraliakos X, Brandt J, Listing J, Haibel H, Sorensen H, Rudwaleit M, et al. Outcome of patients with active ankylosing spondylitis after two years of therapy with etanercept: clinical and magnetic resonance imaging data. Arthritis Rheum 2005; 53:856 – 63.