Using the health assessment questionnaire to estimate preference-based single indices in patients with rheumatoid arthritis.код для вставкиСкачать
Arthritis & Rheumatism (Arthritis Care & Research) Vol. 57, No. 6, August 15, 2007, pp 963–971 DOI 10.1002/art.22885 © 2007, American College of Rheumatology ORIGINAL ARTICLE Using the Health Assessment Questionnaire to Estimate Preference-Based Single Indices in Patients With Rheumatoid Arthritis NICK BANSBACK,1 CARLO MARRA,2 AKI TSUCHIYA,3 ASLAM ANIS,4 DAPHNE GUH,5 TONY HAMMOND,6 AND JOHN BRAZIER3 Objective. To estimate the relationship between preference-based measures, EuroQol (EQ-5D) and SF-6D, and the Health Assessment Questionnaire (HAQ) disability index (DI) in patients with rheumatoid arthritis (RA), and to characterize components that are predictors of health utility. Methods. Patients with RA participating in 2 studies in the UK (n ⴝ 151) and Canada (n ⴝ 319) completed the HAQ, EQ-5D, and Short Form 36 (SF-36). The SF-36, a generic measure of quality of life, was converted into the preferencebased SF-6D. From these results we developed models of the relationship between the HAQ and SF-6D and EQ-5D using various regression analyses. Results. The optimal model developed for the EQ-5D entered levels for each item as independent variables (model 5). A root mean square error (RMSE) of 0.18 suggested relatively good predictive ability. For the SF-6D, RMSEs were lower (0.09), suggesting better predictions than for the EQ-5D, but models with more explanatory variables did not improve results (model 2 or 4 optimal). The models were able to predict actual SF-6D and EQ-5D across the range of the HAQ DI. Conclusion. Our approach enabled calculations of quality-adjusted life years from existing trials where only the HAQ was measured. All aspects of the HAQ may not be reﬂected in the preference-based measures, and this method is suboptimal to direct measurement of health state utility in clinical trials. Given this limitation, our approach provides an alternative for researchers who need health-state utility values, but had not included a preference-based measure in their clinical study because of resource constraints or a desire to limit patient burden. KEY WORDS. Economics; Utility theory; Rheumatoid arthritis; Quality-adjusted life years. INTRODUCTION Given the scarcity of health care resources, public and private agencies have become interested in both the effectiveness and cost-effectiveness of health care interventions (1). The preferred approach toward measuring beneﬁts in 1 Nick Bansback, MSc: St. Paul’s Hospital, Vancouver, British Columbia, Canada, and the University of Shefﬁeld, Shefﬁeld, UK; 2Carlo Marra, PharmD, PhD: University of British Columbia, and the Vancouver Coastal Research Institute, Vancouver, British Columbia, Canada; 3Aki Tsuchiya, PhD, John Brazier, PhD: University of Shefﬁeld, Shefﬁeld, UK; 4Aslam Anis, PhD: University of British Columbia, and St. Paul’s Hospital, Vancouver, British Columbia, Canada; 5Daphne Guh, MSc: St. Paul’s Hospital, Vancouver, British Columbia, Canada; 6Tony Hammond, MD: Maidstone Hospital, Kent, UK. Address correspondence to Nick Bansback, MSc, Centre for Health Evaluation and Outcome Sciences, St. Paul’s Hospital, 570-24 1081 Burrard Street, Vancouver, British Columbia, Canada V6Z 1Y6. E-mail: firstname.lastname@example.org. Submitted for publication April 24, 2006; accepted in revised form January 11, 2007. cost-effectiveness analyses is to value health status in a single unit of measurement known as utilities, which are used to derive quality-adjusted life years (QALYs). Instead of receiving full credit for each year of life, QALYs weight the impact of morbidity. For example, patients with severe disability (Health Assessment Questionnaire [HAQ] score ⬎2) may receive credit for living 5 months of good health for each year they are alive (2). QALYs in cost-effectiveness analyses (known as cost-utility analyses [CUAs]) are particularly informative for health policy decisions because they allow direct comparison of the efﬁciency of health care resource expenditure across a wide variety of conditions and treatments (3). Utilities are obtained by asking patients to make judgments or reveal preferences about changes in particular health states or outcomes. Preference-based instruments are formal methods for quantifying these judgments. These instruments fall into 2 groups: direct measures such as a standard gamble (SG) or time trade-off (TTO) questionnaire, or indirect measures where a generic instrument (such as the EuroQol [EQ-5D] or Health Utilities Index) has previously been populated 963 964 with preference values from general population samples (1). Utilities obtained by indirect methods are recommended by the US Panel on Cost-Effectiveness in Health and Medicine and the Outcome Measures in Rheumatology Clinical Trials (OMERACT) Consensus-Based Reference Case for Economic Evaluation in Rheumatoid Arthritis (3,4). Many clinical studies do not use a preference-based measure due to lack of resources or time, or because the commonly used generic preference-based measures are regarded as unsuitable for the condition (5). In a majority of rheumatoid arthritis (RA) clinical trials, the HAQ is the primary and often sole measure of quality of life (6). Although the HAQ was primarily designed to measure only aspects of physical function and pain, it has been shown to be highly correlated with many generic and disease-speciﬁc measures of health-related quality of life (7). Subsequently, linear transformations between the HAQ and utility have previously been used in CUA (8,9). While other disease-speciﬁc measures such as the Rheumatoid Arthritis Quality of Life questionnaire have been developed, only more recent clinical trials have used a preferencebased measure (10). As a result, the results of many clinical studies are not amenable to populating CUA. Because new programs and treatments in RA are competing alongside other disease areas for funding, it is important for the rheumatology community to be able to demonstrate the value of their interventions to policy makers. Estimating a relationship between the HAQ and a preference-based measure would make it possible to estimate QALY scores from existing clinical data where the HAQ has been measured but preference-based instruments have not (5,11). Moreover, in trials where one such preference-based instrument has been measured, it could also be possible to evaluate another. Such analyses have previously been attempted for outcomes in asthma and obesity (11,12). In the present study, we used data from the UK and Canada to map 2 preference-based instruments, the EQ-5D and the SF-6D, from the HAQ questionnaire. We went on to demonstrate how the results can be used in practice. MATERIALS AND METHODS Instruments. Health Assessment Questionnaire. The HAQ is a self-completed questionnaire, developed as a comprehensive measure of outcome in patients with a wide variety of rheumatic diseases, including RA, osteoarthritis, juvenile RA, lupus, scleroderma, ankylosing spondylitis, ﬁbromyalgia, and psoriatic arthritis. Although the complete form of the HAQ includes an assessment of mortality, disability, pain and symptom levels, drug side effects, and resource utilization, most studies in practice only use the physical disability scale. This scale assesses upper and lower limb function in relation to the degree of difﬁculty encountered in performing daily living tasks, which include walking, dressing, bathing, and shopping. The HAQ contains 20 items distributed across 8 components. The scores range from 0 (without any difﬁculty) to 3 Bansback et al (unable to do). The highest score on any item within 1 component represents the dimension score. The respondent also indicates whether he or she uses aids or devices (14 items) or help from other individuals (8 items), totaling 42 individual items. The scores for each dimension are corrected for the use of aids or devices, summated, and transformed to give an overall disability index (DI) score between 0 and 3. A score of 0 represents no disability and 3 represents very severe, high-dependency disability (6). EQ-5D. The EQ-5D has 5 dimensions: mobility, selfcare, usual activities, pain/discomfort, and anxiety/depression. Each dimension has 1 item, and each item has 3 levels with 1 denoting no problems and 3 denoting extreme problems (13). The number of theoretically possible health states is 35 ⫽ 243. The EQ-5D can be reported in terms of a 5-digit proﬁle indicating the level on each dimension, or in terms of a preference-based single index number. The latter is obtained by applying algorithms that link the 5-digit health state description with average valuations obtained from members of the public using the TTO method or a visual analog scale. In this study, EQ-5D indices were obtained using the so-called Measurement and Valuation of Health (MVH A1) value set, derived from a population survey in the UK using 10-year TTOs (14). SF-6D. The SF-6D was derived from the Short Form 36 (SF-36) (15). The SF-36 is a generic measure of health that generates scores across 8 dimensions of health (16). It has become one of the most widely used generic measures of health throughout the world, but it was not originally designed for use in economic evaluation. A research team at the University of Shefﬁeld in collaboration with Dr. John Ware estimated a preference-based single index measure of health from the SF-36 (15). The index is estimated via a health state classiﬁcation called the SF-6D derived from the SF-36 and is composed of 6 multilevel dimensions of health. It was constructed from a sample of 11 items selected from the SF-36 to minimize the loss of descriptive information and deﬁnes 18,000 health states. A selection of 249 states deﬁned by the SF-6D has been valued by a representative sample of the UK general population (n ⫽ 611) using the SG valuation technique. Like the EQ-5D, regression models were estimated to predict single index scores for all health states deﬁned by the SF-6D. The resultant algorithm can be used to convert SF-36 data at the individual level to a preference-based index. Study populations. Participants from 2 locations were recruited. In Vancouver, Canada, 319 patients from 8 private rheumatology ofﬁces with a clinical diagnosis of RA were followed up quarterly between October 2001 and September 2002 during 3 periods. In Maidstone Hospital, UK, 151 patients with a clinical diagnosis of RA from the department of rheumatology who were under routine treatment were assessed in 2001. All patients self-administered the HAQ, the SF-36, and the EQ-5D, in no particular order, at each clinic visit. We recruited 2 samples in order to generate an algorithm more generalizable to external populations. Estimating a Preference-Based Single Index From the HAQ Statistical analysis. For the primary analysis, the relationships between scores on the EQ-5D, SF-6D, and HAQ DI were examined by ﬁtting linear regression models estimated by generalized estimating equation algorithms where the correlation matrix takes the structure of an autoregressive of order 1. We evaluated 5 different regression models. Model 1 regressed only the HAQ DI onto the EQ-5D and SF-6D. Model 2 used all 8 domain scores, treating each as a continuous variable. Model 3 incorporated all 42 items of the HAQ (the 20 items that make up the domain scores along with the 22 questions surrounding aids or devices or help from other individuals), treating each as a continuous variable. Model 4 was the same as model 2 but treated each domain as a categorical variable with 4 levels, whereas model 5 was the same as model 3 but treated each item as a categorical variable. Each successive model required fewer assumptions surrounding items and response choices between intervals carrying equal weight, but also increased the chances of incorporating arbitrary associations. The signiﬁcance or sign of the beta coefﬁcients was not of primary interest in this exercise given that we were interested in predictive ability rather than explanatory power of the variables. Because most data sets will collect all items of the HAQ, all coefﬁcients were included in the ﬁnal models of 1, 2, and 3. This was not practical for models 4 and 5 due to the large number of dummy variables in models 4 and 5. Instead, models 4 and 5 were developed using a backwards stepwise selection procedure, systematically removing the least signiﬁcant variable until only signiﬁcant variables remained (P ⬍ 0.05). The criterion for judging the performance of each model is the difference between observed and predicted outcomes as reported in terms of the root mean square error (RMSE) (11). Although there are a number of alternative measures for accuracy of prediction (e.g., mean absolute error or intraclass correlation), because the RMSE favors prediction models that do not produce particularly large errors, it was considered to be the most indicative measure given that the objective of the analyses was to predict the mean EQ-5D and SF-6D scores for a cohort based on the individual HAQ DI scores, and not to predict individual scores or look for explanatory relationships (12). The goodness of ﬁt for each model was also reported in terms of the marginal R2, which accounts for the multiple observations from individuals (17). Residual plots were examined for nonlinear patterns and nonconstant error variance. Three-fold cross validation was then used to evaluate models. Data were randomly split into 3 subsets stratiﬁed by country. Of the 3 subsets, 2 subsets were used as training data and the remaining subset was retained as the validation data for testing the model. The process was then repeated 3 times so that each of the 3 subsets was used as the validation data exactly once. The root of the summation of mean squared errors over the 3 validations was then compared among models. The predictive performance in the UK and Canadian samples was also assessed. The generalizability of the ﬁnal models to alternative patient populations was also examined by including the covariates age, sex, RA duration, tender joint count, and swollen joint count into 965 the multivariate regression to determine whether they were important additional predictors. RESULTS Patient demographics. At baseline, patients in the Canadian cohort were slightly older (61 years versus 56 years; P ⬍ 0.001) and a greater percentage of patients were women (78% versus 67%; P ⬍ 0.01) (Table 1). The mean HAQ score in the UK patients was substantially higher (1.41 versus 1.11; P ⬍ 0.01). This was reﬂected in both the EQ-5D scores where UK patients had a statistically significant different mean score of 0.51 versus 0.63 in the Canadian patients, and in the SF-6D where UK patients had a mean score of 0.62 versus 0.68 in the Canadian sample. These scores compare with age- and sex-adjusted general population values of 0.79 and 0.77 for the EQ-5D (UK and Canadian, respectively) and 0.78 and 0.77 for the SF-6D (UK and Canadian, respectively) (18). When the complete HAQ DI and either the SF-6D or EQ-5D were available, they were included in the analysis; otherwise the record was excluded. In total, 131 records were included from the UK cohort, and 308, 258, and 226 records were included from the Canadian cohort at baseline, 3 months, and 6 months, respectively. Prediction models. Each of the candidate models was evaluated and those with the smallest RMSE in the crossvalidation analysis were chosen as the optimal prediction models. The coefﬁcients from the combined data source for the optimal models are shown in Table 2. Regardless of which model was used, elements of arising, eating, walking, hygiene, and grip were consistent statistically significant predictors of both health utility measures. All coefﬁcients were negative except for hygiene. Examination of the residual plots (Figure 1) suggested relatively linear models with constant error variance. Models 2 and 4 were equally the best performing models for predictions of the SF-6D, with RMSEs equal to 0.09. Model 2 regressed the SF-6D indices onto the 8 HAQ DI dimension scores, with each dimension treated as a continuous variable. This assumes that the 42 items of the HAQ DI carry equal weight within a given domain and the intervals between response choices for each item are equal. Model 4 was less restrictive by entering each level of the domain as a dummy variable with level 1 as the baseline (i.e., 3 ⫻ 8 dummy codes representing the 4 possible responses for each dimension), allowing each dimension to have ordinal properties. In the ﬁnal model, 13 of the 24 variables were included in the SF-6D (Table 2). Both models had marginal R2 values ⬎0.5. The model with the most covariates (model 5) was considered the optimal model for the EQ-5D, with an RMSE equal to 0.18. In model 5, the EQ-5D indices were regressed on the individual levels of the HAQ DI item scores, where each level was entered as a dummy variable with level 1 as the baseline (i.e., 3 ⫻ 20 dummy codes representing the 4 possible responses for each item of the 8 domains, and 1 ⫻ 22 dummy codes representing the dichotomous parameters). This model made the least strin- 966 Bansback et al Table 1. Summary statistics of baseline characteristics in the 2 cohorts* UK (n ⴝ 131) Female sex, % 67 Age, years 55.98 ⫾ 13.68 (17–82) RA duration, years ‡ Tender joint count ‡ Swollen joint count ‡ HAQ disability Number 131 Index 1.41 ⫾ 0.80 (0–3) Domains, modal level (% of total) (range) Dressing and grooming 2 (35) (0–3) Rising 1 (41) (0–3) Eating 1 (35) (0–3) Walking 2 (41) (0–3) Hygiene 2 (43) (0–3) Reach 3 (30) (0–3) Grip 2 (57) (0–3) Activities 2 (35) (0–3) SF-6D Number 129 Index 0.62 ⫾ 0.11 (0.27–0.92) Domains, modal level (% of total) (range) Physical functioning 4 (31) (1–6) Role limitation 4 (46) (1–4) Social functioning 3 (36) (1–5) Pain 5 (33) (1–6) Mental health 3 (36) (1–5) Energy and vitality 5 (34) (1–5) EQ-5D Number 131 Index 0.51 ⫾ 0.31 (⫺0.35–1) Domains, modal level (% of total) (range) Mobility 2 (78) (1–2) Self-care 1 (52) (1–3) Usual activities 2 (71) (1–3) Pain 2 (77) (1–3) Anxiety 1 (52) (1–3) Canada (n ⴝ 308) Total (n ⴝ 439) P† 78 61.35 ⫾ 13.71 (19–90) 13.98 ⫾ 11.64 (0–57) 15.01 ⫾ 12.08 (0–52) 9.13 ⫾ 9.66 (0–43) 76 60.76 ⫾ 13.61 (17–90) ‡ ‡ ‡ 0.01 ⬍ 0.01 – – – 308 1.11 ⫾ 0.77 (0–3) 439 1.15 ⫾ 0.78 (0–3) ⬍ 0.01 0 (46) (0–3) 0 (54) (0–3) 0 (40) (0–3) 0 (45) (0–3) 3 (30) (0–3) 0 (31) (0–3) 2 (61) (0–3) 2 (28) (0–3) 0 (39) (0–3) 0 (44) (0–3) 0 (35) (0–3) 0 (41) (0–3) 0 (31) (0–3) 2 (30) (0–3) 2 (59) (0–3) 2 (30) (0–3) ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 0.02 ⬍ 0.01 ⬍ 0.01 0.01 0.47 302 0.68 ⫾ 0.13 (0.26–1) 431 0.68 ⫾ 0.13 (0.26–1) ⬍ 0.01 5 (30) (1–6) 2 (63) (1–4) 3 (43) (1–5) 4 (27) (1–6) 2 (41) (1–5) 3 (35) (1–5) 5 (28) (1–6) 2 (54) (1–4) 3 (40) (1–5) 4 (27) (1–6) 2 (38) (1–5) 3 (33) (1–5) ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 0.01 ⬍ 0.01 308 0.63 ⫾ 0.25 (⫺0.48–1) 2 (62) (1–3) 1 (71) (1–3) 2 (66) (1–3) 2 (79) (1–3) 1 (64) (1–3) 439 0.62 ⫾ 0.27 (⫺0.48–1) ⬍ 0.01 2 (66) (1–3) 1 (65) (1–3) 2 (63) (1–3) 2 (79) (1–3) 1 (60) (1–3) ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 0.01 ⬍ 0.01 * Values are the mean ⫾ SD (range) unless otherwise indicated. RA ⫽ rheumatoid arthritis; HAQ ⫽ Health Assessment Questionnaire; EQ-5D ⫽ EuroQol. † Ordinal data compared using independent sample t-tests, categorical data compared using chi-square test. ‡ Missing data. gent assumptions and did not assume that the response choices have ordinal properties (Table 2). Again, the marginal R2 value of the model was ⬎0.5. While the RMSE can be used to choose which of the candidate models performs best, no deﬁnition exists of what level of RMSE should be considered acceptable for ﬁtting purposes. Figure 2 demonstrates that across the range of the HAQ DI, the optimal model predictions for both the EQ-5D and SF-6D were close to that observed. Only in the ﬁrst group (HAQ 0 – 0.5) was the prediction signiﬁcantly different from the actual utility (P ⬍ 0.01). Even in the higher HAQ groups where there were fewer patients, the predictions appeared to be robust. Generalizability. We attempted to assess the generalizability of the prediction models by including other characteristics of the study populations. We found that the Canadian population had a small but signiﬁcantly higher estimated utility score compared with the UK cohort, above what was explained by the HAQ DI (B ⫽ 0.06 for the EQ-5D and B ⫽ 0.04 for the SF-6D, P ⬍ 0.05). However, because the estimated effect of HAQ elements was not changed when a country was added to the models as a covariate, an estimated utility gain using these algorithms would not be affected by which country patients in the population were from. Of the other clinical variables examined in the Canadian baseline data, none were statistically signiﬁcant for the EQ-5D, whereas only the number of tender joints was found to be a signiﬁcant predictor for the SF-6D (B ⫽ ⫺0.0016, P ⬍ 0.05). However, the inclusion of clinical variables did not improve the predictive performance of the ﬁnal models for either the EQ-5D or the SF-6D. Application. A simple example of how to use the algorithms is given in Figure 3 (a downloadable Excel sheet is available at http://www.pharmacoeconomics.ubc. ca/downloads.html). For each patient in each strategy, the Estimating a Preference-Based Single Index From the HAQ 967 Table 2. Optimal regression equations for the SF-6D (models 2 and 4) and EQ-5D (model 5)* Domain/item Model 2 SF-6D Dressing and grooming Arising Eating Walking Hygiene Reach Grip Activities Constant Model 4 SF-6D Arising ⫽ 1 Arising ⫽ 2 Arising ⫽ 3 Eating ⫽ 1 Eating ⫽ 2 Eating ⫽ 3 Walking ⫽ 2 Walking ⫽ 3 Hygiene ⫽ 1 Reach ⫽ 1 Reach ⫽ 2 Reach ⫽ 3 Grip ⫽ 2 Constant Model 5 EQ-5D Dressing and grooming H1 ⫽ 2 Arising H4 ⫽ 1 H4 ⫽ 2 H4 ⫽ 3 Eating H6 ⫽ 2 H7 ⫽ 1 H7 ⫽ 2 Walking H8 ⫽ 2 H9 ⫽ 3 Aids or devices H13 ⫽ 2 H16 ⫽ 1 Hygiene H23 ⫽ 1 H24 ⫽ 1 H24 ⫽ 2 Reach H26 ⫽ 2 H26 ⫽ 3 Grip H27 ⫽ 2 H27 ⫽ 3 H28 ⫽ 3 RMSE develop RMSE cross validation RMSE Canada RMSE UK Marginal R2 0.09 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 0.07 0.01 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 0.089 0.085 0.082 0.099 0.50 0.01 0.01 0.04 0.01 0.01 0.02 0.01 0.02 0.01 0.01 0.01 0.01 0.01 0.01 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 0.01 0.02 ⬍ 0.01 ⬍ 0.01 ⬍ 0.01 0.089 0.084 0.081 0.099 0.51 ⫺0.15 0.04 ⬍ 0.01 0.183 0.178 0.161 0.241 0.57 ⫺0.08 ⫺0.12 ⫺0.59 0.02 0.05 0.08 ⬍ 0.01 0.02 ⬍ 0.01 ⫺0.15 ⫺0.04 ⫺0.08 0.05 0.02 0.03 0.01 0.02 0.01 ⫺0.10 0.12 0.04 0.05 0.03 0.02 ⫺0.14 0.07 0.04 0.03 ⬍ 0.01 0.01 ⫺0.05 ⫺0.05 ⫺0.11 0.02 0.02 0.04 ⬍ 0.01 0.01 ⬍ 0.01 ⫺0.14 ⫺0.13 0.04 0.06 ⬍ 0.01 0.03 ⫺0.08 ⫺0.20 0.04 0.07 0.04 ⬍ 0.01 B SE P ⫺0.01 ⫺0.03 ⫺0.02 ⫺0.01 0.01 ⫺0.01 ⫺0.01 ⫺0.02 0.79 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 ⫺0.03 ⫺0.05 ⫺0.11 ⫺0.02 ⫺0.04 ⫺0.06 ⫺0.02 ⫺0.07 ⫺0.02 ⫺0.02 ⫺0.02 ⫺0.04 ⫺0.02 0.78 (continued) HAQ DI should be translated to either the EQ-5D or SF-6D at pre- and postintervention. The average utility can then be developed for each time interval. A way of calculating a QALY would be to then calculate an appropriate area under the curve (e.g., [pre-utility ⫹ post-utility] / 2 ⫻ elapsed time [in years]) and multiply by the annual sur- 968 Bansback et al Table 2. Optimal regression equations for the SF-6D (models 2 and 4) and EQ-5D (model 5)* (Continued) Domain/item Activities H30 ⫽ 1 H31 ⫽ 1 H31 ⫽ 2 H32 ⫽ 3 Constant B SE P ⫺0.05 ⫺0.07 ⫺0.08 ⫺0.09 0.80 0.02 0.02 0.04 0.03 0.01 ⬍ 0.01 ⬍ 0.01 0.03 ⬍ 0.01 ⬍ 0.01 RMSE develop RMSE cross validation RMSE Canada RMSE UK Marginal R2 * RMSE ⫽ root mean square error; H1 ⫽ dress yourself, including tying shoelaces and doing buttons; H4 ⫽ get in and out of bed; H6 ⫽ lift a full cup or glass to your mouth; H7 ⫽ open a new milk carton; H8 ⫽ walk outdoors on ﬂat ground; H9 ⫽ climb up 5 steps; H13 ⫽ wheelchair; H16 ⫽ chair; H23 ⫽ take a tub bath; H24 ⫽ get on and off the toilet; H26 ⫽ bend down to pick up clothing from the ﬂoor; H27 ⫽ open car doors; H28 ⫽ open jars that have been previously opened; H30 ⫽ run errands and shop; H31 ⫽ get in and out of a car; H32 ⫽ do chores such as vacuuming or yardwork. vival percentage. This describes a simple method for developing just one component in a cost-effectiveness model. The incorporation of costs, extrapolation of costs and beneﬁts, and a comparison between at least 2 strategies are just a few additional requirements before an incremental cost-effectiveness ratio can be derived (1). An example of how the models would predict utility gains is given in Figure 4. We divided the patients in the Canadian sample into responders and nonresponders based on whether they had an improvement in HAQ DI by what is deﬁned as a minimum important difference equal to 0.25 (19). The difference between estimated and observed mean utility gains was small (0.13 versus 0.10 for the EQ-5D and 0.05 versus 0.06 for the actual and observed gains, respectively). Figure 1. Predicted versus actual EuroQol (EQ-5D; model 5) and SF-6D (model 4) scores. DISCUSSION We anticipated that models with more available predictors would account for a higher proportion of the variance and would therefore perform better as measured by the RMSE. Although this hypothesis was accurate for the EQ-5D where model 5 proved to be the best performing, it was not the case for the SF-6D. The results from the cross validation are conceivably the most important because they predict how generalizable the models will be to external populations. From this we found model 5 to be the most appropriate model for estimating the EQ-5D, whereas model 2 or model 4 was the most appropriate for the SF-6D. The performance of models for the SF-6D always Figure 2. Predicted and actual EuroQol (EQ-5D; model 5) and SF-6D (model 4) scores and conﬁdence intervals across Health Assessment Questionnaire (HAQ) groups for all observations. Estimating a Preference-Based Single Index From the HAQ Figure 3. Example of calculation required for estimating a preference-based index. HAQ ⫽ Health Assessment Questionnaire. outperformed models for the EQ-5D (e.g., lower RMSEs) due to the smaller scale range of the SF-6D. Although the beneﬁts of using the later models versus the simple estimate in model 1 would seem small in terms of the improvement in RMSEs, overall these models will provide more accurate estimates, partly due to their ability to account for the small nonlinearity seen in the relationship between the HAQ and utility, particularly at severe states of disability (Figure 2). Figure 4. Example of the algorithm’s performance for predicting change in utility for patients in the Canadian cohort achieving a minimally important difference in Health Assessment Questionnaire (HAQ) score (or not) from 0 to 6 months (model 5 for EuroQol [EQ-5D] and model 4 for SF-6D). 969 There are a number of important issues that need further consideration, the ﬁrst being whether these results would be generalizable to external populations. To address this issue, we developed the models using data from 2 different sources, one from the UK and the other from Canada. Patients in the Canadian data set were older but had less severe RA. Patient heterogeneity within the cohorts is important because it means the models can be used for estimation across a wider range of patients. The country effect that was discovered appears not to be due to age because the Canadian population was older, but could have been due to characteristics not measured in our cohorts. The models were tested on both the UK and Canadian samples. The RMSEs were always higher for the UK population because the models were developed based on more observations from the Canadian sample. We also found that including some additional clinical variables did not improve the predictions, further suggesting that the algorithms should be as applicable in patients with only a few joints involved as in persons with multiple joint involvement. Although the populations in our sample have similar disease characteristics to patients in many studies recently published (20,21), external validation would add assurance to the results, particularly in patients with more mild and severe disease. Second, it has been argued that the HAQ DI does not adequately measure aspects of quality of life, measured by the preference-based instruments such as mental health and pain (22). We did not have sufﬁcient data to examine the additive inﬂuence of other components in the HAQ questionnaire, such as the pain score. Nevertheless, the models demonstrate that the HAQ DI does explain much of the preference-based measures we have studied, with relatively small RMSEs. Perhaps such aspects of quality of life such as pain are highly correlated to domains and therefore are indirectly covered. Such complex interactions might be the reason for the positive correlation between worsening hygiene and improvement in health utility. The purpose of this study was not to explain why there is a relationship between the 2 measures, but rather to explore if there is a translation between the 2 measures. Importantly, the method described in this report is not designed and would not accurately predict the utility of an individual but rather would only predict the average utility of a cohort. In this respect the models seem to perform well (Figures 2 and 4). Conversely, it is plausible that aspects of RA captured by the HAQ DI might not be covered in the preference-based measures. Concerns about the EQ-5D and SF-6D in patients with RA have previously been demonstrated (23). The purpose of this report is not to make claims on the superiority or defects of different preference-based measures, but to give researchers a method of estimating what are now frequently used instruments. Last, this exercise provides a method that will always be suboptimal in comparison with a trial that uses a preference-based questionnaire directly. Given the objectives of the study, there are other approaches that could be used to derive a single index from the HAQ DI. A survey of the general population could be used to value a sample of states deﬁned by the HAQ DI using a preference-elicitation 970 Bansback et al technique such as SG or TTO. This would not only generate an enormous number of health states but more importantly each state would contain 42 pieces of information, which most respondents would ﬁnd impossible to process. Instead, a selection of the most important items of the HAQ DI could be selected, similar to how the SF-6D uses only 15 questions from the SF-36. Another approach is to administer the HAQ alongside a preference-elicitation technique such as TTO and SG. Regression techniques could then estimate preference weights for each of the items of the HAQ DI using the SG or TTO response as the dependent variable. However, results from such a study would not meet the reference case for either the National Institute for Health and Clinical Excellence or the Washington Panel on Cost Effectiveness in Medicine who prefer social preferences elicited using a choice-based method (3,24). This exercise could act as a precursor to such studies, but given limited resources, we have undertaken a more pragmatic approach. Much of this report has concentrated on studies in which no preference-based measure has been administered. Given that the SF-6D does not perform well in patients with severe RA due to a ﬂoor effect, there is a potential use when only 1 preference-based questionnaire is administered (21,25). This is the case in the British Society for Rheumatology Biologics Registry, which measured only the SF-36 (26). The algorithms in this study allowed an estimate of EQ-5D utility to also be calculated (27). The approach examined in this article is intended to empirically map the relationship between a non–preference-based health-related quality of life instrument and a preference-based measure. This approach has the advantage of being able to utilize existing valuation data and offers a shortcut for researchers who need health-state utility values, but have not used a preference-based measure in their clinical study because of resource constraints or a desire to limit the patient burden. This could be used to estimate the improvement in utility in important trials such as the Anti–Tumor Necrosis Factor Trial in Rheumatoid Arthritis with Concomitant Therapy (ATTRACT) trial of inﬂiximab or the Trial of Etanercept and Methotrexate with Radiographic Patient Outcomes (TEMPO) where no preference-weighted instrument was used (18,19). The results presented here suggest that such a model can be useful in predicting preference-based values and that the developed models have reasonable predictive ability. AUTHOR CONTRIBUTIONS Mr. Bansback had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study design. Bansback, Marra, Tsuchiya, Anis, Brazier. Acquisition of data. Bansback, Marra, Anis, Hammond. Analysis and interpretation of data. Bansback, Marra, Tsuchiya, Anis, Guh, Brazier. Manuscript preparation. Bansback, Marra, Tsuchiya, Anis, Guh, Hammond, Brazier. Statistical analysis. Bansback, Guh. REFERENCES 1. Drummond MF, Stoddart GL, Torrance GW. Methods for the economic evaluation of health care programmes. 2nd ed. Oxford: Oxford Medical Publications; 1987. 2. Wong JB. Cost-effectiveness of anti-tumor necrosis factor agents [review]. Clin Exp Rheumatol 2004;22:S65–70. 3. Gold MR, Siegel JE, Russell LB, Weinstein MC. Cost-effectiveness in health and medicine. Oxford: Oxford University Press; 1996. 4. Maetzel A, Tugwell P, Boers M, Guillemin F, Coyle D, Drummond M, et al. Economic evaluation of programs or interventions in the management of rheumatoid arthritis: deﬁning a consensus-based reference case. J Rheumatol 2003;30:891– 6. 5. Brazier J, Deverill M, Green C, Harper R, Booth A. A review of the use of health status measures in economic evaluation [review]. Health Technol Assess 1999;3:i–iv, 1–164. 6. Fries JF, Spitz PW, Young DY. The dimensions of health outcomes: the Health Assessment Questionnaire, disability and pain scales. J Rheumatol 1982;9:789 –93. 7. Scott DL, Garrood T. Quality of life measures: use and abuse [review]. Baillieres Best Pract Res Clin Rheumatol 2000;14: 663– 87. 8. Brennan A, Bansback N, Reynolds A, Conway P. Modelling the cost-effectiveness of etanercept in adults with rheumatoid arthritis in the UK. Rheumatology (Oxford) 2004;43:62–72. 9. Barton P, Jobanputra P, Wilson J, Bryan S, Burls A. The use of modelling to evaluate new drugs for patients with a chronic condition: the case of antibodies against tumour necrosis factor in rheumatoid arthritis. Health Technol Assess 2004;8:1– 91. 10. Torrance GW, Tugwell P, Amorosi S, Chartash E, Sengupta N. Improvement in health utility among patients with rheumatoid arthritis treated with adalimumab (a human anti-TNF monoclonal antibody) plus methotrexate. Rheumatology (Oxford) 2004;43:712– 8. 11. Tsuchiya A, Brazier J, McColl E, Parkin D. Deriving preference-based single indices from non-preference based condition-speciﬁc instruments: converting the AQLQ into EQ5D indices. URL: http://www.shef.ac.uk/content/1/c6/01/87/47/ 02_1FT.pdf. 12. Brazier JE, Kolotkin RL, Crosby RD, Williams GR. Estimating a preference-based single index for the Impact of Weight on Quality of Life-Lite (IWQOL-Lite) instrument from the SF-6D. Value Health 2004;7:490 – 8. 13. Brooks R. EuroQol: the current state of play. Health Policy 1996;37:53–72. 14. Dolan P. Modeling valuations for EuroQol health states. Med Care 1997;35:1095–108. 15. Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271–92. 16. Ware JE, Snow K, Kosinski M, Gandek B. SF-36 Health survey: manual and interpretation guide. Boston: The Health Institute, New England Medical Center; 1993. 17. Zheng B. Summarizing the goodness of ﬁt of generalized linear models for longitudinal data. Stat Med 2000;19:1265– 75. 18. Hanmer J, Lawrence WF, Anderson JP, Kaplan RM, Fryback DG. Report of nationally representative values for the noninstitutionalized US adult population for 7 health-related quality-of-life scores. Med Decis Making 2006;26:391– 400. URL: http://www.ohsu.edu/epc/mdm. 19. Wells GA, Tugwell P, Kraag GR, Baker PR, Groh J, Redelmeier DA. Minimum important difference between patients with rheumatoid arthritis: the patient’s perspective. J Rheumatol 1993;20:557– 60. 20. Maini R, St Clair EW, Breedveld F, Furst D, Kalden J, Weisman M, et al, for the ATTRACT Study Group. Inﬂiximab (chimeric anti-tumour necrosis factor ␣ monoclonal antibody) versus placebo in rheumatoid arthritis patients receiving concomitant methotrexate: a randomised phase III trial. Lancet 1999;354:1932–9. Estimating a Preference-Based Single Index From the HAQ 21. Klareskog L, van der Heijde D, de Jager JP, Gough A, Kalden J, Malaise M, et al. Therapeutic effect of the combination of etanercept and methotrexate compared with each treatment alone in patients with rheumatoid arthritis: double-blind randomised controlled trial. Lancet 2004;363:675– 81. 22. Wolfe F, Michaud K. HAQ-based utilities and SF6D systematically overvalue quality of life (QOL) in RA patients with severe RA, pain and psychological distress [abstract]. Arthritis Rheum 2003;48 Suppl 9:S399. 23. Marra CA, Esdaille JM, Guh D, Kopec JA, Brazier JE, Koehler BE, et al. A comparison of four indirect methods of assessing values in rheumatoid arthritis. Med Care 2004;42:1125–31. 24. National Institute for Clinical Excellence. Guide to the methods of technology appraisal. URL: http://www.nice.org.uk/ download.aspx?0⫽201973. 971 25. Brazier J, Roberts J, Tsyuchiya A, Busschbach J. A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ 2004;13:873– 84. 26. Hyrich KL, Symmons DP, Watson KD, Silman AJ, on behalf of the British Society for Rheumatology Biologics Register. Comparison of the response to inﬂiximab or etanercept monotherapy with the response to cotherapy with methotrexate or another disease-modifying antirheumatic drug in patients with rheumatoid arthritis: results from the British Society for Rheumatology Biologics Registry. Arthritis Rheum 2006;54:1786 –94. 27. Brennan A, Bansback N, Nixon RM, Madan J, Harrison M, Watson K, et al. Modeling the cost effectiveness of TNF alpha antagonists in the management of rheumatoid arthritis: results from the British Society for Rheumatology Biologies Registry. Rheumatology (Oxford). In press.