close

Вход

Забыли?

вход по аккаунту

?

Using the health assessment questionnaire to estimate preference-based single indices in patients with rheumatoid arthritis.

код для вставкиСкачать
Arthritis & Rheumatism (Arthritis Care & Research)
Vol. 57, No. 6, August 15, 2007, pp 963–971
DOI 10.1002/art.22885
© 2007, American College of Rheumatology
ORIGINAL ARTICLE
Using the Health Assessment Questionnaire to
Estimate Preference-Based Single Indices in
Patients With Rheumatoid Arthritis
NICK BANSBACK,1 CARLO MARRA,2 AKI TSUCHIYA,3 ASLAM ANIS,4 DAPHNE GUH,5
TONY HAMMOND,6 AND JOHN BRAZIER3
Objective. To estimate the relationship between preference-based measures, EuroQol (EQ-5D) and SF-6D, and the Health
Assessment Questionnaire (HAQ) disability index (DI) in patients with rheumatoid arthritis (RA), and to characterize
components that are predictors of health utility.
Methods. Patients with RA participating in 2 studies in the UK (n ⴝ 151) and Canada (n ⴝ 319) completed the HAQ,
EQ-5D, and Short Form 36 (SF-36). The SF-36, a generic measure of quality of life, was converted into the preferencebased SF-6D. From these results we developed models of the relationship between the HAQ and SF-6D and EQ-5D using
various regression analyses.
Results. The optimal model developed for the EQ-5D entered levels for each item as independent variables (model 5). A
root mean square error (RMSE) of 0.18 suggested relatively good predictive ability. For the SF-6D, RMSEs were lower
(0.09), suggesting better predictions than for the EQ-5D, but models with more explanatory variables did not improve
results (model 2 or 4 optimal). The models were able to predict actual SF-6D and EQ-5D across the range of the HAQ DI.
Conclusion. Our approach enabled calculations of quality-adjusted life years from existing trials where only the HAQ
was measured. All aspects of the HAQ may not be reflected in the preference-based measures, and this method is
suboptimal to direct measurement of health state utility in clinical trials. Given this limitation, our approach provides an
alternative for researchers who need health-state utility values, but had not included a preference-based measure in their
clinical study because of resource constraints or a desire to limit patient burden.
KEY WORDS. Economics; Utility theory; Rheumatoid arthritis; Quality-adjusted life years.
INTRODUCTION
Given the scarcity of health care resources, public and
private agencies have become interested in both the effectiveness and cost-effectiveness of health care interventions
(1). The preferred approach toward measuring benefits in
1
Nick Bansback, MSc: St. Paul’s Hospital, Vancouver,
British Columbia, Canada, and the University of Sheffield,
Sheffield, UK; 2Carlo Marra, PharmD, PhD: University of
British Columbia, and the Vancouver Coastal Research Institute, Vancouver, British Columbia, Canada; 3Aki Tsuchiya, PhD, John Brazier, PhD: University of Sheffield, Sheffield, UK; 4Aslam Anis, PhD: University of British Columbia,
and St. Paul’s Hospital, Vancouver, British Columbia, Canada; 5Daphne Guh, MSc: St. Paul’s Hospital, Vancouver,
British Columbia, Canada; 6Tony Hammond, MD: Maidstone Hospital, Kent, UK.
Address correspondence to Nick Bansback, MSc, Centre
for Health Evaluation and Outcome Sciences, St. Paul’s Hospital, 570-24 1081 Burrard Street, Vancouver, British Columbia, Canada V6Z 1Y6. E-mail: nbansback@cheos.ubc.ca.
Submitted for publication April 24, 2006; accepted in
revised form January 11, 2007.
cost-effectiveness analyses is to value health status in a
single unit of measurement known as utilities, which are
used to derive quality-adjusted life years (QALYs). Instead
of receiving full credit for each year of life, QALYs weight
the impact of morbidity. For example, patients with severe
disability (Health Assessment Questionnaire [HAQ] score
⬎2) may receive credit for living 5 months of good health
for each year they are alive (2). QALYs in cost-effectiveness analyses (known as cost-utility analyses [CUAs]) are
particularly informative for health policy decisions because they allow direct comparison of the efficiency of
health care resource expenditure across a wide variety of
conditions and treatments (3). Utilities are obtained by
asking patients to make judgments or reveal preferences
about changes in particular health states or outcomes.
Preference-based instruments are formal methods for
quantifying these judgments. These instruments fall into 2
groups: direct measures such as a standard gamble (SG) or
time trade-off (TTO) questionnaire, or indirect measures
where a generic instrument (such as the EuroQol [EQ-5D]
or Health Utilities Index) has previously been populated
963
964
with preference values from general population samples
(1). Utilities obtained by indirect methods are recommended by the US Panel on Cost-Effectiveness in Health
and Medicine and the Outcome Measures in Rheumatology Clinical Trials (OMERACT) Consensus-Based Reference Case for Economic Evaluation in Rheumatoid Arthritis (3,4).
Many clinical studies do not use a preference-based
measure due to lack of resources or time, or because the
commonly used generic preference-based measures are regarded as unsuitable for the condition (5). In a majority of
rheumatoid arthritis (RA) clinical trials, the HAQ is the
primary and often sole measure of quality of life (6). Although the HAQ was primarily designed to measure only
aspects of physical function and pain, it has been shown to
be highly correlated with many generic and disease-specific measures of health-related quality of life (7). Subsequently, linear transformations between the HAQ and utility have previously been used in CUA (8,9). While other
disease-specific measures such as the Rheumatoid Arthritis Quality of Life questionnaire have been developed,
only more recent clinical trials have used a preferencebased measure (10).
As a result, the results of many clinical studies are not
amenable to populating CUA. Because new programs and
treatments in RA are competing alongside other disease
areas for funding, it is important for the rheumatology
community to be able to demonstrate the value of their
interventions to policy makers. Estimating a relationship
between the HAQ and a preference-based measure would
make it possible to estimate QALY scores from existing
clinical data where the HAQ has been measured but preference-based instruments have not (5,11). Moreover, in
trials where one such preference-based instrument has
been measured, it could also be possible to evaluate another. Such analyses have previously been attempted for
outcomes in asthma and obesity (11,12). In the present
study, we used data from the UK and Canada to map 2
preference-based instruments, the EQ-5D and the SF-6D,
from the HAQ questionnaire. We went on to demonstrate
how the results can be used in practice.
MATERIALS AND METHODS
Instruments. Health Assessment Questionnaire. The
HAQ is a self-completed questionnaire, developed as a
comprehensive measure of outcome in patients with a
wide variety of rheumatic diseases, including RA, osteoarthritis, juvenile RA, lupus, scleroderma, ankylosing
spondylitis, fibromyalgia, and psoriatic arthritis. Although
the complete form of the HAQ includes an assessment of
mortality, disability, pain and symptom levels, drug side
effects, and resource utilization, most studies in practice
only use the physical disability scale. This scale assesses
upper and lower limb function in relation to the degree of
difficulty encountered in performing daily living tasks,
which include walking, dressing, bathing, and shopping.
The HAQ contains 20 items distributed across 8 components. The scores range from 0 (without any difficulty) to 3
Bansback et al
(unable to do). The highest score on any item within 1
component represents the dimension score. The respondent also indicates whether he or she uses aids or devices
(14 items) or help from other individuals (8 items), totaling
42 individual items. The scores for each dimension are
corrected for the use of aids or devices, summated, and
transformed to give an overall disability index (DI) score
between 0 and 3. A score of 0 represents no disability and
3 represents very severe, high-dependency disability (6).
EQ-5D. The EQ-5D has 5 dimensions: mobility, selfcare, usual activities, pain/discomfort, and anxiety/depression. Each dimension has 1 item, and each item has 3
levels with 1 denoting no problems and 3 denoting extreme problems (13). The number of theoretically possible
health states is 35 ⫽ 243. The EQ-5D can be reported in
terms of a 5-digit profile indicating the level on each dimension, or in terms of a preference-based single index
number. The latter is obtained by applying algorithms that
link the 5-digit health state description with average valuations obtained from members of the public using the
TTO method or a visual analog scale. In this study, EQ-5D
indices were obtained using the so-called Measurement
and Valuation of Health (MVH A1) value set, derived from
a population survey in the UK using 10-year TTOs (14).
SF-6D. The SF-6D was derived from the Short Form 36
(SF-36) (15). The SF-36 is a generic measure of health that
generates scores across 8 dimensions of health (16). It has
become one of the most widely used generic measures of
health throughout the world, but it was not originally
designed for use in economic evaluation. A research team
at the University of Sheffield in collaboration with Dr.
John Ware estimated a preference-based single index measure of health from the SF-36 (15). The index is estimated
via a health state classification called the SF-6D derived
from the SF-36 and is composed of 6 multilevel dimensions of health. It was constructed from a sample of 11
items selected from the SF-36 to minimize the loss of
descriptive information and defines 18,000 health states. A
selection of 249 states defined by the SF-6D has been
valued by a representative sample of the UK general population (n ⫽ 611) using the SG valuation technique. Like
the EQ-5D, regression models were estimated to predict
single index scores for all health states defined by the
SF-6D. The resultant algorithm can be used to convert
SF-36 data at the individual level to a preference-based
index.
Study populations. Participants from 2 locations were
recruited. In Vancouver, Canada, 319 patients from 8 private rheumatology offices with a clinical diagnosis of RA
were followed up quarterly between October 2001 and
September 2002 during 3 periods. In Maidstone Hospital,
UK, 151 patients with a clinical diagnosis of RA from the
department of rheumatology who were under routine
treatment were assessed in 2001. All patients self-administered the HAQ, the SF-36, and the EQ-5D, in no particular order, at each clinic visit. We recruited 2 samples in
order to generate an algorithm more generalizable to external populations.
Estimating a Preference-Based Single Index From the HAQ
Statistical analysis. For the primary analysis, the relationships between scores on the EQ-5D, SF-6D, and HAQ
DI were examined by fitting linear regression models estimated by generalized estimating equation algorithms
where the correlation matrix takes the structure of an
autoregressive of order 1. We evaluated 5 different regression models. Model 1 regressed only the HAQ DI onto the
EQ-5D and SF-6D. Model 2 used all 8 domain scores,
treating each as a continuous variable. Model 3 incorporated all 42 items of the HAQ (the 20 items that make up
the domain scores along with the 22 questions surrounding aids or devices or help from other individuals), treating
each as a continuous variable. Model 4 was the same as
model 2 but treated each domain as a categorical variable
with 4 levels, whereas model 5 was the same as model 3
but treated each item as a categorical variable. Each successive model required fewer assumptions surrounding
items and response choices between intervals carrying
equal weight, but also increased the chances of incorporating arbitrary associations.
The significance or sign of the beta coefficients was not
of primary interest in this exercise given that we were
interested in predictive ability rather than explanatory
power of the variables. Because most data sets will collect
all items of the HAQ, all coefficients were included in the
final models of 1, 2, and 3. This was not practical for
models 4 and 5 due to the large number of dummy variables in models 4 and 5. Instead, models 4 and 5 were
developed using a backwards stepwise selection procedure, systematically removing the least significant variable
until only significant variables remained (P ⬍ 0.05).
The criterion for judging the performance of each model
is the difference between observed and predicted outcomes as reported in terms of the root mean square error
(RMSE) (11). Although there are a number of alternative
measures for accuracy of prediction (e.g., mean absolute
error or intraclass correlation), because the RMSE favors
prediction models that do not produce particularly large
errors, it was considered to be the most indicative measure
given that the objective of the analyses was to predict the
mean EQ-5D and SF-6D scores for a cohort based on the
individual HAQ DI scores, and not to predict individual
scores or look for explanatory relationships (12). The goodness of fit for each model was also reported in terms of the
marginal R2, which accounts for the multiple observations
from individuals (17).
Residual plots were examined for nonlinear patterns
and nonconstant error variance. Three-fold cross validation was then used to evaluate models. Data were randomly split into 3 subsets stratified by country. Of the 3
subsets, 2 subsets were used as training data and the remaining subset was retained as the validation data for
testing the model. The process was then repeated 3 times
so that each of the 3 subsets was used as the validation
data exactly once. The root of the summation of mean
squared errors over the 3 validations was then compared
among models. The predictive performance in the UK and
Canadian samples was also assessed. The generalizability
of the final models to alternative patient populations was
also examined by including the covariates age, sex, RA
duration, tender joint count, and swollen joint count into
965
the multivariate regression to determine whether they
were important additional predictors.
RESULTS
Patient demographics. At baseline, patients in the Canadian cohort were slightly older (61 years versus 56
years; P ⬍ 0.001) and a greater percentage of patients were
women (78% versus 67%; P ⬍ 0.01) (Table 1). The mean
HAQ score in the UK patients was substantially higher
(1.41 versus 1.11; P ⬍ 0.01). This was reflected in both the
EQ-5D scores where UK patients had a statistically significant different mean score of 0.51 versus 0.63 in the Canadian patients, and in the SF-6D where UK patients had a
mean score of 0.62 versus 0.68 in the Canadian sample.
These scores compare with age- and sex-adjusted general
population values of 0.79 and 0.77 for the EQ-5D (UK and
Canadian, respectively) and 0.78 and 0.77 for the SF-6D
(UK and Canadian, respectively) (18). When the complete
HAQ DI and either the SF-6D or EQ-5D were available,
they were included in the analysis; otherwise the record
was excluded. In total, 131 records were included from the
UK cohort, and 308, 258, and 226 records were included
from the Canadian cohort at baseline, 3 months, and 6
months, respectively.
Prediction models. Each of the candidate models was
evaluated and those with the smallest RMSE in the crossvalidation analysis were chosen as the optimal prediction
models. The coefficients from the combined data source
for the optimal models are shown in Table 2. Regardless of
which model was used, elements of arising, eating, walking, hygiene, and grip were consistent statistically significant predictors of both health utility measures. All coefficients were negative except for hygiene. Examination of
the residual plots (Figure 1) suggested relatively linear
models with constant error variance.
Models 2 and 4 were equally the best performing models
for predictions of the SF-6D, with RMSEs equal to 0.09.
Model 2 regressed the SF-6D indices onto the 8 HAQ DI
dimension scores, with each dimension treated as a continuous variable. This assumes that the 42 items of the
HAQ DI carry equal weight within a given domain and the
intervals between response choices for each item are
equal. Model 4 was less restrictive by entering each level
of the domain as a dummy variable with level 1 as the
baseline (i.e., 3 ⫻ 8 dummy codes representing the 4
possible responses for each dimension), allowing each dimension to have ordinal properties. In the final model, 13
of the 24 variables were included in the SF-6D (Table 2).
Both models had marginal R2 values ⬎0.5.
The model with the most covariates (model 5) was considered the optimal model for the EQ-5D, with an RMSE
equal to 0.18. In model 5, the EQ-5D indices were regressed on the individual levels of the HAQ DI item scores,
where each level was entered as a dummy variable with
level 1 as the baseline (i.e., 3 ⫻ 20 dummy codes representing the 4 possible responses for each item of the 8
domains, and 1 ⫻ 22 dummy codes representing the dichotomous parameters). This model made the least strin-
966
Bansback et al
Table 1. Summary statistics of baseline characteristics in the 2 cohorts*
UK (n ⴝ 131)
Female sex, %
67
Age, years
55.98 ⫾ 13.68 (17–82)
RA duration, years
‡
Tender joint count
‡
Swollen joint count
‡
HAQ disability
Number
131
Index
1.41 ⫾ 0.80 (0–3)
Domains, modal level (% of total) (range)
Dressing and grooming
2 (35) (0–3)
Rising
1 (41) (0–3)
Eating
1 (35) (0–3)
Walking
2 (41) (0–3)
Hygiene
2 (43) (0–3)
Reach
3 (30) (0–3)
Grip
2 (57) (0–3)
Activities
2 (35) (0–3)
SF-6D
Number
129
Index
0.62 ⫾ 0.11 (0.27–0.92)
Domains, modal level (% of total) (range)
Physical functioning
4 (31) (1–6)
Role limitation
4 (46) (1–4)
Social functioning
3 (36) (1–5)
Pain
5 (33) (1–6)
Mental health
3 (36) (1–5)
Energy and vitality
5 (34) (1–5)
EQ-5D
Number
131
Index
0.51 ⫾ 0.31 (⫺0.35–1)
Domains, modal level (% of total) (range)
Mobility
2 (78) (1–2)
Self-care
1 (52) (1–3)
Usual activities
2 (71) (1–3)
Pain
2 (77) (1–3)
Anxiety
1 (52) (1–3)
Canada (n ⴝ 308)
Total (n ⴝ 439)
P†
78
61.35 ⫾ 13.71 (19–90)
13.98 ⫾ 11.64 (0–57)
15.01 ⫾ 12.08 (0–52)
9.13 ⫾ 9.66 (0–43)
76
60.76 ⫾ 13.61 (17–90)
‡
‡
‡
0.01
⬍ 0.01
–
–
–
308
1.11 ⫾ 0.77 (0–3)
439
1.15 ⫾ 0.78 (0–3)
⬍ 0.01
0 (46) (0–3)
0 (54) (0–3)
0 (40) (0–3)
0 (45) (0–3)
3 (30) (0–3)
0 (31) (0–3)
2 (61) (0–3)
2 (28) (0–3)
0 (39) (0–3)
0 (44) (0–3)
0 (35) (0–3)
0 (41) (0–3)
0 (31) (0–3)
2 (30) (0–3)
2 (59) (0–3)
2 (30) (0–3)
⬍ 0.01
⬍ 0.01
⬍ 0.01
0.02
⬍ 0.01
⬍ 0.01
0.01
0.47
302
0.68 ⫾ 0.13 (0.26–1)
431
0.68 ⫾ 0.13 (0.26–1)
⬍ 0.01
5 (30) (1–6)
2 (63) (1–4)
3 (43) (1–5)
4 (27) (1–6)
2 (41) (1–5)
3 (35) (1–5)
5 (28) (1–6)
2 (54) (1–4)
3 (40) (1–5)
4 (27) (1–6)
2 (38) (1–5)
3 (33) (1–5)
⬍ 0.01
⬍ 0.01
⬍ 0.01
⬍ 0.01
0.01
⬍ 0.01
308
0.63 ⫾ 0.25 (⫺0.48–1)
2 (62) (1–3)
1 (71) (1–3)
2 (66) (1–3)
2 (79) (1–3)
1 (64) (1–3)
439
0.62 ⫾ 0.27 (⫺0.48–1) ⬍ 0.01
2 (66) (1–3)
1 (65) (1–3)
2 (63) (1–3)
2 (79) (1–3)
1 (60) (1–3)
⬍ 0.01
⬍ 0.01
⬍ 0.01
0.01
⬍ 0.01
* Values are the mean ⫾ SD (range) unless otherwise indicated. RA ⫽ rheumatoid arthritis; HAQ ⫽ Health Assessment Questionnaire; EQ-5D ⫽
EuroQol.
† Ordinal data compared using independent sample t-tests, categorical data compared using chi-square test.
‡ Missing data.
gent assumptions and did not assume that the response
choices have ordinal properties (Table 2). Again, the marginal R2 value of the model was ⬎0.5.
While the RMSE can be used to choose which of the
candidate models performs best, no definition exists of
what level of RMSE should be considered acceptable for
fitting purposes. Figure 2 demonstrates that across the
range of the HAQ DI, the optimal model predictions for
both the EQ-5D and SF-6D were close to that observed.
Only in the first group (HAQ 0 – 0.5) was the prediction
significantly different from the actual utility (P ⬍ 0.01).
Even in the higher HAQ groups where there were fewer
patients, the predictions appeared to be robust.
Generalizability. We attempted to assess the generalizability of the prediction models by including other characteristics of the study populations. We found that the
Canadian population had a small but significantly higher
estimated utility score compared with the UK cohort,
above what was explained by the HAQ DI (B ⫽ 0.06 for the
EQ-5D and B ⫽ 0.04 for the SF-6D, P ⬍ 0.05). However,
because the estimated effect of HAQ elements was not
changed when a country was added to the models as a
covariate, an estimated utility gain using these algorithms
would not be affected by which country patients in the
population were from. Of the other clinical variables examined in the Canadian baseline data, none were statistically significant for the EQ-5D, whereas only the number
of tender joints was found to be a significant predictor for
the SF-6D (B ⫽ ⫺0.0016, P ⬍ 0.05). However, the inclusion
of clinical variables did not improve the predictive performance of the final models for either the EQ-5D or the
SF-6D.
Application. A simple example of how to use the
algorithms is given in Figure 3 (a downloadable Excel
sheet is available at http://www.pharmacoeconomics.ubc.
ca/downloads.html). For each patient in each strategy, the
Estimating a Preference-Based Single Index From the HAQ
967
Table 2. Optimal regression equations for the SF-6D (models 2 and 4) and EQ-5D (model 5)*
Domain/item
Model 2
SF-6D
Dressing and grooming
Arising
Eating
Walking
Hygiene
Reach
Grip
Activities
Constant
Model 4
SF-6D
Arising ⫽ 1
Arising ⫽ 2
Arising ⫽ 3
Eating ⫽ 1
Eating ⫽ 2
Eating ⫽ 3
Walking ⫽ 2
Walking ⫽ 3
Hygiene ⫽ 1
Reach ⫽ 1
Reach ⫽ 2
Reach ⫽ 3
Grip ⫽ 2
Constant
Model 5
EQ-5D
Dressing and grooming
H1 ⫽ 2
Arising
H4 ⫽ 1
H4 ⫽ 2
H4 ⫽ 3
Eating
H6 ⫽ 2
H7 ⫽ 1
H7 ⫽ 2
Walking
H8 ⫽ 2
H9 ⫽ 3
Aids or devices
H13 ⫽ 2
H16 ⫽ 1
Hygiene
H23 ⫽ 1
H24 ⫽ 1
H24 ⫽ 2
Reach
H26 ⫽ 2
H26 ⫽ 3
Grip
H27 ⫽ 2
H27 ⫽ 3
H28 ⫽ 3
RMSE
develop
RMSE cross
validation
RMSE
Canada
RMSE
UK
Marginal
R2
0.09
⬍ 0.01
⬍ 0.01
⬍ 0.01
0.07
0.01
⬍ 0.01
⬍ 0.01
⬍ 0.01
0.089
0.085
0.082
0.099
0.50
0.01
0.01
0.04
0.01
0.01
0.02
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
⬍ 0.01
⬍ 0.01
⬍ 0.01
⬍ 0.01
⬍ 0.01
⬍ 0.01
⬍ 0.01
⬍ 0.01
⬍ 0.01
0.01
0.02
⬍ 0.01
⬍ 0.01
⬍ 0.01
0.089
0.084
0.081
0.099
0.51
⫺0.15
0.04
⬍ 0.01
0.183
0.178
0.161
0.241
0.57
⫺0.08
⫺0.12
⫺0.59
0.02
0.05
0.08
⬍ 0.01
0.02
⬍ 0.01
⫺0.15
⫺0.04
⫺0.08
0.05
0.02
0.03
0.01
0.02
0.01
⫺0.10
0.12
0.04
0.05
0.03
0.02
⫺0.14
0.07
0.04
0.03
⬍ 0.01
0.01
⫺0.05
⫺0.05
⫺0.11
0.02
0.02
0.04
⬍ 0.01
0.01
⬍ 0.01
⫺0.14
⫺0.13
0.04
0.06
⬍ 0.01
0.03
⫺0.08
⫺0.20
0.04
0.07
0.04
⬍ 0.01
B
SE
P
⫺0.01
⫺0.03
⫺0.02
⫺0.01
0.01
⫺0.01
⫺0.01
⫺0.02
0.79
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.01
⫺0.03
⫺0.05
⫺0.11
⫺0.02
⫺0.04
⫺0.06
⫺0.02
⫺0.07
⫺0.02
⫺0.02
⫺0.02
⫺0.04
⫺0.02
0.78
(continued)
HAQ DI should be translated to either the EQ-5D or SF-6D
at pre- and postintervention. The average utility can then
be developed for each time interval. A way of calculating
a QALY would be to then calculate an appropriate area
under the curve (e.g., [pre-utility ⫹ post-utility] / 2 ⫻
elapsed time [in years]) and multiply by the annual sur-
968
Bansback et al
Table 2. Optimal regression equations for the SF-6D (models 2 and 4) and EQ-5D (model 5)* (Continued)
Domain/item
Activities
H30 ⫽ 1
H31 ⫽ 1
H31 ⫽ 2
H32 ⫽ 3
Constant
B
SE
P
⫺0.05
⫺0.07
⫺0.08
⫺0.09
0.80
0.02
0.02
0.04
0.03
0.01
⬍ 0.01
⬍ 0.01
0.03
⬍ 0.01
⬍ 0.01
RMSE
develop
RMSE cross
validation
RMSE
Canada
RMSE
UK
Marginal
R2
* RMSE ⫽ root mean square error; H1 ⫽ dress yourself, including tying shoelaces and doing buttons; H4 ⫽ get in and out of bed; H6 ⫽ lift a full cup
or glass to your mouth; H7 ⫽ open a new milk carton; H8 ⫽ walk outdoors on flat ground; H9 ⫽ climb up 5 steps; H13 ⫽ wheelchair; H16 ⫽ chair; H23 ⫽
take a tub bath; H24 ⫽ get on and off the toilet; H26 ⫽ bend down to pick up clothing from the floor; H27 ⫽ open car doors; H28 ⫽ open jars that have
been previously opened; H30 ⫽ run errands and shop; H31 ⫽ get in and out of a car; H32 ⫽ do chores such as vacuuming or yardwork.
vival percentage. This describes a simple method for developing just one component in a cost-effectiveness
model. The incorporation of costs, extrapolation of costs
and benefits, and a comparison between at least 2 strategies are just a few additional requirements before an incremental cost-effectiveness ratio can be derived (1).
An example of how the models would predict utility
gains is given in Figure 4. We divided the patients in the
Canadian sample into responders and nonresponders
based on whether they had an improvement in HAQ DI by
what is defined as a minimum important difference equal
to 0.25 (19). The difference between estimated and observed mean utility gains was small (0.13 versus 0.10 for
the EQ-5D and 0.05 versus 0.06 for the actual and observed
gains, respectively).
Figure 1. Predicted versus actual EuroQol (EQ-5D; model 5) and
SF-6D (model 4) scores.
DISCUSSION
We anticipated that models with more available predictors
would account for a higher proportion of the variance and
would therefore perform better as measured by the RMSE.
Although this hypothesis was accurate for the EQ-5D
where model 5 proved to be the best performing, it was not
the case for the SF-6D. The results from the cross validation are conceivably the most important because they predict how generalizable the models will be to external populations. From this we found model 5 to be the most
appropriate model for estimating the EQ-5D, whereas
model 2 or model 4 was the most appropriate for the
SF-6D. The performance of models for the SF-6D always
Figure 2. Predicted and actual EuroQol (EQ-5D; model 5) and
SF-6D (model 4) scores and confidence intervals across Health
Assessment Questionnaire (HAQ) groups for all observations.
Estimating a Preference-Based Single Index From the HAQ
Figure 3. Example of calculation required for estimating a preference-based index. HAQ ⫽ Health Assessment Questionnaire.
outperformed models for the EQ-5D (e.g., lower RMSEs)
due to the smaller scale range of the SF-6D. Although the
benefits of using the later models versus the simple estimate in model 1 would seem small in terms of the improvement in RMSEs, overall these models will provide
more accurate estimates, partly due to their ability to account for the small nonlinearity seen in the relationship
between the HAQ and utility, particularly at severe states
of disability (Figure 2).
Figure 4. Example of the algorithm’s performance for predicting
change in utility for patients in the Canadian cohort achieving a
minimally important difference in Health Assessment Questionnaire (HAQ) score (or not) from 0 to 6 months (model 5 for
EuroQol [EQ-5D] and model 4 for SF-6D).
969
There are a number of important issues that need further
consideration, the first being whether these results would
be generalizable to external populations. To address this
issue, we developed the models using data from 2 different
sources, one from the UK and the other from Canada.
Patients in the Canadian data set were older but had less
severe RA. Patient heterogeneity within the cohorts is
important because it means the models can be used for
estimation across a wider range of patients. The country
effect that was discovered appears not to be due to age
because the Canadian population was older, but could
have been due to characteristics not measured in our cohorts. The models were tested on both the UK and Canadian samples. The RMSEs were always higher for the UK
population because the models were developed based on
more observations from the Canadian sample. We also
found that including some additional clinical variables
did not improve the predictions, further suggesting that
the algorithms should be as applicable in patients with
only a few joints involved as in persons with multiple joint
involvement. Although the populations in our sample
have similar disease characteristics to patients in many
studies recently published (20,21), external validation
would add assurance to the results, particularly in patients
with more mild and severe disease.
Second, it has been argued that the HAQ DI does not
adequately measure aspects of quality of life, measured by
the preference-based instruments such as mental health
and pain (22). We did not have sufficient data to examine
the additive influence of other components in the HAQ
questionnaire, such as the pain score. Nevertheless, the
models demonstrate that the HAQ DI does explain much of
the preference-based measures we have studied, with relatively small RMSEs. Perhaps such aspects of quality of
life such as pain are highly correlated to domains and
therefore are indirectly covered. Such complex interactions might be the reason for the positive correlation between worsening hygiene and improvement in health utility. The purpose of this study was not to explain why there
is a relationship between the 2 measures, but rather to
explore if there is a translation between the 2 measures.
Importantly, the method described in this report is not
designed and would not accurately predict the utility of an
individual but rather would only predict the average utility of a cohort. In this respect the models seem to perform
well (Figures 2 and 4).
Conversely, it is plausible that aspects of RA captured by
the HAQ DI might not be covered in the preference-based
measures. Concerns about the EQ-5D and SF-6D in patients with RA have previously been demonstrated (23).
The purpose of this report is not to make claims on the
superiority or defects of different preference-based measures, but to give researchers a method of estimating what
are now frequently used instruments.
Last, this exercise provides a method that will always be
suboptimal in comparison with a trial that uses a preference-based questionnaire directly. Given the objectives of
the study, there are other approaches that could be used to
derive a single index from the HAQ DI. A survey of the
general population could be used to value a sample of
states defined by the HAQ DI using a preference-elicitation
970
Bansback et al
technique such as SG or TTO. This would not only generate an enormous number of health states but more importantly each state would contain 42 pieces of information,
which most respondents would find impossible to process.
Instead, a selection of the most important items of the
HAQ DI could be selected, similar to how the SF-6D uses
only 15 questions from the SF-36. Another approach is to
administer the HAQ alongside a preference-elicitation
technique such as TTO and SG. Regression techniques
could then estimate preference weights for each of the
items of the HAQ DI using the SG or TTO response as the
dependent variable. However, results from such a study
would not meet the reference case for either the National
Institute for Health and Clinical Excellence or the Washington Panel on Cost Effectiveness in Medicine who prefer
social preferences elicited using a choice-based method
(3,24). This exercise could act as a precursor to such studies, but given limited resources, we have undertaken a
more pragmatic approach.
Much of this report has concentrated on studies in
which no preference-based measure has been administered. Given that the SF-6D does not perform well in
patients with severe RA due to a floor effect, there is a
potential use when only 1 preference-based questionnaire
is administered (21,25). This is the case in the British
Society for Rheumatology Biologics Registry, which measured only the SF-36 (26). The algorithms in this study
allowed an estimate of EQ-5D utility to also be calculated
(27).
The approach examined in this article is intended to
empirically map the relationship between a non–preference-based health-related quality of life instrument and a
preference-based measure. This approach has the advantage of being able to utilize existing valuation data and
offers a shortcut for researchers who need health-state
utility values, but have not used a preference-based measure in their clinical study because of resource constraints
or a desire to limit the patient burden. This could be used
to estimate the improvement in utility in important trials
such as the Anti–Tumor Necrosis Factor Trial in Rheumatoid Arthritis with Concomitant Therapy (ATTRACT) trial
of infliximab or the Trial of Etanercept and Methotrexate
with Radiographic Patient Outcomes (TEMPO) where no
preference-weighted instrument was used (18,19). The results presented here suggest that such a model can be
useful in predicting preference-based values and that the
developed models have reasonable predictive ability.
AUTHOR CONTRIBUTIONS
Mr. Bansback had full access to all of the data in the study and
takes responsibility for the integrity of the data and the accuracy
of the data analysis.
Study design. Bansback, Marra, Tsuchiya, Anis, Brazier.
Acquisition of data. Bansback, Marra, Anis, Hammond.
Analysis and interpretation of data. Bansback, Marra, Tsuchiya,
Anis, Guh, Brazier.
Manuscript preparation. Bansback, Marra, Tsuchiya, Anis, Guh,
Hammond, Brazier.
Statistical analysis. Bansback, Guh.
REFERENCES
1. Drummond MF, Stoddart GL, Torrance GW. Methods for the
economic evaluation of health care programmes. 2nd ed.
Oxford: Oxford Medical Publications; 1987.
2. Wong JB. Cost-effectiveness of anti-tumor necrosis factor
agents [review]. Clin Exp Rheumatol 2004;22:S65–70.
3. Gold MR, Siegel JE, Russell LB, Weinstein MC. Cost-effectiveness in health and medicine. Oxford: Oxford University Press;
1996.
4. Maetzel A, Tugwell P, Boers M, Guillemin F, Coyle D, Drummond M, et al. Economic evaluation of programs or interventions in the management of rheumatoid arthritis: defining a
consensus-based reference case. J Rheumatol 2003;30:891– 6.
5. Brazier J, Deverill M, Green C, Harper R, Booth A. A review of
the use of health status measures in economic evaluation
[review]. Health Technol Assess 1999;3:i–iv, 1–164.
6. Fries JF, Spitz PW, Young DY. The dimensions of health
outcomes: the Health Assessment Questionnaire, disability
and pain scales. J Rheumatol 1982;9:789 –93.
7. Scott DL, Garrood T. Quality of life measures: use and abuse
[review]. Baillieres Best Pract Res Clin Rheumatol 2000;14:
663– 87.
8. Brennan A, Bansback N, Reynolds A, Conway P. Modelling
the cost-effectiveness of etanercept in adults with rheumatoid
arthritis in the UK. Rheumatology (Oxford) 2004;43:62–72.
9. Barton P, Jobanputra P, Wilson J, Bryan S, Burls A. The use of
modelling to evaluate new drugs for patients with a chronic
condition: the case of antibodies against tumour necrosis factor in rheumatoid arthritis. Health Technol Assess 2004;8:1–
91.
10. Torrance GW, Tugwell P, Amorosi S, Chartash E, Sengupta N.
Improvement in health utility among patients with rheumatoid arthritis treated with adalimumab (a human anti-TNF
monoclonal antibody) plus methotrexate. Rheumatology (Oxford) 2004;43:712– 8.
11. Tsuchiya A, Brazier J, McColl E, Parkin D. Deriving preference-based single indices from non-preference based condition-specific instruments: converting the AQLQ into EQ5D
indices. URL: http://www.shef.ac.uk/content/1/c6/01/87/47/
02_1FT.pdf.
12. Brazier JE, Kolotkin RL, Crosby RD, Williams GR. Estimating
a preference-based single index for the Impact of Weight on
Quality of Life-Lite (IWQOL-Lite) instrument from the SF-6D.
Value Health 2004;7:490 – 8.
13. Brooks R. EuroQol: the current state of play. Health Policy
1996;37:53–72.
14. Dolan P. Modeling valuations for EuroQol health states. Med
Care 1997;35:1095–108.
15. Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF-36. J Health Econ
2002;21:271–92.
16. Ware JE, Snow K, Kosinski M, Gandek B. SF-36 Health
survey: manual and interpretation guide. Boston: The Health
Institute, New England Medical Center; 1993.
17. Zheng B. Summarizing the goodness of fit of generalized
linear models for longitudinal data. Stat Med 2000;19:1265–
75.
18. Hanmer J, Lawrence WF, Anderson JP, Kaplan RM, Fryback
DG. Report of nationally representative values for the noninstitutionalized US adult population for 7 health-related quality-of-life scores. Med Decis Making 2006;26:391– 400. URL:
http://www.ohsu.edu/epc/mdm.
19. Wells GA, Tugwell P, Kraag GR, Baker PR, Groh J, Redelmeier
DA. Minimum important difference between patients with
rheumatoid arthritis: the patient’s perspective. J Rheumatol
1993;20:557– 60.
20. Maini R, St Clair EW, Breedveld F, Furst D, Kalden J, Weisman M, et al, for the ATTRACT Study Group. Infliximab
(chimeric anti-tumour necrosis factor ␣ monoclonal antibody)
versus placebo in rheumatoid arthritis patients receiving concomitant methotrexate: a randomised phase III trial. Lancet
1999;354:1932–9.
Estimating a Preference-Based Single Index From the HAQ
21. Klareskog L, van der Heijde D, de Jager JP, Gough A, Kalden J,
Malaise M, et al. Therapeutic effect of the combination of
etanercept and methotrexate compared with each treatment
alone in patients with rheumatoid arthritis: double-blind randomised controlled trial. Lancet 2004;363:675– 81.
22. Wolfe F, Michaud K. HAQ-based utilities and SF6D systematically overvalue quality of life (QOL) in RA patients with
severe RA, pain and psychological distress [abstract]. Arthritis Rheum 2003;48 Suppl 9:S399.
23. Marra CA, Esdaille JM, Guh D, Kopec JA, Brazier JE, Koehler
BE, et al. A comparison of four indirect methods of assessing
values in rheumatoid arthritis. Med Care 2004;42:1125–31.
24. National Institute for Clinical Excellence. Guide to the methods of technology appraisal. URL: http://www.nice.org.uk/
download.aspx?0⫽201973.
971
25. Brazier J, Roberts J, Tsyuchiya A, Busschbach J. A comparison
of the EQ-5D and SF-6D across seven patient groups. Health
Econ 2004;13:873– 84.
26. Hyrich KL, Symmons DP, Watson KD, Silman AJ, on behalf of
the British Society for Rheumatology Biologics Register. Comparison of the response to infliximab or etanercept monotherapy
with the response to cotherapy with methotrexate or another
disease-modifying antirheumatic drug in patients with rheumatoid arthritis: results from the British Society for Rheumatology
Biologics Registry. Arthritis Rheum 2006;54:1786 –94.
27. Brennan A, Bansback N, Nixon RM, Madan J, Harrison M,
Watson K, et al. Modeling the cost effectiveness of TNF alpha
antagonists in the management of rheumatoid arthritis: results from the British Society for Rheumatology Biologies
Registry. Rheumatology (Oxford). In press.
Документ
Категория
Без категории
Просмотров
5
Размер файла
135 Кб
Теги
base, using, preference, health, patients, estimates, single, arthritis, indices, questionnaire, assessment, rheumatoid
1/--страниц
Пожаловаться на содержимое документа