вход по аккаунту


Comparison of the health assessment questionnaire disability index and the short form 36 physical functioning subscale using Rasch analysisComment on the article by Taylor and McPherson.

код для вставкиСкачать
Arthritis & Rheumatism (Arthritis Care & Research)
Vol. 59, No. 4, April 15, 2008, pp 598 – 602
© 2008, American College of Rheumatology
DOI 10.1002/art.23520
Comparison of the Health Assessment
Questionnaire disability index and the
Short Form 36 physical functioning
subscale using Rasch analysis: comment on
the article by Taylor and McPherson
To the Editors:
In a recent article in Arthritis Care & Research, Taylor
and McPherson (1) compared the Health Assessment
Questionnaire disability index (HAQ DI) and the Short
Form 36 (SF-36) physical functioning subscale (PF) using
Rasch analysis in a small cross-sectional study, suggesting
that the analysis favors the SF-36 PF over the HAQ DI in
psoriatic arthritis (PsA). Studies such as this bring item
response theory approaches to analyses of patient-reported
outcomes. Although this effort is by itself meritorious, it
carries the hazard that relatively unfamiliar terminology
may obscure rather than illuminate. Under some circumstances, Rasch analysis has posed unacceptable threats to
content through trimming of items to a more unidimensional construct, which then lacks face and content validity (2). Some of us would argue that sensitivity to change,
face and content validity, and reliability, not studied by
Taylor and McPherson, are among the most essential attributes of an outcome assessment instrument, and that
item separation, ceiling and floor effects, and differential
item functioning, although not unimportant, are less essential.
Furthermore, the authors’ analyses and interpretations
misunderstand the construction of the HAQ DI, which was
designed to balance content across categories into a single
score, not to be disaggregated into subdimensions (profiles). HAQ DI categories were not designed to be ranked or
separately reported, but to ensure attention to all major
content areas of disability. The PsA patients compared
with the rheumatoid arthritis patients had on average
much better physical functioning (HAQ DI score 0.5 versus
1.23), raising issues of different performance in different
populations. This cross-sectional study cannot get at the
most critical outcome assessment issues nor lead to definitive conclusions.
That being said, there are useful insights here. First, an
unresolved clinical issue with PsA is whether we should
assess only the arthritis or some sum of the skin and the
joint disease. If it is the latter, a health-related quality of
life instrument might perform strongly. Second, where
disability is near the population norm, an instrument designed for more normal populations (SF-36 PF) might perform well. Third, we agree that floor and ceiling effects
have received less attention than warranted.
Figure 1 shows measurement precision, where a standard error of 2.3 corresponds to reliability of Cronbach’s
alpha of 0.95 graphed against theta values, normalized so
Figure 1. Measurement precision (SE). SF-12 ⫽ Short Form 12;
SF-36 ⫽ Short Form 36; HAQ ⫽ Health Assessment Questionnaire; CAT ⫽ computer-adaptive testing; WOMAC ⫽ Western
Ontario and McMaster Universities Osteoarthritis Index.
that 50 represents the average functioning of a normal
population and each set of 10 units represents 1 standard
deviation (3). The best instrument would have the lowest
and broadest curve; the lowest point shows the degree of
physical functioning where information content is maximal and greater breadth reduces floor and ceiling effects.
These data confirm the authors’ belief that the HAQ DI has
its greatest item information content in sicker populations
and the SF-36 PF in more normalized ones, and that floor
effects are more common with the HAQ DI than the SF-36
PF. Most importantly, use of computer-adaptive testing,
where items are dynamically selected for the individual
based upon prior responses, can readily outperform static
instruments using a similar number of items.
The National Institutes of Health Patient-Reported Outcomes Measurement Information System (PROMIS) is approaching these issues with qualitative as well as quantitative item review and calibration, with the best of the
HAQ DI and the SF-36 PF together with other items used
in dynamic (computer-adaptive testing) rather than static
instruments. These instruments will clearly supersede our
present standards. PROMIS item banks are in the public
domain and may be accessed at and
at for the PROMIS HAQ. We have
entered an era of higher performance standards for patientreported outcomes and better outcome measures for studies.
James F. Fries, MD
Bonnie Bruce, DrPH, MPH, RD
Stanford University
Palo Alto, CA
Matthias Rose, MD, PhD
Health Assessment Lab
Waltham, MA
1. Taylor WJ, McPherson KM. Using Rasch analysis to compare
the psychometric properties of the Short Form 36 physical
function score and the Health Assessment Questionnaire dis-
ability index in patients with psoriatic arthritis and rheumatoid arthritis. Arthritis Rheum 2007;57:723–9.
2. Fries JF. New instruments for assessing disability: not quite
ready for prime time [editorial]. Arthritis Rheum 2004;50:
3064 –7.
3. Rose M, Bjorner JB, Becker J, Fries JF, Ware JE. Evaluation of a
preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). J Clin Epidemiol 2008;61:
DOI 10.1002/art.23521
To the Editors:
We are grateful for the interest shown by Fries et al in
our work comparing the SF-36 PF with the HAQ DI in PsA
using the Rasch model. We agree with many of their comments, especially the usefulness of the information function plot that is displayed. This provides a useful insight
actually quantified by the Rasch measurement model: increased precision of the estimate of the attribute is
achieved by including items that are as difficult as the
ability of the sample. In other words, items that are targeted to the sample of interest lead to more precise measurement. We agree that computer-adaptive testing is an
excellent approach to address this. We also agree that
sensitivity to change and other psychometric properties
are crucially important in measurement. However, each of
these has already been comprehensively evaluated in the
measures we addressed.
It is still fundamental that items within a scale fit the
Rasch model for rational computation of scores and appropriate statistical evaluation of the effects of interventions
and the impact of conditions. Summation absolutely demands unidimensionality among other characteristics and
it is not true, as Fries et al seem to assert, that unidimensionality is somehow less important than content validity.
Content validity is essentially a question of targeting as
described above, but unless the items fit the Rasch model,
any apparent targeting and validity in the content of items
as part of a single scale is challenged. We find the emphasis that the HAQ DI is supposed to be an aggregated score
reflecting functional difficulties across major disability
areas to be irrelevant. Aggregated scores are the product of
all multiple-item instruments, but if any item happens to
behave very differently than the others, then such aggregation leads to scores that have little useful meaning. Improving such items, moving them to a separate scale if they
seem fundamentally important and meaningful, or at times
removing them altogether, improves the usefulness of the
scale. However, this is not suggesting unthinking or arbitrary discarding of items. It also seems to us that there is
some inconsistency with an argument in favor of computer-adaptive testing and a lamenting of trimming items to
improve the measurement properties of the instrument.
The ultimate aim of computer-adaptive testing is to
achieve precise measurement with as few items as possible, clearly an approach that utilizes trimming of items. In
the absence of such technology in the clinic it seems
reasonable to try and achieve appropriately targeted and
psychometrically sound static instruments by reexamination of item behavior and revision of the instrument if
We agree that the language of Rasch analysis remains for
many somewhat mysterious and obscure, but the science
is very definitely robust. Rasch analysis may be a relatively
new approach to considering measurement in clinical
practice. However, it can be very illuminating, by highlighting scales that are not actually scales and measures
that are rather short on fundamental measurement properties. Given that it is not that long since responsiveness,
minimally important clinical difference, and the merits of
agreement as opposed to correlation became part of the
modern measurement lexicon, we argue that we should
persist with Rasch analysis, use it wisely, and question
assumptions accordingly.
W. J. Taylor, MBChB, PhD
Wellington School of Medicine and Health Sciences,
University of Otago
Wellington, New Zealand
K. M. McPherson, PhD
Auckland University of Technology
Auckland, New Zealand
DOI 10.1002/art.23519
Early aggressive care and symptomatic
recovery from whiplash: comment on the
article by Côté et al
To the Editors:
I am writing in response to an article recently published
by Côté et al in Arthritis Care & Research (1). The title,
Early aggressive care and delayed recovery from whiplash:
isolated finding or reproducible result? would suggest that
the authors were able to determine that “early aggressive
care,” a term that was not operationally defined, in some
way delayed patient recovery in this cohort. This category
of “aggressive care” indicated only that the patients were
seen by both general practitioners and chiropractors.
Since, according to the authors, the group treated with
aggressive care had more serious whiplash injuries, this
does not seem surprising.
In their conclusion, the authors inform the reader that
“combining chiropractic and general practitioner care appears to confer no benefit to patients” (1). This conclusion,
however, is based entirely on claim duration, which was
defined as the time from the injury to the time of claim
closure. No objective measure of “benefits” was actually
collected. Moreover, claim closure is an administrative
function of the insurer. Owing to the retrospective nature
of this study and the artificial reference frame (claim closure) used for a proxy for recovery, the analysis of data and
conclusions drawn from it are potentially skewed. This
proxy was justified by the authors by referring to a previous study in which they found that claim closure roughly
coincided with lower (but not absent) neck pain. In addition to the information bias introduced by misclassifying
Без категории
Размер файла
50 Кб
physical, using, short, forma, articles, disability, questionnaire, index, comparison, taylor, health, mcpherson, rasch, subscale, assessment, functioning, analysiscomment
Пожаловаться на содержимое документа