close

Вход

Забыли?

вход по аккаунту

?

890

код для вставкиСкачать
Human Brain Mapping 423-46(1996)
6
Reliability of PET Activation Across Statistical
Methods, Subject Groups, and Sample Sizes
T.J.Grabowski, R.J. Frank, C.K. Brown, H. Damasio,
L.L. Boles Ponto ,G.L. Watkins, and R.D. Hichwa
Department of Neurology, Division of Behavioral Neurology and Cognitive Neuroscience (T.J.G., R.J.F.,
H. D.); Department of Preventive Medicine aizd Enz7ironmental Health (C.K.B.); nnd Positron Emission
Tomography Imaging Cenfer (L.L.H.P., G.L. W., R.D.H.), University of Iowa, Iowa City,Iowa 52242
Abstract: Four pixel-based methods for estimating regional activation in positron emission tomography
(PET) images were implemented so as to allow the comparison of their performances in the same dataset.
Change distribution analysis, Worsley's method, a pixelwise general linear model, a nonparametric
method, and several methods derived from them were investigated. Important technical factors,
including the degree of smoothing, stereotactic transform, coregistration algorithm, search volume, and
the volumetric alpha level, were held constant. The dataset, which was obtained with a verb generation
paradigm, was large enough to permit assessment of concordance between independent samples of
conventional size, as well assessment of within-cohort replicability. (Eighteen normal subjects performed
four GENERATE-READ pairs each.) Same-task (noise) images were also analyzed.
In noise datasets, type I errors (false positives) occurred at the nominal rate (in 5% of datasets). Detected
regions of activation were highly likely to be internally replicated (93%).The detected activations were a
superset of activations previously reported using the same paradigm. The methods were chiefly
distinguished by type 11 error rates and by the stability of the location of activation clusters. Those
methods dependent on local variance estimates were less powerful with small sample sizes and less stable
with respect to the attributed location of task-induced changes. The use of pooled variance (Worsley's
method) reduced these problems, but variance was not stationary. Overall, the power of all analyses was
modest with samples of conventional size (nine subjects x one or two task-pairs). Modeling of the sources
of variance, particularly improvement of anatomical standardization, is likely to improve the power of
pixel-based analyses. c. 1996 wiiey-Liss, h e .
Key words: functional imaging, regional cerebral blood flow, language, human brain, magnetic resonance
imaging, statistical parametric mapping, change distribution analysis, nonparametric sta tistics, rerandomization, replication
t
7
INTRODUCTION
The techniques that permit in vivo experimental
analysis of the physiological correlates of cognition
Received for publication June 20,1995; revision accepted September
29, 1995.
Address reprint requests to Thomas J. Grabowski, M.D., Department
of Neurology, University of Iowa, 200 Hawkins Drive, Iowa City, I A
52242-1053.
0 1996 Wiley-Liss, Inc.
have been applied to numerous problems in cognitive
neuroscience over the last decade. Progress in functional neuroimaging was greatly accelerated by the
advent of computerized image analysis, especially the
application of "pixel-based' methods of statistical
analysis to activation images generated by positron
emission tomography (PET). These methods, in which
the fundamental unit of analysis is the pixel (picture
element) rather than a region of intercst (ROI), were
attractive because they did not require a priori anatomi-
4
Grabowski et al. 4
cal decisions and, perhaps most importantly, because
they offered an automated approach that could be
delegated to computers. Additional incentive came
from the required stereotactic transformation, which
evolved into a standard notation for the communication of results between centers. A considerable body of
data has been obtained with these methods, and more
data are continuously being reported.
In general, reports have been compatible with observations previously made in subjects with acquired
brain lesions, and, in some cases, PET results have
been cross-replicated. On the other hand, as activation
studies with similar paradigms dre undertaken in
different centers, numerous discrepancies in the results require explanation [e.g., see Petersen et al., 1988,
vs. Raichle et al., 1992; Pardo et al., 1990, vs. Bench et
al., 1993; and Petersen et al., 1990, vs. Howard et al.,
1992, and Price et al., 19941. Rather than calling into
question the validity of pixel-based techniques, these
discrepancies may simply be calling attention to the
sensitivity of these methods to either technical image
analysis factors or subtle differences in paradigm
design. Be that as it may, the field of functional
imaging has reached a stage where it is necessary to
reconcile brain activation maps that have been generated using different hardware, different technical
methods, and different paradigms. This need provides
an opportunity to investigate and understand the
effect of technical factors on PET image analysis, and
to use such knowledge to evaluate the extant literature on functional brain activation, inform the design
of new studies, and improve the performance of
existing techniques.
A number of pixel-based approaches to PET image
analysis have been proposed [e.g., Fox et al., 1988;
Friston et al., 1991a; Worsley et al., 1992; Roland et al.,
1993; Poline and Mazoyer, 1993; Holmes et al., 19951.
These involve complex manipulations of PET data. An
understanding of these manipulations is facilitated by
considering them in the following general framework
1) acquisition and tomographic reconstruction; 2)
anatomical standardization; 3) noise reduction; 4) test
statistic generation; and 5) inference. From the point
of view of digtal image analysis methods, methodological differences in any of these categories might
significantly affect the results of activation studies. The
following survey of procedural differences is meant to
gwe an idea of the complexity of the analyses to which
PET data are submitted and the consequent difficulty
with which data from different centers might be
compared.
+ 24 +
1. Acquisition and tomographic reconstruction of
data. Some tomographs have limited Z-axis sampling, and some have virtually isotropic resolution. Counting statistics vary as a function of
dose, tomograph sensitivity, and other technical
factors. Some centers calculate parametric images
(e.g., regional cerebral blood flow); most do not.
Certain procedures, such as the method of bolus
administration, reconstruction filter, and scatter
correction are not standardized, but may influence the results.
2. Anatomical standardization of voxels. Most pixelbased analysis is done in a universal standard
space: Talairach space [Talairach and Tournoux,
19881. However, there is no standard implementation of this transformation. There is documented variability of this ”standard” space across
centers [Senda et al., 19931. Moreover, some
centers perform further nonlinear standardization to Talairachs atlas [Friston et al., 1991b, 1995;
Minoshima et al., 19941. Coregistration of PET
images with one another is accomplished by
head immobilization or post hoc mathematical
algorithms [e.g., Woods et al., 19921. Coregistration of PET and magnetic resonance (MR) images
is an integral part of the analysis in some centers,
and relies on either fiducials, post hoc mathematical algorithms, or a combination of both [Pellizari
et al., 1989; Woods et al., 1993; Neelin et al., 1993;
Evans et al., 1994; Grabowski et al., 1995al.
3. Noise reduction techniques. Virtually all centers
performing activation studies remove global fluctuations in rCBF or activity (i.e., PET counts),
which are believed to be task-independent. However, some centers perform proportional normalization [Fox, 1991; Worsley et al., 19921 and others
a pixelwise ANCOVA adjustment [Friston et al.,
1991al. Moreover, some centers estimate and
remove other sources of variability (e.g., Pcoz
[Roland et al., 19931 or a block [subject] effect
[Friston et al., 1991al). Most centers apply spatial
filters to smooth the data in order to improve the
signal-to-noise ratio and to overcome residual
anatomical variability in the standard space, with
final image resolutions of 16-20 mm full width at
half maximum (FWHM).
4. Test statistic generation. Pixel-wise statistical
analysis usually requires estimates of both local
mean change and variability. However, methods
differ in both the generation and the application
of this estimate of variability. Some methods
[Worsley et al., 1992; Minoshima et al., 19941
+ Reliability of PET Activation +
average variance across all PET voxels in the
search volume. Others [Friston et al., 1991a;
Roland et al., 1993; Holmes et al., 19951 use a local
variance estimate, but differ fundamentally
among themselves in the way in which this
estimate is generated. In some methods, the test
statistic is based on response intensity [Fox et al.,
1988; Worsley et al., 1992; Friston et al., 1991a1,
whereas in others it is based on the spatial extent
of the response [Poline and Mazoyer, 1993; Roland et al., 1993; Friston et al., 19941. Some
proposed methods take into account both signal
intensity and spatial extent [Poline and Mazoyer,
19941.
5. Inference. Once test statistics have been mapped
to every pixel or local extremum, a decision must
be made at every pixel (or extremum) to accept or
reject the presence of activation. The decision can
be made on the basis of signal intensity and/or
spatial extent. Intensity thresholding has been
performed at a predetermined alpha level, without correction for multiple comparisons; with a
correction for multiple comparisons based on
theoretical results [Friston et al., 1991a; Worsley
et al., 19921; or with a nonparametric approach
[Holmes et al., 19951. Decisions based on the
spatial extent of a cluster have been based on
empirically assessed null distributions [Roland et
al., 19931, or with reference to expectations generated by theoretical results [Poline and Mazoyer,
1994; Friston et al., 19941. Nonparametric spatial
extent thresholding has also been proposed [Holmes et al., 19951. The siLe of the sedrch volume
(all intracranial voxels vs. gray matter voxels) and
the alpha level (both its magnitude and whether
it is applied slicewise or volumetrically) vary
across investigations, and even between studies
from the same center.
Given the number of relevant procedural differences, it is unlikely that pixel-based analysis methods
can be compared to each other on the basis of the
existing literature. Such comparisons should be performed by submitting the same dataset to each of the
several methods.
Here we report such an experiment, in which we
compare four methods of analysis for PET activation
images. These methods are: Change distribution analysis (CDA), Worsley’s method (WOR), the pixelwise
general linear model (GLM), and a nonparametric
(rerandomization) approach (RER). We submitted the
same PET dataset to each method for analysis. Thus,
images were subjected to the same image coregistration algorithm, the same degree of spatial filtering,
and the same stereotactic transformation. The same
search volume and volumetric alpha level were used.
In this way, we attempted to directly compare the
performance of these methods in terms of their inferential statistics.
We incorporated several other forms of control in
this experiment: 1)we chose a paradigm that has been
replicated and independently studied with other imaging modes [Petersen et al., 1988; Raichle et al., 1993;
Bartenstein et al., 19941; 2) we studied a large enough
sample to allow measures of within- and betweencohort variability. 3) we carried out a parallel analysis
with same-task pairs in order to test the methods for
type I errors; 4) we performed quantitative imaging to
allow monitoring for systematic changes in global
blood flow.
METHODS
Image acquisition
Paradigm
We used the verb generation paradigm of Petersen
et al. [1988] because it lends itself easily to repetitive
performance, because it activates association cortex,
and because its results have been shown to be replicable by its original investigators. Moreover, it has also
been employed in functional studies with other imaging modalities, including event-related potentials and
functional magnetic resonance imaging (MRI) with
consistent results [Hinke et al., 1993; Snyder and
Raichle, 19931. Although there can be disagreement
over the interpretation of the results, there is a general
consensus as to the location of the main areas activated by the tasks.
We used only the verb generation and noun repetition tasks of the Petersen paradigm. Subjects performed GENERATE VERB and READ NOUN four
times each, using a new list of common concrete
nouns for each GENERATE-READ task pair. Before
the PET study, stimuli were piloted by ten normal
volunteers. Nouns which could be used as verbs (e.g.,
“RAKE”) were eliminated. The pilot data were used to
assemble five lists which were equivalent in terms of
median latency to voice onset and the number of
unique responses which normals gave.
The subjects of the PET study were briefed on the
tasks immediately before the scanning session and
practiced the GENERATE VERB task once with 40
4
Grabowski et al. 4
items (which were not used during the experiment)
immediately before the first [150]H20injection. GENERATE VERB and READ NOUN tasks were performed in a GRGRGRGR order. GENERATE VERB
was always performed before READ NOUN in order
to minimize the tendency of subjects to covertly
generate verbs during the READ trials, since subjects
performed both tasks using the same lists. The stimuli
were presented visually at a rate of one word per 2 sec
(500 msec on, 1,500 msec off). Subjects were cued
visually by the disappearance of a white box 5 sec
before the first word appeared, and continued performance until all 90 stimuli had been delivered. During
the scanning session, subjects’ spoken responses were
taped. These recordings were digitized, and latencies
to voice onset determined for each item, using custom
software.
all, 30 datasets were analyzed using the pixel based
methods described below: 15 activation datasets and
15 noise datasets.
Magnetic resonance imaging
MR images were obtained with a General Electric
Signa scanner operating at 1.5 T, using the following
protocol: SPGR/SO, TR 24, TE 5, NEX 2, FOV 24 cm,
matrix 256 x 192. We obtained 12A contiguous coronal
slices with thickness 1.5-1.7 mm and interpixel distance 0.94 mm. The slice thickness was adjusted to the
size of the brain so as to sample the entire brain while
avoiding wrap artifacts.
Positron emission tomography
PET data were acquired with a General Electric 4096
Plus body tomograph, yielding 15 transaxial slices
with a nominal interslice interval of 6.5 mm.
Subjects
Parameter (regional cerebral blood flow [rCBF])
estimation followed the [l50]H20autoradiographic
Eighteen normal right-handed volunteers were stud- method [Herscovitch et al., 19831, as follows. For each
ied after giving informed consent, Subjects were stud- injection, 50-75 mCi of [150]H20in 8 mL saline was
ied in two imaging sessions on separate days. In the administered ds a bolus through a venous catheter.
first, they underwent three-dimensional MRI. In the Images were acquired in twenty 5-sec frames, beginsecond, they underwent a single PET imaging session ning at the time of injection. Arterial blood sampling
during which they received 8-10 injections of [150]H20. was performed manually (0.5 cc/5 sec) via a radial
Since some subjects had only eight injections, only the artery catheter, beginning at injection. The input
first eight injections from each subject were analyzed function was derived from the radial artery blood
for purposes of this study. The data therefore com- curve, after deadtime and decay-correction. To deterprise 144 injections of labeled water. In order to mine the time course of bolus transit from the cerebral
arteries, time-activity curves (TACs) were generated
simulate conventional sample sizes, the 18 subjects
for regions of interest (ROIs)placed over major vessels
were divided at random into two independent coat the base of the brain. The eight frames representing
horts of nine subjects (cohort A and cohort 8). Each
the first 40 sec immediately after transit of the bolus
cohort comprised five women and four men. Each from the arterial pool in the brain were summed to
cohort had four GENERATE-READ pairs per subject make an integrated 40 scc count image. This summed
(sample size 9 x 4). The data from each cohort were image was reconstructed into 2 nun pixels in a 128 x
further fractionated into two datasets of sample size 128 matrix using a Butterworth reconstruction filter
9 x 2 pairs and four datasets of sample size 9 x 1pair (order = 6, cutoff frequency = 0.35 Nyquist), yielding
(see Fig. 1).Thus, there was a total of eight independent a reconstruction resolution of 10.6mm in the XY plane
datasets at the 9 x 1sample size, four datasets at 9 x 2, and 5.7 mm in the Z axis. Using an assumed brain
and two at 9 x 4. The use of this design allowed us to tissue/blood partition coefficient of 0.90, rCBF was
assess the reproducibility of the results in indepen- calculated pixel-by-pixel with the lookup table method
dent cohorts of reasonable size (36 G - R subtraction [see also Hichwa et al., 19951.
pairs), and the stability of activation foci within subject
and method. It also permitted estimation of the power
image analysis
of the methods in samples of smaller (“conventional”)
size. A completely parallel analysis was designed with
MR and PET images were transferred on digital tape
same-task (GENERATE-GENERATE and READ- to the Human Neuroanatomy and Neuroimaging
READ) pairs, which we refer to as ”noise images.” In Laboratory of the Division of Behavioral Neurology
26
4
Reliability of PET Activation +
COHORT " A "
8
INJECTIONS
GR GR GH GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GH
GK
GK
GH
GH
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GK
GR GR GH GH
GR GR GK GK
GR GK GK GK
GR GK GK GK
GK GK GK GK
GK GK GK GK
GK
GK
GH
GK
GH
GH
GK
GR
GK
GK
GK
GK
GK
GH
GI?
GK
GH
GH
GK
GK
GK
GK
GK
GK
GK
GH
GK
GR
GR
GH
GR
GK
GK
GR
GH
GK
GK
GK
GK
GK
GK
GK
GK
GH
GK
GK
GK
GK
ti
144 injections
in total
18 X 4 t r i a l s
9 X 4 trials
9 X 2 trials
1 dataset
2 datasets
4 datasets
GH
GK
GR
-m
GR
GK
GR
GR
GR
GR
GK
GK
GR
GR
GR
GR
GR
GR
GR
GR
GR
GR
GR
GK
GK
GR
GR
GR
GR
GR
GR
GR
GR
GR
GK
-I'
GK
GK
GK
9 X 1 trial
8 datasets
Figure I.
Partitioning of data into cohorts and datasets. GENERATE and
READ were performed four times by each subject in the order
GRGRGRGR. Subject cohorts A and B (N = 9 ) were formed by
random assignment, without replacement. Within cohort, datasets
of 9 subjects X I, 2, or 4 task-pairs were formed, assigning
GENERATE and READ injections at random, without replacement.
(Original GR pairing was not respected.) A parallel analysis was
designed for same-task pairs (not illustrated).
and Cognitive Neuroscience. All image processing
was performed with networked Silicon Graphics Workstations (Silicon Graphics, Mountain View, CA).
Three-dimensional neuroanatomical analysis of MR
images was performed using Brainvox, a threedimensional interactive rendering package [Damasio
and Frank, 19921. Analysis of PET data was performed
using an extension of this package designed for
functional imaging ("PET-Brainvox" [Damasio et al.,
19931).PET-Brainvox comprises an interactive module
(MPFIT), supporting a priori PET slice selection and
MR-PET coregistration, and a suite of modular software utilities that support pixelwise image computations [Grabowsla et al., 1995al.
+ 27 +
+ Grabowski et al. +
Description of implemented pixel-based methods
Data preparation
Image coregistration
Change distribution analysis (CDA)
During the three-dimensional MR acquisition, subjects wore fiduciary glasses which allowed us to use
MPFIT to predict a PET gantry orientation producing
slices parallel to the intercommissural line. The technique gives a priori MR-PET coregistration, with
three-dimensional errors at the level of the cerebral
cortex averaging 3.3 mm [Grabowski et al., 19941.
Residual misregistration was estimated and removed
with a mathematical post hoc algorithm (Automated
Image Regstration [Woods et al., 19931).
This method was implemented after the description
of Fox and colleagues [1988], with the following small
differences. As originally described, CDA datasets
were integer datasets. In this analysis, secondary
smoothing was performed at floating point resolution.
This had the effect of eliminating pixels that were
“tied” with their neighbors. Since our datasets have a
much finer matrix size than those analyzed with the
original CDA, only every second pixel (in all three
dimensions) was analyzed, and a radius of four pixels
used to define local extrema. The analysis of leptokurtosis was performed in a two-tailed sense for the
primary analysis, and with two independent onetailed tests as a derived analysis designated “CDA(l).”
Spatial filtering
All parametric images were smoothed to 20 mm
FWHM in the XY plane, using a Gaussian kernel, prior
to Talairach transformation. No axial smoothing was
performed. This is the only processing step in this
study which was not performed in three dimensions.
Stereotactic transformation
Talairach transformation was based on user-identification of the anterior and posterior commissures in
coregistered three-dimensional MR datasets, which
had been manually segmented to remove extracerebra1 structures. A third point in the midsagittal plane
was also marked by the user. These points are sufficient to construct the axes of the coordinate system.
The bounding box was found with an automated
planar search routine. Spatial scaling occurred in three
segments along the Y axis, two in the X axis, and two
in the Z axis, as Talairach described. The voxel dimension in Talairach space was 1 x 1 x 1 mm. Talairach
coordinates reported here use the millimetric scale of
the 1988 atlas [Talairach and Tournoux, 19881.
Search volume
The volume of tissue which was searched for significant responses was that portion of Talairach space
which was common to all subjects. The search volume
was determined by the intersection of the stereotactically transformed three-dimensional MRI volumes,
inclusive of ventricles, and was approximately 1,000
cm3. No attempt was made to restrict the search
volume to gray matter pixels. In the case of change
distribution analysis, the search volume consisted of
the union rather than the intersection of the transformed three-dimensional MR volumes.
+ 28
Worsley’s method (WOR)
This method was implemented after the description
of Worsley et al. [1992]. Although implemented with
local software, it is faithful to Worsley’s description.
General linear model
(Gul)
In essence, GLM is a method for estimating pixelwise T statistics using a local variance estimate by
partitioning variance to effects of global flow, block
(subject), or task, or else to residual error, using a
linear model. Our implementation of the general
linear model was cross-validated with reference to
single pixel datasets analyzed with the SAS statistical
package (SAS Institute, Inc., SAS/STAT Users Guide,
Release 6.03 Ed., Cary, NC: SAS Institute, Inc, 1988).
For this study we used a model which estimated
coefficients for global flow (covariable) and task and
block/subject effects (classification variables). These
are the same coefficients estimated by SPM95 [Friston
et al., 19951, although previous versions of SPM implemented a one-way ANCOVA and treated task replicates as separate levels of task [Friston et al., 1991a,
Friston, 19941. Adjusted mean images were prepared
for each task. When the analysis proceeded using only
a subset of injections, all injections were used to
calculate the ANCOVA adjustment, but only the
subset was used to calculate adjusted means.
Our implementation differs from SPM in two other
respects. First, once pixelwise T statistics were generated as a planned comparison of means and transformed to the normal distribution, we determined the
+ Reliability of PET Activation
+
threshold Z values according to the theory of differen- gated four modifications of these methods ("derived
tial topology, as articulated by Worsley et al. [1992]. methods") which included: 1) change distribution
Past versions of SPM used a two-dimensional empiri- analysis, with separate one-tailed tests for positive and
cal estimator of smoothness, and the threshold was negative leptokurtosis, CDA(1); 2) Worsley's method
drawn so as to meet a certain alpha level on a slicewise using a voxelwise (local) variance estimate, WOR(L)
basis [Friston et al., 19911. SPM95 also to estimates [Worsley et al., 1993; Worsley, 19941; 3) T statistic maps
smoothness empirically, but in three dimensions. We based on pooled voxel standard deviation, but threshused Worslry's approach because of its more rigorous olded on the basis of spatial extent rather than T
theoretical foundation and because it did not seem statistic magnitude [WOR(E)]. Because of the high
desirable to introduce another variable into the com- effective degrees of freedom of the denominator, these
parison.'
T maps were assumed to have a gaussian distribution,
The second difference between our implementation and the method of Friston et al. [1994] was applied.
and SPM was that no attempt was made to limit the For this analysis and the next, an arbitrary Z statistic
search volume to gray matter pixels for this or for any threshold of 2.8 was chosen; and 4) T statistic maps
generated by the general linear model, transformed to
of the other methods.
the normal distribution, and then thresholded by
spatial extent [GLM(E)][Friston et al., 19941.
Rerandomization method using locally smooth
variance (RER)
Analysis of data
This method was implemented based on the description of Holmes and colleagues [1995], with minor
modifications. Because our datasets are much larger
(by virtue of the finer matrix) than those of Holmes et
al., it proved necessary to limit computation to every
other slice of data in order to accelerate the analysis. In
the smaller datasets, inclusion of every slice resulted in
essentially identical results, the difference in the calculated threshold being on the order of 0.01. Therefore,
we believe the results are also quite accurate for the 18
subject analysis. The largest dataset (18 subjects) was
analyzed with an "approximate" (i.e., sampled) method
using 1,000 rerandomiLdtions. Computation time in
this case was approximately 24 hr on an SCI Indigo
workstation. We did not implement Holmes' "stepdown" procedure. Locally smoothed variance was
obtained using a 12 x 12 x 10 mm gaussian kernel.
Division of local mean change by a function of the
locally smoothed variance generated "pseudo Tstatistic" images.
Derived methods
The primary analysis of methods in this manuscript
concerns CDA, WOR, GLM, and RER. We also investi'In
fact,
however,
the
difference
between
thresholds
from the two approaches would have been small in
this study. The Z threshold dictated by Worsley's
equation was 4.55 [alpha 0.05, two-tailed] whereas our
implementation of the two-dimensional empirical estimator would have given, on average, 4.51 for the
entire volume. A three-dimensional version of the
empirical estimator would have given 4.55, on average.
Cluster detection and description
"Activation clusters" were detected by thresholding
the test statistic images of WOR, GLM, and RER at a
value corresponding to a volumetric alpha of P < 0.05.
Foci of positive or negative activation were described
in terms of their volume (mm?), and the stereotactic
coordinates of their centers of mass. For CDA, the
centers of mass were determined by seeding the
cluster detection algorithm with the significant local
extrema; the threshold was set at the value of change
in normalized rCBF that corresponded to a Z score of
k1.96 ( P = 0.05, two-tailed). The centers of mass of
these clusters were used in the subsequent analysis.
Neuroanatomical interpretation
A neuroanatomical interpretation of each activation
duster was made by the investigators by projecting
the activation clusters and their centers of mass onto
an image of stereotactically transformed and averaged
MRI scans of the corresponding subjects, using Brainvox (Table IV). Activation clusters with anatomical
interpretations, which we call "activation regons,"
were the fundamental unit of analysis in this study. In
some datasets we decided that two or more activation
clusters corresponded to the same activation regon.
We refer to these associated clusters as "fragmented
activation regions." Fragmentation was observed most
often in the cerebellum and right lateral temporal
regions. Conversely, we recognized that some activation clusters in the larger datasets spanned more than
one activation region, especially when spatial extent
+ 29 +
4
Grabowski et al. 4
thresholding was used [WOR(E) and GLM(E)].These
activation regions were represented by separate clusters in smaller datasets. Activation clusters were given
more than one anatomical label when suprathreshold
pixels in a cluster, labeled as one region, were within a
7-mm three-dimensional distance of the estimated
center of mass of another region. We refer to these
clusters as "confluent activation clusters." Methodologcal factors, such as smoothing, may account for
this phenomenon in which distinct activation regions
are represented by connected clusters. Confluence
chiefly affected the cerebellar vermis-right cerebellar
regions and the left inferior frontal gyrus-left dorsolatera1 prefrontal regions.
Classification of regions
Activation regons were classified according to the
following scheme: 1) A replicafed response was one
which was found in independent datasets, i.e., in at
least two datasets which did not share any injections
of [150]H20.These images could come from the same
or different cohorts of subjects. 2 ) A nunreplicated
response was one which was not found in at least two
independent datasets; 3) A cunsensus response was
one which all four primary methods detected in the
18 x 4 dataset. All consensus responses were also
replicated responses; 4) A marginal response was one
which was not replicated, but which was found by at
least one method in the 18 x 4 dataset (i.e., when all
available data were pooled). In order to qualify as a
marginal response, the activated region also had to be
a visible local extremum in the mean difference image
of the 18 x 4 dataset. The reason for the "marginal"
classification was to distinguish, approximately, those
responses which were real, but weak (nonreplicated
marginal), from those which were false positives
(nonreplicated nonmarginal).
Measurement of concordance
Concordance between methods was assessed using
the kappa statistic for interrater reliability. Each of the
four primary methods was considered to be an independent rater. The methods were polled as to the
presence or absence of activation in each activation
region listed in Table lV, in each of the 15 activation
datasets.
Repkabifity
We were interested in knowing the rate at which
any activation region would be found again if another
study were performed using the same subjects and
method or different subjects and/or methods. This
analysis of replicability was carried out in terms of
activation regions, not activation clusters. A pixelbased measure of replication was not attempted.
Fragmented activation regions were considered as
one region. We considered the following dimensions
to be relevant to the analysis: 1) sample size (9 X 1,
9 x 2, or 9 x 4); 2) primary method of analysis (CDA,
WOR, GLM, RER); 3) subject cohort (A or B). Parallel,
separate analyses were done for each sample size.
Replication rates (total number of replications divided
by the total number of possible replications) were
determined for the following three situations: 1)in the
same dataset, but using a different primary method of
analysis; 2) in the same cohort, using the same methods of analysis; 3) in the other cohort, using the same
method of analysis. These relationships and an example are illustrated in Figure 2.
Spatial stability of the locations of activation regions
The best estimates of the locations of the centers of
mass for each activated region were determined by
averaging the centroid coordinates of all nonconfluent
clusters in these regions which were detected in
datasets of any sample size. Note that this is the only
aspect of our analysis in which data were pooled
across sample size. In the case of fragmented activation regons, only the largest cluster was used for this
determination. The average centers of mass were
determined separately for cohorts A and B. The vector
distances between corresponding centers in cohorts A
and B were calculated.
In order to determine how well the centers of mass
of individual activation regions estimate the locations
of these average centers of mass, we calculated a
"radius o f 95% confidence" in the following manner.
The three-dimensional distances between the center
of mass of each nonconfluent activation cluster and
the average center of mass of the corresponding
region were tabulated. The radius of 95% confidence
was defined as the mean of this distance plus 1.65
standard deviations.
RESULTS
Technical data
Median latency to voice onset did not differ for the
five word lists used in the study (two-way ANOVA,
Reliability of PET Activation +
Figure 2.
replication occurs two of three possible times (67%). Within
cohort, within method replication is 3 / 3 (loo%), and berween
cohort, within method replication is 2/4 (5090).
Calculation of rates of replication. Detection of activation of LlFG in
datasets of size 9 X I is denoted by the presence of an "X" in the
figure. Consider the activation region LlFG detected by CDA in
cohort A, dataset I (circled). Within dataset, between method
F4,67= 0.91, P = 0.46). An analysis of covariance using tion of the variance images indicates that visual cortex,
scan order as a regression covariate indicated no not differently activated by the tasks reported here,
significant effect of scan order on response latency has a higher variance than other gray matter areas
(Fig. 3).
( t 5 3 = 0.22, P = 0.83).
Global rCBF was not significantly different for the
two tasks [GENERATE 56.1 (SE 1.0) mL/min/100 g vs.
Noise images
READ 57.6 (SE 1.1) mL/min/100 g (Fl,17= 2.93,
P = 0.105, two-way ANOVA)]. Within task, global
Each of the 15 same-task datasets was analyzed with
rCBF did not differ significantly with respect to scan
order (ANCOVA: GENERATE: tS3= 0.01, P = 0.99; the four primary pixel-based methods, making a total
of 60 analyses. The volumetric alpha level was nomiREAD: T53 = 0.24, P = 0.81).
The search volume of cohort A was 1,043 cm3 (336 nally P = 0.05. Three spatially distinct "activations"
resels), Talairach 1988 Z level -40 < Z < +38. The were detected in these datasets. (Two by WOR and
search volume of cohort B was 1,060 cm3 (342 resels), one by RER; see Table 11). False positives therefore
-44 < Z < +39. The search volume of the combined occurred at the nominal rate (3/60 = 5% of datasets).
In derived analyses [WOR(L), CDA(l), GLM(E),
cohort was 991 cm3(319 resels), -40 < Z < +38.
Mean voxel standard deviation for the datasets in WOR(E)], two datasets gave false positives with
the study was on average 3.54 normalized flow units WOR(E). These two false positives corresponded to
in 9 x 1analyses, 2.54 in 9 X 2 analyses, and 1.83 in 9 x the same brain region as one of the WOR false
4 and 18 x 4 analyses. The standard deviation de- positives and occurred in a 9 x 2 dataset and one of its
clined as expected by a factor of root two as replicates subsidiary 9 x 1 datasets.
were doubled.
Significant spatial structure was found in the variDescription of activations
ance images. This structure had at least three aspects.
First, variance appeared to be higher in gray matter
than in white matter (see Fig. 3). Second, variance in Total number of activated regions
activated regions was often higher than both the
average variance and the average gray matter variThe operation of all four primary methods on all 15
ance (Table I and Fig. 3). Third, simple visual inspec- activation datasets detected a total of 474 activation
4
Grabowski et al. 4
Reliability of PET Activation +
TABLE I. Average variance in activated regions
Region
Direction
RCBLM/CVERM
LCBLM
increase
increase
increase
increase
increase
increase
increase
decrease
decrease
decrease
decrease
LTPOL
LIFG/LDPF
RAINS
LMTG
ACING
RMPSYL
RMORB
LMPSYL
MPAR
Mean
(WOR)
Mean SDV
2.01
1.98
1.81
2.20
2.43
1.75
2.35
1.84
2.39
2.10
2.26
2.10
WM)
M e a n adj.
error var.
0.506
0.544
0.371
0.517
0.535
0.460
0.636
0.462
0.500
0.534
0.590
0.516
ginal. Only 7/26 (1.5% of the total) of the activation
clusters were nonreplicated, nonmarginal changes
that were likely to be false positives. Similarly, when
analyzed in terms of activation regions, 6.9% of regions were nonreplicated, of which 7/30 (1.6% of the
total) were nonreplicated nonmarginal responses
(Table IIIB).
Anatomical classificationof activations
Activation regions for WOR 18 X 4 and GLM 18 x 4 are listed,
together with the mean voxelwise measure of variability (standard
deviation for WOR and adjusted error variance for GLM) for each
region. Region name abbreviations are the same as for Table VI.
Over the entire search volume, the mean voxelwise standard
deviation for WOR was 1.87 and the mean adjusted error variance
for GLM was 0.480. See also Figure 3.
clusters. There were 81 fragmented activation regions
and 45 confluent clusters. After adjustment for fragmentation and confluence, there was a total of 438
detected activation regions. Whether analyzed in terms
of activation clusters, or activation regions, the great
majority (87%) of activations were classified as replicated, consensus responses (Table IIIA). In the cluster
counting analysis, 5.5% of "activations" were not
replicated, the majority of which (19/26) were mar-
Figure 3.
The effect of tissue type on the magnitude of voxelwise variance.
A Left: Gray level histogram for the average MR image of the 18
contributing subjects. Right: Average MR image at Z = +6. Two
gray level intervals. corresponding approximately to the cortical
mantle and the deep hemispheric white matter, have been
color-coded. B: Left: Plot of average voxel standard deviation for
WOR, I 8 X 4, as a function of average MR gray level. Right: Image
of voxel standard deviation at Z = +6. C: Left: Plot of adjusted
voxel error variance for GLM, I 8 x 4. as a function of average MR
gray level. Right: Image of adjusted voxel error variance at Z = +6.
In B and C, note the much lower variance for pixels corresponding
to white matter. In B and C, the black horizontal lines correspond
to the magnitude of the average voxelwise standard deviation and
adjusted error variance, respectively, and the horizontal blue lines
to the average voxelwise standard deviation and adjusted error
variance in the excursion set of voxels (activated volume). (See also
Table I.)
A total of 27 anatomically distinct regions were
identified as activated at least once (Table IV). These
regions included: 12 replicated consensus regons, 3
replicated nonconsensus regions, 6 nonreplicated marginal regions, and 6 nonreplicated n o n m a r p a l regions. Figure 4 shows the activation maps for the 18 x
4 dataset, using GLM and WOR. Both blood flow
increases and decreases were found (see also Fig. 4).
There were six positive and six negative consensus
regions (Table IV, Table V).
All activations with absolute mean changes (in the
grand dataset) of at least 2.2 normalized rCBF units
were found to be consensus regions. All consensus
regions were also detected in both cohorts of subjects,
though not by all methods (Table VI). Both nonreplicated and replicated foci were found in the range of
1.0-2.0 normalized flow units.
This roster of activated regions is quite consistent
with that reported by Petersen and Raichle and their
colleagues (see Table VII and Petersen et al. [1988,
19891 and Raichle et al. [1994]). These investigators
reported all of the consensus increases reported here
except VTHAL. They have also reported the nonconsensus foci LCBLM, LMTG, and LMOCC in specific
contexts [Raichle et al., 1992, 19931 and the decreases
LMPSYL and RMPSYL [Raichle et al., 19941. The
RAINS and LTPOL foci have not been previously
reported, nor have any of the extrasylvian decreases
detected in our study. The average three-dimensional
vector distance separating our foci from their counterparts in the Washington University studies was about
10 mm (Table VII).
Thus, the foci we report are a superset of those
reported by the original investigators. The activation
map has proven to be more extensive than was
originally suspected, particularly, in respect to blood
flow decreases.
Concordance among methods
There was a remarkable concordance among the
four primary pixel-based methods, as assessed by the
kappa statistic (Table VIII). Kappa values of 0.6-0.7
+ 33 +
+ Grabowski et al. +
TABLE II. Summary of false positive activations
Method
Cohort
Sample
Direction
Size
(mm3)
x
Y
Z
Threshold
Magnitude
A
A
B
9x 1
9x 1
9x 1
decrease
decrease
increase
443
15
6
+15
+6
-7
-61
+43
-52
+19
-18
-26
-4.55
-4.55
+6.30
-4.72
-4.59
6.43
A
A
9x 1
9x 2
decrease
decrease
7856
5195
+13
+13
-62
-61
+16
-2.80
-2.80
-3.50
-3.13
Main methods
WOR
WOR
RER
+
Derived methods
WOR(E)
WOR(E)
+18
All false positive activations are reported here. Coordinates (mm) conform to Talairach and Tournoux (1988) conventions. The two false
positives with WOR at the 9 x 1 sample size were in different datasets. Two of the false positive clusters were very small (6,15 voxels).
were found for 9 x 2 and 9 x 4 analyses. The values
were significantly lower for 9 x 1 analyses (approx.
0.3). The kappa values were also significantly lower
for negative than for positive activations at 9 x 1 and
9 x 2.
dataset. Each sample size was analyzed separately.
Replication rates were a function of sample size and
were on the order of 50% at 9 x 1, 70% at 9 x 2, and
85% at 9 x 4. Increases were replicated more often
than decreases in samples of size 9 x 1 and 9 x 2
(Table IX).
Rates of replication
Spatial stability of responses
Three hundred seventy-two activation regons were
considered in this analysis, since the analysis was not
applicable to activation regons found in the 18 x 4
The average three dimensional vector distance separating the average centers of mass of the consensus
TABLE 111. Classification of activation: Clusters and regions
Replicated
Sample
size
Consensus
Nonreplicated
Nonconsensus
Marginal
Nonmarginal
Total
A. Number of activation clusters
9 x 1
9 x 2
9 x 4
18 x 4
Total
Percent
143
146
89
33
411
86.7%
7
8
11
11
37
7.8%
4
2
6
9
19
4.070
4
0
2
1
7
8
8
6
29
6.6%
4
2
6
11
23
5.3%
4
0
2
7
1.5%
474
100%
B. Number of activation regons
9 x 1
9 x 2
9 x 4
18 x 4
Total
Percent
123
123
85
48
379
86.5%
1
7
1.6%
438
100%
Regional activation was classified as replicated consensus, replicated nonconsensus, nonreplicated
marginal, or nonreplicated nonmargnal. The analysis was carried out in terms of A) the number of
activation clusters and B) the number of activation regions. (A total of 45 activation clusters were
determined to be confluent, and a total of 81 fragmented activation regions were identified). See
Methods for a definition of terms.
34
4
Reliability of PET Activation 4
TABLE IV. Anatomical interpretation of activations, with classification
Talairach and Tournoux
(1988) coordinates
Region
Increases
Left inferior frontal gyrus
(consensus)
Right cerebellum
Left dorsolateral prefrontal
(Consensus)
Cerebellar vermis
Ventral thalamus
Anterior cingulate
Right anterior insula
Left middle temporal gyms
(marginal)
Left temporal pole
Left cerebellum
Left caudate head
Left mesial occipital
Left mesial temporal
Right orbitoinferior frontal
Right mesial deep parietal
Decreases
Right mesial post sylvian
Right lateral parietaI
Left mesial posterior sylvian
(consensus)
Mesial parietal
Right lateral temporal
Frontal polar
Right frontopolar
Right precentral gyrus
Right mesial temporal
Left lateral parietal
Left frontopolar
Right middle frontal gyrus
(marginal)
Magnitude
avt;
X
Y
LIFG
RCBLM
- 39
+21
+5
- 65
-33
5.24
5.56
LDPF
CVERM
VTHAL
ACING
RAINS
-44
-1
-1
-1
+33
- 57
+ 25
-21
-1
+ 31
-1
5.02
4.82
3.87
3.82
2.98
LMTG
LTPOL
LCBLM
LCAUD
LMOCC
LMEDT
ROIFG
RDPAR
- 56
-46
+5
- 38
-35
-8
2.00
1.97
3.36
1.73
4.59
4.06
3.93
1.84
Abbreviation
RMPSYL
RLPAR
LMPSYL
MPAR
RLTEMP
RMORB
RFPOL
RPCG
RMEDT
LLPAR
LFPOL
RMFG
+27
Z
+ 15
- 14
+21
+ 17
Magnitude
18 x 4
+4.66
+4.61
+4.08
+3.38
+3.15
+2.88
f2.12
Replicated
yes
yes (consensus)
yes
yes (consensus)
yes (consensus)
yes (consensus)
no (marginal)
+2.12
+1.86
+1.67
+1.59
+1.32
+0.93
+9
-34
-28
+9
- 73
+8
- 25
- 37
-7
-12
129
-13
-55
+11
+26
-4.27
-4.07
-3.34
-3.20
yes (consensus)
yes (consensus)
+ 12
+ 30
- 15
- 16
- 56
- 25
37
+57
+5
-8
- 43
62
+
- 10
-11
+3
+22
-19
+27
12
+
-3.90
-4.58
-3.56
-3.55
-3.15
-4.02
-2.15
-1.63
-1.51
-2.77
-2.69
-2.38
-2.27
-1.99
-1.59
- 1.45
-1.30
-1.20
Yes
yes (consensus)
yes (consensus)
yes (consensus)
no (marginal)
no
no
no (marginal)
no (marginal)
+44
+ 19
+ 33
-1.32
-1.18
no
-49
-15
-13
-16
34
18
+
+
+ 38
+45
+49
-45
+2
55
+2
+
+
+9
+48
+20
- 49
+0.85
+0.10
All activation foci were classified anatomically by referring to an average 3D MRI dataset constructed from the MR scans of the participating
subjects. All foci detected in the activation datasets are accounted here. Talairach and Tournoux (1988) conventions are used to express the
average centroid for each region. Avg mag: Average magnitude of change in blood flow attributed to a regon (normalized rCBF); 18 X 4
mag: Magnitude of change in blood flow (normalized rCBF) in the 18 x 4 dataset within a radius of 5 rnm of the coordinates of the mean
centroid.
creases, and a method's dependence on local variance
estimates (Table XI). For example, increases detected
in a 9 x 2 dataset with CDA were associated with a
957%confidence radius of about 5 mm, while increases
in 9 x 2 datasets detected with RER had a 95%
confidence radius of 16.4 mm. These findings corroborate and extend the observations of Taylor et al. [1993]
regarding the effect of local perturbations in variance
on the localization of activation foci.
regions of activation in cohort A from their counterparts in cohort B was 6.0 mm (Table X).
The radius of 95% confidence, which estimates the
confidence with which the center of mass of any given
activation cluster predicts the true center of mass for
that activation regon, varied with respect to method,
sample size, and direction (increase or decrease) of the
response over a range of 5-28 mm. Higher values were
associated with 9 x 1 sample size, blood flow de-
4
35
4
+ Grabowski et al. +
Figure 4.
Activation maps for GLM and WOR, 18 X 4 dataset. Activation shown. Talairach Z-levels are indicated (subjects’ left on reader’s
images are thresholded at P = 0.05 (two-tailed) and fused with
right). The concordance of these activation maps is apparent.
average MR images. Both decreases (blue) and increases (red) are
gions. No gold standard activation map is known for
this paradigm. Since the roster of consensus regions
Descriptive statistics: Number of activation regions
was our best estimate of the ”true” pattern of activaand volume of activation
tion, we used the number of detected consensus
regons as an index of the relative sensitivities of these
The number of activation regions detected in a
methods in the smaller samples (Fig. 5). RER and
dataset, both blood flow increases and decreases,
GLM, the methods dependent on local variance, were
increased with sample size for all methods. However,
not powerful in small samples. At the lowest sample
the rate of increase was not the same for all methods,
being slowest for CDA, and fastest for GLM and RER size, only CDA and WOR detected 50% or more of the
(Fig. 5). The cumulative activated volume had an consensus regions. At the 9 x 2 sample size, GLM and
analogous relationship to method and sample size (Fig. 5). RER remained substantially less sensitive than CDA
and WOR. At the 9 x 4 and 18 x 4 sample sizes,
performances were similar for all four methods. Once
Relative sensitivity
again, CDA did not show as steep a gain as the other
We were interested in the relative power of the methods when sample size increased (see Discussion,
primary pixel-based methods to detect activation re- below).
Comparativeperformance of four methods
+ 36 +
4
Reliability of PET Activation +
TABLE V. Activation regions in the I8 x 4 analysis, by method
CDA
RER
WOR(E)
GCM(E)
WOR(L)
CDA(1)
ACING
CVEIm
ACING
LDPF
RCBLM
VTHAL
LDPF
RCBLM
VTHAL
LDPF
RCBLM
VTHAL
ACING
CVERM
LIFG
LDPF
RCBLM
VTHAL
ACING
CVERM
LIFG
LDPF
RCBLM
VTHAL
CVERM
LIFG
LDPF
RCBLM
VTHAL
ACING
CVERM
LIFG
LDPF
RCBLM
VTHAL
LCBLM
LMTG
LTPOL
RAINS
LCBLM
LMTG
LTPOL
RAINS
LCBLM
LMTG
LTPOL
LCBLM
LCBLM
LMTG
LTPOL
LCBLM
LCBLM
WOR
GLM
Blood flow increases
ACING
ACING
LDPF
RCBLM
VTHAL
~
LTPOL
LTPOL
LCAUD
LCAUD
LMOCC
Blood flow decreases
LMPSYL
MPAR
RLPAR
RLTEMP
RMORB
RMPSYL
LMPSYL
MPAR
RLPAR
RLTEMP
RMORB
RMPSYL
RFPOL
LMPSYL
MPAR
RLPAR
RLTEMP
RMORB
RMPSYL
LMPSYL
MPAR
RLPAR
RLTEMP
RMORB
RMPSYL
LMPSYL
MPAR
RLPAR
LLPAR
LFPOL
RFPVL
RMFG
RMFG
RMFG
RMORB
RMPSYL
LMPSYL
MPAR
RLTEMP
RMORB
LMPSYL
MPAR
RLPAR
RLTEMP
RMORB
RMPSYL
For each method, all significant areas of activation are listed for the 18 x 4 dataset. The 12 regions common to CDA, WOR, GLM, and RER
were defined as consensus activations (boxes).
as had been reported [Fox et al., 1988; Poline and
Mazoyer, 19931. These discrepancies appeared to be
due primarily to the smaller number of local extrema
in our datasets than those reported previously. We
confirmed visually that each local extremum in the
search volume was identified with CDA’s search
routine. We found an average of 65 local extrema in
each tail of the analysis using all voxels, but only 30
using voxels common to all subjects. The number of
extrema in the distribution fell about 10% with each
doubling in sample size.
Replicability of results
When the methods were compared to one another
in terms of replicability of activation foci within
method, there were modest trends in favor of CDA
and WOR in small samples (9 x 1)and against CDA in
large samples. However, the data are more remarkable
for similarities between the methods than for differences. Replication rates between cohorts, within
method were similar in magnitude to the within
cohort rates (Table XII).
Additional comments and derived methods
WOR(L)
CDA
We prepared t-statistic images based on local (voxel)
standard deviation. This approach was clearly the
least powerful we investigated (Table V and Fig. 6).
Although performance was similar to the other methods in the 18 x 4 dataset, essentially no activation was
detected in smaller datasets.
We attempted to restrict the analysis with CDA to
voxels in common to all subjects, but no significant
activation was found in any dataset with this approach. Nor did we find that one-tailed analyses of
leptokurtosis increased the sensitivity of the method,
4
37
4
4
Grabowski et al. 4
TABLE VI. Replication of consensus regions in both cohorts
Cohort A
Region
CDA
WOR
Cohort B
RER
GLM
CDA
WOR
GLM
RER
Blood flow increases
ACING
CVERM
LDPF
LIFG
RCBLM
VTHAL
-
X
X
X
-
X
X
X
X
X
X
X
X
X
X
X
X
-
Blovd flow decreases
LMPSYL
MPAR
RLPAR
RMORB
RMPSYL
RLTEMP
X
X
X
-
X
X
X
X
X
X
X
X
X
X
X
-
-
X
X
X
X
The detection of the 12 consensus regions (denoted by "x") is displayed for the 9 x 4 analyses, by
cohort and method. Note the symmetry of results across cohorts.
RER
The computing resource requirements of the rerandomization method are quite significant. Computing
time is an order of magnitude longer than that
required by the other methods. Analyses for nine
subjects took approximately 5 hr on an SGI Indigo
workstation. Analyses of 18 subjects took approximately 24 hr when the data were downsampled to
TABLE VII. Comparison of activation regions to Washington University studies
Our nomenclature
Washington Univ. nomenclature
Region
X
Y
z
Region
X
Y
ACING
LIFG
-1
- 39
+21
+21
+31
+5
LDPF
RCBLM
-44
+27
+15
-65
+25
-33
CVERM
LMTG
LMOCC
LMPSYL
RMPSYL
+1
-56
-13
-45
+45
-57
-46
-73
-16
-13
-21
+5
Inferior anterior cingulate"
Lateral prefrontal cortexa
Inferior prefrontal cortexa
Dorsolateral prefrontal cortex"
Right cerebellar hemisphereb
Right inferior lateral cerebellum"
Anterior cerebellum/colliculusC
Left posterior temporal cortexb
Left medial extrastriate cortexb
Left sylvian-insular cortexb
Right sylvian-insular cortexb
+2
34
- 25
-38
+25
+34
0
- 46
-6
-37
+45
+21
+23
+38
+23
- 65
-62
- 53
-47
- 74
-11
-7
Mean
+8
+12
+11
-
Z
+30
+9
-6
+21
- 18
- 24
- 17
-1
+13
+ 16
+17
3D
difference
(mm)
3.2
6.7
22.0
10.8
15.1
11.8
5.7
11.7
8.7
10.2
8.5
10.4
The table compares the findings of the present study, when possible, to those of the Washington University group. Talairach coordinates of
their foci were converted to Talairach and Tournoux (1988) Atlas coordinates according to the following formulae: Xss = (130/145) * Xu;
Ys=
~ (172/163) * (Y67 - 14);ZM = (75/70) * Z67(JulieFiez, personal communication). The final column shows the computed 3D difference in
mm between the foci in the two studies.
"Petersen et al., 1988.
bRaichleet al., 1994.
Tetersen eta]., 1989.
+ Reliability of PET Activation +
TABLE VIII. Concordance among methods for classification
of activation regions
TABLE X. lntercohort replication of consensus regions
Cohort
Regions
All
Increase
Decrease
Size
9 x
9 x
9 x
18 x
9 x
9 x
9 x
18 x
9 x
9 x
9 x
18 x
1
2
4
4
1
2
4
4
1
2
4
4
Kappa
Z
N
0.33
0.61
0.65
0.61
0.37
0.70
0.67
0.63
0.23
0.48
0.67
0.59
4.80
11.05
11.38
6.78
4.62
10.52
8.74
5.34
1.87
4.79
7.26
4.19
208
104
52
27
120
60
30
15
88
44
22
12
All regions shown in Table VI were included in this analysis. For
each sample size, the kappa statistic was calculated, considering
each of the four primary methods to bc an independent rater.
every other slice (i.e., 42 slices were analyzed) and the
analysis was stopped at 1,000 rerandomizations. Without downsampling, this analysis would have taken 50
hr on an SGI R4000 Indigo workstation.
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
Thresholding by spatial extent [WOR(€), GLM(€)]
Region
N
X
Y
Z
ACING
ACING
CVERM
CVERM
LIFG
LIFG
LDPF
LDPF
RCBLM
RCBLM
VTHAL
VTHAL
5
10
12
9
16
-3
-1
0
-1
-41
-39
-44
-44
+32
+26
0
-3
+24
+19
-57
-59
22
+22
+18
+14
-60
-66
-12
-16
+28
+34
-21
-22
0
+2
+24
+26
-30
-34
0
+2
RMORB
RMORB
LMPSYL
LMPSYL
MPAR
MPAR
RLPAR
RLPAR
RLTEMP
RLTEMP
RMPSY L
RMPSYL
10
4
12
2
13
9
13
10
12
17
17
8
+2
+2
-43
+39
+28
-17
-15
-54
-61
-55
-53
-27
-24
-12
-18
-11
-15
+13
+12
+32
+30
+30
+23
-11
-8
+12
+15
1n
11
12
11
19
13
7
-46
+2
+3
+49
t50
+55
+54
+45
+45
+
Mean
t-Statistic images prepared with Worsley's method
and Z-transformed t-statistic images prepared with
GLM were thrcsholded on the basis of spatial extent,
as described by Friston and colleagues 119941. For this
analysis, the nominal threshold was placed at the
arbitrarily low value of Z = 2.8. These analyses
TABLE IX. Replication rates of individual responses,
collapsed across methods
Regions
Increases
Increases
Increases
Decreases
Decreases
Decreases
Sample
size
N
wDbM
wCwM
bCwM
9x 1
9x2
9x4
9X 1
9X2
9X4
92
85
56
45
48
45
49%
83%
87%
34%
63%
82%
57%
78%
47%
80%
93 %
23 %
52%
80 %
-
39%
54%
-
The table gives the overall probability (percent) that an activation
region was replicated within the same dataset with another method
(wDbM), within cohort and within method (wCwM), between
cohorts within method (bCwM). The analysis was carried out
separately for each sample size. N denotes the number of activation
regons which were assessed. The number of possible replication
events was determined as described in Figure 2.
3D(mm)
7.6
2.8
2.3
3.9
9.7
5.6
11.5
3.6
7.2
6.9
4.6
6.1
6.0 (2.7)
The coordinates of the centers of mass for the 12 consensus activation
regons are ,pen for both cohorts A and 6. Mean 3D vector distance
between homologous centroids was 6.0 mm (SD 2.7 mm).
produced the following results: I) cumulative activation volume approximately doubled for both methods
(Fig. 6); 2) some relatively small areas of activation
were not detected in the 18 x 4 dataset ( e g , LTPOL,
ACIhK, Table V); 3) no new areas of activation
emerged in the 18 x 4 dataset that were not detected
elsewhere in the study (Table V); 4) sensitivity of the
analyses in conventional sample sizes improved (Fig. 6).
DISCUSSION
The data presented here have several important
implications for the evaluation of the results of pixelbased analyses of PET activation images. Most importantly, the results indicate a high degree of specificity
and concordance for activation detected by any of the
four examined methods. Kappa statistics were in the
range of 0.6-0.7 for sample sizes 9 x 2 and larger.
There was a consensus on the presence of 12 regions
of activation in the 18 x 4 dataset.
+ 39 +
+ Grabowski et al. +
TABLE XI. Calculation of a radius of confidence for
activation regions
Sample
size
Method
CDA
WOR
GLM
RER
CDA
WOR
GLM
RER
CDA
WOR
GLM
RER
CDA
WOR
GLM
RER
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
Direction
N
Mean
SD
95% CR
increase
increase
increase
increase
decrease
decrease
decrease
decrease
increase
increase
increase
increase
decrease
decrease
decrease
decrease
17
32
7
19
20
16
1
7
10
17
16
18
13
17
9
9
4.7
8.1
8.0
10.6
5.9
6.2
6.1
13.5
2.9
4.6
4.7
9.5
4.0
6.3
8.4
9.6
3.2
3.7
3.3
5.1
3.7
4.4
9.9
14.2
13.5
19.0
12.0
13.4
NA
NA
8.8
1.4
3.0
3.2
4.2
4.4
5.2
4.9
4.8
28.0
5.2
9.5
10.1
16.4
11.2
14.9
16.6
17.5
The mean and standard deviation of the distance (mm) between the
center of mass of any detected activation region and the average
location of the center of mass for that region were used to calculate
the 95% confidence radius (Y5% CR).
None of the methods produced false positive activations at greater than the nominal rate. Only three false
positive activation clusters were seen in 60 analyzed
same-task datasets. In contrast, in the same number of
GENERATE-READ datasets, 474 activation clusters
were found. Of these 474 foci, 411 corresponded to
regions that were classified as consensus responses in
the 18 x 4 dataset. Only seven of the 476 were
nonreplicated nonmarginal responses likely to have
been false positives (Table IIIA). Thus, when an activation cluster was detected, it was highly likely to
represent a replicable, consensus response. This conclusion held for all sample sizes.
On the other hand, our results raise a serious
concern about the power of these analyses in small
samples. In making this assessment we relied on the 12
consensus regons of activation, since there is no gold
standard activation map. These regons were detected
by all methods in the 18 x 4 dataset; moreover they
were detected by most methods in both halves of this
dataset (Le., cohorts A and B, the 9 x 4 datasets) (Table
VI). The overall rate of detection of these consensus
regons was 33% in 9 X 1 datasets, 62% in 9 X 2
datasets, and 91% in 9 x 4 datasets. Although the
B
B e
B e
8
4
3 2
0
SXf
9x2
ax4
9x4
im4
Blood Flow Decreases
5
150
zH
1MJ
E
I
9s
0
Figure 5.
Primary pixel-based methods compared by the number of activa- average number of activation regions detected in datasets of each
tion regions, the cumulative volume of activation, and the percent- sample size. The center graphs show average cumulative volume of
age of consensus regions detected. Data for regions of blood flow activation (mL) in datasets of each sample size. Graphs on the right
increase are shown in the top tier and those for regions of blood show the percentage of the I 2 consensus regions detected in
flow decrease in the bottom tier. Graphs on the left show the
datasets of each sample size.
+ Reliability of PET Activation +
TABLE XII. Replication rates within method
Regions
Increases
Increases
Increases
Decreases
Decreases
Decreases
Sample
size
9
9
9
9
9
9
x
x
x
x
x
x
1
2
4
1
2
4
within cohort, within method
between cohort, within method
CDA
WOR
GLM
RER
CDA
WOR
58%
91%
69%
82%
38%
78%
41%
56%
46%
62%
42%
59%
-
44%
19%
44%
48%
82%
80%
25%
46%
50%
64%
85%
93%
31%
71%
100%
GLM
RER
14%
26%
83%
100%
67%
93%
0%
44%
83%
-'
33%
73%
The table shows the percentage of instances in which activation regions, detected by the given method
at the given sample size, were replicated within cohort, within method or between cohort, within
method. The number of activation regions for which replication was assessed was on average 19
(range 7-29) for regions of increase and 12 (7-21) for regions of decrease.
"Only one region (not replicated).
methods differed on this index of sensitivity (see
below), the best performance in the 9 x 1 datasets was
only about 50% (WOR). The use of spatial-extent
rather than intensity thresholding improved sensitivity in small sample sizes. However, some small but
reproducible foci went undetected with spatial extent
thresholding. The use of a smaller smoothing filter or a
two-dimensional (intensity and extent) activation
threshold [e.g., Poline and Mazoyer, 19941 might
improve this approach.
A complementary index of sensitivity is the within
cohort, within method rate of replicability, which
expresses the rate at which a detected focus of activation would be found by the same method in the same
cohort of subjects performing the same task. In a 9 x 1
sample the within cohort, within method rate of
Blood Flow Increases
"1
9x1
ax2
9x4
1@X4
I
1
1
9x1
9x2
9x4
18x4
or1
5x2
9x4
IN4
9x4
1ex1
Blood Flow Decreases
%XI
SXL
YXJ
lax4
Figure 6.
Derived pixel-based methods compared by the number of activation regions, the cumulative volume
of activation and the percentage of consensus regions detected, as a function of sample size. Please
refer to the legend of Figure 5.
+ 41 +
+ Grabowski et al. +
replicability was about 50% for blood flow increases
and 30% for blood flow decreases (Table IX), and in
9 X 2 samples 80% for increases and 55% for decreases.
Again, there were differences among the methods; the
best replication rate at 9 x 1was WOR (69% positive,
42% negative). Substantial improvement in power is
brought about by repeating each task at least once.
A further observation relevant to the determination
of the power of these analyses is the consistent
emergence of additional foci of activation in larger
sample sizes (some of which we classified as "marginal
responses"). In particular, we found increases in the
LCBLM, RAINS, LTPOL, and LMTG and a decrease in
the RMFG. In our study these foci were associated
with mean changes in rCBF of 1.0-2.0 mL/min/100 g
in the grand average. Larger sample sizes led to lower
variance and more power. It appears likely that an
even larger sample size might have detected additional foci.
In our study we found numerous "negative activations." In general, these blood flow decreases appear
to have lower rates of detection (Figs. 5 and 6),
replication (Table IX), and concordance (Table VIII) in
conventional sample sizes than do the increases. The
radii of 95% confidence are also larger for regions of
blood flow decrease (Table XI). However, the regions
of decrease are spatially discrete. Although negative
changes in functional images are sometimes thought
to be an artifact of global normalization, this cannot be
the case in our study, since global flow for READ
NOUN is on average higher (though not significantly)
than that for GENERATE VERB. Had we not normalized our images, the decreases would have been more
intense, not less.
Comparison of the methods is facilitated by their
concordance and by the fact that all avoid committing
type I errors. Distinctions emerge chiefly on the basis
of type I1 error rates and the stability of activation foci.
The consensus activations provide the basis for estimating and comparing the power of these analyses.
The replicability of responses and the power of the
methods to detect consensus responses was a function
of sample size for all four methods. However, the
effect of sample size was not the same for all the
methods. The methods dependent on local variance
estimates (GLM, RER) were affected most dramatically
by sample size. In 9 x 1 datasets, these methods had
low power, but showed rapid gains, reaching equality
with WOR at the 9 x 4 sample size. Questions of the
validity of parametric assumptions aside, RER did not
perform quite as well as GLM at any sample size. In
contrast, CDA performed quite well at the 9 x 1
sample size, but showed only modest gains with
increasing sample size. We will discuss this phenomenon further below.
WOR performed as well as CDA in the small
datasets and as well as GLM in larger datasets. However, the crucial assumption in Worsley's method,
namely that variance is spatially homogeneous (stationary), is called into question by our results. In fact,
several forms of structure were found to be present in
the variance: rCBF and changes in rCBF were more
variable in gray matter than in white matter, in visual
cortex than in the remainder of the cortical mantle [see
also Holmes et al., 19951, and in activated regions than
in nonactivated regons. The use of pooled standard
deviation therefore introduces bias because it underestimates the variability in gray matter, where the
activation signal arises. One might suggest the use of a
pooled gray matter variance. This stratagem would
preserve an important benefit of the pooled estimate:
the high effective degrees of freedom of the resulting
T statistic. The pooled gray matter estimate will be a
larger number than the whole brain pooled estimate.
Criticism of the inappropriate assumption of stationary variance might be tempered by the fact that WOR
did not commit type I errors and was not associated
with lower rates of replication than the other methods.
The higher local variance in activated regions asks
for an explanation. In WOR, it is likely to be the result
of interindividual variability in the location of activated tissue. Imperfect overlap of the foci in the
standard space results in higher local variance. In
GLM, this sort of variability is generally partitioned as
part of the subject ("block") effect and therefore it
does not affect the adjusted error variance. However,
within-subject, within-task variability (e.g., effects of
practice or habituation) may affect the error variance
in the GLM unless it is specifically modeled. On the
other hand, WOR is less sensitive to order effects such
as habituation, since each subject's images are averaged with one another before entering the analysis. A
relevant question is whether systematic task-order
effects occur (e.g., habituation). Investigators at Washington University reported such effects in this paradigm when subjects generated verbs repeatedly for
the same list of nouns [Raichle et al., 19943. We
attempted to minimize these effects by using fresh lists
with each performance. We were unable to demonstrate performance differences or rCBF differences in
the left inferior frontal gyrus with respect to scan
+ 42 +
4
Reliability of PET Activation 4
order [see Grabowski et al., 19941. The randomization
of injection assignment was expected to minimize any
order effects which did occur. Nevertheless such
effects are likely to contribute to variability in the
detection and the location of foci of activation. We are
continuing to investigate and quantify such order
effects.
We are aware that the pixelwise linear model, as the
statistical engine in SPM, is reported to be more
powerful with standard sample sizes than our results
would imply. However, we purposefully omitted
implementation of several integral components of the
SPM package, in order to put all methods on equal
footing. The most important of these components may
be the anatomical standardization procedures in SPM,
involving nonlinear (plastic) resampling of the PET
datasets, which cause a reduction in local variance,
leading to improved statistical power. In our experience, this reduction is, on average, about 35%. On the
other hand, all the statistical methods might be expected to benefit from such variance reduction. Details
regarding the effect of the SPM94 anatomical transforms on these methods can be found in Grabowslu et
al. [1995c]. Another important difference is that the
nominal alpha level that has been applied in many
existing studies in which SPM was used was a slicewise, rather than volumetric, criterion. Since SPM
analyzes 26 slices, this significance level is liberal,
relative to the one we employed. (This point does not
apply to analyses with SPM95, in which the nominal
alpha level is no longer slicewise.) Finally, in SPM, the
search volume is restricted to pixels whose intensity is
more than 80% of the mean search volume pixel
intensity (“gray matter”), a decision which reduces
the effective number of comparisons and allows further liberalization of the criterion. Thus, the relatively
poor performance of the GLM in the smaller datasets
may underestimate the performance of the SPM package in datasets of conventional size. But the relative
performances of the methods, given the same data, is
likely to be unaffected.
CDA was the first automated approach to be successfully used in PET activation experiments. However, it
did not see widespread use, partly as a result of
criticism of the validity of the two-tiered statistical test,
which was seen by some as offering little or no control
over type I error, and partly because some of the
original investigators turned to a hypothesis-testing/
replication approach, which they demonstrated to be
more sensitive than CDA [Videen et al., 1991; Raichle
et al., 1994; Buckner et al., 19951. CDA was included for
two reasons: because the verb generation paradigm
used here was originally investigated with CDA, and
because there are many studies in the literature which
used CDA. In our hands, CDA appears unlikely to
commit type I errors. The principal difficulty we had
with CDA stemmed from the fact that we found fewer
local extrema in the datasets than others have reported in their datasets [Fox et al., 19881. The reasons
for this discrepancy are not entirely clear. CDA was
originally performed on data that had a spatial resolution of 23 mm, rather than data convolved to a lower
resolution as a post-processing step. Therefore, our
data, although at a resolution of 20 mm FWHM, may
be ”smoother” than theirs. This may not be the whole
explanation, since CDA involves a secondary smoothing step, in which the image is convolved with a
14-mm spherical kernel. Other differences include our
much finer matrix and the use of floating point
resolution, which eliminated “ties” between neighboring voxels (which were not explicitly dealt with in the
original description of CDA). The small number of
txtrema in the change distributions led to two interesting effects. First, the analysis did not tolerate restriction to common voxels, since about half of the extrema
were in the rim of the search volume. Their removal
led to prohibitively high gamma-2 statistical criteria.
This effect is in contrast to other methods, in which the
inclusion of the highly variable rims tends to degrade
rather than improve performance. However, CDA
may depend on these rims in our data for its “error
estimate.” Second, the number of regions of activation
detected did not rise as a function of sample size as
quickly as it did for other methods.
The rerandomization method, described and implemented by Holmes and colleagues [1995], is an interesting compromise between the use of voxelwise and
pooled standard deviating, with the virtue of avoiding
the assumption of stationary variance. Not surprisingly, its performance lies somewhere between
WOR(L) and WOR. However, the GLM, which models
more sources of variability and generates T statistics
with higher degrees of freedom, is as powerful as RER,
and requires fewer computational resources. Contrary
to our results, Holmes reported that his method found
larger volumes of activation in Watson’s V5 dataset
than SPM did [Holmes et al., 19951. This apparent
discrepancy might be explained by Holmes’ use of
datasets that had been spatially normalized with
SPMs plastic resampling algorithm,
Choosing between GLM and RER may not be a
simple matter of power or computational efficiency.
Grabowski et al. +
Although GLM is used here to analyze what is essentially a subtraction paradigm, one of its strengths is its
flexibility; it can model more complex relationships
among tasks, e g , analyzing pixels that correlate with
continuous measures of performance. On the other
hand, the RER approach is meant to be a generic
technique, applicable to any test statistic which can be
generated from a functional dataset, without the need
to make distributional assumptions [Holmes et al.,
19951. This advantage is potentially large, and deserves further study.
We performed this study using a benchmark paradigm, analyzing leading methods of pixel-based analysis under conditions that allowed a direct comparison
of their statistical components. It would be desirable to
perform similar work with other paradigms and experimental designs, to study additional methods, and to
isolate other components of the data analysis stream
(for example the degree of smoothing and the method
of anatomical standardization). Examination of all
these factors was beyond the scope of a single study.
Nevertheless, our data provide information on the
overall and relative specificity and sensitivity of the
more commonly used methods, a basis on which to
interpret published findings, and parameters to consider in the design of future studies.
We offer the following conclusions:
1. The activation map for the verb generation paradigm is more extensive than was originally reported. In particular there are blood flow decreases which are spatially discrete and are not
artifacts of the global normalization step.
2. The leading methods for pixel-based PET image
analysis commit type I errors infrequently.
3. The leading methods for pixel-based analysis are
generally concordant, differing most significantly
in type I1 error rates, and the stability of the
attributed location of activation. Methods depending on local variability have higher rates of type 11
errors and larger 95% confidence radii, especially
in small datasets. Their successful use probably
depends on modeling of sources of variability and on nonlinear anatomical standardization.
4. At low sample sizes, most activations, both increases and decreases, are replicable. In this
study, they most often corresponded to consensus activations.
5. Analyses of low sample sizes are not powerful.
The absence of detection of activation at a particu-
lar location, especially when a small sample size
is used, must be interpreted cautiously. Experimental designs which repeat tasks at least once
lead to dramatic gains in power, rate of replication, and stability of activation foci.
6. Some weaker (marginal) foci of activation are
only detected in large datasets. These are likely to
remain undetected with conventional sample
sizes unless they are targeted by hypotheses.
7. Variance in activation images is spatially heterogeneous. Effects of tissue compartment and other
regional effects can be demonstrated. An assumption that variance is stationary is probably not
valid, although such an assumption did not lead
to spurious activations in this study.
8. Foci of activation that are less than 10-15 mm
apart, whether they are detected by the same or
different methods, cannot be confidently distinguished.
ACKNOWLEDGMENTS
We thank Andrew Holmes for providing information on the rerandomization method and for helpful
correspondence, Julie Fiez for many helpful discussions about the verb generation paradigm, Jon Spradling and Kathy Jones for dedicated technical assistance, and Ann Reedy for preparing the manuscript.
Supported by a grant from the Mathers Foundation.
REFERENCES
Bartenstein P, Weiller C, Eulitz C, Muller SP, Rijntjes M, Geworski L,
Dutschka K, Elbert T, Schober 0 (19Y4): Evaluation of brain
activity during verb generation with PET and MEG. J Nucl Med
Abstr 35:33P.
Bench CJ, Frith CD, Grasby I'M, Friston KJ, Paulesu E, Frackowiack
RSJ, Dolan RJ (1993):Investigations of the functional anatomy of
attention using the Stroop test. Neuropsychologia 31:907-922.
Buckner RL, Petersen SE, Ojemann JG, Miezin FM, Squire LR,
Raichle ME (1995): Functional anatomical studies of explicit and
implicit memory retrieval tasks. J Neurosci 15:12-29.
Clark C (1993): Analysis of covariance in statistical parametric
mapping. J Cereb Blood Flow Metab 13:1038-1042.
Damasio H, Frank R (1992):Three-dimensional in vivo mapping of
brain lesions in humans. Arch Neurol49:137-143.
Damasio H, Grabowski TJ, Frank R, Knosp B, Hichwa RD, Watkins
GL, Boles Ponto LL (1993): PET-Brainvox, a technique for
neuroanatomical analysis of positron emission tomography images. In: Uemura K, Lassen NA, Jones T, Kanno 1 (eds): Quantification of Brain Function. Tracer Kinetics and lmage Analysis in
Brain PET. Amsterdam: Elsevier.
Evans AC, Collins DL, Ncelin P, MacDonald D, Kamber M, Marrett
TS (1994): Three-dimensional correlative imaging: Applications
in human brain mapping. In: Thatcher RW, Hallett M, Zeffiro T,
+ 44 +
4
Reliability of PET Activation 4
Pardo JV, Pardo PJ, Janer KW, Raichle ME (1990): The anterior
cingulate cortex mediates processing selection in the Stroop
attentional conflict paradigm. Proc Natl Acad Sci USA 87356259.
John ER, Huerta M (eds): Functional Neuroimaging. San Diego:
Academic.
Fox PT, Mintun MA, Reiman EM, Raichle ME (1988): Enhanced
detection of focal brain responses using intersubject averaging
and change-distribution analysis of subtracted PET images. J
Cereb Blood Flow Metab 8:642-653.
Fox PT (1991):Physiologcal ROI definition by image subtraction. J
Cereb Blood Flow Metab 11:A79-~A82.
Friston KJ, Frith CD, Liddle PF, Frackowiak RSJ (1991a): Comparing
functional (PET) images: The assessment of significant change.
J Cereb Blood Flow Metab 11:690-699.
Friston KJ, Frith CD, Liddle PF, Frackowiak RSJ (1991b): Plastic
transformation of PET images. J Comp Assist Tomogr 15:634-639.
Friston KJ (1994): Statistical parametric mapping. In: Thatcher RW,
Hallett M, Zeffiro T, John ER, Huerta M (eds): Functional
Neuroimaging. San Diego: Academic, pp. 79-93.
Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC, Evans AC
(1994): Assessing the significance of focal activations using their
spatial extent. Hum Brain Mapp 1:210-220.
Friston KJ, Holmes AP, Worsley KJ, Poline J-B, Frith CD, Frackowiack RSJ (1995): Statistical parametric maps in functional imaging:
A general linear approach. Hum Brain Mapp 2:189-210.
Grabowski TJ, Damasio H, Frank RJ, Brown CK, Spradling J, Boles
Ponto LLB, Watkins GL, Hichwa RD (1994): An investigation of
methodological factors in brain activation paradigms using the
Generate paradigm. J Nucl Med 35:182P.
Grabowski TJ, Damasio H, Frank R, Hichwa RD, Boles Ponto LL,
Watkins GL (1995a): A new technique for PET slice orientation
and MRI-PET coregistration. Hum Brain Mapp 2:123-133.
Crabowski TJ, Frank RJ, Brown CK, Damasio H, Boles Ponto LL,
Watkms GL, Hichwa RD (1996~):A comparison of four pixelbased analyses for PET. In: Jones T, Myers R, Cunningham VJ,
Bailey DL, Jones T (eds): Quantification of Brain Function using
PET (Brain PET 95). San Diego: Academic (in press).
Herscovitch P, Markham J, Raichle ME (1983): Brain blood flow
measured with intravenous Hz I5O. I. Theory and error analysis.
J Nucl Med 24:782789.
Hichwa RD, Ponto LLB, Wathns GL (1995): Clinical blood flow
measurement with [ISO]water and positron emission tomography (PET). In: Emran AM (ed): Chemists’ Views of Imaging
Centers, Symposium Proceedings of the International Symposium o n ”Chemists’ Views (if Imaging Centers.” New York
Plenum.
Hinke RM, Hu X, Stillman AE, Kim S-G, Merkle H, Salmi R, Ugurbil
K (1993): Functional magnetic resonance imagng of Broca’s area
during internal speech. NeuroReport 4:675478.
Holmes AP, Blair RC, Watson JDG, Ford I (1996): Non-parametric
analysis of statistic images from functional mapping experiments. J Cereb Blood Flow Metab (in press).
Howard D, Patterson K, Wise R, Brown WD, Friston K, Weiller C,
Frackowiak RSJ (1992): The cortical localization of lexicons:
Positron emission tomography evidence. Brain 1151769-1782.
Minoshima S, Koeppe RA, Frcy KA, Kuhl DE (1994): Anatomic
standardization: Linear scaling and nonlinear warping of functional brain images. J Nucl Med 351528-1537.
Neelin P, Crossman J, Hawkes DJ, Ma Y, Evans AC (1993): Validation of an MRI/PET landmark regstration method using 3D
simulated PET images and point simulations. Comput Med
Imaging Graph 17:351-356.
4
Pelizzari CA, Chen GTY, Spelbring DR, Weichselbaum RR, Chen
C-T (1989): Accurate three-dimensional registration o f CT, PET,
and/or MR images of the brain. J Comput Assist Tomogr
13:2O-26.
Petersen SE, Fox PT, Posner MI, Mintun M, Raichle ME (1988):
Positron emission tomographic studies of the cortical anatomy of
single-word processing. Nature 331:585-589.
Petersen SE, Fox PT, Posner MI, Mintun M, Raichle ME (1989):
Positron emission tomographic studies of the processing of
single words. J Cogn Neurosci 1:153-170.
Petersen SE, Fox PT, Snyder AZ, Raichle ME (1990): Activation of
extrastriate and frontal cortical areas by visual words and
word-like stimuli. Science 249:1041-1044.
Price CJ, Wise RSJ, Watson JDG, Patterson K, Howard D, Frackowiark RSJ (1994): Brain activity during reading; the effects of
exposure duration and task. Brain 117:12551269.
Poline J-8, Mazoyer BM (1993): Analysis of individual positron
emission tomography activation maps by detection of high
signal-to-noise-ratio pixel clusters. J Cereb Blood Flow Metab
13:42.5437.
Poline 1-8, Mazoyer B (1994): Cluster analysis in individual functional brain images: Some new techniques to enhance the
sensitivity of activation detection methods. Hum Brain Mapp
2: 103-11 1.
Raichle ME, Fiez JA, Videen TO, Petersen SE (1992): Activation of
left posterior temporal cortex in a verbal response selection task
is rate dependent. Soc Neurosci Abstr 18.
Raichle ME, MacLeod A-K, Videen TO, Fiez JA, Petersen SE (1993):
Practice affects left medial extrastriate responses to visually
presented words. Soc Neurosci Abstr 19:790.
Raichle ME, Fiez JA, Videen TO, MacLeod A-MK, Pardo JV, Fox PT,
Petersen SE (1994): Practice-related changes in human brain
functional anatomy during nonmotor learning. Cereb Cortex
4:8-26.
Roland PE, Levin B, Kawashima R, Akerman S (1993): Threedimensional analysis of clustered voxels in 150-butanol brain
activation images. Hum Brain Mapp 1:3-19.
Senda M, Kanno I, Yonekura Y, Fujita H, Ishii K, Lyshkow H, Miura
S, Oda K, Sadato N, Toyama H (1993): Comparison of three
anatomical standardization methods regarding foci localization
and its between subject variation in the sensorimotor activation.
In: Uemura K et al. (eds): Quantification of Brain Function.
Tracer Kinetics and Image Analysis in Brain PET. Amsterdam:
Elsevier.
Snyder AZ, Raichle ME (1993): Combined PET and evoked potential
study of lexical access. J Cereb Blood Flow Metab 13(suppl.
1):S259.
Talairach J, Tournoux P (1988): Co-Planar Stereotaxic Atlas of the
Human Brain. 3-Dimensional Proportional System: An Approach to Cerebral Imaging. New York: Thieme.
Taylor SF, Minoshima S, Koeppe RA (1993): Instability of localiza-
45
4
4
Grabowski et al.
tion of cerebral blood flow activation foci with parametric maps.
J Cereb Blood Flow Metab 13:1040-1042.
Videen TO, Snyder AZ,Raichle ME (1991): Optimization of regionof-interest definition for detecting regional CBF activationwith
positron emission tomography. J Cereb Blood Flow Metab
ll(SupplZ):S571.
Woods RP, Cherry SR,
Jc (1992): Rapid automated
algorithm for aligning and reslicing PET images. J Comput Assist
Tomogr 16:620-633.
Woods RP, Mazziotta JC, Cherry SR (1993): MRI-PET registration
with automated algorithm. J Comput Assist Tomngr 17:536-546.
Worsley KJ, Evans AC, Marrett S, Neelin P (1992): A threedimensional statistical analysis for CRF activation studies in
human brain. J Cereb Blood Flow Metab 12:900-918.
Worsley KJ, Evans AC, Marrett S, Neelin P (1Y93): Detecting and
estimating the regons of activation in CBF activation studies in
human brain. In: Uemura K et al. (eds): Quantification of Brain
Function, Tracer Kinetics and Image Analysis in Brain PET.
Amsterdam: Elsevier.
Worsley KJ (1994):Localmaxima and the expected Euler characteristicof
excursion sets of x2, F and t fields. Adv Appl Prob 26:13-42.
+ 46 +
Документ
Категория
Без категории
Просмотров
2
Размер файла
2 300 Кб
Теги
890
1/--страниц
Пожаловаться на содержимое документа