close

Вход

Забыли?

вход по аккаунту

?

Efficient Coding of Natural Scenes in the Rhesus Macaque

код для вставкиСкачать
LSUHSC-NO School
©■
of G r a d u a t e
Studies
Dissertation/Thesis Defense
Final Examination Report
P lease Type
Alexander
First
C andidate Yang
Last
Exam ination d a te
D egree
□
June 25, 2010
MS
Initial
D ep a rtm en t Cell Biology and Anatomy
PhD
M a jo r f ie ld __________ Neuroscience
Minor field
D isse rta tio n /T h esis title
Efficient Coding of Natural Scenes in the Rhesus Macaque
The u n d ersign ed m em b ers o f th e G raduate Faculty h a v e ex a m in ed th e ca n d id a te and
a c c e p t h is/h e r D isse rta tio n /T h esis.
Examination Committee
Accept
Typed Name
Theodore Weyand, PhD
(Advisor)
Signature
Department
Cell Bio and Anat
iV
Ranney Mize, PhD
Cell Bio and Anat
Thomas Lallier, PhD
Cell Bio and Anat
Hamilton Farris, PhD
Neuroscience
Carmen Canavier, PhD
Neuroscience
Approvals
D ate
Signature
D ate
Signature of Dean of the School of Graduate Studies
Dissertation_Thesis_Defense_&_Final_Exam_Report
Page 1 of 1
Revised 6 /2 3 /2 0 0 9
EFFICIENT CODING OF NATURAL SCENES
IN THE RHESUS MACAQUE
A Dissertation
Submitted to the Graduate Faculty of the
Louisiana State University Health
Sciences Center at New Orleans
in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
in
The Departm ent of Cell Biology and Anatomy
By
Alexander Cheung Yang
B.S., Armstrong Atlantic State University, 1999
M.S., University of Pennsylvania, 2001
July 2010
ProQuest Number: 10791423
All rights reserved
INFORMATION TO ALL USERS
The quality of this reproduction is d e p e n d e n t u p on the quality of the co p y subm itted.
In the unlikely e v e n t that the author did not send a c o m p le t e m anuscript
and there are missing p a g e s , these will be n o ted . Also, if m aterial had to be rem o v ed ,
a n o te will in d ica te the d eletio n .
uest
ProQ uest 10791423
Published by ProQuest LLC(2018). C opyright of the Dissertation is held by the Author.
All rights reserved.
This work is protected a g a in st unauthorized copying under Title 17, United States C o d e
Microform Edition © ProQuest LLC.
ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 4 8 1 0 6 - 1346
ACKNOW LEDGEM ENT
For his exceptional guidance, latitude, and patience, I would like to thank Dr. Theodore
Weyand. He has graciously allowed me to study under his tutelage without constrictive
expectations, and in doing so guided me in an excellent understanding of both the
reasoning of research and its associated methods. It is certain th at without his
m entorship my endeavors into the understanding of the fundam entals of research would
have been a much longer and a much more arduous task.
I would not be capable of proceeding with the confidence that I have the understanding
in the field of neuroscience and the necessary associated abilities without my
accomplished committee members. I thank them for their guidance and prodding to
ensure that I am sufficiently meritious of being called a “scientist.”
I would also like to thank Dr. Yumei Feng for her support throughout the happiest times
and the darkest hours. W ithout her backing I would have surrendered to the seemingly
insurm ountable difficulties associated with research. She encouraged me to press
forward and break through difficult barriers with determ ination and purpose! She
continuously inspires me each and every day.
And most of all, I would especially like to thank my parents for standing beside me
regardless of my career goals and its associated trials and tribulations, and for
continually assuring me that whether I fail or succeed, they will always be there to
support and help me whatever difficulties may present itself! Thanks Mom and Dad!
TABLE OF CONTENTS
ACKNOW LEDGEM ENT.......................................................................................................II
TABLE OF CONTENTS........................................................................................................I ll
LIST OF FIG URES
..................................................................................................... VI
ABSTRACT..............................................................................................................................VII
INTRODUCTION.......................................................................................................................l
1.1
- D oes t h e activity of LGN
n e u r o n s in t h e a w ak e , b e h a v in g m o nk ey
WHITEN WHEN THE ANIMAL VIEWS TIME-VARYING NATURAL IMAGES?............................ 9
1.2 - D oes
t h e in fo r m a t io n rate in th e spike tr a in g e ne r at ed d u r in g tim e
VARYING NATURAL IMAGES EXCEED THAT OBSERVED DURING PRESENTATION OF NOISE?
.....................................................................................................................................................................9
1.3 - HOW DO SACCADES ALTER THE INFORMATION FLOW DURING NATURAL IMAGES?
10
M E T H O D S................................................................................................................................ 12
2.1 - S u r g er y : P ost a n d Coil Im p l a n t a t io n ....................................................................... 12
2.2 - S u r g e r y : M icroelectrode Ca n n u l a I m p l a n t a t io n ........................................... 13
2.3 - Eye coil calibration a n d
behavioral t r a in in g .................................................... 14
2.4 - V isua l s t im u l i ......................................................................................................................... 15
2.5 - N eu r o n a l R e c o r d in g .......................................................................................................... 17
2 .6 -A n a l y sis ..................................................................................................................................... 18
2.6.1 - W hitening.............................................................................................. 19
2.6.2 - In form ation.......................................................................................... 20
2.6.3 - Perisaccadic A ctivity ........................................................................... 23
24
RESULTS
3.1 - T im e - varying n a tu ra l im ag es
pr o d uc e br o a d r ang es of in te r spik e
INTERVALS............................................................................................................................................ 2 4
3.2 - LGN NEURONS TRANSMIT HIGHER RATES OF INFORMATION IN RESPONSE TO
NATURAL IMAGES
3 .3 - S t im u l u s
.*..................................................................................................................... 34
c o n d itio n s d ifferentia lly m o du late perisaccadic activity
. 38
D IS C U S S IO N .......................................................................................................................... 44
4.1 - S pike
tr a in m o d u l a t io n ....................................................................................................45
4 .2 - N atural im ag es
pr o d u c e h ig h e r in fo r m a t io n r a t e s .......................................4 7
4 .3 - Saccades a lter t h e
flo w of in fo r m a tio n flow t h r o u g h t h e
LGN
49
4 .4 - Co n c l u sio n s ............................................................................................................................5 0
REFERENCE PA G E .................................
52
APPEN D IX A: A BRIEF REVIEW OF VISUAL SYSTEM D E S IG N ......................55
A .i - V is io n
A. 2 - T h e
is r e p r e s e n t a t io n a l ............................................................................................ 55
receptive field is t h e m etric to be u n d e r s t o o d .......................................55
A .3 - V is io n
is m o d u l a r ............................................................................................................... 55
A .5 - V is io n
is m assively parallel a n d p o sse sse s c h a n n e l s .................................... 56
A .6 - R eceptive
fiel d s pr o g r ess fro m ‘d e n s e ’ to ‘s pa r se ’
......................................... 57
APPENDIX B: THE LATERAL GENICULATE NUCLEUS (L G N )........................ 59
B. 1 - ‘g a ting ’ by s t a t e .................................................................................................................... 6 0
B .2 - T em po r al
dec o r r ela t io n ................................................................................................6 0
B .3 - S accadic s u p p r e s s io n ......................................................................................................... 61
B .4 - Ga in
f ie l d s .............................................................................................................................. 62
B .5 - I n c re ased
s p a r s e n e s s ....................................................................................................... 63
B .6 - M ult iplex in g
of r et in a l s ig n a l s ............................................................................... 64
APPEN D IX C: APPROACHES TO CAPTURING THE RECEPTIVE FIELD
65
APPEN D IX D: METHODS OF ANALYZING S P IK E S .............................................. 69
D.i - I m age Co r r e l a t io n ............................................................................................................ 69
D.2 - A utocorrellog ram a n d FFT........................................................................... 69
D.3 - Ku r t o s is ............................................................................................................... 70
D .4 - P re v e r su s P ost In te r spik e In t e r v a l s ....................................................................70
V
LIST OF FIGURES
Figure 1.) Image Correlation P lo t............................................................................................... 2
Figure 2.) Diagram of the On-Center and Off-Center Surround C e ll................................... 3
Figure 3.) Example of Stimuli Im ages......................................................................................16
Figure 4.) Example of 40sec of Excerpt Data..........................................................................26
Figure 5.) Example Cell Autocorrellograms and FFTs.......................................................... 27
Figure 6.) Compiled FFT and Comparison of Natural Image Video to Noise Stimuli...... 29
Figure 7.) Kurtosis Analysis for an Example Cell and Comparisons of SignificantKurtosis
to Non-significant........................................................................................................................ 31
Figure 8.) Pre/Post Interspike Interval P lo ts......................................................................... 32
Figure 9.) Pre/Post Interspike Interval Plots Analyzing for Low Frequencies...................33
Figure 10.) Comparison of toBin Words to 8Bin W ords....................................................... 36
Figure 11.) Analysis of Unique W ord Usage Over Time for Different Stimuli.................... 37
Figure 12.) Example Data Showing Alignment of Spike Data to Saccade Onset............... 40
Figure 13.) Perisaccadic Suppression Followed by E nhancem ent.......................................41
Figure 14.) Presaccadic Suppression Tied to Stimuli Type................................................... 42
Figure 15.) Presaccadic Suppression Irrespective of Stimuli Type, but Postsaccadic
Enhancem ent Only in Response to Natural Images.............................................................. 43
A B ST R A C T
Early vision refers to the first structures of the visual system th at process the incoming
patterns of reflected light, and include the retina and the lateral geniculate nucleus (LGN)
of the thalamus. Both of these structures have been extensively studied and modeled.
However, the activity of these structures has never been m onitored in awake animals
free-viewing natural images. Here we describe the statistical properties of the activity of
neurons in the LGN in the awake monkey free-viewing tim e varying natural images
(videos of animals and people at a zoo), conditions closely mimicking real-life. An
attractive idea is that the natural environment shaped response sensitivities of visual
neurons, and the visual system most efficiently encodes the statistical properties of such
natural images. One prediction is that a rich sampling of natural images should yield a
spike interval response ensemble that contains many different frequencies, a
phenom enon known as ‘whitening.’ Despite observations of whitening in the LGN in the
paralyzed, anesthetized cat by other investigators, we never observed equivalent
responses among LGN neurons in the awake, behaving monkey. W hat we did observe
was that over half the cells (14/19) exhibited increased spectral power for low frequencies
(< 3 H z) in response to time-varying natural images versus noise patterns whose
spatiotemporal profile was white. This observation was corroborated by additional
analysis of the kurtosis of spike interval distribution. The response profile of timevarying natural images was ‘flatter’ (more kurtotic) than the response profile generated
to noise patterns for those same neurons (8/19). When the responses were analyzed for
information transmission, we also observed that time-varying natural images generated
a higher information rate (9/19) than observed during presentation of the noise pattern.
This is interesting because the noise pattern contained significantly greater information
than the natural images. On balance, although a ‘whitening’ pattern was not observed,
the increased power of low-frequency responses, the increased kurtosis, and the
increased information transm ission rate, are all consistent with the idea th at the LGN is
biased to efficiently transm it information about natural images.
viii
Yang, Alexander
B.S., Armstrong Atlantic State University 1999
B.S., Armstrong Atlantic State University 1999
M.S., University of Pennsylvania, 2001
Doctor of Philosophy, Commencement 2012
Major, Cell Biology and Anatomy
Efficient Coding of Natural Scenes in the Rhesus Macaque
Dissertation directed by Professor Theodore G. Weyand
Pages in dissertation, 79. Words in Abstract 333
A B ST R A C T
Early vision refers to the first structures of the visual system that process the incoming
patterns of reflected light, and include the retina and the lateral geniculate nucleus (LGN)
of the thalamus. Both of these structures have been extensively studied and modeled.
However, the activity of these structures has never been m onitored in awake animals
free-viewing natural images. Here we describe the statistical properties of the activity of
neurons in the LGN in the awake monkey free-viewing tim e varying natural images
(videos of animals and people at a zoo), conditions closely mimicking real-life. An
attractive idea is that the natural environment shaped response sensitivities of visual
neurons, and the visual system most efficiently encodes the statistical properties of such
natural images. One prediction is that a rich sampling of natural images should yield a
spike interval response ensemble that contains many different frequencies, a
phenom enon known as ‘whitening.’ Despite observations of whitening in the LGN in the
paralyzed, anesthetized cat by other investigators, we never observed equivalent
responses among LGN neurons in the awake, behaving monkey. W hat we did observe
was that over half the cells (14/19) exhibited increased spectral power for low frequencies
(< 3 H z) in response to time-varying natural images versus noise patterns whose
spatiotemporal profile was white. This observation was corroborated by additional
analysis of the kurtosis of spike interval distribution. The response profile of timevarying natural images was ‘flatter’ (more kurtotic) than the response profile generated
to noise patterns for those same neurons (8/19). When the responses were analyzed for
information transmission, we also observed that time-varying natural images generated
a higher information rate (9/19) than observed during presentation of the noise pattern.
This is interesting because the noise pattern contained significantly greater information
than the natural images. On balance, although a ‘whitening’ pattern was not observed,
the increased power of low-frequency responses, the increased kurtosis, and the
increased information transm ission rate, are all consistent with the idea th at the LGN is
biased to efficiently transm it information about natural images.
IN T R O D U C T IO N
From a statistical perspective, the structure of natural scenes (scenery of the real
environment) is highly redundant. Consider a picture of a m ountain meadow. Although
one is likely impressed with the overall texture of the scene with many contrasts, in fact,
the luminance value of any given pixel (the smallest point in an image) is likely to be
identical or nearly identical to the adjacent pixel. Contrasts (the ratios of local light
levels), particularly sharp contrasts of the scene where there are rapid changes in pixel
values, form a very small fraction of the total pixels. This observation has been quantified.
Of all potential images possible, natural scenes exhibit high correlations with adjacent
pixels, and these correlations fall with distance to reach an asymptotic value of ~40%
correlation within 20 pixels (e.g., F igure 1; c.f., (Simoncelli and Olshausen 2001)). This
means that the initial receptors of vision, the rods and cones, are exposed to large
am ounts of redundant information.
Because of these redundancies, an im portant job of early visual processing would seem
to be to eliminate or minimize redundancies. The m ost common design of the retinal
ganglion cell (the sole output of the retina) is a concentric center with an opposite
polarity/inhibitory surround receptive field, ideal for eliminating redundancies.
Concentric receptive fields are characterized by spatially contiguous on and off regions, a
center of one polarity (‘on’ or ‘off) and a ‘surround’ of opposite polarity (e.g. F igure 2).
The consequence of such organization is that when the receptive field is over regions of
natural scenes that are highly redundant (homogenous with near zero contrast), there is
1
1.0
Noise Frame
Natural Image Frame
0.8
-
0.6
<D
o 04 -
o
0.2
-
0.0
— T ~
10
—
I--------------------------- 1---------------------------- 1—
20
30
40
—
I—
50
60
Spatial Separation (pixels)
Figure l.) Plot of pixel correlations versus pixel distance (Top). The natural image video frame used in the
correlation plot (Left). The noise video frame used in the correlation plot (Right). Natural images have a
correlation fall off rate that begins to asymptote around 40% within 20 pixels of spatial separation. The
noise frame here shows the correlation goes to 0.0 at a range of 30 pixels. However, because in an actual
video the noise image would be temporally changing the correlation is expected to approach ~o.o within 1-2
pixel range.
Note: Although the correlation plot of this natural image frame remains higher than 0.4 at a range of >20
pixels, two factors must be considered. First, this is a correlation analysis of a single frame and not the
compilation of multiple frames, and second the resolution of the captured images contributes to differences
in the correlation measurement.
2
Off-Center Cell
On-Center Cell
Central Illumination
a
Light On
Light On
Surround Illumination
Light On
Light On
Diffuse Illumination
Light On
Light On
Figure 2.) Diagram and spike shifts of the concentric On-Center Surround cell (Left) and concentric OffCenter Surround cell (Right) to various levels and areas of illumination. The On-Center cell responds with
an increase in spike rate when an annulus of light the approximate size of the center of the receptive field
covers the center of the receptive field. The Off-Center cell responds in opposite fashion such that a bright
annulus over the center of the receptive field will decrease the spike rate. It is notable that a diffuse
illumination across both the center and surround would yield little change in spike rate across both types of
concentric cells, thus indicating the concentric cell to be primarily a contrast detector.
very little change in the neuron’s output signal. However, when the receptive field is over
regions in which pixel values are locally quite different (i.e. a region of high contrast), the
output shifts dramatically. Thus, retinal ganglion cells perform an im portant job of
eliminating spatial correlations (sometimes referred to as ‘de-correlating’ the image).
The strong spatial correlations of natural images discussed above apply to temporal
correlations as well. This should be obvious to one fixated on a static scene; if the eye
does not move the temporal correlations would be maintained near 100%. For timevarying natural images (e.g. movies of moving through the world), the correlations fall as
scenes change (or the eye moves), but remain high during fixations (when the eyes do
not move (Dong and Atick 1995)). Again, retinal ganglion cells show they can remove
many of these temporal correlations as their output drops dramatically if the scene is not
temporally modulated.
Two other im portant observations regarding encoding by retinal ganglion cells in the
tem poral domain are worth noting. First, significantly less than 1% of retinal ganglion
cells are ‘luxotonic’, i.e. neurons that provide a steady output based on local luminance
(Gur 1987). The vast majority of the concentric cells transiently shift their output with
shifts in illumination, returning to a base firing rate within a few seconds. Thus, scene
dynamics, not static properties account for most activity. Second, gaze shifts (e.g., eye
movements) are the single greatest contributor to disrupting temporal correlations
because they bring the receptive field to an entirely different region of a scene th at likely
possesses significantly different luminance values.
Removing temporal and spatial correlations in natural images appears to be an
im portant function of the concentric neurons, and this contributes to efficient coding of
the early visual system. From a computational perspective, concentric neurons obey the
rule of ‘least commitment’. Neurons early in the visual pathway respond to many
different stimulus patterns without committing to any special structure or feature of the
scene. As such, one could say they have high entropy (which, besides being the
thermodynamic principle of randomness, can also be equally described as ‘uncertainty’)
and low information output. For example, a given action potential from a concentric
4
neuron provides very little information about the nature of any visual scene or part of a
visual scene. In contrast, neurons in visual cortex typically possess receptive fields that
require the stimulus to possess a feature structure or orientation. Thus, an action
potential from cortical neurons typically contains much more information.
In the parlance of computational neuroscience, concentric neurons are referred to as
‘dense’ and the feature/orientation-selective neurons as ‘sparse’ (Field 1987). The ‘sparse’
or selective neuron is highly committed relative to the concentric neuron. Predictably, as
a monkey views time-varying natural scenes, concentric neurons produce ~20-70
spikes/sec, whereas an orientation-selective neuron in visual cortex produces ~0.2
spikes/sec under the same conditions (Weyand and Dong, unpublished). The operating
assumption of how our visual system works is based on the idea th at there is a
hierarchical process in which these dense coders provide the driving input to the sparse
coders. Dense coding exists in the early visual system because it represents an economic
solution. The retina has limited physical space for computation, and can ill-afford to
possess ganglion cells with sparse properties. Every point in visual space is covered by
~15 ganglion cells (Peichl and Wassle 1979). Such a small num ber does not afford the
luxury of sparse coding. Additionally, the hum an optic nerve, which carries all
information out of the retina is described as an information ‘bottleneck’, containing ~i
million axons. The prevailing view is th at sparse coding in cats and monkey1does not
emerge until visual signals reach cortex whose volume is several 100 times th at of the
retina.
It is informative to test if neurons are specifically tuned to the statistical properties of the
natural environments. Interest emerged shortly after Shannon’s treatise on information
1Other animals, such as the rabbit apparently do possess sparse coders at the level of the retina. Directionally-selective
neurons, which effectively do not appear in monkey and cat until primary visual cortex, comprise -25% of rabbit ganglion
cells. Caldwell, J. H. and N. W. Daw (1978). "New properties of rabbit retinal ganglion cells." The Journal of Physiology
276(1): 257-276.]
5
theory (Shannon 1948). Both Attneave (Attneave 1954) and Barlow (Barlow 1961)
speculated that an im portant achievement of the early visual system would be to
efficiently code information and reduce, if not eliminate redundancies. Atick and his
colleagues performed a theoretical analysis of how visual scenes should be encoded, and
using minimal assumptions arrived at a ‘unit’ that was a concentric cell capable of
decorrelating spatial relations. Based on several other assumptions, Dong and Atick
(Dong and Atick 1995) proposed that the lateral geniculate nucleus (LGN), the major
recipient of retinal input, was particularly im portant for recoding information so as to
minimize the temporal correlations of natural images. Since the visual system evolved in
the natural environment, it is reasonable to assume it was inherently designed to operate
most efficiently in coding the statistical properties of the natural world. An efficient
encoder, exposed to a rich sample of events that it is sensitive to would respond with a
‘rich’ spike train. A ‘rich’ spike train can be expressed as the most variable. Broad
variance of the interspike intervals in the time domain would yield correspondingly
broad representation in the frequency domain, and thus the distribution can be
described as ‘whitened’. This assertion was tested by Dan and her colleagues (Dan, Atick
et al. 1996). Using paralyzed, anesthetized cats, they recorded the activity of neurons in
the LGN as the animal was exposed to time-varying natural images (video clips from the
movie, ‘Casablanca’).
LGN neurons, which project to visual cortex, are driven by the retina, and are also mostly
concentric neurons, similar to the retina. Dan et al showed th at the responses of the LGN
neurons to time-varying natural images were indeed ‘w hitened’. As a control, they used
a time-varying stimulus composed of pseudo-randomly presented noise patterns. This
noise stimulus was rich in the space-time domain (i.e. many different and varying spatial
and tem poral frequencies present), b u t it did not have the biased spatiotemporal
6
structure of time-varying natural images.
The noise pattern effectively drove the neurons and produced even more spikes/sec than
the natural images. However, the spike train was not whitened. These results lend
support to the idea that neurons in the early visual system are specifically designed to
efficiently encode natural images. Despite the fact that the noise stimulus was
structurally ‘white’ (b road power in the spatial and temporal domain), the neuron did
not respond with a whitened spike train. Further analysis revealed th at the whitening to
natural stimuli was largely a result of the bias in the spatiotemporal filtering properties
of the neuron, i.e. a bias towards the spatiotemporal energy possessed by the natural
images.
These observations suggest that the responses of neurons in the early visual system are
designed to most efficiently encode the statistical properties of natural images. However,
the conditions under which the study was done may compromise the generalizability of
the result. To control the stimuli presented over the receptive fields, the cats were
paralyzed. Because of hum ane concerns the paralyzed animals were also anesthetized.
Finally, the contrast and luminance of the video were narrowly constrained so th at both
of these variables would allow the neuron to operate in a single linear range. The narrow
range is not consistent with the normal contrast/lum inance values during natural vision,
but effectively allowed the investigators to avoid non-linearities introduced by light and
contrast adaptation (Mante, Frazor et al. 2005; Weyand 2007; Mante, Bonin et al. 2008).
Unfortunately, the limitations placed on the animal condition constrain the generality of
the results. By eliminating eye movements, paralysis disrupts the normal flow of
information. As previously indicated, eye movements are the single greatest source of
decorrelation in the tem poral domain as it brings the receptive field to a different
location of the scene whose image statistics are likely much different than those present
7
prior to the movement. Second, eye movements contribute to modulations by sweeping
the receptive field across an image generating a visual smearing response.
In addition, anesthesia alters the m em brane properties of neurons. As such, it alters both
spike timing and the probability that the neuron will even produce a spike. Anesthesia
takes an even greater toll on visual responsiveness when synaptically further from the
retina (e.g. in the barbiturate anesthetized cat, only cortical areas 17 and 18 are reliably
visually responsive).
Even at the level of the retina one can observe the effects of anesthesia. Mcllwain
(Mcllwain 1964) showed that the influence of peripheral stimuli well outside of the
classically defined receptive field (the receptive field th at easily responds to spots of light)
was quite sensitive to barbiturates. Weyand (Weyand 2007) observed that not only is
retinogeniculate transmission more efficient in wakefulness, but the success of
transm ission for retinal intervals > 40msec averaged 13% in wakefulness (even
exceeding 40% efficacy in several cells), compared to < 0.01% when under anesthesia
(Sincich, Adams et al. 2007). Thus, although Dan et al’s results are useful in identifying a
difference of function for LGN neurons to natural versus noise stimuli, it does not
accurately represent how a neuron responds under natural conditions.
The following study was undertaken to investigate the pattern of neural activity in the
early visual system (LGN) in an animal (rhesus monkey) th at is absent of anesthesia and
freely viewing natural images whose luminance and contrast profiles conform to those of
natural scenes. As im portant and provocative as the results by Dan et al (Dan, Atick et al.
1996) are, it would be prudent to ensure that the results hold under the natural condition.
We address the following 3 questions:
8
1.1 - D o e s th e activity o f LGN n e u r o n s in th e aw ake, b eh a v in g m o n k ey w h ite n
w h e n th e a n im al v ie w s tim e-va ry in g n atu ral im a g es?
Similar to the Dan et al (Dan, Atick et al. 1996) study, it is of interest to determine the
degree to which time-varying natural images whiten the spike train. The hypothesis is
that natural scenes will yield a spike interval distribution th at is white. However, in this
study, this idea will be tested in the awake, behaving monkey. A positive result will
strongly reinforce the view that the visual system evolved to efficiently encode the
natural environment. We provide two controls: (1) a blank background screen (contrast
near zero) set at the mean luminance of the natural images video to follow, and (2) a
noise pattern composed of randomly chosen pieces (0.50 squares) taken from the natural
images video that follows.
1.2 - D o e s th e in fo rm a tio n rate in th e sp ik e tr a in g en era ted d u rin g tim e
varyin g n atu ral im a g e s e x ce e d th a t o b serv ed d u rin g p r e se n ta tio n o f n o ise ?
In sensory systems, the temporal stream of spikes, also known as the ‘spike code’, holds a
relation to the spatiotemporal properties of the stimuli in the receptive field. In m ost
studies, because the spatiotemporal attributes of the stimuli are available, explicit
predictions can be made regarding spike probability. In statistical approaches such as
used here, one can still make significant inferences about the stimulus without explicit
knowledge of the receptive field.
From information theory, entropy refers to the num ber of bits required to specify a
stimulus exactly. Given our three stimulus sets, because the background screen
represents the least am ount of change and thus theoretically takes the fewest bits to
represent, variance of the neuronal response to this stimuli can be considered ‘neural
noise’. On the other hand, the noise pattern, because of its complexity in both spatial and
tem poral domains, would be expected to require the m ost bits to specify. Therefore, the
9
variability of the response to the natural image stimuli is expected to be between the
response to the background and the response to the noise stimuli yet much more similar
to that of the noise.
Ideally, an encoder’s response entropy (characteristics of the spike train) should match
the entropy of the stimulus, i.e. the variability of the response is related to the variability
of the stimuli. By such logic, it is expected that the greatest response entropy would arise
from the noise pattern. However, if our encoder is specifically designed to decode the
spatiotemporal attributes of time-varying natural images, the response entropy will be
greatest for natural images, even though the noise patterns may contains more stimulus
entropy. This analysis should be complementary to the first question regarding
whitening, implying that the neuron is particularly efficient at encoding natural image
stimuli.
1.3 - H o w d o sa cc a d e s a lter th e in fo rm a tio n flo w d u rin g n a tu ra l im a g es?
As previously stated, eye movements are the single greatest source of decorrelating
image statistics as it moves the receptive field to a new region of the scene which is
highly likely to have a different spatial, if not tem poral structure. In addition, the eye
movement alone stimulates the receptive field by moving the scene across the receptive
field at a high velocity. We are not usually conscious of this because there are active
mechanisms to suppress this type of stimulation (saccadic suppression, e.g., see Ross et
al 2001, or review).
The source of this signal can be from oculomotor efferent copy signals (Helmholtz) or
proprioceptive feedback from the extraocular muscles. Both have been proven to operate
in the LGN (proprioceptive: (Lai and Friedlander 1990); oculomotor: (Schmidt 1996; Lee
and Malpeli 1998)). More recent experiments have provided surprising results that
saccadic suppression begins in the retina. Olveczky et al (Olveczky, Baccus et al. 2003)
10
have described suppression of visual signals associated with global shifts of the visual
field (as would occur during eye movements) but not with local shifts (as would occur
when something moves in the visual field).
Two recent papers in cat (Lee and Malpeli 1998) and monkey (Reppas, Usrey et al. 2002)
have described patterns of suppression followed by excitation in the LGN th at are
associated with eye movements. Both observed a measurable suppression of excitation
that precedes the saccade by at least 50 msec, implicating an efferent copy signal since
the eye is stable at that time, followed by an excitation th at peaks at the end of the
saccade. Both of these studies used background stimuli that were either of zero contrast
(including darkness, in the case of Lee and Malpeli), or simple gratings. W hat is
unknown is how a stimulus of natural images activates LGN cells during movement.
Here, we examine whether the response to saccades observed by these two groups are
modulated differently when the stimuli is natural images. As controls we use background
illumination of zero contrast and noise patterns. Our expectation during saccades is that
the background should cause minimal modulation and the noise pattern maximal
modulation. However, because the expectation is that the LGN neurons are tuned to
natural images, it would not be unexpected if strong suppression to natural images is
observed during the saccade. Regardless, these experiments will provide us with the first
glimpse of how eye movements perturb LGN activity when the stimuli are close to the
realm of the natural experience.
To assist the reader, we provide further background in several subjects relevant to this
dissertation in the form of appendices in the back of this document. Appendix A is
devoted to visual system design and assumptions. Appendix B provides some specific
background on the LGN, and Appendix C provides a historic background on methods of
analyzing the receptive fields.
11
M ETHODS
The majority of the data were obtained from one 17 year-old female rhesus monkey
(macaca mulatta, ‘Mattie’) who currently resides in Animal Care at LSUHSC. Additional
confirmatory data was obtained from another rhesus monkey (‘Wily’) while the senior
investigator was at Baylor College of Medicine following Hurricane Katrina.
2.1 - Surgery: P o st an d C oil Im p la n ta tio n
All procedures were approved by IACUC at LSUHSC, and done using sterile techniques.
In the first surgery, a titanium post was attached to the skull using titanium orthopedic
screws, and a Teflon-coated stainless steel wire coil was attached to the eye. The purpose
of the post was to rigidly fix the head during the recording sessions, and the purpose of
the wire coil was to determine gaze using the scleral search coil technique (Robinson
1963).
After fasting the monkey, the monkey was tranquilized with ketamine (10 mg/kg), the
head and arm shaved, an intubation tube inserted into the trachea, and an i.v. catheter
inserted into the cephalic vein. Anesthesia was then induced with isofluorane (1-2%) for
the duration of the surgery, and lactated Ringer’s solution was dripped through the
catheter. During surgery, heart-rate and respiratory rate were continuously monitored.
The eyes were protected with an ophthalmic ointment, and the animal placed in a
stereotaxic frame.
Using a scalpel and periosteal elevators, a flap of skin was cut and retracted to expose the
skull. Fascia and muscle were also retracted until the skull was sufficiently exposed to
allow insertion of the post. The post had five mounting arms that were custom -bent
during surgery to conform to the skull surface. Orthopedic screws secured the post
through the mounting arms to the skull. Thirteen orthopedic screws (2.7 mm) were
12
secured to the skull through tapped holes. With the post in place, the flap was brought
over the post by making a slit in the flap to allow the post to emerge from the skin. The
skin flap was then closed with suture.
The sclera of one eye was exposed by slitting and blunt dissecting (using iris scissors) the
conjunctivum -5m m from the corneal margins of the globe. The coil (diameter ~ i8 mm)
was then placed onto the scleral surface and secured by placing a drop of isocyanate
(tissue glue) onto the sclera at 4 diagonal locations around the globe. The leads from the
coil were then run through the conjunctivum, and a small loop (~5mm in diameter) of
wire was placed into a ‘pouch’ in the conjunctivum at the lateral margins before exiting
the orbit area with a needle. The loop functions to relieve tension, thus allowing free
movement of the eye with the coil attached. The lead was then run under the skin to exit
~iom m behind the eye. At this point, a much larger loop (-25m m diameter) of wire was
inserted into a second ‘pouch’ before bringing the lead under the skin to term inate at the
crown of the head. The lead was forced into a small hole in the titanium post and led to a
small box attached to the side of the post. The lead was then soldered to a pair of
electrical contacts within the box, and the box was subsequently closed. The animal was
then removed from the stereotaxic device, the anesthesia discontinued, and the animal
returned to her home cage. An opioid analgesic (Buprenex) was given over the next 72
hours. To ensure that the post was well-secured to the skull, the animal was given at
least 4 weeks to recover before any weight was put on the post.
2 .2 - Surgery: M icro electro d e C annula Im p la n ta tio n
The goal of the second surgery was to unilaterally attach a stainless-steel swiveling base
(Malpeli, Weyand et al. 1992) containing a protective guide cannula over the LGN.
Through this cannula would pass the microelectrode. Again, the animal was fasted
overnight, and tranquilized the next morning with ketamine. An intubation tube was
13
then placed in the trachea, and the animal anesthetized with isofluorane before being
placed in the stereotaxic frame. A region of skull overlying the LGN on one side was then
exposed, and a titanium ring (10 mm dia. x ~5 mm) anchored to the skull using 2 size 256 titanium screws and dental acrylic. The center of this ring corresponded to a point in
vertical alignment with the LGN determ ined from an atlas of the monkey brain. A hole
was then drilled through the skull at the center of the ring to expose dura mater. The
hole in the skull was expanded to at least 2 mm across, the dura slit, and the base and
cannula lowered until the cannula was several mm into brain and the base approximately
flush with the skull. The base was then affixed to the skull and the titanium ring using
dental acrylic. The brain was protected from the liquid acrylic by prepacking the opening
with Gelfoam prior to lowering the cannula into place.
Once the acrylic set, a custom, protective cap made of delrin was attached to the
previously attached post by a single 8-32 screw. The monkey was then removed from the
stereotaxic frame, anesthesia discontinued, and the intubation tube removed. To protect
the brain from the outside through the open cannula, a stylus with an internally threaded
cap was screwed down onto the base. Again, the monkey was treated for 72 hours postoperatively with analgesics.
2.3 - Eye c o il ca lib ra tio n a n d b e h a v io r a l tra in in g
Following recovery from surgery, the eye coil position was calibrated. Simultaneously,
the monkey was trained to fixate on selected targets. Post recovery of the first surgery,
the monkey had been acclimated to a prim ate chair th at confined the animal during all
subsequent recording sessions. In addition, the animal had also been acclimated to
having the head rigidly fixed for several hours per day.
Because the monkey can be motivated to perform tasks through a juice reward, the
monkey was water-deprived for ~24 hr prior to the first behavioral training session.
14
With the monkey’s head fixed in place and facing a viewing screen, a target (small red
dot) spontaneously appeared in the middle of the screen. With this appearance, there
was a sudden shift in the signal from the eye coil. This was interpreted as the monkey
moving its eyes to look at the target. The targeting movement was rewarded with a shot
of dilute apple juice, and the offsets on the eye coil electronics adjusted such th at the
signal in relation to a centering dot was set to an output of o volts. This process was
repeated several times to ensure reliability before moving the target to a new location.
W hen the target was set at the new location, the gain of the signal was then set so that
0.5 V corresponded to a 10 deg eye movement. This strategy was used to calibrate an
initial array of 5 targets (center, left, right, up, and down) displaced by 10 deg. The
monkey was trained under several hundred trials to ensure the behavior was reliable.
Behavior was further refined by requiring that the monkey fixate for at least 1 sec within
1 deg of the target. This m ethod of training was completed within 2 days.
2 .4 - V isu a l stim u li
The monkey viewed a rear-projection screen from a distance of 54cm such th at the
screen encompassed 60 x 40 degrees of visual space onto which images were projected
using an Optoma 720P image projector refreshed at 120 Hz. At this distance, each pixel
subtended ~ o.i degrees. Illumination was adjusted through values ranging from 0.001 to
10 cd/m 2. There were 3 stimulus sets presented to the monkey; examples of each are
presented in F igure 3. The first stimulus, ‘Background,’ consisted of a near zero
contrast gray screen set at the mean luminance of the natural images video to follow (the
phrase ‘near zero’ is used because neither the screen nor the projection lens are optically
pure, but zero would be close). To ensure accurate eye position calibration small red dots
(~ o.i degrees in diameter) would appear pseudo randomly on the screen which the
monkey was then expected to fixate upon for ~ i second. This calibration routine
15
included presenting 9 pseudo-random tar gets sequentially and lasted ~22 seconds. The
second stimulus, ‘Noise,’ consisted of pseudo randomly arranged 0.5 degree square
pieces of the natural images video to follow set at the same mean luminance and changed
at a rate of 60Hz. A calibration routine similar to the prior one was presented with the
Figure 3.) Example frames of each of the three stimuli types, background (Top), noise (M iddle), and
natural images (Bottom ). The background is a diffuse illumination with no structural profile, set at the
mean luminance of the natural images video that follows in a set. The noise images are spatiotemporally
diverse stimuli of 0.50 pseudorandomly presented squares (displayed at 60Hz) taken from the natural
images video to follow. The mean luminance profile of the noise video is also set to the mean luminance
profile of the natural images video that follows in a set. The natural images videos are videos of people and
animals taken in a zoo setting recorded at a 40msec rate (25Hz).
noise as a background; however, the red dots now are presented centered on a black 1
degree square to increase visibility. The noise stimulus was presented for -15 seconds.
The presentation of the noise pattern was followed by a 5 sec epoch set at the same
luminance and zero contrast as the ‘Background’ routine, except there was a single target
16
present in the center that remained over the 5 sec period. The monkey was reinforced for
maintaining gaze on the target. The third stimulus, ‘Natural Images Videos,’ followed
this 5 sec fixation period and was presented for -3 0 seconds. Altogether the presentation
schema was as follows: Background (25 sec), Noise (15 sec), Background (5 sec), and
Natural Images Video (30 sec).
Natural images videos were obtained from a library of videos taken of animals and
people at a wild animal park in Orlando, FL, using a hand-held high-speed digital video
camera (JVC DVL9800). Camera shutter speed is 0.04 seconds. Focus, exposure, and
white balance are adjusted to the given scene and then fixed. The videos were in black
and white, and their contrast and luminance adjusted such that the monkey encountered
a range of both. This sequence of background, noise, background, and natural images
video was repeated over 20 cycles.
The 20 natural images videos used for each of the cycles were created by splitting 10
different videos into 2 parts each. The second set of 10 involved a continuation of the
first set of 10 but set at a different mean luminance or contrast. Should the neuron still
be isolated after 20 cycles, additional repeats of the 20 cycles were run until the cell was
lost, or the monkey’s behavior became unreliable. Recording sessions typically ran 3-4
hr.
2.5 - N eu ro n a l R ecord in g
For recording neural activity, the monkey was placed in a prim ate chair, the head rigidly
fixed using the previously attached titanium post, and the protective stylus in the
cannula replaced with a tungsten-in-glass microelectrode (tip exposure between 15 pm 60pm) inserted into a microdrive (Malpeli, Weyand et al. 1992). Both the microdrive and
microelectrode were sterilized immediately prior to use. The recording session was
begun by slowly advancing the electrode until the LGN was found, determ ined by
17
position and visual responsiveness.
Neural activity was amplified in 2 stages of 100X each (total gain of 10,000) and the
signals filtered between either 0.15-10 KHz or 0.04-10 KHz using an active filter (24
dB/octave). The signals were split to a computer, an oscilloscope, and an audio monitor.
During the electrode advance, the monkey viewed arbitrary videos that were not used in
the data collection routines (scenes of ballroom dancing, walking through a park in
Brugge, Belgium), and the monkey was sporadically rewarded with juice. Calibration of
eye position was done at the beginning of the recording session, and usually several more
tim es before encountering LGN cells. The LGN was tentatively identified by entering an
area that was easily activated by visual stimulation, located at a depth which would be
appropriate as determined by extrapolation from published brain atlases, and vigorously
responds to the noise pattern. In addition, because of the neural density of the LGN, a
rise in background activity is also indicative of entering the LGN. Once a neuron was
deemed isolatable, a data file was opened and the sequence of showing 20 stimuli sets
(described above) begun. The data files consisted of 4 channels: horizontal eye position,
vertical eye position, a sync signal indicating which stimulus type was present on the
screen, and the unit activity (the actual spike activity). The data were collected at 25
KHz/channel using National Instrum ents (Austin, TX) hardware (DIO-‘E’ series) and
software (LabView). Subsequent analysis including spike discrimination was done using
custom routines written in MatLab (Mathworks, Natick, MA).
2 .6 - A n alysis
Initial analysis was to isolate the spikes in the analog records, and create a listing of
spike times, eye position, and event status (which stimulus was present when the spike
occurred). With this record established, we could then perform further analysis. Detailed
descriptions of the various analysis methods have been included in Appendix D.
18
2.6.1 - Whitening
The extent to which the spike train whitened was assessed in several ways. The first was
to construct an autocorrellogram of the spike train and obtain the corresponding Fast
Fourier Transform (FFT; e.g., Dan et al 1996). This proved less than perfect in the sense
that we often obtained shifting power spectra across the very same region that Dan et al
(Dan, Atick et al. 1996) observed as flat. Second, we observed significantly more
variability in our noise versus their noise (Dan, Atick et al. 1996). As a result, we were
concerned that some of our natural images videos were ‘whitened’ but not significant
from analysis of FFT alone. Because of these discrepant results, we considered other
methods to assess efficient coding.
To further explore other ways of documenting whitening, we used a modification from
Baddeley et al (Baddeley, Abbott et al. 1997) on theoretical distributions of spike activity
by dense and sparse neurons efficiently coding natural images. In particular, simply
tallying spike counts every 250msec creates distributions (examples below) th at can
deviate from Gaussian in different ways. Here, our interest was in the degree to which
the movies created a platykurtic distribution. Because the hypothesis is th at efficient
coding of a rich sample of natural images will yield a broad distribution of interspike
intervals, plotting spike counts should yield a broad distribution of counts best
approximated by a platykurtic (flattened Gaussian) distribution. Kurtosis was easily
measured, and we could measure kurtosis on each epoch of each of the stimuli types to
obtain a mean and standard deviation (reliability) for statistical purposes. Under these
conditions, kurtosis was a direct measure of whitening.
A third measure of potentially assessing efficient coding emerged from plotting each
spike in a space in which one position on one axis was the tim e since the previous spike,
and position along the second axis was the time to the next spike. ‘W hitening’ should be
19
directly proportional to the degree to which the resulting distribution filled the space
(examples given below). For statistical purposes, we tested the fraction of the sample
which had intervals (either before or after) greater than to o msec. A 100msec limit
corresponds to 10Hz in the frequency domain. It had been noted that the power spectra
of natural images is inversely related to the spatial frequency. Therefore, the lower
frequency, longer interspike intervals should dominate when encoding natural images.
In fact, this was the rationale for the whitening measured by Dan et al (Dan, Atick et al.
1996). They observed the whitening was best at low frequency i.e. less than 15 Hz.
Empirically, we simply tallied the fraction of each of our epochs (background, noise,
natural images video) occupied by interspike intervals greater than 100 msec. Statistical
analysis of these fractions was done using simple parametric statistics.
Finally, as discussed below, application of information theory should complement
‘whitening’ of the spike distribution. As will be asserted below, a whitened distribution
for a complex scene (as those associated with natural images or noise) should also
correspond to transmission of the greatest amount of information.
2.6.2 - Information
Analysis of the structure of the spike train alone can offer insight into the transm ission
of information (e.g. (Strong, Koberle et al. 1998). The underlying principle is th at when
an encoder is given an information rich input, it is expected th at the encoder will yield an
output that is also correspondingly rich. ‘Rich’ in term s of the spike train argues at the
very least, that the interspike intervals should be highly variable. In information theory,
this translates to the concept of entropy when the stimulus has not been specifically tied
to a response, but is related to the response.
For these experiments, entropy is the
am ount of information required to specify that stimulus exactly given some response.
There is both stimulus entropy and response entropy. For our purposes, we assume a
20
transparency between the stimulus and the response, a concept th at lies at the very heart
of information theory. Therefore, unless there is bias in the encoder’s ability to encode
information (which we are proposing there is), we expect the more complicated the
stimulus, the higher the entropy. Response entropy H (R ) is given as:
H (R ) = - 2 p ( n ) * lo g 2p ( n )
[3.1]
where P(r0 is the probability of a response. Practically speaking, a response th at is
highly probable with all other responses improbable would have low entropy. Entropy is
maximal when there are many responses all at equal probability.
Stimuli information could be derived from a stimuli type as a whole if we treat the
Background stimuli as signal noise (reasonable, as the contrast is near zero and should
result in no variability), we can further obtain an estimate of the information:
I ( R ,S ) = H (R ) - H ( R \S )
[3.2]
The variable H ( R \S ) can be considered the signal noise. The resulting calculation of
I ( R ,S ) is the total information derived from a stimuli associated to some response,
implying a total information rate of th at stimuli type. This should result in a close
approximation of the true information rate when the specific stimulus to some response
is unknown.
Strong and colleagues (Strong, Koberle et al. 1998) introduced some methods to m easure
both entropy and information in spike trains. W ithout any explicit knowledge of the
stimuli, and especially ignoring any contribution of signal noise, a spike train with
significant variance contains more information (and under this m ethod of analysis,
greater entropy).
Similar to how Strong et al approached measuring information, one m ethod of
21
measuring the variability of a signal without consideration of signal noise is to parse the
spike train into ‘words’, with each word composed of m bits (binary digits). The
encompassing time length of a word can be arbitrary but should converge on a size that
yields the most information. At one extreme would be a word of l bit, which would
border on meaningless (spike or no spike i.e. ‘l ’ or ‘o ’). At the other extreme would be a
word encompassing seconds represented by such a large array of bits that the word
would likely occur only once and have little utility. After extensive testing of various word
sizes we found the optimal length to be 10 bits of 4msec for each bit (total 40msec). This
yielded the greatest variability in allowing separation of the responses to the different
stimuli without exhausting the possible combinations of words. In our data, it is rare to
have 2 spikes occupy one 4 msec bin. The 10 bits/w ord was chosen as a reasonable
approximation of continuous updates (see discussion).
Using 10 bit words generates a ‘vocabulary’ or 1024 words (210). As an example, we also
tried 8 bit words (8 x 4msec, 32msec lengths) which generate a vocabulary of 256 words
(28). In several cells the 32msec word length of the 8 bit words resulted in exhaustion of
the variations, thus causing overlaps or repeats in the word representation, resulting in a
decrease in separation among the responses to different stimuli. Since this would
compromise both our measure of information and discrimination, we favored the
vocabulary of 1024 words, which was never exhausted (max used: 850 words in a movie
sequence) and as such would not compromise our measures of entropy or information.
Another im portant requirem ent of word design is th at it differentiates each stimulus
condition. In an ideal scheme, certain words would uniquely differentiate among the
background, noise, and natural images videos. Given this method, we obtained a Chi
square value for each response (a ‘w ord’) based on the frequency of each word tied to
stimulus condition. From this, we generated a discrimination coefficient. Taking an
22
arbitrary sequence of words generated from the responses of different stimuli, it is
feasible to make an educated ‘guess’ as to whether that sequence came from background,
noise, or natural images videos. By repetition, we could rapidly develop confidence levels
for sequences, and determine how m any words, on average, it took to reliably distinguish
the source of the words.
2.6.3 - Perisaccadic Activity
Saccadic events were marked in the files and time-stamped. A saccade was defined as a
change in eye position of at least 0.25 degrees in 10msec (Schlag-Rey and Schlag 1977).
The activity around the time of the saccade could then be easily referenced. For
statistical analysis, we obtained a perisaccadic interval of +/-200m sec around either the
beginning or the end of the saccade. The most useful plot was then to bin the times
between 2-25msec epochs and plot the activity around the saccades in standard
deviation units. The overall profile rem ained the same when compared among the bin
ranges of 2-25msec. Thus most analysis was performed under the 4msec method to allow
visual clarity of the data. Given the relatively smooth nature of the plots, deviations
exceeding 2 standard deviations were considered statistically significant.
23
RESU LTS
3.1 - T im e-varying n atu ral im a g e s p ro d u ce b ro a d r a n g es o f in tersp ik e
in ter v a ls
Natural image videos and noise videos modulate the neural activity differently when
compared to each other or the background stimuli. Figure 4 a shows the response over a
40 second epoch of a representative LGN neuron th at was well-modulated by the
different stimuli presented. It should be readily apparent th at as the stimulus conditions
change, the activity also changes. F ig u res 4b and 4 c provide greater detail of transition
points showing a close up of the change in activity. F igure 4 b shows greater detail as
the screen shifts from the blank background to the noise pattern, while F igure 4 c shows
greater detail as the screen shifts from a blank background to the natural images video.
Although it is clear th at a shift in activity occurs, the goal is to determine if the tem poral
distribution of spikes shifted in ways th at would be consistent with efficient coding
during the presentation of the time-varying natural images. Our hypothesis is th at the
encoder, an LGN neuron, is optimally designed to encode the statistical properties of
natural images; therefore, a rich sample of the natural images should manifest as the
m ost usage of the encoder’s ‘vocabulary.’
F igure 5 shows examples of autocorrellograms and the associated FFTs generated from
5 epochs each of Background (F ig 5A a n d 5D ), Noise (Fig. 5B a n d 5E ), and Natural
Images Videos (Fig. 5C a n d 5F) for an example cell. The power spectra of the FFT
shows that at the lowest frequencies the natural images stimuli has the greatest power,
consistent with the results obtained by Dan and her colleagues (Dan, Atick et al. 1996) in
24
imiii
25
in
~o _
C
3O
j*:
i—
O)
o
03 _
CQ ~
O
CM
</>
E
o
ID
CM
ii
:
m i n i
CD E
i
inn
iiiiiini
him
i i
i
iiiiii ii
iiiiiiiii i
i i
■ iiiiiiiiii i i iiiiiii i iiiiiiiiiiiii
Natural Images Videos
o =
(U “
> =
CD =
(/)
Background
Example Data: 40 Second Excerpt
32 I
A-D
if)
o
in
E
CM
Figure 4.) Excerpt of 40sec of data for both eye position and spike activity (A). Vertical tick marks indicate
transition points among 3 stimulus conditions in the sequence of background, noise videos, background, and
natural images videos. A 5sec sample centered on the transition point from background to noise videos (B)
shows a shift in spike activity from a base activity rate to a very shifting and dynamic rate at noise video
onset. Another 5 sec sample centered around the transition point from the background to the natural images
video (C) shows a shift in spike activity from a base activity rate during the background stimuli to a shifting
and dynamic rate at the natural images video onset. However, this spike activity rate has a profile that is
unlike that which occurred during the noise video and showed a greater display of dynamic ranges.
the LGN of the paralyzed, anesthetized cat. However, different from their result is the
absence of flattening in the frequency ranges below 15Hz.
By combining the ISIs of multiple epochs of the same stimuli (e.g. 20 cycles of
background, noise, or natural image videos) an FFT can be generated th at represents the
frequency profiles of the cell’s total response to th at stimuli. F igure 6A shows the FFT
power spectra from another cell in which the spectra have been combined for a complete
series of 20 epochs of background (black trace), noise (red trace), and natural images
video (blue trace). Again, similar to Figure 5, except th at this is for a complete series,
there is significantly more power in the low frequencies for the natural image videos, and
the power spectrum is relatively higher for the lowest frequencies. F igure 6B shows the
cumulative results (18 of 19 cells: One was removed due to the z-score indicating a
possible outlier) of the relative increase in power for low frequencies during natural
image videos versus the noise pattern. When all cells are considered, the dominance is
only found at the lowest frequencies. For 14/19 cells, the natural images contain more
spectral power at <3Hz than the noise.
In spite of a similar FFT profile with Dan et al in the lowest frequency ranges, the results
were not equivalent. Dan et al found flat frequency profiles between the ranges of 5 to
15Hz; whereas, our results show th at there is a higher power representation of the low
frequencies until 15Hz and that representation has a variable degree of flatness. Thus,
F igure 6B indicates that overall, the evidence of whitening is relatively weak in the
natural condition. As will be discussed, the combinations of natural conditions - i.e.
26
D
B ackground
B ackground
1000
100
10
1
500
-400
-300
-200
-100
0
100
200
300
400
500
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
B
1000 -|
1000
900 -
100
10
1
-500
-400
-300
-200
-100
0
100
200
300
400
500
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
Natural Im ages V id eo
Natural Im ages V id eo
1000
1000
900
800
700
100
600
500
400
10
300
200
100
0
-500
-400
-300
-200
-100
0
1
100
200
300
400
500
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
Figure 5.) Comparisons of the autocorrelogram of spike activity across 5 epochs of each stimuli condition:
Background (A), Noise Video (B), and Natural Images Video (C). The associated FFTs of each
autocorrelogram is also plotted for each of the stimuli conditions: Background (D), Noise Video (E), and
Natural Images Video (F) demonstrating that the neuron’s response to Natural Images Video shows a
greater low frequency power than the response to Noise Video Stimuli.
wakefulness, eye movements, and light/contrast adaptation - may be inhibiting our
ability to detect an underlying whiteness.
Figure 7 shows the results of the kurtosis analysis. Figures 7A shows the distribution
27
of spike counts per 250msec generated across 5 epochs of zero contrast background for a
representative LGN neuron. In this case, most counts were below 15 spikes/250 msec.
Figure 7B shows the distribution when the screen is replaced with the noise stimulus.
The distribution is surprisingly similar to the background but shifted to the right, and
m ost of the data was between 5 and 20 spikes/250 msec.
F igure 7C shows the
distribution during the natural image videos, the distribution is clearly broadened. This
is representative of whitening of the spike distributions. F igure 7D shows an example
of one of the most significantly platykurtic (‘flat’) distributions when considering a
complete set of 20 cycles of background (kurtosis: 6.84+1.49), noise (kurtosis:
8.15+2.18), and natural images video (kurtosis: 3.85+1.2). This example demonstrates
the differences of the variability of the spike response profiles for each of the three
stimuli.
Statistical significance was then m easured by obtaining the level of kurtosis for each
epoch (of 20) under each condition, and obtaining the t-value (p < 0.01). F ig u res 7E
and 7F show the distributions that had the lowest significance (7E; kurtosis t-value of
noise versus natural images video of 3.247 crit 2.861, p < 0.01) and not significant, but
closest to being significant (7F; kurtosis t-value of noise versus natural images video of
2.796 critical 2.861, p < 0.01). In F igure 7E, the slight tail on the right was apparently
consistent and had a strong effect on kurtosis; as otherwise, the distributions were
strongly Gaussian. F igure 7F did not qualify due to the apparent noise in the
distributions. Overall, of the 19 cells analyzed, 8/19 (42%) were significantly platykurtic
(t-value > 2.861, p < 0.01).
28
A
1e+8
Complete Series of 20 Sets
1e+7 -
Background
Noise Video
Natural Im ages Video
1e+6
g 1e+5
1e+4 -
1e+3 -
1e+2
0
5
10
15
25
30
Frequency (Hz)
20
35
40
45
50
35
40
45
50
B
3
All Cells Compiled
o
CD
?2
<£>
O
Z
o
0
■g
>
O)
E
CD
CD
L_
13
0
0
5
10
15
20
25
30
Frequency (Hz)
Figure 6.) Compiled FFT of 20 runs of each stimuli showing a greater power for the Natural Images Videos
at the lower frequency ranges (A ), and a normalized ratio of the Natural Images Video FFT across all cells to
their corresponding Noise Video FFT demonstrating the trend of higher power for the Natural Images Videos
at the lower frequencies. The red trace shows the ratio of power for natural image videos versus the noise at
each frequency. The vertical bars in the figure are standard deviations, indicating significant scatter, and the
strongest frequency representation of natural images videos at -3 Hz (B). *One outlier removed based on
critical value.
29
Another m ethod of analyzing specifically the tem poral distribution of spikes is the time
preceding a spike (pre) versus the time to the next spike (post). F igure 8 shows
examples of plotting these distributions. F igure 8A shows the distribution of spikes
during the zero contrast background for a single epoch. In these plots, a narrow
distribution is associated with indicating a single frequency, except in interval rather
than frequency. As such, they should correspond to the FFT, but lack the o Hz
component that consistently emerges with FFT plots. F igure 8B shows the same format,
but the spikes were obtained during the noise sequencq. F igure 8C shows the
distribution of spikes evoked during the natural image videos. When plotted this way,
the broadest, least focal distribution implies a whitened or increase variability in the
distribution.
For this example, broadest distribution fits the natural images videos best. F ig u res 8D,
8E, and 8F are the same as in F igu res 8A, B, C, except they represent a compilation of
a complete set across 20 cycles. The broader distribution, apparent in the single case of
natural videos (Fig. 8C), applies for the compiled record across 20 cases as well.
Statistically, we noted that what appears to separate the responses to natural images
video from the noise are the long interspike intervals (low frequencies), in particular
those at 100 msec or more (10 Hz). Figure 9 uses the same convention as F igure 8
except the 100 msec line is indicated, and a different cell is displayed. We compared the
fraction of cells with before and after interval was greater than 100msec during the noise
pattern versus the natural images video. Of 19 cells analyzed, 13/19 (68%) showed a
greater ratio of the long intervals (low frequency) for the natural image videos than for
the noise patterns. As with the other 2 param eters (FFT, kurtosis) examined, only a
fraction of the cells showed a tendency to whiten; thus, indicating that whitening though
efficient in the sense of bandwidth utilization may not be a defining factor of efficiency of
the LGN neuron.
30
0.14 - Compiled
Background
20 Sets - Highly Significant
------- Background
------- Noise Video
------- Natural Images Video
0.12 0.10 -
0
b
00
A .A a
o
0
0
b
^
Probability
0
0.10
0.02 -
A -/XI A.-
y
0.00 -
1
i
20
30
40
I3
10
1
20
40
30
Spike Counts
50
60
Spike Counts
B
0.20
Compiled 20 Sets - Least Significant
Noise Video
0.18 -
0.3 Background
Noise Video
Natural Images Video
0.16 0.14 -
£
0.12
-
0.10
-
>.
C
5
O
Q_
0.08
0.06
0.04
0.0
0.02
0.00
0
10
20
30
40
50
6
60
10
8
12
16
22
20
24
Compiled 20 Sets - Closest Not Significant
Natural Images Video
-
0.20
-
—
0.08 -
&
J7
Q
-270.10
0.06 -
CL
18
0.25
0.12
0.10
14
Spike Counts
Spike Counts
O
Q.
0.04 -
Background
N oise Video
Natural Images Video
-
0.05
0.02
0.00
0.00
0
10
20
30
40
50
-
0
60
2
4
6
8
10
12
14
16
18
20
22
24
Spike Counts
Spike Counts
Figure 7.) Kurtosis plots of the number of spike counts occurring within 250msec integrated to equal 1. The
first 5 epochs of each stimuli presentation: Background (A), Noise Video (B), and Natural Images Video (C)
showing as an example that the whitening effect occurred for each set in a series of 20. Also displayed are the
compiled series spike counts for the cell with the most significant kurtosis (D), the cell with the least significant
kurtosis (E), and a cell that is the closest to being significant (F).
31
Background
Background
Noise Video
Noise Video
Natural Images Video
Natural Images Video
Figure 8.) Plots of pre-interspike interval vs post-interspike intervals under 5 epochs of each stimuli
condition presentation: Background (A), Noise Video (B), and Natural Images Video (C). Also displayed are
the compiled pre/post interspike intervals for a full series of 20 sets starting with Background(D), Noise
Video (E), and Natural Images Video. Because of the concentration of spikes with almost the same pre and
post ISIs it appears that the response to the Background stimuli has a tendency to approach one specific
frequency. The noise has a concentration towards high frequency spike intervals, while it is apparent that the
response to the natural image stimuli is broad indicating a varied spectrum of interspike intervals.
32
A
D
Background
Background
1000n----------------------
1000 1----------------------------------------------
1
10
100
1000
B
1
10
100
1000
E
Noise Video
10001------------------------------------
1
10
100
Noise Video
1000
-|------------------------------------------------------------------------------------------------------
1000
c
1
10
100
1000
F
1000 1
1
Natural Images Video
5--------
10
100
Natural Images Video
1000
1000
5--------
1
10
100
1000
Figure 9.) Plots of pre-interspike interval vs post-interspike intervals for 5 epochs of each stimuli
presentation: Background (A), Noise Video (B), and Natural Images Video (C). Also displayed are the
compiled pre/post interspike intervals for a full series of 20 sets starting with Background(D), Noise Video
(E), and Natural Images Video. Although these are plots of a different cell from Figure 8, it is readily
apparent that the two cells share a similar response profile to the different stimuli. Here a line is drawn in
each condition indicating the separation of interspike intervals which were beyond 100ms (<ioHz). ISIs The
ISI’s > 100ms represented 9.1% of spike intervals for Natural Images Videos, while it only represented 2.9% of
the spike intervals for the Noise Videos, further demonstrating the greater power of the low frequencies for
the Natural Images Videos.
3-2 - LGN n e u r o n s tr a n sm it h ig h er r a te s o f in fo rm a tio n in r e sp o n se to
n a tu ra l im a g es
We calculated the entropy and related information rates th at occur when the monkey
was viewing the blank screen, the noise patterns, or natural image videos. F igure 10
demonstrates some justification for using 10 bit 40msec (1024 words, 210) words.
F igure 10A shows the frequency distribution of 10 bit 40msec words (10 x 4msec) used
during 20 epochs of background, noise, or natural images video (log scale). The words
used during the background have the smallest vocabulary (412 words), the noise stimuli
the next to smallest vocabulary (503 words), and the natural images videos the largest
vocabulary (842 words). In F igure 10B, when the same spike trains are converted to 8
bit/32m sec words (8 x 4msec) under the same conditions, natural image videos exhaust
the 256 word vocabulary. Ideally the vocabulary should be used, but not exhausted.
F igure 10A also shows that more information is transm itted during the natural images
videos, the integral difference between the words produced by the noise pattern versus
the natural images videos should be directly proportional to information. Among the 19
cells analyzed 9 had the longest tails (greatest information) provided by the natural
images stimuli.
It was useful to determine how often and how rapid unique words were formed. F igure
11 tracks word usage for individual epochs and combined sets (10 bit, 40msec words).
Figure 11A plots the cumulative count of new words th at appear over time for each of 5
movies in response to the presentation of the zero contrast background. There is some
variance, including one in which not even 100 unique words appear. F igu res 11B and
11C use the same convention, but track the appearance of new words for the noise
patterns and natural images respectively. Interestingly, the production of new words is
not much different between noise and natural image videos. F igu res 11D-F show the
34
cumulative effects on new word production across epochs (20 each), yielding quite
different results. F igure 11D shows the cumulative effects for the background stimuli,
F igure 11E shows cumulative effects for the noise stimuli, and Figure 11F shows
cumulative effects for the natural image videos. The responses to natural images stand
out, exhausting the greatest portion of the vocabulary. Combining the left and right
graphs, leads to the following: it appears natural images exhaust more words, not
because they use more words for each epoch (F igure 11C), but with each epoch they
now start generating new words that weren’t represented in the previous. The effect is
strong for the natural image videos, but much less pronounced for the noise patterns
despite the increased complexity and uniqueness (9/19 cells).
Using the algorithms for entropy, and using the zero contrast background to estimate
our neural ‘noise’, we determined entropy (information transmission) for each of 19 cells
under either noise or natural image videos. Response entropy was highest for the natural
image videos in 8/19 cells (42%). This result implies that in those 8 cells, the natural
images videos produced the greatest variation in responses when compared to the noise
stimuli.
35
Word Size -1 0 Bins
10000
Background
Noise Video
Natural Images Video
1000
-
100
-
10
-
0.1
0
100
200
300
400
500
600
700
800
900
1000
Unique Word (1024)
Word Size - 8 Bins
10000
Background
Noise Video
Natural Images Video
1000
-
100
-
10
-
0
20
40
60
80
100
120
140
160
180
200
220
240
Unique Word (256)
Figure 10.) Plots of word usage counts relative to unique word usage when using 10 bins (A) vs. 8 bins
(B). Usage of 8 bins resulted in an exhaustion of unique word vocabularies which results in equal
probability representation for vastly different stimuli and decreases the deviations among stimuli type,
thus resulting in less predictive power when using individual word combinations to back predict response
origins.
36
D
A
Background Total
Background
p
p
600
g - 400
600 -
g - 400
0
50
100
150
200
250
300
350
0
400
1000
2000
3000
4000
5000
6000
T im e (Word S iz e * Bin S iz e = 4 0 m s )
T im e (W ord S iz e * Bin S iz e = 4 0 m s)
Noise Videos
Noise Videos Total
7000
8000
7000
8000
B
■p 600 -
p
g - 400
g - 400
0
p
50
100
150
200
250
300
350
600
0
400
1000
2000
3000
4000
5000
6000
T im e (W ord S iz e * Bin S iz e = 4 0 m s )
Tim e (W ord S iz e * Bin S iz e = 4 0 m s)
Natural Images Video
Natural Images Video Totals
p
600
600 -
g - 400 -
0
50
100
150
200
250
300
T im e (W ord S iz e * Bin S iz e = 4 0 m s )
350
400
0
1000
2000
3000
4000
5000
6000
7000
8000
T im e (Word S iz e * Bin S iz e = 4 0 m s )
Figure 11.) Plots of unique words used over time by stimuli condition. Example of word consumption
rates of 5 runs of each stimuli type: Background (A), Noise Videos (B), and Natural Images Videos (C).
Compiled unique word usage across all 20 runs for a series of Background stimuli (D), Noise Video
stimuli (E), and Natural Images Video stimuli (F). Although across all conditions the time was limited to
i5sec samples per stimuli per set, the unique word usage of the neuron to Natural Images Video
presentation was the greatest among the different stimuli presentations.
3-3 - S tim u lu s c o n d itio n s d ifferen tia lly m o d u la te p erisa cca d ic activity
It is reasonable to assume that eye movements alter activity of LGN neurons since the
movement causes the receptive field to be swept across a textured screen. The visual
texture (stimulus) on the screen should influence the response observed. It may seem
surprising that the blank screen elicits a response, but this may in part be the imperfect
homogeneity of illumination or effects of the extraclassical receptive field (Alitto and
Usrey 2008). Figure 12A shows a sample of 20 horizontal (red traces) and vertical
(green traces) eye movement traces aligned to the onset of an eye movement (onset
indicated by arrow). F igure 12B shows the replicas of action potentials (raster ticks) for
each of the 20 eye movements with the activity aligned to movement onset. For 20
saccades, it is difficult to discern any modulation associated with the eye movement.
However, F igure 12C shows the perisaccadic response histogram (2msec bins)
constructed from 2207 eye movements. With this much greater number, modulation is
apparent. Following saccade onset, there is a depression in activity followed by an
excitation; however, what is not readily apparent is the presaccadic suppression trend.
Most neurons generated responses to eye movements (15/19 cells), though m ost had
significant variations in activity from which no single trend emerged. The presence of the
zero contrast background screen generated the least amount of activity. This is
predictable, since the am ount of visual texture is minimal and the typical response to the
background presentation has the lowest rate.
As others have reported, there is a suppression of activity preceeding the saccade
followed by an enhancement that peaks ~ioom sec following saccade onset (Lee and
Malpeli 1998; Reppas, Usrey et al. 2002). F igure 13 shows the responses of an LGN
neuron that was modulated around the time of the saccade, but showed similar patterns
independent of background. Figure 13A shows a histogram (4msec bins) of the
perisaccadic activity (~ iooo saccades) with the zero contrast background present. The
38
dotted lines indicate the activity is modulated 2 standard deviations from the mean.
F igure 13B shows the equivalent activity associated with the noise stimulus, and
F igure 13C shows the activity associated with the natural images videos. This cell
showed the interesting and consistent property of decreased activity associated with
saccade onset, followed by excitation th at emerged following the end of the saccade.
Particularly noteworthy is the downshift in activity th at precedes the eye movement
(7/19 cells), especially noticeable during the natural images videos (F igure 13C).
F igu res 13D-F are identical to F igu res 13A-C except that the histograms are aligned
to saccade end rather than the beginning.
Although some cells show a basic suppression (12/19 cells) across different stimuli, other
cells dem onstrate subtle stimuli preferences in the response profile. F igure 14 shows an
example of another LGN cell with a similar profile, but one in which the downshift
preceding the saccade began sooner and was much more pronounced when the monkey
was watching natural image videos than either the zero contrast background or the noise
pattern
Other cells demonstrate a saccadic enhancement (3/19 cells) instead of saccadic
suppression such as previously reported (Lee and Malpeli 1998; Reppas, Usrey et al.
2002). F igure 15, same convention as F igure 14, provides the strongest evidence for a
pre-saccadic depression, but only the natural images show a significant post-saccadic
enhancement. There are two interesting points on this profile. First, the depression
precedes the saccade.
Because the eye is not moving, such depression can only be
accounted for if there is an oculomotor component actively suppressing the signal.
Second, and perhaps most interesting, is the fact saccadic enhancement was only
associated with eye movements made when the background was time-varying natural
images.
39
50ms
50ms
50ms
50ms
Figure 12.) Saccadic eye movement effects on spike activity during natural image video stimuli. Eye
position is aligned to saccade onset (A - Red Trace: H orizontal Eye M ovem ent, A - Green Trace:
Vertical Eye M ovem ent). Raster plot of spike activity during first 20 saccades aligned to eye position
trace (B) showing little in the way of saccadic modulation. Histogram of compiled spike activity of all
saccades (2207) during 20 Natural Images Video presentations aligned to eye position trace (C) showing a
presaccadic and a, significant, saccadic suppression with a post saccadic enhancement.
40
A
D
4
4
Background
Background
2
2
o
o
-2
-2
-4
-200 -175 -150 -125 -100 -75
-50
-25
0
25
50
75
100
125
150
175
-4
-200 -175 -150 -125 -100 -75
200
Perisaccadic Tim e (Aligned to O nset)
-50
-25
0
25
50
75
100
125
150
175
200
100
125
150
175
200
100
125
150
175
200
Perisaccadic Tim e (Aligned to End)
B
4
Noise Video
Noise Video
2
0
-2
-200 -175 -150 -125 -100 -75
-50
-25
0
25
50
75
100
125
150
175
-4
-200 -175 -150 -125 -100 -75
200
-50
-25
0
25
50
75
Perisaccadic T im e (Aligned to Onset)
Perisaccadic Tim e (Aligned to End)
Natural Images Video
Natural Images Video
4
2
2
o
-2
-2
-4
-200 -175 -150 -125 -100 -75
-50
-25
0
25
50
75
100
125
150
175
-4
-200 -175 -150 -125 -100 -75
200
-50
-25
0
25
50
75
Perisaccadic Tim e (Aligned to End)
Perisaccadic Tim e (Aligned to Onset)
Figure 13.) Typical histogram plots in standard deviation units of the spike activity mean during saccadic
eye movement under all three stimuli conditions set for both saccade onset and saccade end binned to 4msec
time intervals: Background - Saccade Onset (A), Noise Videos - Saccade Onset (B), Natural Images Videos
- Saccade Onset (C), Background - Saccade End (D), Noise Videos - Saccade End (E), Natural Images
Videos - Saccade End (F). All stimuli types show a saccadic suppression effect. Presaccadic suppression
though not explicitly significant appears as a trend in this cell among all three stimulus conditions.
41
4
Background
2
0
-2
-4
-200 -175 -150 -125 -100
-75
-50
-25
B
0
25
50
75
100
125
150
175
200
125
150
175
200
Perisaccadic Time
JL
Noise Video
2
w _2
-
.
-200 -175 -150 -125 -100
-75
-50
-25
0
25
50
75
100
Perisaccadic Time
4
Natural Images Video
2
0
-2
-4
-200 -175 -150 -125 -100
-75
-50
-25
0
25
50
75
100
125
150
175
200
Perisaccadic Time
Figure 14.) Histogram plots in standard deviation units of the spike activity mean aligned to saccadic eye
movement onset under all three stimuli conditions: Background (A ), Noise Videos (B), and Natural Image
Videos (C) binned to 4msec (same cell as Figure 13). Interestingly observable is a presaccadic suppression
trend noticeable only in the response to natural image videos.
42
4
Background
2
o
-2
-4
-200 -175 -150 -125 -100
-75
-50
-25
0
25
50
75
100
125
150
175
200
50
75
100
125
150
175
200
Perisaccadic Time
Noise Video
—i------------ 1------------ 1------------ 1------------ 1-
-200 -175 -150 -125 -100
-75
-50
-25
0
25
Perisaccadic Time
4
Natural Images Video
2
o
-2
-4
-200 -175 -150 -125 -100
-75
-50
-25
0
25
50
75
100
125
150
175
200
Perisaccadic Time
Figure 15.) Histogram plots in standard deviation units of the spike activity mean during saccadic eye
movement under all three stimuli conditions: Background (A), Noise Videos (B), and Natural Images
Videos (C) binned to 4msec. All stimuli types show a presaccadic suppression effect. Presaccadic
suppression is significant and does not seem related to stimuli type. However, a significant post saccadic
excitation in this cell appears to only be evident in the natural images video presentation.
43
D IS C U S S IO N
The above results are surprisingly, novel. Normally, the first rule in science is
observation in the natural habitat. Unfortunately, for the early visual system, this idea
has been sidetracked due to technical difficulties. Failure to accommodate these
conditions prevents accurate descriptions of normal activity, including assessing an LGN
neuron’s encoding potential. Here we have attem pted to describe the LGN neuron in as
natural of an environment as currently feasible.
The general approach used here is ecologically sensitive, and attem pts to develop models
of early vision derived from animals that actually perceive in a pseudo-natural
environment. The overall project has two parts, a quantitative approach in which we seek
to recover the receptive field from the flow of spikes as the animal views time-varying
natural images (Theunissen, Sen et al. 2000; Theunissen, David et al. 2001; Dong, Kelso
et al. 2002; Dong and Weyand 2007), and a qualitative approach presented here in
which we capture some of the statistical properties of early visual neurons as the monkey
views time-varying natural images. Although qualitative descriptions lack the high level
prediction potentially available from quantitative receptive field models, their utility is
that they can capture some of the design principles by which information is transm itted
through the nervous system (e.g. (Strong, Koberle et al. 1998)).
Analysis of the temporal properties of LGN spikes dem onstrates the variability of the
spike train to different stimulus conditions. Application of ideas from information theory
indicates a complementary result that information transm ission is highest during
presentation of time-varying natural images, and there is evidence that saccades alter
information flow differently when the eyes move across natural scenes. The importance
of this qualitative study is enhanced by the fact th at we currently have no data on
44
information flow through the LGN in the awake animal, viewing time-varying natural
images. The data obtained here provides ample evidence for the necessity of developing
models in a more natural environment.
Carandini and his colleagues (Mante, Bonin et al. 2008) derived a model of LGN activity
under ‘natural conditions’ from data obtained from paralyzed, anesthetized cats. An
audio clip of the predicted activity of an off-center cell synchronized to a ‘cat cam’ (video
camera m ounted on a cat walking through high grass) can be obtained at
http://ww w .neuron.or2/cgi/content/full/A 874/625/D C i/. Ignoring the fact th at gaze
shifts in cats are much less common, the activity generated in the LGN of the awake
monkey (e.g. F igure 4) is easily 4 times greater than what is observed in the cat. It is
especially im portant to note that saccade-related activity in the above m entioned cat
video was derived from the temporal response properties of the neuron in the paralyzed,
anesthetized animal.
In the discussion below, we focus on three issues related to the results: the significance
of whitening in the awake, behaving animal viewing time-varying natural images; the
complementary observation that information rate is highest when the stimuli are natural
images even if the noise pattern has greater entropy; and the observation that in the
presence of natural images, saccadic eye movements can alter information flow in ways
th at are different from other stimulus sets.
4.1 - S p ik e tra in m o d u la tio n
We were not able to replicate prior evidence of whitening, gleaned from the paralyzed,
anesthetized cats. However, several factors that are not present in the paralyzed
anesthetized animal could contribute to the absence of whitening. Wakefulness may
introduce non-linearities not appreciated in anesthesia. In addition, contrast and
luminance gain control may contribute to non-linearities th at would be present in the
45
response to unaltered images. Because the study by Dan et al limited the contrast and
luminance ranges this effect would not be observable. Finally eye movements clearly
have an influence on altering the power of both low and high frequencies further altering
response profiles. This contribution would, of course, be absent in the paralyzed,
anesthetized condition.
As discussed above, we know that wakefulness affects transmission, such that both the
numbers of spikes generated and the tem poral distribution of those spikes are perturbed.
The previously identified effect of whitening, could be associated with anesthesia, in th at
the cells behave more linearly, are depressed, and resort to some baseline activity th at is
strongly biased towards the whitened distribution. It is certain th at the spike rates under
the awake condition are several times greater than while under anesthesia.
That wakefulness might introduce more non-linearities in the response is suggested by
an intracellular study in the LGN of the cat (Hirsch, Fourment et al. 1983). These
investigators wanted to measure the membrane potential of LGN cells in different states.
They commented that the level of synaptic bom bardm ent in wakefulness was so high
that the membrane potential was completely unstable and impossible to measure. Such
unstable behavior would not be associated with the linear properties found in anesthesia.
It is significant that they were able to obtain m embrane potentials during both slowwave and REM sleep, further implicating that the LGN neuron during wakefulness is
significantly different than one that is under anesthesia.
Eye movements, because they appear to both depress and facilitate transm ission (results
above; (Lee and Malpeli 1998; Reppas, Usrey et al. 2002)), would appear to disrupt the
spike distribution, and thus alter the kurtosis. However, kurtosis analysis of spike
distributions during fixation only times, surprisingly showed th at the significance of a
platykurtic distribution was irrespective of whether the analysis was restricted to fixation
46
or included saccades, even though there is a non significant trend towards being less
platykurtic when considering only fixational periods.
It may be that the general pattern of depression preceding and during the saccade, and
enhancem ent after the saccade may introduce additional low and high frequencies which
contributes to a slightly more flattened distribution. Transmission gain in the context of
whitening was not discussed by Dan et al (Dan, Atick et al. 1996). The problem is gaze
shifts are especially common in primates, and gaze shifts obviously alter excitability (e.g.,
Figs. 13, 14, 15, 16) and spike train behavior. W hat is not clear is how such shifts in
activity should be factored into a scheme of whitening which deals with efficient coding
of the image only.
In real life, the range of luminance and contrast values associated with natural scenes are
sufficiently variable such that luminance and contrast adaptation mechansism (both
non-linear) are operating. W hen artificially constrained, as done by Dan et al (Dan, Atick
et al. 1996), these variables remain linear, likely increasing the probability of whitening.
Although Dan et al (Dan, Atick et al. 1996) provided some evidence for whitening at the
level of the LGN, it has not been shown that the retina is not also whitened by timevarying natural images. It would be of interest to know if the retinal ganglion cells are
temporally decorrelated, and if they are, to what extent they are decorrelated. From what
is known of the fidelity of retinogeniculate transm ission and efficacy of transfer (e.g.
(Weyand 2007)), it is difficult to extrapolate how tem poral decorrelation would suddenly
emerge at the level of the LGN.
4 .2 - N atu ral im a g es p r o d u ce h ig h er in fo r m a tio n r a tes
The idea that one can read a ‘spike code’ is a relatively recent extension of approaches to
sensory processing. Traditional studies, such as those discussed above and in the
Appendix do this indirectly as they focus on the transform relating stimulus attributes
47
with spike probability. Bialek and his colleagues (Bialek, Rieke et al. 1991; Strong,
Koberle et al. 1998) have made significant contributions to Information Theory by
exploring the behavior of spike trains and their potential information content. Here, we
constructed our own ‘vocabulary’ of 40msec words and then tested how this vocabulary
was used under different stimulus conditions. For the most part, natural images videos
exploited the vocabulary the most. This observation alone complements the observed
increased variability of the distribution of the spike train. It can be considered another
way of indicating the interspike intervals were variable. The more variability one
observes in interspike intervals, the more ‘words’ th at should be produced. The results
here complement recent work in prim ary visual cortex of the awake monkey indicating
information transmission is increased using static natural image frames over artificial
image frames (e.g. gratings; (Vinje and Gallant 2002)).
An idea related to information transm ission is stimulus prediction based on expected
information transmission. This relates to the ease with which one can examine the spike
train and determine which of the three stimulus sets (background of zero contrast, noise,
or natural images) were on the screen. From the 40msec words produced, we produced a
Chi square value for each response and used that to construct coefficients that weighted
the occurrence of that word with th at stimulus set. W hen tested, random sampling of 3
words in sequence (120msec) was sufficient in most cases to identify which stimulus was
present. Success or failure varied with the cell, and ‘magic’ words that specifically
indicated which stimulus was present was highly limited and even then related to only a
select range of cells. In other words, reading the spike code with this method typically
took at least 3 words to indicate which stimulus was present, and there was no evidence
of specific words, associated with particular stimuli th at survived across cells.
4*3 - S accad es alter th e flo w o f in fo rm a tio n flo w th ro u g h th e LGN
Traditional approaches to visual perception subscribe to the idea that we pick-up
information with each fixation, and then shift gaze to a new location to pick-up our next
morsel of information. The role of the gaze shifts is minimized. Psychophysical
experiments indicate that visual perception is minimal within the peri-saccadic interval.
The observation that saccades appear to alter the gain of retinogeniculate efficacy would
be consistent with the psychophysics.
The basic functions relating neural activity to saccade timing observed in recent studies
in the cat and monkey LGN (Lee and Malpeli 1998; Reppas, Usrey et al. 2002) were
evident here (Figs. 13,14,15,16). This function includes a depression that precedes the
saccade by up to 100msec, and an excitation that exceeds the end of the saccade by
another 100msec (Lee and Malpeli 1998). Further, in our case, saccades appeared to
have little influence on whether whitening was or was not observed. These observations
seem to create more questions than answers. Given the fact th at saccades alone alter the
excitability even when the head is fixed (Lee and Malpeli 1998; Reppas, Usrey et al.
2002), saccade timing should be incorporated into the basic models of LGN neurons.
Ignoring saccades would not necessarily kill prediction, but based on the response
histograms, would appear to create an error term th at could exceed 10%.
The other aspect of saccades perturbing information flow was whether the presence of
natural image videos affected spike probability in ways that were different than that
observed under other conditions. The type of background present when the eyes move
has obvious influence on excitability. To our knowledge, no one has reported shifts in
excitability among retinal ganglion cells when the eyes move in the dark. This is not the
case in the LGN. Lee and Malpeli (Lee and Malpeli 1998) observed small changes in
excitability among LGN cells to movements in the dark. Interestingly, the pattern of
49
changes was exactly the same in the light, but on a much smaller scale. As long as the
environment is truly dark, such observations prove an oculomotor influence on
excitability. Once light is added, excitability goes up significantly, and providing a high
contrast background of grating patterns provides an excellent backdrop for maximizing
activity (e.g. (Noda 1975)).
Since the background matters, what is the consequence of a natural image background?
We found several examples (e.g. F igu res 14 a n d 15) where natural image backgrounds
appeared to selectively influence activity. To the extent th at other cells can be found that
show such selectivity is yet another factor that should be incorporated into any model.
Currently, how saccades actually influence excitability at all are not included in any
models of LGN cells (e.g. (Mante, Bonin et al. 2008)). Recent work in our laboratory has
dem onstrated that including saccade timing into models of the LGN increased prediction
(Dong and Weyand 2009).
4 .4 - C on clu sion s
Perhaps surprisingly, the general approach here is pioneering in the field of vision. If one
wants to understand the neuronal substrates for vision, it behooves us to study these
substrates in animals that are conscious and perceiving, and we would argue that the
stimulus sets should at least include natural images. Although such approaches have
become more popular, there is a continuing inclination to continue to use paralyzed,
anesthetized subjects or alternatively, to use awake subjects yet create artificial scan
patterns that do not appropriately replicate the unique features of natural images (e.g.,
(Vinje and Gallant 2002)).
Both Theunissen and his colleagues (Theunissen, Sen et al. 2000; Theunissen, David et
al. 2001) and Dong and his colleagues (Dong, Kelso et al. 2002; Dong and Weyand 2007;
Dong and Weyand 2009) have shown th at time-varying natural stimuli can be used to
50
recover the receptive field of geniculate neurons using the powerful technique of reverse
correlation. Most important, both groups have dem onstrated th at models derived from
artificial stimulus sets, or simple linear convolutions produce inferior models relative to
those using natural stimulus sets.
In this study, we have done an analysis of the statistical properties of LGN neurons in the
awake, behaving monkey as it viewed time-varying natural images. We found evidence
th at the spike train distributed more variably during presentation of natural images, but
not during noise patterns which were actually richer in the spatiotemporal domain. This
observation is consistent with the idea th at the early visual system is designed to
efficiently code natural images. Second, we found that information rates were greater
during those epochs when the monkey viewed natural images versus the noise patterns.
This result complements the observation of unique variability during observation of
natural images, and further bolsters the idea that the nervous system developed to most
efficiently code natural images. Finally, we observed exactly the same temporal
excitability functions other investigators have reported associated with the peri-saccadic
interval. Peri-saccadic excitability shifts in some cells are just as prevalent with natural
images as other stimuli. However, it appears that natural image presentation provides a
unique saccadic response profile not evident with other stimuli conditions. It is essential
that these shifts associated with stimulus features be incorporated into future models of
LGN neurons.
51
R E F E R E N C E PA G E
Alitto, H. J. and W. M. Usrey (2008). "Origin and Dynamics of Extraclassical Suppression in the
Lateral Geniculate Nucleus of th e M acaque Monkey." 57(1): 135-146.
Alonso, J. M., W. M. Usrey, et al. (1996). "Precisely correlated firing in cells of th e lateral
geniculate nucleus." Nature 383(6603): 815-9.
Attneave, F. (1954). "Some informational aspects of visual perception." Psychol Rev 61(3): 18393.
Baddeley, R., L. F. Abbott, e t al. (1997). "Responses of neurons in primary and inferior tem poral
visual cortices to natural scenes." Proc Biol Sci 264(1389): 1775-83.
Barlow, H. (1961). "Possible principles underlying th e transform ation of sensory messages."
Sensory Communication: 217-234.
Barlow, H. B. (1995). The neuron doctrine in perception, MIT Press.
Bialek, W., F. Rieke, e t al. (1991). "Reading a neural code." Science 252(5014): 1854-7.
Cai, D., G. C. Deangelis, e t al. (1997). "Spatiotemporal Receptive Field Organization in th e Lateral
Geniculate Nucleus of Cats and Kittens." J Neurophvsiol 78(2): 1045-1061.
Caldwell, J. H. and N. W. Daw (1978). "New properties of rabbit retinal ganglion cells." The
Journal of Physiology 276(1): 257-276.
Coenen, A. M. L. and A. J. FI. Vendrik (1972). "Determination of the transfer ratio of cat's
geniculate neurons through quasi-intracellular recordings and th e relation with th e level
of alertness. Exp." Brain Res(14): 2 2 7 -2 4 2 .
Dan, Y., J. M. Alonso, et al. (1998). "Coding of visual information by precisely correlated spikes in
the lateral geniculate nucleus." Nat Neurosci 1(6): 501-7.
Dan, Y., J. J. Atick, e t al. (1996). "Efficient Coding of Natural Scenes in th e Lateral Geniculate
Nucleus: Experimental Test of a Computational Theory." J. Neurosci. 16(10): 3351-3362.
Dong, D. and J. Atick (1995). "Statistics of natural time-varying images." Network: Computation
in Neural Systems 6: 345-358.
Dong, D., J. A. S. Kelso, e t al. (2002). Spatio-Temporal Decorrelated Activity Patterns in
Functional MRI Data during Real and Imagery M otor Tasks.
Dong, D. and T. Weyand (2009). The efficient coding of dynamic signals: th e effect of scenes and
saccades on th e receptive fields of th e lateral geniculate nucleus during free-viewing
natural time-varying images. Chicago, Society for Neuroscience.
Dong, D. W. and J. J. Atick (1995). "Temporal decorrelation: a theory of lagged and nonlagged
responses in th e lateral geniculate nucleus." Network: Computation in Neural Systems
6(2): 159-178.
Dong, D. W. and T. G. Weyand (2007). Receptive fields of awake animals free-viewing natural
tim e varying images. Salt Lake City, COSYNE.
Enroth-Cugell, C. and J. G. Robson (1966). "The contrast sensitivity of retinal ganglion cells of th e
cat." The Journal of Physiology 187(3): 517-552.
Erisir, A., S. C. Van Horn, et al. (1997). "Relative num bers of cortical and brainstem inputs to the
lateral geniculate nucleus." Proceedings of th e National Academy of Sciences of the
United States of America 94(4): 1517-1520.
Field, D. J. (1987). "Relations betw een th e statistics of natural images and th e response
properties of cortical cells." J Opt Soc Am A 4(12): 2379-94.
Gur, M. (1987). "Intensity coding and luxotonic activity in th e ground squirrel lateral geniculate
nucleus." Vision Research 27(12): 2073-2079.
52
Hartline, H. K. (1938). "The response of single optic nerve fibers of th e v e rte b rate eye to
illumination of the retina." Am J Physiol 121(2): 400-415.
Hirsch, J. C., A. Fourment, et al. (1983). "Sleep-related variations of m em b ra n e potential in th e
lateral geniculate body relay neurons of th e cat." Brain Research 259(2): 308-312.
Hubei, D. H. and T. IM. Wiesel (1961). "Integrative action in th e cat's lateral geniculate body." J
Phvsiol 155: 385-98.
Kaplan, E. and R. Shapley (1984). "The origin of th e S (slow) potential in th e mammalian lateral
geniculate nucleus." Exp Brain Res 55(1): 111-6.
Lai, R. and M. J. Friedlander (1989). "Gating of retinal transmission by afferent eye position and
m ov em ent signals." Science 243(4887): 93-6.
Lai, R. and M. J. Friedlander (1990). "Effect of passive eye position changes on retinogeniculate
transmission in th e cat." J Neurophysiol 63(3): 502-522.
Lee, D. and J. G. Malpeli (1998). "Effects of saccades on the activity of neurons in the cat lateral
geniculate nucleus." J Neurophysiol 79(2): 922-36.
Malpeli, J. G. and F. H. Baker (1975). "The representation of th e visual field in th e lateral
geniculate nucleus of Macaca mulatta." J Comp Neurol 161(4): 569-94.
Malpeli, J. G., T. G. Weyand, e t al. (1992). "A new m etho d of mounting and directing chronically
implanted microdrives." J Neurosci M ethods 44(1): 19-26.
M ante, V., V. Bonin, e t al. (2008). "Functional m echanisms shaping lateral geniculate responses
to artificial and natural stimuli." Neuron 58(4): 625-38.
M ante, V., R. A. Frazor, e t al. (2005). "Independence of luminance and contrast in natural scenes
and in th e early visual system." Nat Neurosci 8(12): 1690-7.
McAlonan, K., J. Cavanaugh, e t al. (2006). "Attentional Modulation of Thalamic Reticular
Neurons." J. Neurosci. 26(16): 4444-4450.
McClurkin, J. W., T. J. Gawne, e t al. (1991). "Lateral geniculate neurons in behaving primates. I.
Responses to two-dimensional stimuli." J Neurophysiol 66(3): 777-793.
Mcllwain, J. T. (1964). "Receptive fields of optic tract axons and lateral geniculate cells:
peripheral extent and barbiturate sensitivity." J Neurophysiol 27:1154-73.
Noda, H. (1975). "Discharges of relay cells in
lateral geniculate nucleus of th e cat during
spo ntan eou s eye m ovem ents in light and darkness." J Phvsiol 250(3): 579-95.
O'Connor, D. H., M. M. Fukui, e t al. (2002). "Attention m odulates responses in the hum an lateral
geniculate nucleus." Nat Neurosci 5(11): 1203-9.
Olveczky, B. P., S. A. Baccus, e t al. (2003). "Segregation of object and background motion in th e
retina." Nature 423(6938): 401-408.
Peichl, L. and H. Wassle (1979). "Size, scatter and coverage of ganglion cell receptive field
centres in th e cat retina." The Journal of Physiology 291(1): 117-141.
Pillow, J. W., L. Paninski, e t al. (2005). "Prediction and Decoding of Retinal Ganglion Cell
Responses with a Probabilistic Spiking M odel.” J. Neurosci. 25(47): 11003-11013.
Quiroga, R. Q., L. Reddy, et al. (2005). "Invariant visual representation by single neurons in th e
hum an brain." Nature 435(7045): 1102-1107.
Reid, R. C., J. D. Victor, et al. (1997). "The use of m -sequences in th e analysis of visual neurons:
linear receptive field properties." Vis Neurosci 14(6): 1015-27.
Reppas, J. B., W. M. Usrey, e t al. (2002). "Saccadic eye m ovem ents m odulate visual responses in
th e lateral geniculate nucleus." Neuron 35(5): 961-74.
Robinson, D. A. (1963). "A m ethod of measuring eye m o vem ent using a scleral search coil in a
magnetic field." IEEE Trans Biomed Eng 10:137-45.
Rodieck, R. W. (1965). "Quantitative analysis of cat retinal ganglion cell response to visual
stimuli." Vision Res 5(11): 583-601.
53
Sakakura, H. (1968). "Spontaneous and evoked unitary activities of cat lateral geniculate
neurons in sleep and wakefulness." Jpn J Phvsiol 18(1): 23-42.
Sawai, H., K. Morigiwa, et al. (1988). "Effects of EEG synchronization on visual responses of the
cat's geniculate relay cells: a comparison am ong Y,X and W cells." Brain Research 455(2):
394-400.
Schlag-Rey, M. and J. Schlag (1977). "Visual and presaccadic neuronal activity in thalamic
internal medullary lamina of cat: a study of targeting." J Neurophysiol 40(1): 156-173.
Schmidt, M. (1996). "Neurons in th e cat pretectum th a t project to th e dorsal lateral geniculate
nucleus are activated during saccades." J Neurophysiol 76(5): 2907-2918.
Shannon, C. (1948). A Mathematical Theory of Communication. CSLI Publications.
Simoncelli, E. P. and B. A. Olshausen (2001). "Natural image statistics and neural
representation." Annu Rev Neurosci 24:1193-216.
Sincich, L. C., D. L. Adams, e t al. (2007). "Transmission of spike trains at th e retinogeniculate
synapse." J Neurosci 27(10): 2683-92.
Strong, S. P., R. Koberle, et al. (1998). "Entropy and Information in Neural Spike Trains." Physical
Review Letters 80(1): 197.
Swadlow, H. A. and T. G. Weyand (1987). "Corticogeniculate neurons, corticotectal neurons, and
suspected interneurons in visual cortex of awake rabbits: receptive-field properties,
axonal properties, and effects of EEG arousal." J Neurophysiol 57(4): 977-1001.
Theunissen, F. E., S. V. David, e t al. (2001). "Estimating spatio-tem poral receptive fields of
auditory and visual neurons from their responses to natural stimuli." Network 12(3):
289-316.
Theunissen, F. E., K. Sen, e t al. (2000). "Spectral-Temporal Receptive Fields of Nonlinear Auditory
Neurons Obtained Using Natural Sounds." J. Neurosci. 20(6): 2315-2331.
Toyama, K. and Y. Komatsu (1984). "Integration of retinal and m otor signals of eye m ovem ents
in striate cortex cells of th e alert cat." Journal of Neurophvsiology(51): 649--665.
Vinje, W. E. and J. L. Gallant (2002). "Natural stimulation of th e nonclassical receptive field
increases information transmission efficiency in VI." Journal of Neuroscience(22): 2904-2915.
Weyand, T. G. (2007). "Retinogeniculate transmission in wakefulness." J Neurophysiol 98(2):
769-85.
Weyand, T. G. and J. G. Malpeli (1993). "Responses of neurons in primary visual cortex are
modulated by eye position." J Neurophysiol 69(6): 2258-2260.
Yeh, C.-l., C. R. Stoelzel, e t al. (2009). "Functional Consequences of Neuronal Divergence Within
th e Retinogeniculate Pathway." J Neurophysiol 101(4): 2166-2185.
54
A P P E N D IX A: A b r i e f r e v ie w o f v is u a l s y s t e m d e s ig n
Vision has probably been the most studied of sensory systems, and ideas of how the
visual system worked can be traced back to Greek and Arab philosophers. Today, to the
extent there is consensus, the modern era of vision research began with Hartline
(Hartline 1938) who is credited for first using the term ‘receptive field’. Out of this and
several other studies has emerged a paradigm that can best be captured by the following
principles:
A .i - V isio n is r ep re sen ta tio n a l
We live in a world external to us, and make inferences about the world based on the
patterns of reflected light that enters our eyes and activates photoreceptors, rods and
cones. These patterns of reflected light provide the ‘proximal stimulus’ from which we
construct our representation of the outside world.
A. 2 - T he recep tiv e fie ld is th e m e tr ic to b e u n d e r sto o d
The receptive field as the metric to be understood is the neuron doctrine (e.g. (Barlow
1995))- A neuron is the major functional unit of the nervous system and the key
component for understanding how the brain works. Neurons produce action potentials
(spikes) only under certain conditions. Specification of those conditions constitutes the
cell’s receptive field. The neuron acquires its receptive field characteristics as a result of
integrating its inputs to shape its output.
A .3 - V isio n is m od u lar
The light to the eye activates neurons in the retina. The retina is the 1st of many
structures, or modules which process this pattern of light to construct a representation of
55
the world. The 2nd module, which the retina directly connects to, is the LGN. Each
module contains at least one representation of the world. For the ‘early’ modules (closest
to the retina), the representation is topographic, i.e. adjacent points within the module
correspond to adjacent points in the world. The word ‘retinotopic’ is commonly used in
place of topographic in referencing adjacent points on the retina. Topographic
representations are 1st order topological representations; the rule is adjacent points map
onto adjacent points. Local expansion is allowed (e.g. central magnification), b u t not
such things as ‘shearing’ of the representation. Higher order (i.e. non-topographic)
representations occur in modules more removed from the retina. The major concept for
the module is that each module contributes in some unique way to vision.
A .4 - V isio n is m o stly h ie r a rc h ica l
Visual perception is an achievement of running through a chain of modules. Generally
speaking it is hierarchical, with each module depending on its response based on the
integrity of modules below it (i.e. closer to the retina). Thus, interruption of the
preceding module should cause a serious disruption of activity. The retina and LGN are
strongly coupled, and the loss of the retina eliminates activity in the LGN (e.g.(Kaplan
and Shapley 1984)).
A .5 - V isio n is m a ssiv ely p a ra llel a n d p o s s e s s e s c h a n n els
The visual system is massively parallel, and this feature originates at the level of the
retina. An individual neuron is only privy to a small portion of the receptive field.
W hatever processing it might do is repeated by many other neurons at this same level.
For example, if there are 40,000 on-center cells at the level of the retina, the assum ption
is that these 40,000 cells perform a similar process, but for different locations of the
visual field.
The idea of channels is that on-center and off-center cells can be farther divided on the
basis of linearity of spatial summation, chromatic properties, or linear response to
changes in contrast. A strong statem ent is th at one channel, parvocellular, is associated
with form vision (identification of ‘w hat’ is out there), and a second channel,
magnocellular, is associated with movement (very loosely, ‘where’ the ‘w hat’ is).
Parvocellular neurons linearly sum light and dark across its receptive field, are sensitive
to wavelength, and saturate easily over a relatively narrow range of contrasts.
Magnocellular neurons include neurons that summate across space, and those th at are
non-linear, but magnocellular neurons of early vision are ‘broad-band’ (do not
discriminate on the basis of wavelength), and have a linear output for a much greater
range of contrast than parvocellular neurons. These magnocellular/parvocellular
differences hold at least into prim ary visual cortex, and form the basis of a ‘w hat’ and
‘where’ pathway.
A .6 - R ecep tive fie ld s p r o g r ess fro m ‘d e n s e ’ to ‘sp a r se ’
W ithin the hierarchy, neurons close to the retina typically have high firing rates and are
poorly discriminating on the basis of spatial or tem poral structure. Such neurons are
referred to as ‘dense’. As one moves deeper into the hierarchy, the neurons become much
more selective for stimulus parameters. ‘Sparse’ is a term first used by Field (Field 1987)
to describe neurons with increased selectively, and at th at tim e he was assigning that
label to the orientation-selective simple cells of prim ary visual cortex. With firing rates of
20-50 spikes/sec during movies, it would be difficult to designate LGN neurons as
‘sparse’. On the other hand, with firing rates below 1 per second when viewing the same
time-varying natural images, it seems reasonable to describe visual cortical neurons as
‘sparse’. The limit of sparse coding is often grist for philosophy. Relatively recent studies
57
indicate neurons in remote visual areas (i.e. well removed from retina) can acquire
amazing selectivity to respond to movie stars such as Jennifer Anniston or Halle Barry
(Quiroga, Reddy et al. 2005). Nearly no one believes we were born with such cells, but
one can raise the question regarding whether the visual system can or should be any
sparser in its design. Careful examination of the responses indicate th at such cells are
tuned to specific features; however, they do respond to other features, and it is unlikely
that such cells maintain such fixed properties i.e. they likely shift to adjust their tuning to
some other feature that the person will become enamored with next month.
58
A P P E N D IX B: T h e la t e r a l g e n ic u la t e n u c le u s (L G N )
The essential feature of the LGN is th at it receives from the retina (a necessary ‘drive’)
and projects to prim ary visual cortex. In the monkey, nearly all neurons are concentric.
The LGN of the many mammals including primates is laminated. In the rhesus monkey,
the central 20° (i.e. to the representation of the optic disc) contains 6 layers. Each of
these layers contains a complete or partial topographic (retinotopic) representation of
the contralateral visual field. Three layers (l, 4, and 6) receive from the contralateral eye,
and three layers receive from the ipsilateral eye (2, 3, and 5). From 200 to ~45° (the
limits of the binocular field) there are 4 layers, and beyond 450, 2 layers (Malpeli and
Baker 1975). The ventral pair of layers (1 and 2) are described as ‘magnocellular’, and the
dorsal 4 layers are ‘parvocellular’ (properties described above). The LGN is topographic,
and, therefore, retinotopic. Adjacent regions of the LGN in a particular layer receive from
adjacent regions of the retina.
The functional significance of the LGN continues to be debated. At one extreme, would
lay the argument that it does nothing other than relay retinal output to cortex. The
evidence that it does exactly that is minimal. However, it is easy to understand why that
is a reasonable conjecture. Many, if not most, LGN cells are dominated by input from a
single retinal ganglion cell which provides a necessary but not sufficient drive for that
LGN cell. When the LGN cell and the retinal axon are simultaneously studied, it appears
th at the only difference is th at the LGN cell has a more powerful surround than the
driving retinal ganglion cell (Hubei and Wiesel 1961). Thus, by th at logic, success and
failure of retinogeniculate transm ission turns on the homogeneity of contrast across the
common receptive field center and surround. As a simple m atter, this difference
59
translates functionally into the LGN cell as being slightly more finicky, sparser, than its
retinal drive. Beyond this, other potential functions of the LGN are listed as follows:
B .i - ‘g a tin g ’ by sta te
This is the most popular explanation for the functional significance of the LGN. The
simplest version is that inputs from the brainstem associated with arousal/sleep and
wakefulness gate retinogeniculate efficiency. Thus, in a sleeping animal efficiency is poor,
in an aroused animal, efficiency is high (Sakakura 1968; Coenen and Vendrik 1972). One
of the curious wrinkles is that within wakefulness, arousal per se has little or no effect on
efficacy by itself (Swadlow and Weyand 1987; Sawai, Morigiwa et al. 1988; Weyand 2007)
increased efficacy only appears when the retinal output increases. Tested only recently,
m ost had believed wakefulness would yield very high retinogeniculate efficacy (~ioo% ).
In fact, efficacy in wakefulness hovers around 50% (Weyand 2007). It appears likely
other dynamic factors dealing with vision itself are responsible for altering the remaining
gain on retinogeniculate efficacy. Although one fMRI study (O'Connor, Fukui et al. 2002)
indicated the LGN was capable of showing alterations in efficacy associated with
selective attention, such results have never been observed at the level of individual
neurons. Finally, a recent study by McAlonan and W urtz (McAlonan, Cavanaugh et al.
2006) indicates that the adjacent thalamic reticular nucleus, which provides recurrent
inhibitory input back to the LGN can be selectively modulated by attention. Other than
we know it can alter gain, it is still unclear how much ‘state’ influences retinogeniculate
transmission.
B .2 - T em p oral d e c o rr e la tio n
As discussed in the main text, an im portant role of the concentric design is to recode the
significant correlations associated with natural images into a decorrelated form. The
60
concentric cell is well-designed for minimizing spatial correlations, and may well do a
good job
at minimizing tem poral
correlations
as well.
However, theoretical
considerations and the observation of diversity in response timing among LGN cells lead
to a hypothesis that temporal decorrelation would occur in the LGN (Dong and Atick
1995). Again, as described above, Dan and her colleague (Dan, Atick et al. 1996) provided
strong evidence that this occurs in the LGN of the paralyzed, anesthetized cat, as did the
current project demonstrating that the LGN spike train is also decorrelated by natural
images in the awake, behaving monkey. The remaining major issue in ascribing tem poral
decorrelation to the LGN is that no one has dem onstrated th at the retinal spike train is
not decorrelated as well.
B .3 - S accad ic su p p r e ssio n
Under normal conditions, we generate shifts in gaze several times per second. There is
the potential need for these self-generated shifts in gaze to be dissociated from
movement of the world (i.e. ‘Did I move, or did the world move?’), as gaze shifts generate
visual signals by sweeping the retina across visual texture. Because so many of our gaze
shifts are initiated by eye movements or saccades, the context is usually one of saccadic
suppression. This can be dem onstrated psychophysically by demonstrating th at people
can barely see letters when the letters are presented around the tim e of eye movements (>
90% drop in visibility). It is uncertain where this suppression begins. Recent studies
have indicated it may begin at the level of the retina by demonstrating some neurons are
selectively tuned for the kinds of global shifts associated with saccades (Olveczky, Baccus
et al. 2003). Alternatively, it appears th at the LGN may be involved with this process,
since it can be dem onstrated that there are both proprioceptive and oculomotor
projections into the LGN
(Schmidt 1996). Additional evidence emerges from the
61
dem onstration that LGN neurons can respond to eye movements in the dark (Lee and
Malpeli 1998). Two recent studies in cat (Lee and Malpeli 1998; Reppas, Usrey et al.
2002) dem onstrated similar changes in excitability around the time of eye movements.
They both described a suppression that precedes the eye movement by ~ io o msec,
recovers by onset, and exhibits a peak just after the end of the eye movement. It would
appear that saccadic suppression is likely a function of the LGN through perhaps a
cortical downflow, but effects of eye movements on visual excitability are more
pronounced in visual cortex (Toyama and Komatsu 1984).
B .4 - G ain fie ld s
Information about our visual world comes into our brain from the retina, which is in
retinal coordinates. To know where things are relative to us, we need to translate these
retinal coordinates into an egocentric frame. This is first achieved by constructing a
head-centered frame in which visual excitability is gated by eye position. Such a frame is
still retinotopic, but the gain of transm ission is influenced by the position of the eye in
the orbit.
In awake, behaving animals, such head-centered effects have been found as early as
prim ary visual cortex (Weyand and Malpeli 1993). In the paralyzed, anesthetized cat, Lai
and Friedlander (Lai and Friedlander 1989) showed th at moving one eye relative to a
fixed position of the other eye altered visual excitability in the eye whose position was
fixed. They made the argument that this could be used to construct a head-centered
coordinate system. This would suggest th at such gain fields are as close to the retina as
possible. However, such disjunctive eye movements as used in their preparation would
never occur in nature. Gating of visual response by eye position among LGN neurons has
yet to be reported in awake animals.
62
B .5 - In crea sed sp a r se n e ss
It would seem quite reasonable that the receptive fields of LGN neurons were not simply
copies of their retinal drives. Hubei and Wiesel’s (Hubei and Wiesel 1961) original
description of enhanced surround would be one example of a more sophisticated sparser
receptive field.
Richmond and his colleagues (McClurkin, Gawne et al. 1991) made some remarkable
observations to suggest LGN neurons could be quite sophisticated. Awake monkeys were
trained to fixate on a central target as the receptive field of the neuron under study was
probed with a series of checkerboard patterns that were only briefly displayed. These
patterns were specifically designed so that the receptive field center and surround was
always covered by a single patch th at could be either light or dark. There were 64
patterns and the patterns pseudo-randomly presented at least 12 times (i.e. minimum of
768 stimulus presentations). The response to each stimulus was then analyzed. The
remarkable observation was th at the investigators were more often than not, able to
predict which of the 64 patterns was present by evaluation of the response alone. They
looked specifically at the evolved response, which was activity evaluated over the first
250 msec.
Such sophistication would not be expected of retinal ganglion fibers, although this has
yet to be tested. The source of this increased sparseness is likely the result of the massive
projection back from visual cortex. An interesting observation about the LGN is that
although essential to driving the LGN cell, the retina comprises only about 10% of the
synapses on the LGN cell. The so-called ‘feedback’ projection from visual cortex accounts
for over 50% of the synapses (Erisir, Van Horn et al. 1997). Curiously, this observation,
and the methods developed to evaluate some of the temporal aspects of the visual
63
responses has never been re-visited. As with the temporal decorrelation hypothesis,
there should certainly be strong interest in comparing these data to those generated by
the responses to the retinal ganglion cells.
B .6 - M u ltip lexin g o f r etin a l sig n a ls
The retina is finite, and has limited room for growth, and the output, the optic nerve, is
often described as a bottleneck. Several investigators have noted how individual retinal
axons act to drive not just a single LGN neuron, but many (e.g., up to 30; (Yeh, Stoelzel
et al. 2009)). One idea is that such multiplexing can serve to amplify the signal strength
of individual retinal ganglion cells. Thus, it would mean th at the diverging axon could
now simultaneously activate a small group of cells. Synchronicity from a group of LGN
cells could greatly enhance the probability of activating visual cortical neurons. Further,
with just minimal mixing (e.g. convergence of retinal signals from 2 retinal fibers) onto
another LGN cell, one introduces a 3rd receptive field with properties th at would not be
found in the retina (e.g. (Dan, Alonso et al. 1998)).
64
A P P E N D IX C: A p p r o a c h e s t o c a p tu r in g t h e r e c e p t iv e f ie ld
The receptive field is central to the neuron doctrine. Some of the first descriptions of the
receptive field came from the visual system (e.g. (Hartline 1938)). Originally, they were
simply qualitative descriptions of the spatial distributions of ‘on’ and ‘off or ‘o n /o ff
regions. They were valuable in the sense that one could inventory neural structures for
the degree to which they had a particular type of receptive field. Some descriptions
became more sophisticated as a result of trying to categorize various receptive fields. For
example, Enroth-Cugall and Robson (Enroth-Cugell and Robson 1966) categorized
retinal ganglion cells on the basis of whether the neuron simply summed light and dark
over a delimited region (‘linear’, the ‘X’ cells) or did not (‘non-linear’, the ‘Y’ cells).
Rodieck (Rodieck 1965) is credited for creating the first model of a ganglion cell
receptive field, and as such did a great service in moving visual physiology forward. If
one understands something, one can either build it, or account for its properties by
modeling it.
Rodieck’s model was relatively simple and surprisingly successful. He simply fashioned
the receptive field as two overlapping Gaussian functions, one for the center, one for the
surround; convolved the stimulus with each; and took the difference. The resulting
model is often referred to as the ‘difference-of-gaussians’ or DOG model. The model
treats the receptive field as linear, simple summation applies, and he assumed that the
center and surround operated on the same latency to visual activation, also referred to as
spatiotemporal inseparability.
Rodieck’s model was a big step forward. However, there were two assumptions that
limited the general usefulness (robustness) of the model: the stimuli used and the fact
65
that the animal was paralyzed and anesthetized. Significant efforts have been made to
use stimuli that allow for more robust models, but little has been done to correct for the
fact that the animal is paralyzed and anesthetized.
Rodieck’s stimuli were spots, bars, and gratings. These were useful for driving the cells
and were relatively good for modeling when testing for responses with spots, bars, and
gratings. However, in the ideal case, one wants to know what the response would be to
arbitrary stimuli. Two solutions are used today. One is to use gratings but use gratings
that probe the receptive field parametrically in the spatial and tem poral domains (i.e. use
sinusoidal gratings that vary in spatial and tem poral frequencies and in contrast (e.g.
(Pillow, Paninski et al. 2005; Mante, Bonin et al. 2008)). Using such stimulus sets,
investigators typically take advantage of Fourier synthesis (the idea th at any arbitrary
image can be represented as a series of sine waves) to then predict a response to an
arbitrary stimulus or visual event.
A second method is to probe the receptive field with some fixed stimulus and recover the
field through reverse correlation. Because the receptive field is recovered from responses
over tim e and space, the receptive field is known as the space-time receptive field (STRF).
The stimulus can be a simple bar th at probes the receptive field along a single linear
dimension in space (e.g. (Cai, Deangelis et al. 1997)), or noise patterns (e.g. (Reid, Victor
et al. 1997)). Noise patterns have the prim ary advantage of not carrying any stimulus
correlations with them. As such, the receptive field recovered using reverse correlation is
the STRF without any necessary correction. Interestingly, the receptive field recovered
this way need not even be linear (c.f. (Reid, Victor et al. 1997)). The STRF represents the
best linear approximation of the receptive field.
Noise patterns are potentially powerful because they are correlation-free, hence, the
66
predictions they make (probabilty of a spike) can be extended to any arbitrary stimulus
and be reasonably successful (e.g. (Dan, Atick et al. 1996)). However, noise patterns do
not effectively drive all neurons. They seem good for retina and many LGN neurons, but
are not nearly as effective in driving the orientation-selective neurons of visual cortex
(Alonso, Usrey et al. 1996; Reid, Victor et al. 1997).
Natural images are avoided because they are statistically complicated and carry large
space-time correlations. Investigators, of course, are interested in them as they represent
the ultim ate stimulus set. Recently, both Theunissen and his colleagues (Theunissen, Sen
et al. 2000; Theunissen, David et al. 2001), as well as Dong and his colleagues, (Dong,
Kelso et al. 2002) have developed methods for recovering the STRF using natural images.
Theunissen and colleagues (Theunissen, Sen et al. 2000; Theunissen, David et al. 2001)
used sample cells from both the auditory and from the visual systems.
All methods are basically the same, and yield some fascinating results. The m ethod is to
use natural stimuli in time, and then use the reverse correlation to make the initial
recovery of the space-time receptive field. This simple recovery can be referred to as a
spike-triggered average (STA), but is still obtained by reverse correlation. The logic of
reverse correlation is straight-forward, spikes occur for a reason, they were evoked by the
antecedent stimuli. Therefore, for each spike, one looks backwards in time (reverse
correlation) at the antecedent stimuli. When this is done several thousand times, a
pattern emerges. For stimuli that are uncorrelated, this is the space-time receptive field
(STRF). Armed with the companion polynomial equation that captures this, one can then
predict the response of that cell to any arbitrary stimulus. However, with natural images,
the generated STA carries all of the correlations, and predictions inevitably are poor.
Both Theunissen and Dong have developed methods to subtract the statistical properties
67
of natural images or sounds from the raw STA and recover an STRF. Interestingly, both
comment on similar issues. Theunissen and his colleagues (Theunissen, Sen et al. 2000)
comment that when using auditory ‘noise’ patterns (random phonemes), the predictive
power of the reverse correlation model was inferior to th at constructed from birdsongs.
Similarly, Dong and Weyand (Dong and Weyand 2007) found that the linear STRF
recovered from noise patterns was inferior at predicting response compared to the STRF
recovered from natural images.
Only Dong and his colleagues (Dong and Weyand 2007) have recovered the STRF from
awake animals free-gazing time-varying natural images. Given the observations by
Theunissen and his colleagues (Theunissen, Sen et al. 2000) on the superiority of
modeling using natural sounds, and what we know about the costs of anesthesia and
paralysis, it is readily apparent that these are the m ost accurate models th at represent
the true functioning of the neuron. For basic quantitative analysis, current models in
paralyzed, anesthetized preparations are a good exercise in dealing with complications of
vision, but will never adequately capture the behavior of these neurons in a truly natural
condition.
68
A P P E N D IX D : M e t h o d s o f a n a ly z in g s p ik e s
Several methods of spike analysis were employed to study the distribution and variability
of the spike train. Below are the general descriptions of the methods employed to
perform these analysis, as well as analysis of the statistics of different stimuli (natural
images and noise images).
D .i - Im age C orrelation
To derive the correlational structure of an image, the correlations of each pixel to all
other pixels in a horizontal or vertical range were derived. Thus, a pixel’s value at
position (x,y) was aligned to pixel values at (x+l,y)..(x+n,y) and (x,y+i)..(x,y+m) where n
equals the horizontal range of pixel positions and m equals the vertical range of pixel
positions. For each pixel distance, a two dimensional vector was formed of pixel values
that were a specific distance away from a reference pixel value. Once the vector for a
particular pixel distance was completed for all pixels in the image, the R-squared value of
the relationships was determ ined to arrive at the correlation values for that distance.
This was repeated up to a distance of 200 pixels away.
D .2 - A u tocorrellogram a n d FFT
The autocorrellogram is a common m ethod of presenting interspike intervals (ISI) when
interested in the underlying power in the frequency domain (e.g. rhythmic signals). By
compiling and plotting all interspike intervals of a response given a particular stimuli
type, an autocorrelogram can be derived (i.e. compile distance to each spike from each
spike such as the 1st spike to the 2nd spike, 1st to 3rd,... 1st to Nth, 2nd to 1st, 2nd to 3rd,... Nth
to N - l).
69
A Fast Fourier Transform (FFT) can be performed on the autocorrelogram. The result of
the FFT indicates the strength of representation of each frequency. Thus, the higher a
particular frequency is, the more power that frequency has, indicating that a particular
ISI had a higher occurrence in the autocorrellogram.
D .3 - K u rtosis
Another way of looking at the variability or distribution of spikes is to count the number
of spikes which occurred across an interval of 250msec. If there is low variability in the
spike distribution the num ber of spikes occurring during each epoch of 250msec would
remain relatively the same. Thus a peak would be formed. However, should the
variability be large there would be a spreading of the spike counts such th at the counts of
spikes per 250msec epoch would be m ore evenly distributed. For example, instead of a
consistent rate of 10 spikes per 250msec resulting in a peak, there would be an
equivalent representation of 5 spikes per 250msec as 10 spikes per 250msec and all
other counts in between, resulting in a more flattened distribution. This flatness is
m easured as kurtosis. The more flat or evenly distributed the spike counts are the more
platykurtic the distribution is. Likewise, the more peaked a distribution is the more
leptokurtic it is.
D .4 - Pre v e rsu s P o st In tersp ik e In tervals
Alternatively, the variability of the spike distribution can be observed qualitatively
through plotting the pre-interspike interval against the post-interspike interval. For each
spike at time T, we plot a point where the absicca is the tim e since the previous spike and
the ordinate is the time to the next spike. The visual sparseness of the plot would imply a
greater degree of variability; whereas, a relatively concentrated spot with minimal
spreading would imply a tendency of the response to lock into a particular firing rate.
70
Документ
Категория
Без категории
Просмотров
0
Размер файла
12 567 Кб
Теги
sdewsdweddes
1/--страниц
Пожаловаться на содержимое документа