вход по аккаунту


Comparability of methods used in the sampling of primate behavior.

код для вставкиСкачать
American Journal of Primatology 5:l-15 (1983)
Comparability of Methods Used in the Sampling of
Primate Behavior
'Department of Psychology, University of Californiq Riverside, and 'Graduate School of
Education, University of California, Los Angeles
Two measures of a behavior are defined as comparable if one is predictable
from the other. The comparability of one-zero, instantaneous, actual-frequency, and actual-duration sampling was investigated using a Monte
Carlo simulation to extend the range of sampling intervals beyond those
ordinarily found in the literature. Several combinations of bout time, rate
of response, and observation interval were simulated. Comparability between one-zero or instantaneous scores and actual frequency or actual
duration was higher for higher behavior rates, decreased with longer observation intervals, and was usually higher for longer bout times. Curves of
p2s from multiple regressian analyses revealed a fundamental difference in
the source of one-zero and instanteous predictability. Both actual frequency
and actual duration contributed substantially to one-zero predictability,
indicating high comparability between one-zero scores and a weighted
combination of actual frequency and actual duration. Once the contribution
of actual duration to instantaneous scores was accounted for, actual frequency contributed virtually nothing additional. Long-standing meanings
of validity, reliability, and comparability were applied to the findings,
resulting in the conclusion that all of the sampling methods are useful,
depending upon the researcher's approach, resources, and problem.
Key words: simulation, sampling methods, primate behavior
Findings from empirical studies of several widely used methods of sampling
spontaneous behavior [Altmann, 19741 have been evaluated and summarized by
Rhine & Linville [1980]. The four types of scores most commonly compared in
empirical studies are actual frequencies, actual durations, one-zero scores, and
instantaneous scores [Dunbar, 1976; Kraemer, 1979; Leger, 1977; Simpson & Simpson, 1977; Rhine & Flanigon, 1978; Rhine & Linville , 19801.Actual frequency is the
total number of observed bouts of a sampled behavior. Actual duration is the
Received January 7, 1982; revision accepted December 7,1982.
Address reprint requests to Ramon J. Rhine, Psychology Department, University of California, Riverside,
CA 92521.
0 1983 Alan R. Liss, Inc.
Rhine and Ender
proportion of the total observation time during which the behavior is recorded as
occurring. A onezero score is the proportion of observation intervals, such as 30second intervals, during which one or more bouts of the behavior are observed. An
instantaneous score is the proportion of defined instants in time, such as the ends of
30-second intervals, a t which the behavior is observed in progress. Two additional
quantities used in this paper are rate and bout time. Rate is the frequency of a
behavior’s occurrence per specified unit of observation time, such as frequency per
minute. Bout time is the time elapsing from the onset of a single occurrence of a
behavior to its termination.
The above four measures of spontaneous behavior, like all behavioral measures,
may be characterized in terms of their validity, reliability, and comparability, which
have long-standing meanings in behavioral measurement [eg, Guilford, 1936; Hilgard, 1962; Kimble et al, 19801. These meanings have not always been applied
consistently in discussions of the relative merits of the four sampling procedures
[Rhine & Linville, 19801. In keeping with the long-standing meanings, a measure
will be considered valid if it actually does index the phenomenon being studied. For
example, one-zero scores will be regarded as a valid measure of social affinity if, in
fact, higher one-zero scores tend to occur when social affinity is higher and lower
one-zero scores tend to occur when social affinity is lower. Reliability refers to a
measure’s precision or accuracy in the sense of consistency. For example, actual
duration is a reliable measure of self-grooming if two researchers observing simultaneously and independently obtain equivalent actual durations of self-grooming.
Reliable measures are not necessarily valid measures. A reliable measure of selfgrooming probably will not be valid if the actual duration of self-grooming is used
as a measure of social attractiveness and may or may not be valid if used as a
measure of social isolation. Valid measures must be reliable. Thus, if researchers
observing simultaneously and independently obtain highly inconsistent durations
of self-grooming, such durations cannot be a valid measure of social isolation or
anything else. Available data indicate that the four types of scores usually are about
equally reliable, and no empirical data exist demonstrating that one of these measures is more or less valid than others as measures of primate phenomena such as
social affinity [Rhine & Linville, 19801.
Two measures are comparable if one is predictable from the other. Comparable
measures may yield different numbers, eg, kilograms versus pounds as measures of
a man’s weight. Measures yielding different numbers are comparable if they are
related by a transformation equation from which one measure is predictable from
the other (eg, 2.2 times kilograms equals pounds). Comparable measures need not
look alike or have the same format. For example, two intelligence tests with completely different items are comparable if they are related in such a way that a score
on one is predictable from a score on the other. Indeed, it is common practice to
validate a new test by demonstrating that its scores are well related to (correlated
with) scores from accepted tests [eg Kimble, et al, 19801.
The meaning of a high linear correlation coefficient is that one variable is well
predicted from the other; hence, in behavioral science, the comparability of two
measures is commonly indexed by linear regression for which the simple transformation equation is well known and from which the degree of comparability can be
conveniently expressed using a single unitless number, the correlation coefficient.
Since the correlation coefficient can be used to determine the degree of predictability from one measure to another, it can be used also as a n indicator of the degree of
comparability of these measures. Different measures used as somewhat comparable
indices of the same phenomenon may not be perfectly correlated because of unreliability and because they may measure somewhat different aspects of the phenome-
Comparability of Sampling Methods
non. For example, social affinity between two animals may be measured in one
study by the frequency of association and in another by the duration of association,
even though in several cases actual frequency and actual duration are only moderately correlated [eg, Rhine & Linville, 19801. It is impossible to have a high linear correlation without a high degree of predictability, and, therefore, a high degree of
The Monte Carlo simulation reported in this paper bears directly upon this
conception of comparability and only indirectly upon reliability and validity. The
simulation bears upon comparability because the data are correlations between
methods, indicating the degree to which one type of score is predictable from one or
more of the others. These data by themselves tell nothing about reliability or
validity; however, if measure A is comparable to measure B, and if A is reliable
(valid), then B is also reliable (valid).
The four sampling methods have been examined previously in studies which
were limited in scope. A massive effort would be required to conduct an empirical
study in which all combinations of several relevant dimensions are varied simultaneously over a wide range of values. In addition, factors such as bout time are not
under the researcher’s control in empirical studies of spontaneous behavior. These
limitations can be alleviated by a computer simulation which extends the range of
observation intervals beyond those commonly studied and which allows comparisons of the four types of scores while systematically varying combinations of observation intervals, rates, and bout times.
A successful simulation is a n analytic approximation of reality, but specific
details of empirical and simulated findings are not expected to be identical. The
simulation focuses upon a limited number of main variables, whereas empirical
results can be influenced by nuances of living conditions, group composition, health,
individual experience, reproductive condition, food supply, weather, etc. Nevertheless, broad correspondence can be judged by determining if known empirical trends
tend to be reproduced by the simulation. If the basic forms of empirical trends are
found for those portions of simulated relationships for which empirical counterparts
exist, then confidence in the overall simulation is increased.
Behavior observations were simulated using a computer program developed to
manipulate independently the rates and bout times of simulated behavior. Twenty
simulated observation periods of four hours each were generated for each of several
combinations of rate and bout time. During the development of the program, trial
runs of 50 and 100 replications were alsn tested. They did not differ significantly
TABLE I. Rates and Bout Times Used in the Simulation
Average bout time in sec
1 5
1 5
1 5
1 5
1 5
1 5
60 120
60 120
60 120
Rhine and Ender
from the 20 replication trials. As shown in Table I, 42 combinations of rates and
bout times were employed; these combinations yielded a total of 840 simulated
observation periods of four hours each.
In Table I, a rate of 1/10, for example, indicated that the behavior occured an
average of once every ten minutes. The upper right-hand portion of Table I is blank
because some combinations of rates and bout times were not used, namely, combinations where bout times were long in relation to rates. For example, a n average
rate of once per minute (111) with a n average bout time of two minutes was not
meaningful. Rates in Table I are grouped into higher, middle, and lower rates.
Similarly, bout times are grouped into shorter and longer times.
The simulation program created a n array of 14,400 elements representing a
four-hour observation period of 14,400 seconds. Every element of the array was
initially set to zero, indicating that the behavior was not occurring. Next, the
expected number of behavior bouts was computed by multiplying the minutes (4 hr
x 60 = 240 min) in the observation period by the appropriate rates. For example, if
rate was set a t 115, then the expected number of behavior bouts would have been
240 x 115 = 48. A random number generator was used to generate 48 starting
positions which were random uniform numbers between one and 14,400. For each
starting location, the element in the array corresponding to the random number was
changed form zero to one, indicating the beginning of a behavior bout. Bout times
were randomly generated about each specified mean bout time, with the standard
deviation changing in proportion to the bout time and with longer bout times having
greater variability. For example, if a mean bout time of ten seconds was specified,
then the random number generator would generate bout times with a mean of ten
which were then randomly assigned to each of the starting locations in the array.
Following each starting location, the number of elements equal to the bout time
minus one were changed from zero to two, which indicated the continuation of a
behavior bout. If a “one” was encountered during the course of changing zeros to
twos, the program left it in place and generated a new random time for the next
behavior bout.
The end product was a n array of 14,400 elements with the starting times
randomly distributed and with behavior bouts of varying lengths centered upon a
prescribed mean. Once the above array of simulated observations was created, it
was a simple matter to obtain actual frequencies, actual durations, instantaneous
scores, and one-zero scores. These latter two scores were obtained for each of the
following seven observation intervals: 5, 15, 30, 60, 120, 300, and 600 seconds.
Observation intervals most commonly used in studies of spontaneous primate behavior fall within the range of five seconds to ten minutes.
Fit Between Simulated and Empirical Results
Correlations were calculated from the entire data base to determine if the
simulation yielded a reasonable approximation to available empirical results from
primates for which observation intervals usually varied from ten to 120 seconds.
Figure 1presents curves for all rates and bouts combined, comparing over the seven
observation periods trends of correlations for one-zerolactual-frequency, instantaneouslactual-frequency, one-zerolactual-duration, and instantaneouslactual-duration. In Figures 1-5, correlations of .19 or higher were significant at the .01 level,
and correlations of .27 or higher were significant at the .001 level.
The left half of Figure 1 shows correlations between actual frequency and onezero or instantaneous scores. For each observation interval, the one-zerolactualfrequency correlation was always higher than the instantaneouslactual-frequency
correlation, and the differences between these pairs of correlations were quite large.
Comparability of Sampling Methods
5 60 120
Fig. 1. Overall correlations of one-zero and instantaneous scores with actual frequency and actual
duration, as a function of size of observation interval.
3 .75
- -*
- _ _ _ - - -0
.oo 5 6 0
5 6 0 120
z 1.00
2 -75
n .50
Fig, 2. Correlations for three rate groups of one-zero and instantaneous scores with actual frequency and
actual duration, as a function of size of observation interval.
This simulated finding is consistent with empirical studies, and the initial increasing trend in one-zero/actual-frequency correlations is also comparable t o the empirical trend [Leger, 1977; Rhine & Linville, 19801. The empirical curves for
instantaneous/actual-frequency correlations were almost horizontal lines in the 10120-second range, centering around 5 3 in one study and .65 in another. For simulated data, the slightly declining trend within this range was not markedly different
from horizontal. Similarly, the right half of Figure 1, which shows trends of correlations of one-zero or instantaneous scores with actual-duration, is also consistent with
empirical findings [Leger, 1977; Rhine & Linville, 19801. Such consistency lends
credence to simulated trends extending beyond available empirical data.
As will be seen from Table I, it is possible to compare the three rate groups with
bout time held constant by making the comparison for shorter rates only. This was
done in Figure 2, which contains curves of correlations comparable to those of Figure
1but with the data subdivided by rate. Most of the increasing or decreasing trends
Rhine and Ender
L o o 560120
5 60120
0 50
k .25
' ==- -'-a
560 120
Fig. 3. Correlations for two bout-time groups and for middle rates of one-zero and instantaneous scores
with actual frequency and actual duration, as a function of size of observation interval.
Z l.0Or
5 60 120
-.- -- -
'O0! '$0 IhO
- - -----
D .50
- --.
5 6 0 120
.oo 560120
Fig. 4. Correlations far two bout-time groups and for lower rates of one-zero and instantaneous scores
with actual frequency and actual duration, as a function of size of observation interval.
in curves of Figure 2 paralleled those of Figure 1. (In Fig. 2 and subsequent figures,
points not shown for a given observation interval would be overlaid by those plotted.)
Of the 84 correlations in Figure 2, there was only one case (upper left graph,
600 seconds) where a correlation for a higher rate may have been meaningfully
lower than a counterpart from middle or lower rates. Except for this one case, onezero correlations with actual frequency or actual duration were as large or larger
for higher rates than for middle and lower rates; similarly, instantaneous correlations with either actual frequency or actual duration were always a s large or larger
Comparability of Sampling Methods
5 6 0 120
-------- - -0
5 60 120
5 60120
,o 05
60 120
Fig. 5. Correlations for two bout-time groups and for all rates of one-zero and instantaneous scores with
actual frequency and actual duration, as a function of size of observation interval.
for higher rates than for middle or lower rates. Except for the upper left hand curves
of Figure 2 (one-zero/actual-frequency), all correlations for middle and lower rates
obtained from intervals of 120 seconds or more were substantially lower than the
corresponding correlations for higher rates.
Bout Times
Longer and shorter bout times may be compared for middle rates and again for
lower rates (see Table I). Longer and shorter bout times for middle rates are shown
in Figure 3 and for lower rates in Figure 4.The general shapes of all curves of these
two figures were similar to corresponding sampling-method curves of Figures 1and
2. Except for one-zero/actual-frequency correlations, all other pairs of curves in
Figures 3 and 4 had approximately equal or higher correlations for longer bout
lengths than for shorter. The reverse occurred for one-zero/actual-frequency, and this
is especially clear in Figure 4.
An interesting reversal occurred when bout times were compared for all rates
combined. As shown in Figure 5, instantaneous correlations for shorter bout times
were essentially equal to or substantially greater than corresponding correlations
for longer bout times, which was the reverse of the trends in Figures 3 and 4.This
change was due to the inclusion in Figure 5 of higher rates which, for reasons stated
above, involved only shorter bout times (see Table I). Higher rates yielded higher
correlations, and adding higher rates to middle and lower ones tended to elevate
correlations for shorter bout times.
Multiple Correlations
Figure 6 shows multiple correlations (R’s) of actual frequency and actual duration regressed upon one-zero or instantaneous scores. All of these R’s were statistically significant at beyond the .001 level. The left-hand graph is for different rates,
and the right hand graph is for bout times. The graphs in Figure 6 have magnified
slopes compared to those in previous figures, because in Figure 6 the left-hand scale
is from .60 to 1.00 whereas in previous figures it is from .OO to 1.00. The magnified
Rhine and Ender
n -84-
.OO'f" I
5 6 0 120
5 6 0 120
Fig. 6. Multiple correlations for rates and bout times of one-zero or instantaneous scores predicted from
actual frequency and actual duration, as a function of observation interval.
scale is necessary to achieve visual separation among many points, especially at
short observations intervals. Even so, because of crowding, several points a t the
shortest observations intervals had to be left out; for a given interval, omitted points
are always very close to the ones shown.
All of the 50 R's for the most commonly used observation intervals (up to 120
seconds) of Figure 6 were .92 or higher, and only five of these 50 were less than ,951.
In other words, up to an observation interval of 120 seconds, both instantaneous and
one-zero scores were predictable with high precision from knowledge of actual
frequency and actual duration. Furthermore, up to a n observation interval of five
minutes, one-zero scores were still well predicted from actual frequency and duration, the 30 applicable R's of Figure 6 all being .84 or higher. An equivalent level of
predictability did not occur for instantaneous R's based upon middle or lower rates
or upon longer bout times. For instantaneous multiple correlations, shorter bout
times yielded equal or higher correlations than longer bout times, which was consistent with the trend of the correlations in Figure 5. The most consistently high
multiple correlations (Fig. 6, upper curve, right-hand graph) were for prediction of
one-zero scores using longer bout times. For the seven observations intervals, one R
was .92 and the remainder were .95 or higher.
The reasons for the trends of R's in Figure 6 are clarified by Figure 7, which
contains curves of squared standardized regression weights (p's) reflecting the relative contribution to the R's of the two predictor variables, actual frequency and
actual duration. The instantaneous p2s are shown in the two right-hand graphs of
Figure 7. These P's for actual frequency were zero or close to zero for all rates and
bout lengths; on the other hand, the contribution of actual duration was large,
though it gradually decreased as the observation interval increased. The trend of
the p2s due to actual duration paralleled that of the instantaneous R's (Fig. 6)
because, leaving unreliability and sampling error aside, instantaneous scores provide a direct estimate of actual duration [Altmann, 1974; Kraemer, 1979; Rhine &
Linville, 19801. The instantaneous/actual-frequency correlations in Figures 1through
5 could still reflect actual duration since actual duration and actual frequency are
themselves moderately correlated IRhine & Linville, 19801. When the contribution
of actual duration was taken into account in the regression analysis, actual frequency contributed little or nothing additional to the instantaneous R's.
The situation for one-zero R's is fundamentally different, as may be seen from
the two left-hand graphs of Figure 7. Not only did actual frequency and actual
Comparability of Sampling Methods
_ _ _ ACTUAL
I .00
5 60120
- 1.00z
g .75
5 60120
*:--- --- ----&-
5 60 120
_ I
.ooRpc v^
5 6 0 120
Fig. 7. Squared standardized regression weights (0's) for the multiple correlations of Figure 6.
duration both contribute substantially to one-zero multiple correlations, but their
relative contribution differed for different observation intervals. At the shortest
observation intervals, one-zero R's reflected primarily actual duration. As the interval size increased, the contribution of actual frequency became increasingly prominent, with the curves of p2s intesecting and the contribution of actual frequency
being much greater than actual duration for the longer observation intervals. Similar trends for curves of 6% occurred in empirical studies using intervals up to 120
seconds. The simulated trend beyond 120 seconds suggests that, for observation
intervals beyond five minutes, one-zero R's reflect primarily actual frequency. Because of the combined contributions of actual duration and actual frequency, onezero R's in Figure 6 in every case except one were equal to, nearly equal to, or higher
than their instantaneous counterparts. As the p2s show, when the contribution of
actual duration decreased, an equally precipitous drop in the one-zero R's was
avoided by a balancing increase in the contribution of actual frequency.
Although the numerical values of empirical and simulated correlations occasionally diverged, the trends of the simulated data were regularly consistent with
available empirical information for which observation intervals from ten to 120
seconds have been investigated. This correspondence between simulated and available empirical results suggests that simulated trends beyond 120 seconds probably
are reasonable approximations of reality.
Why are intervals within the range of ten to 120 seconds the ones most commonly chosen? Perhaps research designers judge very short or very long intervals
as inconvenient, impractical, or less reliable than others. Very short intervals can
strain an observer's information-processing capacities and require intense observer
concentration, possibly leading to inaccuracies due to fatigue [Altmann, 19741. Very
long intervals will usually yield smaller samples, which tend to magnify the influence of a few interobserver inconsistencies. Whatever the basis of choice, correlations
Rhine and Ender
reported in empirical studies indicate that the various methods are largely interchangeable up to 120 seconds [Dunbar, 1976; Rhine & Linville, 19801.
The size of such correlations is a n index of the degree to which one measure is
comparable to another. To say that comparability is indicated by the unitless
correlation coefficient is to say also that comparability does not require sameness
of absolute values, the actual magnitude of the numbers obtained [Rhine & Flanigon, 1978; Rhine & Linville, 19801. Inches and centimeters are comparable measures of length, even though for a given object they yield different numbers. Similarly,
(4,121, (5,131, and (6,14) are pairs of scores which yield the same correlation as (lJ),
(2,21, and (3,3); therefore, the numbers in the second set of pairs, a s two measures of
behavior, are no more or less comparable than those in the first set. Furthermore,
absolute values of behavior frequencies obtained from any of the sampling procedures are neither “true” nor “natural” in a meaningful general sense because these
frequencies are dependent upon factors such as group composition, ecology, the
definition of behavior the researcher decides to use, and the properties of the
sampling procedures employed [Rhine & Linville, 19801.
Comparability is an ingredient in the choice of one procedure over another.
Factors such as validity and reliability cease to be major considerations in choosing
between two measures if the measures are highly comparable. This leaves more
room to weigh practical matters such as cost, convenience, administrative feasibility,
etc. Information on comparability is available from the simulation and from the
above-cited empirical studies. This information is summarized below, with existing
empirical data weighted more heavily if there is any discrepancy between it and
simulated data:
1. One-zero frequency was a reasonable index of actual frequency or vice versa
for most observation intervals a researcher was likely to choose, and the two
were quite comparable for intervals of 15 seconds to 5 minutes.
2. For lower or middle rates of responding, actual frequency and one-zero frequency appeared to be moderately comparable for observation intervals of 30
seconds or more, but not for the shortest intervals. However, it is unlikely that a
researcher would choose to employ the highly inefficient method of using, for
example five-second observation periods with a behavior that does not occur very
3. Available data from primates indicate that instantaneous frequencies and
actual frequencies are unlikely to be more than modestly comparable; consequently, they are probably best treated as not interchangeable unless there exists
situation-specific evidence to the contrary.
4. The empirical data indicated that one-zero and actual-duration scores were
reasonably comparable up to about 60-120-second intervals, and the trend of the
simulation suggest a decrease in their comparability thereafter.
5. Instantaneous scores and actual duration were almost completely comparable up to 120 seconds, with decreasing comparability thereafter.
6. The empirical data [Rhine & Linville, 1980: Table 21 suggested that up to
120 seconds, and probably well beyond, one-zero scores from one observation
period were comparable to those from another; the same was true of paired sets
of instantaneous scores. One-zero scores were comparable to instantaneous scores
if the same observation intervals were used for both. If not, the correlations
between one-zero and instantaneous scores tended to be lower the greater the
difference between the interval lengths, but these correlations never fell below
Comparability of Sampling Methods
7. Actual frequency and actual duration were only moderately correlated in
studies summarized by Rhine and Linville [ 19801; therefore, they should rarely,
if ever, be treated a s interchangeable without situation-specific evidence of high
correlation, such as that supplied by Chamove [1974].
8. One-zero scores were highly predictable from a weighted combination of
actual frequency and actual duration.
Number of Hits
The number of hits in a sampling session is the number of times the behavior is
recorded as occurring. The number of hits is important because a low number tends
to yield low reliability and low comparability. The number of hits is partially
determined by the research design. Hits depend upon the length of the observation
intervals chosen by the researcher and upon behavior definitions. Sometimes the
expected range of hits can be estimated from previous research or pilot studies.
The degree of comparability of sampling procedures under different conditions
of the simulation was indicated by the curves of correlations in Figures 1-5, in
which three main trends were associated with variation in the number of hits. One
of these main trends was that higher rates, which yielded more hits, tended to
produce higher correlations among sampling procedures (Fig. 2). A second main
trend was lower correlations with larger observation intervals (Figs. 1-5). Larger
observation intervals per unit time (four hours in the simulation) is equivalent to
fewer observation intervals and consequently to fewer hits. A third main trend of
the simulation was the relationship of bout time to trends of correlations among
sampling procedures (Figs. 3,4). This relationship, which was also modulated by the
number of hits, will be discussed below for instantaneous and one-zero sampling.
Instantaneous sampling. The number of hits from instantaneous sampling
depends upon the ratio of bout time to the observation interval. For example, if all
bout times are shorter than the observation interval, the maximum number of
instantaneous hits in the total observation session is equal to the actual frequency,
and the minimum is zero. The number of such instantaneous hits will usually be
larger than zero and smaller than the actual frequency. If all bout times are longer
than the observation interval, the minimum number of hits is the actual frequency;
the maximum usually will be greater than the actual frequency because bout times
longer than the observation interval can be sampled more than once. For example,
in stump-tailed macaques sampled every 30 seconds [Rhine, 19731, adult grooming
bouts of several minutes occurred. A ten-minute bout yielded 20 instantaneous hits
from a single actual frequency. In contrast, instantaneous sampling with a tenminute observation interval, instead of every 30 seconds, will yield not 20 grooming
hits, but only one.
Instantaneous scores are a good index of actual duration unless the number of
hits is too small. Large observation intervals are not unlikely to yield few or zero
instantaneous hits even through the behavior occurs several times. Few or no hits
from large intervals are especially likely if the behavior's rate is low and its bout
time is short. If a correlation is calculated from two sets of scores, and if the scores
in one set are always the same, for example, zero, then the correlation will be zero
regardless of the values of the other set of scores. Because of approximations to this
condition, the instantaneous correlations in Figure 2 were smaller for middle and
lower rates than for higher rates and were smaller in Figures 3 and 4 for shorter
bout times than for longer. The least comparable should be the combination of
shorter bouts, lower rates, and the longest observation interval, which yielded the
lowest instantaneouslactual-durationcorrelation in Figure 4.
Rhine and Ender
This last effect upon instantaneous/actual-duration comparability should be
alleviated by increasing the number of observation intervals. In the simulation, the
smaller the observation interval, the more there were and the higher the instantaneousiactual-duration correlations in Figures 1-5. As the time of the observation
interval approaches zero, the number of intervals approaches infinity. As the
number of intervals approaches infinity, the correlation between actual duration
and instantaneous scores necessarily approaches 1.00 because the proportion of
time the behavior is sampled, regardless of rate or bout length, will become equal
to the proportion of time the behavior occurs, which is the definition of actual
For middle and lower rates, longer bout times yielded higher correlations with
instantaneous scores than did shorter bouts (Figs. 3, 4). In Figures 3 and 4, rates
were the same for both longer and shorter bouts. Given the same number of
occurrences, the probability of a n instantaneous hit is greater with longer bout
times than with shorter; therefore, the number of hits would be larger for the
longer bout times of Figures 3 and 4. The reverse becomes possible if a change
occurs which makes the number of hits for shorter bout times larger than those for
longer times, and that is what happened when high rates (all shorter bout times)
were added into the data (Fig. 5). In Figure 5, rate and bout time were confounded,
with the rate effect seen in Figure 2 outweighing the bout-time effect of Figures 3
and 4.
One-zero scores. Putting aside unreliability, one-zero sampling, unlike instaneous sampling, can yield zero hits only if the behavior does not occur a t all during
the total observation session. For a given observation session, the number of hits
from one-zero sampling is typically larger than the number from instantaneous
sampling. Even short, infrequent bouts, which can easily yield zero instantaneous
hits, especially with long observation intervals, will each tend to contribute one or
more one-zero hits.
One-zero scores and actual frequency will be comparable, as indicated by linear
correlation, if the magnitude of actual-frequency scores is proportional to the
magnitude of paired one-zero scores. Linear proportionality tends to break down
when the time of the observation interval is small relative to bout time, which is
most likely for the shortest observation intervals of the simulation. For example,
with a n observation interval of five seconds a single bout of three minutes’ duration will yield 36 hits if it starts precisely a t the beginning of a five-second interval,
or otherwise, 37. Thus, a single actual frequency is usually associated with 37 onezero hits. If the next bout is also three minutes, proportionality obtains, since that
yields a n actual frequency of two and one-zero hits of 74, ie, 2 x 37. However, if
these longer bouts vary among themselves, a s occurred in the simulation and a s is
likely in real life, then the proportionality can be, and typically is, reduced. For
example, if the third bout is one-minute long, then it yields 12 or, more likely, 13
hits, giving a n actual frequency of three and total one-zero hits of 74
13 = 87.
One is to 37 as two is to 74, but not as three is to 87. A similar breakdown in
proportionality probably accounts for the lower one-zerolactual-frequency correlations for the shortest observation intervals, which often produce multiple hits from
a single bout.
While proportionality of one-zero scores with actual frequency may break down
somewhat for observation intervals which are short in relation to bout times, such
intervals are a favorable condition for correlations of one-zero scores with actual
duration, accounting for the highest one-zerolactual-duration correlations occurring for the shortest observation intervals (Figs. 1-5). Consider what happens if
the bout time is longer than the observation interval, in which case actual duration
is likely to remain closely proportional to one-zero hits. To continue the above
example, after one three-minute bout, there are 37 hits and three minutes of actual
Comparability of Sampling Methods
duration; after two three-minute bouts there are 74 hits and six minutes of duration; and after three bouts there are 87 hits and seven minutes of duration.
Proportionality is nearly the same for 3-37, 6-74, and 7-87. Thus, if bout lengths
are longer than the observation interval, a high one-zerolactual-duration correlation is expected.
A rule of thumb for optimizing both one-zero and instantaneous correlations
with actual duration follows from the above analyses: Use the shortest observation
interval feasible and the longest bout times. The length of the observation interval
is determined by the researcher. Although researchers rarely, if ever, make deliberate attempts to control bout time, it is often a t least partially determined by a
researcher’s decisions. Just as there are no “true” or “natural” frequencies of
spontaneous behavior, so also there are no absolute “true” or “natural” bout times.
Bout time depends upon the definition a researcher uses, even for such inevitable
behaviors as feeding. In one case, eating was defined as the act of placing food into
the mouth, which made bout times so short that feeding bouts were rarely hit by
30-second instantaneous sampling [Rhine & Linville, 19801. By redefining feeding
as the entire sequence of preparing, placing in the mouth, and chewing, long
enough bout times would probably be available to obtain an adequate number of
instantaneous hits to estimate the actual duration of feeding. In this case, bout
time would be deliberately increased by manipulating the definition, with the
second definition being no less meaningful than the first.
A problem occurs in analyses of sampling procedures if validity is confused with
reliability or comparability [Rhine & Linville, 19801. When a researcher is concerned
with the measurement of phenomena such as dominance, social affinity, dependency,
etc, validity refers to the relatinship between a measure and the phenomenon being
studied and not to the relationship between one measure (eg, one-zero) and another
(eg, actual frequency) which is arbitrarily labeled “true.” The degree of relationship
between two measures indicates their comparability, but not necessarily their validity. “Valid” is equal to “comparable” in the special case where one measure is being
obtained for the express purpose of estimating another. Thus, under most circumstances, instantaneous scores can be used to provide a valid estimate of actual
duration. Reliability, comparability, and validity are not all-or-none concepts. There
are degrees of useful reliability, comparability, and validity; therefore, even if one
wishes to use, for example, one-zero scores as a comparable and valid estimate of
actual frequency, a perfect correlation between the two sets of measures is not
Some authors have recommended that one-zero sampling never be used because
one-zero scores may be considered a combination of actual frequency and actual
duration [Altmann, 1974; Kraemer, 19791. An alternative view has been developed
[Rhine & Flanigon, 1978; Rhine & Linville, 19801. Briefly, if actual frequency and
actual duration are both valid measures of spontaneous social behavior as claimed
by the critics of one-zero sampling, why should a measure which is a combination of
the two be regarded as inferior to each separately? Kraemer argues that the differing
contributions of actual frequency and actual duration for different observation
intervals is a disadvantage, though why it should be so is not clear if both are valid
measures. Why is a combination of less of valid measure A and more of valid
measure B (or more of A and less of B) better or worse than A or B alone?
Both actual frequency and actual duration have been used in the measurement
of social behavior because both frequent association and long association have been
considered plausible (seemingly valid) indicators of, for example, social affinity. Yet
in several studies, actual frequency and actual duration were only moderately
Rhine and Ender
correlated (.5 to .6), indicating, as noted above, a marginal degree of comparability
[Leger, 1977; Rhine & Flanigon, 1978; Rhine & Linville, 19801. With only a moderate
correlation between actual frequency and actual duration, results using one measure
may differ considerably from findings based upon the other. For example, if the
actual duration of proximity is used as a measure of social affinity, a n interactant
with a few long bouts of proximity could receive a higher social-affinity score than
an interactant with many briefer bouts. With actual frequency instead of actual
duration as the measure, the reverse scoring would occur. Yet there is no a priori
reason to assume that frequency of association is more or less meaningful than
duration of association.
A solution to this seeming paradox is a single measure, such as one-zero scores,
that is a convenient weighted combination of actual frequency and actual duration,
yielding high scores from frequent long bouts, low scores from infrequent short
bouts, and in-between scores from infrequent long bouts or frequent short bouts. We
do not suggest that this solution is always optimal, but we do suggest that there are
occasions when it is worthy of serious consideration and that it should not, therefore,
be arbitrarily rejected.
In making a choice among alternative sampling methods, a critical question
remains unanswered: What properties of spontaneous social behavior are reflected
by actual duration, and how, if a t all, do these properties differ from those measured
by actual frequency? That the properties differ in some circumstances is likely
because of modest correlations found between actual frequency and actual duration
and because both have contributed substantially to high one-zero multiple correlations. An understanding of possible differences in the properties of spontaneous
social behavior measured by actual duration and those measured by actual frequency is probably the most important next step in laying the foundation for the
rational selection of the best sampling procedure for the problem at hand.
No simple set of rules is available for choosing one sampling procedure over
another. That choice is a matter of judgment, in which knowledge of the research
problem and design is weighed together with factors such as reliability, validity,
feasibility, comparability, and cost in time, energy and money. On these multiple
grounds, none of the sampling procedures (actual frequency, actual duration, onezero scores, and instantaneous scores) can be regarded as inherently superior to the
others. All have their uses, and none should be rejected. All four of the procedures
have yielded satisfactory interobserver reliability, and there are no empirical data
supporting one procedure as more valid than the others as a measure of widely
studied primate phenomona such as social affinity [Rhine & Linville, 19801. In the
measurement of social affinity, for example, a lower value for any of the four
measures has been taken to represent a lesser degree of social relatedness and a
higher value to represent a greater degree. That lower (higher) values in one
measure tend also to be lower (higher) in another is clearly indicated by the present
research and even more so by empirical studies [Dunbar, 1976; Kraemer, 1979;
Leger, 1977; Rhine & Flanigon, 1978; Rhine & Linville, 19801.
This research was supported by Intramural and Intercampus Research Opportunity grants from the University of California, Riverside, to the first author. We
thank M. Hauser, H. Kraemer, D. Leger, and K. Widaman for critical reviews of the
manuscript. A summary of this paper was presented at the April 1982 meetings of
the Western Psychological Association.
Comparability of Sampling Methods
Altmann, J. Observational study of behavior: Sampling methods. BEHAVIOUR
4811-41, 1974.
Chamove, A.S. A new primate social behaviour category system. PRIMATES 15:8599, 1974.
Dunbar, R.I.M. Some aspects of research design and their implications in the observational study of behaviour. BEHAVIOUR
Guilford, J.P. PSYCHOMETRIC METHODs. New York, McGraw-Hill, 1936.
Hilgard, E.R. INTRODUCTION TO PSyCHOLOGY. New York, Harcourt, Brace
and World, 1962.
Kimble, G.A.; Garmezy, N.; Zigler, E. PRINCIpLES OF GENERAL PSYCHOLOGY.
New York, Wiley, 1980.
Kraemer, H.C. One-zero sampling in the
study of primate behavior, PRIMATES 20:
237-244, 1979.
Leger, D.W. An empirical evaluation of instantaneous and one-zero sampling of chimpanzee behavior. PRIMATES 18:387-393,
Rhine, R.J.; Flanigon, M. An empirical comparison of one-zero, focal-animal and instantaneous methods of sampling spontaneous
primate social behavior. PRIMATES 19:
353-361, 1978.
Rhine, R.J.; Linville, A.K. Properties of onezero scores in observational studies of primate social behavior: The effect of assumptions on empirical analyses. PRIMATES
21:111-122, 1980.
Rhine, R.J. Variation and consistency in the
social behavior of two groups of stumptail
macaques ( ~ u c u c uurctoides). PRIMATES
14:21-35, 1973.
Simpson, M.J.A.; Simpson, A.E. One-zero and
scan methods for sampling behavior. ANIMAL BEHAVIOUR 25:726-731,1977.
Без категории
Размер файла
1 027 Кб
behavior, primate, comparability, sampling, method, used
Пожаловаться на содержимое документа