American Journal of Primatology 5:l-15 (1983) Comparability of Methods Used in the Sampling of Primate Behavior RAMON J. RHINE' AND PHILIP B. ENDER' 'Department of Psychology, University of Californiq Riverside, and 'Graduate School of Education, University of California, Los Angeles Two measures of a behavior are defined as comparable if one is predictable from the other. The comparability of one-zero, instantaneous, actual-frequency, and actual-duration sampling was investigated using a Monte Carlo simulation to extend the range of sampling intervals beyond those ordinarily found in the literature. Several combinations of bout time, rate of response, and observation interval were simulated. Comparability between one-zero or instantaneous scores and actual frequency or actual duration was higher for higher behavior rates, decreased with longer observation intervals, and was usually higher for longer bout times. Curves of p2s from multiple regressian analyses revealed a fundamental difference in the source of one-zero and instanteous predictability. Both actual frequency and actual duration contributed substantially to one-zero predictability, indicating high comparability between one-zero scores and a weighted combination of actual frequency and actual duration. Once the contribution of actual duration to instantaneous scores was accounted for, actual frequency contributed virtually nothing additional. Long-standing meanings of validity, reliability, and comparability were applied to the findings, resulting in the conclusion that all of the sampling methods are useful, depending upon the researcher's approach, resources, and problem. Key words: simulation, sampling methods, primate behavior INTRODUCTION Findings from empirical studies of several widely used methods of sampling spontaneous behavior [Altmann, 19741 have been evaluated and summarized by Rhine & Linville . The four types of scores most commonly compared in empirical studies are actual frequencies, actual durations, one-zero scores, and instantaneous scores [Dunbar, 1976; Kraemer, 1979; Leger, 1977; Simpson & Simpson, 1977; Rhine & Flanigon, 1978; Rhine & Linville , 19801.Actual frequency is the total number of observed bouts of a sampled behavior. Actual duration is the Received January 7, 1982; revision accepted December 7,1982. Address reprint requests to Ramon J. Rhine, Psychology Department, University of California, Riverside, CA 92521. 0 1983 Alan R. Liss, Inc. 2 Rhine and Ender proportion of the total observation time during which the behavior is recorded as occurring. A onezero score is the proportion of observation intervals, such as 30second intervals, during which one or more bouts of the behavior are observed. An instantaneous score is the proportion of defined instants in time, such as the ends of 30-second intervals, a t which the behavior is observed in progress. Two additional quantities used in this paper are rate and bout time. Rate is the frequency of a behavior’s occurrence per specified unit of observation time, such as frequency per minute. Bout time is the time elapsing from the onset of a single occurrence of a behavior to its termination. The above four measures of spontaneous behavior, like all behavioral measures, may be characterized in terms of their validity, reliability, and comparability, which have long-standing meanings in behavioral measurement [eg, Guilford, 1936; Hilgard, 1962; Kimble et al, 19801. These meanings have not always been applied consistently in discussions of the relative merits of the four sampling procedures [Rhine & Linville, 19801. In keeping with the long-standing meanings, a measure will be considered valid if it actually does index the phenomenon being studied. For example, one-zero scores will be regarded as a valid measure of social affinity if, in fact, higher one-zero scores tend to occur when social affinity is higher and lower one-zero scores tend to occur when social affinity is lower. Reliability refers to a measure’s precision or accuracy in the sense of consistency. For example, actual duration is a reliable measure of self-grooming if two researchers observing simultaneously and independently obtain equivalent actual durations of self-grooming. Reliable measures are not necessarily valid measures. A reliable measure of selfgrooming probably will not be valid if the actual duration of self-grooming is used as a measure of social attractiveness and may or may not be valid if used as a measure of social isolation. Valid measures must be reliable. Thus, if researchers observing simultaneously and independently obtain highly inconsistent durations of self-grooming, such durations cannot be a valid measure of social isolation or anything else. Available data indicate that the four types of scores usually are about equally reliable, and no empirical data exist demonstrating that one of these measures is more or less valid than others as measures of primate phenomena such as social affinity [Rhine & Linville, 19801. Two measures are comparable if one is predictable from the other. Comparable measures may yield different numbers, eg, kilograms versus pounds as measures of a man’s weight. Measures yielding different numbers are comparable if they are related by a transformation equation from which one measure is predictable from the other (eg, 2.2 times kilograms equals pounds). Comparable measures need not look alike or have the same format. For example, two intelligence tests with completely different items are comparable if they are related in such a way that a score on one is predictable from a score on the other. Indeed, it is common practice to validate a new test by demonstrating that its scores are well related to (correlated with) scores from accepted tests [eg Kimble, et al, 19801. The meaning of a high linear correlation coefficient is that one variable is well predicted from the other; hence, in behavioral science, the comparability of two measures is commonly indexed by linear regression for which the simple transformation equation is well known and from which the degree of comparability can be conveniently expressed using a single unitless number, the correlation coefficient. Since the correlation coefficient can be used to determine the degree of predictability from one measure to another, it can be used also as a n indicator of the degree of comparability of these measures. Different measures used as somewhat comparable indices of the same phenomenon may not be perfectly correlated because of unreliability and because they may measure somewhat different aspects of the phenome- Comparability of Sampling Methods 3 non. For example, social affinity between two animals may be measured in one study by the frequency of association and in another by the duration of association, even though in several cases actual frequency and actual duration are only moderately correlated [eg, Rhine & Linville, 19801. It is impossible to have a high linear correlation without a high degree of predictability, and, therefore, a high degree of comparability. The Monte Carlo simulation reported in this paper bears directly upon this conception of comparability and only indirectly upon reliability and validity. The simulation bears upon comparability because the data are correlations between methods, indicating the degree to which one type of score is predictable from one or more of the others. These data by themselves tell nothing about reliability or validity; however, if measure A is comparable to measure B, and if A is reliable (valid), then B is also reliable (valid). The four sampling methods have been examined previously in studies which were limited in scope. A massive effort would be required to conduct an empirical study in which all combinations of several relevant dimensions are varied simultaneously over a wide range of values. In addition, factors such as bout time are not under the researcher’s control in empirical studies of spontaneous behavior. These limitations can be alleviated by a computer simulation which extends the range of observation intervals beyond those commonly studied and which allows comparisons of the four types of scores while systematically varying combinations of observation intervals, rates, and bout times. A successful simulation is a n analytic approximation of reality, but specific details of empirical and simulated findings are not expected to be identical. The simulation focuses upon a limited number of main variables, whereas empirical results can be influenced by nuances of living conditions, group composition, health, individual experience, reproductive condition, food supply, weather, etc. Nevertheless, broad correspondence can be judged by determining if known empirical trends tend to be reproduced by the simulation. If the basic forms of empirical trends are found for those portions of simulated relationships for which empirical counterparts exist, then confidence in the overall simulation is increased. METHOD Behavior observations were simulated using a computer program developed to manipulate independently the rates and bout times of simulated behavior. Twenty simulated observation periods of four hours each were generated for each of several combinations of rate and bout time. During the development of the program, trial runs of 50 and 100 replications were alsn tested. They did not differ significantly TABLE I. Rates and Bout Times Used in the Simulation Rate Higher Middle Lower Frequencylmin 111 113 115 1/10 1/12 1/15 1/20 1/25 1/30 Average bout time in sec Shorter Longer 1 5 1 5 1 5 15 15 1 5 15 1 5 1 5 10 10 10 10 10 10 10 10 10 30 30 30 30 30 30 60 60 60 60 120 60 120 60 120 4 Rhine and Ender from the 20 replication trials. As shown in Table I, 42 combinations of rates and bout times were employed; these combinations yielded a total of 840 simulated observation periods of four hours each. In Table I, a rate of 1/10, for example, indicated that the behavior occured an average of once every ten minutes. The upper right-hand portion of Table I is blank because some combinations of rates and bout times were not used, namely, combinations where bout times were long in relation to rates. For example, a n average rate of once per minute (111) with a n average bout time of two minutes was not meaningful. Rates in Table I are grouped into higher, middle, and lower rates. Similarly, bout times are grouped into shorter and longer times. The simulation program created a n array of 14,400 elements representing a four-hour observation period of 14,400 seconds. Every element of the array was initially set to zero, indicating that the behavior was not occurring. Next, the expected number of behavior bouts was computed by multiplying the minutes (4 hr x 60 = 240 min) in the observation period by the appropriate rates. For example, if rate was set a t 115, then the expected number of behavior bouts would have been 240 x 115 = 48. A random number generator was used to generate 48 starting positions which were random uniform numbers between one and 14,400. For each starting location, the element in the array corresponding to the random number was changed form zero to one, indicating the beginning of a behavior bout. Bout times were randomly generated about each specified mean bout time, with the standard deviation changing in proportion to the bout time and with longer bout times having greater variability. For example, if a mean bout time of ten seconds was specified, then the random number generator would generate bout times with a mean of ten which were then randomly assigned to each of the starting locations in the array. Following each starting location, the number of elements equal to the bout time minus one were changed from zero to two, which indicated the continuation of a behavior bout. If a “one” was encountered during the course of changing zeros to twos, the program left it in place and generated a new random time for the next behavior bout. The end product was a n array of 14,400 elements with the starting times randomly distributed and with behavior bouts of varying lengths centered upon a prescribed mean. Once the above array of simulated observations was created, it was a simple matter to obtain actual frequencies, actual durations, instantaneous scores, and one-zero scores. These latter two scores were obtained for each of the following seven observation intervals: 5, 15, 30, 60, 120, 300, and 600 seconds. Observation intervals most commonly used in studies of spontaneous primate behavior fall within the range of five seconds to ten minutes. RESULTS Fit Between Simulated and Empirical Results Correlations were calculated from the entire data base to determine if the simulation yielded a reasonable approximation to available empirical results from primates for which observation intervals usually varied from ten to 120 seconds. Figure 1presents curves for all rates and bouts combined, comparing over the seven observation periods trends of correlations for one-zerolactual-frequency, instantaneouslactual-frequency, one-zerolactual-duration, and instantaneouslactual-duration. In Figures 1-5, correlations of .19 or higher were significant at the .01 level, and correlations of .27 or higher were significant at the .001 level. The left half of Figure 1 shows correlations between actual frequency and onezero or instantaneous scores. For each observation interval, the one-zerolactualfrequency correlation was always higher than the instantaneouslactual-frequency correlation, and the differences between these pairs of correlations were quite large. Comparability of Sampling Methods - ONE - ZERO b L .oo 5 60 120 300 5 --- INSTANTANEOUS 600 L .oo 560120 300 600 OBSERVATION INTERVAL IN SECONDS Fig. 1. Overall correlations of one-zero and instantaneous scores with actual frequency and actual duration, as a function of size of observation interval. --- INSTANTANEOUS -ONE -ZERO ZI.00 z w .75 3 .75 s (2.- o HIGHER RATES 50 0 25 MIDDLE RATES LOWER RATES - -* --_-___ 3 L - _ _ _ - - -0 .oo 5 6 0 120 300 600 5 6 0 120 300 600 600 z 1.00 G 2 -75 (L 3 n .50 25 3 L 00 OBSERVATION .OO 560120 300 600 I N T E R V A L S I N SECONDS Fig, 2. Correlations for three rate groups of one-zero and instantaneous scores with actual frequency and actual duration, as a function of size of observation interval. This simulated finding is consistent with empirical studies, and the initial increasing trend in one-zero/actual-frequency correlations is also comparable t o the empirical trend [Leger, 1977; Rhine & Linville, 19801. The empirical curves for instantaneous/actual-frequency correlations were almost horizontal lines in the 10120-second range, centering around 5 3 in one study and .65 in another. For simulated data, the slightly declining trend within this range was not markedly different from horizontal. Similarly, the right half of Figure 1, which shows trends of correlations of one-zero or instantaneous scores with actual-duration, is also consistent with empirical findings [Leger, 1977; Rhine & Linville, 19801. Such consistency lends credence to simulated trends extending beyond available empirical data. Rates As will be seen from Table I, it is possible to compare the three rate groups with bout time held constant by making the comparison for shorter rates only. This was done in Figure 2, which contains curves of correlations comparable to those of Figure 1but with the data subdivided by rate. Most of the increasing or decreasing trends 6 Rhine and Ender --- INSTANTANEOUS - ONE-ZERO >. 0 LONGER BOUTS 0 SHORTER BOUTS L o o 560120 300 600 5 60120 -- (L 0 50 I k .25 I I 300 600 ' ==- -'-a -===* .25 3 560 120 300 OBSERVATION .oo 560120 300 I N T E R V A L S IN SECONDS 600 600 Fig. 3. Correlations for two bout-time groups and for middle rates of one-zero and instantaneous scores with actual frequency and actual duration, as a function of size of observation interval. ___ - ONE -ZERO > V Z l.0Or INSTANTANEOUS l.OOr .50 I 0 oo .- 5 60 120 LONGER BOUTS SHORTER BOUTS 300 25%- I 600 -.- -- - 'O0! '$0 IhO - - ----- 0- ---___ 370 CK 600 ---0 3 D .50 I k 3 - --. .25 .oo 5 6 0 120 300 600 .oo 560120 300 600 OBSERVATION I N T E R V A L S IN SECONDS Fig. 4. Correlations far two bout-time groups and for lower rates of one-zero and instantaneous scores with actual frequency and actual duration, as a function of size of observation interval. in curves of Figure 2 paralleled those of Figure 1. (In Fig. 2 and subsequent figures, points not shown for a given observation interval would be overlaid by those plotted.) Of the 84 correlations in Figure 2, there was only one case (upper left graph, 600 seconds) where a correlation for a higher rate may have been meaningfully lower than a counterpart from middle or lower rates. Except for this one case, onezero correlations with actual frequency or actual duration were as large or larger for higher rates than for middle and lower rates; similarly, instantaneous correlations with either actual frequency or actual duration were always a s large or larger Comparability of Sampling Methods k 3 oo.- 3 0 LONGER BOUTS 0 SHORTER BOUTS .25 5 6 0 120 -------- - -0 300 600 300 600 .251 oo .- 5 60 120 7 o.- 5 60120 300 600 .25 ,o 05 60 120 300 600 OBSERVATION I N T ERVALS IN SECONDS Fig. 5. Correlations for two bout-time groups and for all rates of one-zero and instantaneous scores with actual frequency and actual duration, as a function of size of observation interval. for higher rates than for middle or lower rates. Except for the upper left hand curves of Figure 2 (one-zero/actual-frequency), all correlations for middle and lower rates obtained from intervals of 120 seconds or more were substantially lower than the corresponding correlations for higher rates. Bout Times Longer and shorter bout times may be compared for middle rates and again for lower rates (see Table I). Longer and shorter bout times for middle rates are shown in Figure 3 and for lower rates in Figure 4.The general shapes of all curves of these two figures were similar to corresponding sampling-method curves of Figures 1and 2. Except for one-zero/actual-frequency correlations, all other pairs of curves in Figures 3 and 4 had approximately equal or higher correlations for longer bout lengths than for shorter. The reverse occurred for one-zero/actual-frequency, and this is especially clear in Figure 4. An interesting reversal occurred when bout times were compared for all rates combined. As shown in Figure 5, instantaneous correlations for shorter bout times were essentially equal to or substantially greater than corresponding correlations for longer bout times, which was the reverse of the trends in Figures 3 and 4.This change was due to the inclusion in Figure 5 of higher rates which, for reasons stated above, involved only shorter bout times (see Table I). Higher rates yielded higher correlations, and adding higher rates to middle and lower ones tended to elevate correlations for shorter bout times. Multiple Correlations Figure 6 shows multiple correlations (R’s) of actual frequency and actual duration regressed upon one-zero or instantaneous scores. All of these R’s were statistically significant at beyond the .001 level. The left-hand graph is for different rates, and the right hand graph is for bout times. The graphs in Figure 6 have magnified slopes compared to those in previous figures, because in Figure 6 the left-hand scale is from .60 to 1.00 whereas in previous figures it is from .OO to 1.00. The magnified Rhine and Ender 8 - ONE-ZERO R ' S --- INSTANTANEOUS w n -84- 100 R'S .84 . 76 a J 2 .68- MIDDLE RATE LOWER RATE .60,; .OO'f" I I 5 6 0 120 . 60 I I 1 300 600 5 6 0 120 300 O B S E R V A T I O N I N T E R V A L S I N SECONDS \m I 600 Fig. 6. Multiple correlations for rates and bout times of one-zero or instantaneous scores predicted from actual frequency and actual duration, as a function of observation interval. scale is necessary to achieve visual separation among many points, especially at short observations intervals. Even so, because of crowding, several points a t the shortest observations intervals had to be left out; for a given interval, omitted points are always very close to the ones shown. All of the 50 R's for the most commonly used observation intervals (up to 120 seconds) of Figure 6 were .92 or higher, and only five of these 50 were less than ,951. In other words, up to an observation interval of 120 seconds, both instantaneous and one-zero scores were predictable with high precision from knowledge of actual frequency and actual duration. Furthermore, up to a n observation interval of five minutes, one-zero scores were still well predicted from actual frequency and duration, the 30 applicable R's of Figure 6 all being .84 or higher. An equivalent level of predictability did not occur for instantaneous R's based upon middle or lower rates or upon longer bout times. For instantaneous multiple correlations, shorter bout times yielded equal or higher correlations than longer bout times, which was consistent with the trend of the correlations in Figure 5. The most consistently high multiple correlations (Fig. 6, upper curve, right-hand graph) were for prediction of one-zero scores using longer bout times. For the seven observations intervals, one R was .92 and the remainder were .95 or higher. The reasons for the trends of R's in Figure 6 are clarified by Figure 7, which contains curves of squared standardized regression weights (p's) reflecting the relative contribution to the R's of the two predictor variables, actual frequency and actual duration. The instantaneous p2s are shown in the two right-hand graphs of Figure 7. These P's for actual frequency were zero or close to zero for all rates and bout lengths; on the other hand, the contribution of actual duration was large, though it gradually decreased as the observation interval increased. The trend of the p2s due to actual duration paralleled that of the instantaneous R's (Fig. 6) because, leaving unreliability and sampling error aside, instantaneous scores provide a direct estimate of actual duration [Altmann, 1974; Kraemer, 1979; Rhine & Linville, 19801. The instantaneous/actual-frequency correlations in Figures 1through 5 could still reflect actual duration since actual duration and actual frequency are themselves moderately correlated IRhine & Linville, 19801. When the contribution of actual duration was taken into account in the regression analysis, actual frequency contributed little or nothing additional to the instantaneous R's. The situation for one-zero R's is fundamentally different, as may be seen from the two left-hand graphs of Figure 7. Not only did actual frequency and actual Comparability of Sampling Methods -ACTUAL _ _ _ ACTUAL FREQUENCY DURATION I .00 0 5 6 5 60120 300 I - 1.00z g .75 LL ," (u Q * . 5 60120 300 o o 600 9 300 600 *:--- --- ----&- .501 .25 Y A 5 60 120 _ I 0 MIDDLE RATE .ooRpc v^ 600 LL 9 m-\'\, -A --__ --, LONGER BOUTS A SHORTER BOUTS . 5 6 0 120 300 600 OBSERVATION INTERVALS IN SECONDS Fig. 7. Squared standardized regression weights (0's) for the multiple correlations of Figure 6. duration both contribute substantially to one-zero multiple correlations, but their relative contribution differed for different observation intervals. At the shortest observation intervals, one-zero R's reflected primarily actual duration. As the interval size increased, the contribution of actual frequency became increasingly prominent, with the curves of p2s intesecting and the contribution of actual frequency being much greater than actual duration for the longer observation intervals. Similar trends for curves of 6% occurred in empirical studies using intervals up to 120 seconds. The simulated trend beyond 120 seconds suggests that, for observation intervals beyond five minutes, one-zero R's reflect primarily actual frequency. Because of the combined contributions of actual duration and actual frequency, onezero R's in Figure 6 in every case except one were equal to, nearly equal to, or higher than their instantaneous counterparts. As the p2s show, when the contribution of actual duration decreased, an equally precipitous drop in the one-zero R's was avoided by a balancing increase in the contribution of actual frequency. DISCUSSION Although the numerical values of empirical and simulated correlations occasionally diverged, the trends of the simulated data were regularly consistent with available empirical information for which observation intervals from ten to 120 seconds have been investigated. This correspondence between simulated and available empirical results suggests that simulated trends beyond 120 seconds probably are reasonable approximations of reality. Why are intervals within the range of ten to 120 seconds the ones most commonly chosen? Perhaps research designers judge very short or very long intervals as inconvenient, impractical, or less reliable than others. Very short intervals can strain an observer's information-processing capacities and require intense observer concentration, possibly leading to inaccuracies due to fatigue [Altmann, 19741. Very long intervals will usually yield smaller samples, which tend to magnify the influence of a few interobserver inconsistencies. Whatever the basis of choice, correlations 10 Rhine and Ender reported in empirical studies indicate that the various methods are largely interchangeable up to 120 seconds [Dunbar, 1976; Rhine & Linville, 19801. Comparability The size of such correlations is a n index of the degree to which one measure is comparable to another. To say that comparability is indicated by the unitless correlation coefficient is to say also that comparability does not require sameness of absolute values, the actual magnitude of the numbers obtained [Rhine & Flanigon, 1978; Rhine & Linville, 19801. Inches and centimeters are comparable measures of length, even though for a given object they yield different numbers. Similarly, (4,121, (5,131, and (6,14) are pairs of scores which yield the same correlation as (lJ), (2,21, and (3,3); therefore, the numbers in the second set of pairs, a s two measures of behavior, are no more or less comparable than those in the first set. Furthermore, absolute values of behavior frequencies obtained from any of the sampling procedures are neither “true” nor “natural” in a meaningful general sense because these frequencies are dependent upon factors such as group composition, ecology, the definition of behavior the researcher decides to use, and the properties of the sampling procedures employed [Rhine & Linville, 19801. Comparability is an ingredient in the choice of one procedure over another. Factors such as validity and reliability cease to be major considerations in choosing between two measures if the measures are highly comparable. This leaves more room to weigh practical matters such as cost, convenience, administrative feasibility, etc. Information on comparability is available from the simulation and from the above-cited empirical studies. This information is summarized below, with existing empirical data weighted more heavily if there is any discrepancy between it and simulated data: 1. One-zero frequency was a reasonable index of actual frequency or vice versa for most observation intervals a researcher was likely to choose, and the two were quite comparable for intervals of 15 seconds to 5 minutes. 2. For lower or middle rates of responding, actual frequency and one-zero frequency appeared to be moderately comparable for observation intervals of 30 seconds or more, but not for the shortest intervals. However, it is unlikely that a researcher would choose to employ the highly inefficient method of using, for example five-second observation periods with a behavior that does not occur very often. 3. Available data from primates indicate that instantaneous frequencies and actual frequencies are unlikely to be more than modestly comparable; consequently, they are probably best treated as not interchangeable unless there exists situation-specific evidence to the contrary. 4. The empirical data indicated that one-zero and actual-duration scores were reasonably comparable up to about 60-120-second intervals, and the trend of the simulation suggest a decrease in their comparability thereafter. 5. Instantaneous scores and actual duration were almost completely comparable up to 120 seconds, with decreasing comparability thereafter. 6. The empirical data [Rhine & Linville, 1980: Table 21 suggested that up to 120 seconds, and probably well beyond, one-zero scores from one observation period were comparable to those from another; the same was true of paired sets of instantaneous scores. One-zero scores were comparable to instantaneous scores if the same observation intervals were used for both. If not, the correlations between one-zero and instantaneous scores tended to be lower the greater the difference between the interval lengths, but these correlations never fell below .82. Comparability of Sampling Methods 11 7. Actual frequency and actual duration were only moderately correlated in studies summarized by Rhine and Linville [ 19801; therefore, they should rarely, if ever, be treated a s interchangeable without situation-specific evidence of high correlation, such as that supplied by Chamove . 8. One-zero scores were highly predictable from a weighted combination of actual frequency and actual duration. Number of Hits The number of hits in a sampling session is the number of times the behavior is recorded as occurring. The number of hits is important because a low number tends to yield low reliability and low comparability. The number of hits is partially determined by the research design. Hits depend upon the length of the observation intervals chosen by the researcher and upon behavior definitions. Sometimes the expected range of hits can be estimated from previous research or pilot studies. The degree of comparability of sampling procedures under different conditions of the simulation was indicated by the curves of correlations in Figures 1-5, in which three main trends were associated with variation in the number of hits. One of these main trends was that higher rates, which yielded more hits, tended to produce higher correlations among sampling procedures (Fig. 2). A second main trend was lower correlations with larger observation intervals (Figs. 1-5). Larger observation intervals per unit time (four hours in the simulation) is equivalent to fewer observation intervals and consequently to fewer hits. A third main trend of the simulation was the relationship of bout time to trends of correlations among sampling procedures (Figs. 3,4). This relationship, which was also modulated by the number of hits, will be discussed below for instantaneous and one-zero sampling. Instantaneous sampling. The number of hits from instantaneous sampling depends upon the ratio of bout time to the observation interval. For example, if all bout times are shorter than the observation interval, the maximum number of instantaneous hits in the total observation session is equal to the actual frequency, and the minimum is zero. The number of such instantaneous hits will usually be larger than zero and smaller than the actual frequency. If all bout times are longer than the observation interval, the minimum number of hits is the actual frequency; the maximum usually will be greater than the actual frequency because bout times longer than the observation interval can be sampled more than once. For example, in stump-tailed macaques sampled every 30 seconds [Rhine, 19731, adult grooming bouts of several minutes occurred. A ten-minute bout yielded 20 instantaneous hits from a single actual frequency. In contrast, instantaneous sampling with a tenminute observation interval, instead of every 30 seconds, will yield not 20 grooming hits, but only one. Instantaneous scores are a good index of actual duration unless the number of hits is too small. Large observation intervals are not unlikely to yield few or zero instantaneous hits even through the behavior occurs several times. Few or no hits from large intervals are especially likely if the behavior's rate is low and its bout time is short. If a correlation is calculated from two sets of scores, and if the scores in one set are always the same, for example, zero, then the correlation will be zero regardless of the values of the other set of scores. Because of approximations to this condition, the instantaneous correlations in Figure 2 were smaller for middle and lower rates than for higher rates and were smaller in Figures 3 and 4 for shorter bout times than for longer. The least comparable should be the combination of shorter bouts, lower rates, and the longest observation interval, which yielded the lowest instantaneouslactual-durationcorrelation in Figure 4. 12 Rhine and Ender This last effect upon instantaneous/actual-duration comparability should be alleviated by increasing the number of observation intervals. In the simulation, the smaller the observation interval, the more there were and the higher the instantaneousiactual-duration correlations in Figures 1-5. As the time of the observation interval approaches zero, the number of intervals approaches infinity. As the number of intervals approaches infinity, the correlation between actual duration and instantaneous scores necessarily approaches 1.00 because the proportion of time the behavior is sampled, regardless of rate or bout length, will become equal to the proportion of time the behavior occurs, which is the definition of actual duration. For middle and lower rates, longer bout times yielded higher correlations with instantaneous scores than did shorter bouts (Figs. 3, 4). In Figures 3 and 4, rates were the same for both longer and shorter bouts. Given the same number of occurrences, the probability of a n instantaneous hit is greater with longer bout times than with shorter; therefore, the number of hits would be larger for the longer bout times of Figures 3 and 4. The reverse becomes possible if a change occurs which makes the number of hits for shorter bout times larger than those for longer times, and that is what happened when high rates (all shorter bout times) were added into the data (Fig. 5). In Figure 5, rate and bout time were confounded, with the rate effect seen in Figure 2 outweighing the bout-time effect of Figures 3 and 4. One-zero scores. Putting aside unreliability, one-zero sampling, unlike instaneous sampling, can yield zero hits only if the behavior does not occur a t all during the total observation session. For a given observation session, the number of hits from one-zero sampling is typically larger than the number from instantaneous sampling. Even short, infrequent bouts, which can easily yield zero instantaneous hits, especially with long observation intervals, will each tend to contribute one or more one-zero hits. One-zero scores and actual frequency will be comparable, as indicated by linear correlation, if the magnitude of actual-frequency scores is proportional to the magnitude of paired one-zero scores. Linear proportionality tends to break down when the time of the observation interval is small relative to bout time, which is most likely for the shortest observation intervals of the simulation. For example, with a n observation interval of five seconds a single bout of three minutes’ duration will yield 36 hits if it starts precisely a t the beginning of a five-second interval, or otherwise, 37. Thus, a single actual frequency is usually associated with 37 onezero hits. If the next bout is also three minutes, proportionality obtains, since that yields a n actual frequency of two and one-zero hits of 74, ie, 2 x 37. However, if these longer bouts vary among themselves, a s occurred in the simulation and a s is likely in real life, then the proportionality can be, and typically is, reduced. For example, if the third bout is one-minute long, then it yields 12 or, more likely, 13 hits, giving a n actual frequency of three and total one-zero hits of 74 13 = 87. One is to 37 as two is to 74, but not as three is to 87. A similar breakdown in proportionality probably accounts for the lower one-zerolactual-frequency correlations for the shortest observation intervals, which often produce multiple hits from a single bout. While proportionality of one-zero scores with actual frequency may break down somewhat for observation intervals which are short in relation to bout times, such intervals are a favorable condition for correlations of one-zero scores with actual duration, accounting for the highest one-zerolactual-duration correlations occurring for the shortest observation intervals (Figs. 1-5). Consider what happens if the bout time is longer than the observation interval, in which case actual duration is likely to remain closely proportional to one-zero hits. To continue the above example, after one three-minute bout, there are 37 hits and three minutes of actual + Comparability of Sampling Methods 13 duration; after two three-minute bouts there are 74 hits and six minutes of duration; and after three bouts there are 87 hits and seven minutes of duration. Proportionality is nearly the same for 3-37, 6-74, and 7-87. Thus, if bout lengths are longer than the observation interval, a high one-zerolactual-duration correlation is expected. A rule of thumb for optimizing both one-zero and instantaneous correlations with actual duration follows from the above analyses: Use the shortest observation interval feasible and the longest bout times. The length of the observation interval is determined by the researcher. Although researchers rarely, if ever, make deliberate attempts to control bout time, it is often a t least partially determined by a researcher’s decisions. Just as there are no “true” or “natural” frequencies of spontaneous behavior, so also there are no absolute “true” or “natural” bout times. Bout time depends upon the definition a researcher uses, even for such inevitable behaviors as feeding. In one case, eating was defined as the act of placing food into the mouth, which made bout times so short that feeding bouts were rarely hit by 30-second instantaneous sampling [Rhine & Linville, 19801. By redefining feeding as the entire sequence of preparing, placing in the mouth, and chewing, long enough bout times would probably be available to obtain an adequate number of instantaneous hits to estimate the actual duration of feeding. In this case, bout time would be deliberately increased by manipulating the definition, with the second definition being no less meaningful than the first. Validity A problem occurs in analyses of sampling procedures if validity is confused with reliability or comparability [Rhine & Linville, 19801. When a researcher is concerned with the measurement of phenomena such as dominance, social affinity, dependency, etc, validity refers to the relatinship between a measure and the phenomenon being studied and not to the relationship between one measure (eg, one-zero) and another (eg, actual frequency) which is arbitrarily labeled “true.” The degree of relationship between two measures indicates their comparability, but not necessarily their validity. “Valid” is equal to “comparable” in the special case where one measure is being obtained for the express purpose of estimating another. Thus, under most circumstances, instantaneous scores can be used to provide a valid estimate of actual duration. Reliability, comparability, and validity are not all-or-none concepts. There are degrees of useful reliability, comparability, and validity; therefore, even if one wishes to use, for example, one-zero scores as a comparable and valid estimate of actual frequency, a perfect correlation between the two sets of measures is not required. Some authors have recommended that one-zero sampling never be used because one-zero scores may be considered a combination of actual frequency and actual duration [Altmann, 1974; Kraemer, 19791. An alternative view has been developed [Rhine & Flanigon, 1978; Rhine & Linville, 19801. Briefly, if actual frequency and actual duration are both valid measures of spontaneous social behavior as claimed by the critics of one-zero sampling, why should a measure which is a combination of the two be regarded as inferior to each separately? Kraemer argues that the differing contributions of actual frequency and actual duration for different observation intervals is a disadvantage, though why it should be so is not clear if both are valid measures. Why is a combination of less of valid measure A and more of valid measure B (or more of A and less of B) better or worse than A or B alone? Both actual frequency and actual duration have been used in the measurement of social behavior because both frequent association and long association have been considered plausible (seemingly valid) indicators of, for example, social affinity. Yet in several studies, actual frequency and actual duration were only moderately 14 Rhine and Ender correlated (.5 to .6), indicating, as noted above, a marginal degree of comparability [Leger, 1977; Rhine & Flanigon, 1978; Rhine & Linville, 19801. With only a moderate correlation between actual frequency and actual duration, results using one measure may differ considerably from findings based upon the other. For example, if the actual duration of proximity is used as a measure of social affinity, a n interactant with a few long bouts of proximity could receive a higher social-affinity score than an interactant with many briefer bouts. With actual frequency instead of actual duration as the measure, the reverse scoring would occur. Yet there is no a priori reason to assume that frequency of association is more or less meaningful than duration of association. A solution to this seeming paradox is a single measure, such as one-zero scores, that is a convenient weighted combination of actual frequency and actual duration, yielding high scores from frequent long bouts, low scores from infrequent short bouts, and in-between scores from infrequent long bouts or frequent short bouts. We do not suggest that this solution is always optimal, but we do suggest that there are occasions when it is worthy of serious consideration and that it should not, therefore, be arbitrarily rejected. In making a choice among alternative sampling methods, a critical question remains unanswered: What properties of spontaneous social behavior are reflected by actual duration, and how, if a t all, do these properties differ from those measured by actual frequency? That the properties differ in some circumstances is likely because of modest correlations found between actual frequency and actual duration and because both have contributed substantially to high one-zero multiple correlations. An understanding of possible differences in the properties of spontaneous social behavior measured by actual duration and those measured by actual frequency is probably the most important next step in laying the foundation for the rational selection of the best sampling procedure for the problem at hand. CONCLUSIONS No simple set of rules is available for choosing one sampling procedure over another. That choice is a matter of judgment, in which knowledge of the research problem and design is weighed together with factors such as reliability, validity, feasibility, comparability, and cost in time, energy and money. On these multiple grounds, none of the sampling procedures (actual frequency, actual duration, onezero scores, and instantaneous scores) can be regarded as inherently superior to the others. All have their uses, and none should be rejected. All four of the procedures have yielded satisfactory interobserver reliability, and there are no empirical data supporting one procedure as more valid than the others as a measure of widely studied primate phenomona such as social affinity [Rhine & Linville, 19801. In the measurement of social affinity, for example, a lower value for any of the four measures has been taken to represent a lesser degree of social relatedness and a higher value to represent a greater degree. That lower (higher) values in one measure tend also to be lower (higher) in another is clearly indicated by the present research and even more so by empirical studies [Dunbar, 1976; Kraemer, 1979; Leger, 1977; Rhine & Flanigon, 1978; Rhine & Linville, 19801. ACKNOWLEDGMENTS This research was supported by Intramural and Intercampus Research Opportunity grants from the University of California, Riverside, to the first author. We thank M. Hauser, H. Kraemer, D. Leger, and K. Widaman for critical reviews of the manuscript. A summary of this paper was presented at the April 1982 meetings of the Western Psychological Association. Comparability of Sampling Methods 15 REFERENCES Altmann, J. Observational study of behavior: Sampling methods. BEHAVIOUR 4811-41, 1974. Chamove, A.S. A new primate social behaviour category system. PRIMATES 15:8599, 1974. Dunbar, R.I.M. Some aspects of research design and their implications in the observational study of behaviour. BEHAVIOUR 58~78-98,1976. Guilford, J.P. PSYCHOMETRIC METHODs. New York, McGraw-Hill, 1936. Hilgard, E.R. INTRODUCTION TO PSyCHOLOGY. New York, Harcourt, Brace and World, 1962. Kimble, G.A.; Garmezy, N.; Zigler, E. PRINCIpLES OF GENERAL PSYCHOLOGY. New York, Wiley, 1980. Kraemer, H.C. One-zero sampling in the study of primate behavior, PRIMATES 20: 237-244, 1979. Leger, D.W. An empirical evaluation of instantaneous and one-zero sampling of chimpanzee behavior. PRIMATES 18:387-393, 1977. Rhine, R.J.; Flanigon, M. An empirical comparison of one-zero, focal-animal and instantaneous methods of sampling spontaneous primate social behavior. PRIMATES 19: 353-361, 1978. Rhine, R.J.; Linville, A.K. Properties of onezero scores in observational studies of primate social behavior: The effect of assumptions on empirical analyses. PRIMATES 21:111-122, 1980. Rhine, R.J. Variation and consistency in the social behavior of two groups of stumptail macaques ( ~ u c u c uurctoides). PRIMATES 14:21-35, 1973. Simpson, M.J.A.; Simpson, A.E. One-zero and scan methods for sampling behavior. ANIMAL BEHAVIOUR 25:726-731,1977.