# Approximate record length constraints for experimental identification of dynamical fractals.

код для вставкиСкачатьAnn. Phys. (Berlin) 17, No. 12, 955 – 969 (2008) / DOI 10.1002/andp.200810329 Approximate record length constraints for experimental identification of dynamical fractals Sean W. Fleming1,2,∗ 1 2 BC Hydro, Generation Resource Management, E15, 6911 Southpoint Drive, Burnaby, BC V3N 4X8 Canada Department of Geography, University of British Columbia, 1984 West Mall, Vancouver, BC V6T 1Z2 Canada Received 21 May 2008, accepted 22 October 2008 by U. Eckern Published online 24 November 2008 Key words Nonlinear dynamics, statistical mechanics, power spectrum, time series analysis, scale invariance. PACS 05.40.-a, 05.45.Tp, 05.45.Df, 87.23.Cc, 92.40.Zg, 89.65.Gh The ambiguity that can exist, for short datasets, between the observational power spectra of dynamical fractals and low-order linear memory processes is demonstrated and explained. It is argued that it could be broadly useful to have a highly practical rule-of-thumb for assessing whether a data record is sufficiently long to permit distinguishing the two types of processes, and if it is not, to produce an approximate estimate of the amount of additional data that would be required to do so. Such an expression is developed using the AR(1) process as a loose benchmark. Various aspects of the technique are successfully tested using synthetic time series generated by a range of prescribed models, and its application and relevance to observational datasets is then demonstrated using examples from mathematical ecology (wild steelhead population size), geophysics (river flow volume), and econophysics (stock price volatility). c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 1 Introduction A time series, x(t), t = 1, N , is commonly deemed a dynamical fractal if its spectral behaviour is described by S(f ) ∝ 1/f β , i.e., if its power spectrum scales linearly in log space with slope −β. This scaling exponent is related to fractal dimension as D = (5 − β)/2, and in turn to the Hurst exponent as H = 2 − D. The intuitive meaning of such behaviour is a temporal equivalent to spatial fractals: small, frequent variations are superimposed on infinitesimally larger and less frequent variations, which are in turn superimposed upon still bigger and slower variations, and so forth. The result is self-affinity (or under certain conditions, self-similarity), which can loosely be taken to mean that if one magnified the dataset to obtain a fine-scale view, or took a step back for a broader-scale view, those data sections would have much the same overall look and feel. A core implication1 is that one of the most basic concepts in the quantitative analysis, design, and management of both natural and human dynamical systems – a characteristic time scale – can have little meaning for fractal processes. Some additional key consequences can include very long-range memory, very large potential fluctuation magnitudes, and temporal clustering of extreme events. Such scale-free dynamics have been encountered in fields of study as diverse as environmental contaminant transport, stock ∗ 1 E-mail: fleming sean@hotmail.com, sean.fleming@bchydro.com With respect to the noise signature itself. Observational spectra for complex open systems commonly reflect the linear or nonlinear combination of several or many processes; some of these will have well-defined timescales. Superposition of an annual cycle and fractal noise would be an obvious and common example. c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 956 S. W. Fleming: On detecting dynamical fractals price fluctuations, meteorological variability, solar and black hole physics, ecosystem dynamics, electronic component design, automobile traffic flow, and human physiology, to name just a few. Some of these applications may be esoteric, but others are quite practical. The latter range from general insights into such broadly relevant phenomena as crashes in both biological populations and stock markets, to specific design criteria for noise suppression algorithms for PLANCK satellite radiometer measurements of the cosmic microwave background [1] and flood frequency analysis in support of civil works planning in the Red River valley [2]. Various methods are available for identifying fractal dynamics in observational time series. The most common approach – likely because of the relative familiarity and computational accessibility of the mathematical and software tools involved (facilitating analysis), and because it is phrased clearly and explicitly in terms of the fundamental definition of fractal dynamics (facilitating interpretation) – is to empirically evaluate a double-log plot of the numerical spectrum for linear behaviour. Many variations on this theme exist, but typically the fast Fourier transform (FFT) is used to obtain X(f ) from observed x(t), and some measure of the power spectral density or periodogram is calculated from X(f ), such as S(f ) = (N/2)|X(f )|2 . Then, a least-squares regression fit, ln(S) = b − β ln(f ), usually with the associated significance test on the estimated slope value, is performed to assess linearity. Attention is restricted here to this very common, and generally well-performing [3], technique. Given the reasonably fundamental nature of the issues raised below, however, it seems reasonable to at least conjecture that other techniques (detrended fluctuation analysis, rescaled range (R/S, Hurst) analysis, temporal box-counting, maximum likelihood estimation, etc.) may be affected by similar or analogous considerations. Specifically, for short time series records, there can be ambiguities between the log-space power spectral behaviour of dynamical fractals versus simpler, linear autocorrelated processes. Following the lead of others, I use here the linear first-order autoregressive, or AR(1), process (see, e.g., [4–6]; summary provided below) as a widely familiar, meaningful, and relevant benchmark, and present numerical experiments that demonstrate the nature of this interpretational ambiguity. The results echo findings presented much earlier in the literature, but which do not appear to be broadly and consistently recognized in practical applications. I then introduce a straightforward analytical expression which may serve as a widely useful rule-of-thumb for constraining such ambiguity in practice. The utility of this simple calculation is subsequently demonstrated through numerical experiments on synthetic fractal and low-order linear time series, and by application to three observational records, drawn from population biology, hydroclimatology, and quantitative finance. The emphasis here is on the development of readily applied knowledge for the analyst with a pragmatic interest in quickly and conveniently assessing observational time series – in particular but not exclusively, short and noisy records generated by complex, open systems – for fractal dynamics. 2 Experimental ambiguities between AR(1) and fractal spectra The first-order linear autoregressive, or AR(1), process is defined as: xt = α xt−1 + εt (1) where 0 < α < 1 is the lag-1 serial correlation coefficient, and ε is white (serially uncorrelated, but not necessarily Gaussian) driving noise. Note that α ultimately amounts to a damping parameter, controlling how quickly the system recovers from random external shocks (also called interventions). The AR(1) model is a Markov process which can arise from time discretization of linear ordinary differential equations of the form: a1 dx (t) + a0 x(t) = z(t) dt (2) where x is the state variable; z is a stochastic forcing function, equal to ε(a0 + a1 ); and a1 and a0 are coefficients related to the AR(1) parameter as α = a1 /(a0 + a1 ) [4–6]. Some form of Eq. (2) appears c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ann-phys.org Ann. Phys. (Berlin) 17, No. 12 (2008) 957 in many branches of theoretical and applied physics, ranging from a type of Langevin equation in statistical mechanics [6] to the linear reservoir equation for short-term river flow variations in geophysics and engineering [4], and thus so too does the physical basis for AR(1) processes. By the same token, the AR(1) process is a basic building block of most statistical time series models, including the ubiquitous Box-Jenkins approach, and is employed throughout the physical, biological, and social sciences due to its simplicity, interpretability, and effectiveness. Additionally, and perhaps of particular note for our purpo ses here, the AR(1) model is frequently used as a straightforward and tractable measure or parameterization of linear dynamical memory during the course of, and to facilitate, analyses of other kinds of behaviour (see [7] and citations therein). The theoretical spectrum of an AR(1) process is [6]: S(f ) = σε2 π (1 − 2α cos (2πf ) + α2 ) (3) where σε2 = σx2 (1 − α2 ); σε2 is the variance of the forcing noise, and σx2 is the variance of the AR(1) time series. Double logarithmic plots of Eq. (3) for three values of α are given in Fig. 1. The log-space spectrum contains two near-linear regions, one at high frequencies with a negative slope and one at low frequencies with zero slope. Nevertheless, nowhere is the log-space spectrum strictly linear, and the aforementioned slope break is obvious. In principle, then, there should be no ambiguity between the spectra of AR(1) and fractal time series. In practise, matters can be less clear. Fig. 1 Theoretical spectra of AR(1) processes from Eq. (3), using three different serial correlation coefficients and σe = 1. Triangles denote values of f ∗ provided by the approximate expression proposed here. Spectra obtained using the FFT for individual realizations of synthetic AR(1) processes with the same three values of α, generated using Eq. (1) with ε drawn from the standard normal distribution, are illustrated in Fig. 2. This far more closely approximates the real experimental situation of analyzing an observed time series. As for all numerical spectrum estimates presented in this article, the time series were each standardized to zero mean and unit variance prior to analysis. For each model, three dataset lengths were considered. For N = 100, the spectra are indistinguishable from those of 1/f β processes. Best-fit linear regression lines to the observational spectra are also illustrated, with the corresponding estimated values of β. In each N = 100 case, a test under the standard assumptions suggests a statistically significant (P < 0.05) linear relationship between ln(f ) and ln(S). At a record length 10 times greater, the ambiguity begins to resolve itself, but it is not until roughly N ∼ 10, 000 that the spectrum is revealed with complete clarity to be non-fractal. The reasons for this ambiguity seem straightforward. The frequency range sampled by spectral analysis of a finite-length observational record (taking the sampling interval Δt = 1 for notational convenience throughout) extends from the Nyquist frequency, fNyq = 0.5, down to the Rayleigh frequency, fRay = 1/N . Thus, the lowest (non-zero) frequency sampled is simply the inverse of the dataset length, and for short records, that frequency is not sufficiently low to capture the slope break so obvious in Fig. 1. Further, the presence of substantial random variability in the observational spectrum of an individual realization www.ann-phys.org c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 958 S. W. Fleming: On detecting dynamical fractals Fig. 2 Numerical spectrum (periodogram) estimates for nine synthetic AR(1) time series, and corresponding logspace linear regressions (estimated slope is −β; P < 0.05 indicates significant relationship). Linear regression is performed over all fRay f fNyq for N = 100, but only (for comparison purposes) over the pseudo-fractal high-frequency region for N = 1000 and 10 000. is such that departures from log-space linearity over the high-frequency, negative-slope portion of the theoretical AR(1) spectrum are for all practical purposes undetectable. What to do about this problem, however, is not quite as straightforward, particularly if the dataset is not amenable to significant extension for purposes of analysis. That some minimum time series length is required to differentiate the spectra of 1/f β and simple linear autocorrelated processes was recognized in the theoretical literature some time ago [8], but a survey of applications papers seems to suggest that this fact is not often accommodated. A couple of techniques might be employed to address the issue but appear to have some practical limitations. One potential criterion is whether the sample autocorrelation function for the finite-length observed time series has decayed to zero by some lag N [8]. If the empirical result is that it has not decayed to zero, however, the method provides little useful information. We would not know how much more data would be required to make the distinction between the spectra of a fractal vs. simple linear process. Further, the criterion means little if the process is actually fractal (and we do not know c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ann-phys.org Ann. Phys. (Berlin) 17, No. 12 (2008) 959 whether it is fractal a priori, or there would be no need for analysis), as in principle the autocorrelation function will in this case never decline to zero; we would always be led to believe that more data are required to make the distinction [8], which is clearly not true. Another practical problem is that it can be difficult to robustly determine whether the sample autocorrelogram of a short, noisy time series has indeed decayed to zero or not. An alternative approach could be to go a step further and conduct a statistical hypothesis test to directly determine whether the observed log-space linear spectrum is attributable to a simple, low-order, linear memory process. However, it is unclear how well this would work when the time series length is too short to sample the slope break in an AR(1) spectrum. Additionally, such tests can be easy to do badly but challenging or time-consuming to do well2, and cutting corners conceptually or procedurally in their implementation or interpretation can sometimes be problematic given the perception of rigour, precision, and objectivity often associated with their results. It is expressly not my intent to discourage use of any of the foregoing techniques. Rather, I make the more modest argument that it could be widely useful – particularly in the context of analyzing short, noisy records generated by complex, open systems – to have a simple and convenient rule of thumb available for determining: (i) whether N is sufficiently large, and fRay is sufficiently low, to distinguish between fractal spectra and those outwardly showing 1/f β scaling but generated by a simpler, linear process; and (ii) if N is too low for this distinction to be made, a ballpark estimate of how much more data would be needed to do so. Such an expression is suggested in the next section, nominally using the AR(1) process as a benchmark. As a primary goal in its design, the method is carefully constructed so as to be very easy to use. Additionally, it does not involve formal hypothesis tests and the complications associated with them (see footnote 2). The price paid for these attributes is that the outcome does not constitute, in the conventional and strict sense, a statistically rigorous or definitive result. The technique might be employed by physics and applied physics practitioners who require only a quick, approximate answer; or similarly, as a rapid screening method prior to investing the additional time and resources required for implementing more sophisticated tests. Before proceeding, a slightly more philosophical point is addressed regarding the suitability and desirability of comparing fractal and low-order linear processes. Many fractal observational spectra show 1/f β scaling down only until some lower cutoff frequency, at which the spectrum becomes white; such frequency cutoffs to fractal S(f ) are additionally predicted from or explained by theoretical studies [8,10]; and in principle, 1/f β scaling ∀f cannot exist (the so-called ultraviolet and infrared catastrophes3 [8, 11]). Comparing these observations against Figs. 1 and 2, it might be argued that the differences between fractal and (for instance) AR(1) spectra are not so important, or that a low-order linear spectrum is just as “fractal” as many other things we normally call fractal. However, the interpretations (in terms of both underlying physical process and broader practical implications) are fundamentally different between a low-order linear process showing de facto 1/f β scaling over a narrow high-frequency bandwidth, and a fractal process 2 Under the complications encountered in practise. One example is non-Gaussian distributions, which in general require data transformations, non-parametric or rank-based tests, or Monte Carlo resampling (randomization or bootstrap) techniques. Another issue is the manner in which hypothesis tests are logically configured, including in particular the question of identifying the most meaningful hypotheses to test under a particular set of experimental circumstances. Conventional formal statistical hypothesis testing forces a strictly binary test between specific and pre-selected null (H0) and alternative (H1) hypotheses; further, the a priori choice of which of the two competing models is assigned null or alternative status largely determines which must bear the burden of proof (the onus is on the alternative hypothesis to demonstrate its superiority). Thus, “I don’t know which model is best – the data are insufficient to tell” is not an explicitly permitted answer, yet it would seem the appropriate answer when records are too short to distinguish the spectra of 1/f β vs. linear low-order processes. Rather, this “insufficient-data” response is typically but implicitly embedded in a failure to reject whatever one happened to choose as the null hypothesis. This issue might be problematic in fractal analyses. Classically, the null hypothesis consists of a simple or no-effect model, which in our case might be a low-order linear memory process, such as the AR(1) model; but in ecology (for instance) it has been suggested that temporal scale invariance is so common it should be viewed as the routine null model [9]. Similar considerations likely apply to experimental time series from many other fields of study. 3 These limits are in fact extremely liberal, with timescales ranging essentially from the cosmological (e.g., age of the universe) to the quantum (e.g., time required to cross the Compton wavelength for an electron at the speed of light), and thus likely have little practical relevance to observational time series analysis [11, 12]. www.ann-phys.org c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 960 S. W. Fleming: On detecting dynamical fractals showing scale invariance over many (or effectively all) octaves. Further, one might wonder whether some of the experimental studies (particularly those involving analysis of complex open systems with short observational records) nominally identifying a dynamical fractal with a low-frequency cutoff, might reflect misinterpretation of a simpler spectrum, particularly given that the low-order linear alternative often remains unmentioned and untested. Conversely, comparisons between more sophisticated traditional time series models and the fractal hypothesis, though no doubt important in some contexts, seem generally less meaningful and relevant. For instance, both ARIMA(1,0,1) models, and ffGn generators consisting of weighted sums of AR(1) processes, can be used to approximate a fractal time series [13], so the differences are minimal by construction. Overall, then, developing and applying a back-of-the-envelope calculation to support differentiation between fractal and low-order linear memory processes seems worthwhile and prudent. 3 An approximate rule for minimum record length Red noise4 corresponds by definition to autocorrelation, and white noise to serial independence. Hence, a 1/f β process is autocorrelated over a very wide range of (strictly, all) timescales, whereas the whiteto-red transition in the AR(1) spectrum (the slope break in Fig. 1) corresponds to autocorrelation only over small lags. The proposed technique involves using the autocorrelation function of an AR(1) process to develop a simple analytical expression for estimating the dataset length required to yield a sufficiently broad frequency range for sampling this spectral transition. The theoretical autocorrelation function of the AR(1) process is [6]: ρ(τ ) = ατ (4) where τ is the lag. Although ρ(τ ) → 0 only as τ → ∞, it typically decays to a negligible value, δ, fairly rapidly. Define5 τ ∗ = τ |ρ(τ ) = δ, or from Eq. (4): τ∗ = ln(δ) . ln(α) (5) Then f ∗ = 1/τ ∗ is the frequency at which the AR(1) spectrum begins to transition from white noise to an inverse frequency-scaled spectrum. Clearly, f ∗ is the minimum Rayleigh frequency required to sample that slope break, and its inverse (τ ∗ ) is the corresponding minimum record length, N ∗ . To produce a usable rule-of-thumb, a specific value for δ must be selected. There is no obvious single right number. Also, in practise α must be estimated from the observational dataset (as the sample correlation coefficient between the time series and a lagged copy of itself), and will therefore be subject to some random sampling variability. Thus, it seems appropriate to set δ conservatively, i.e., to yield an upper-end τ ∗ (lower-end f ∗ ) estimate. Experimentation suggested δ = 10−30 is a reasonable and robust value in practise. Corresponding theoretical values of f ∗ for α = 0.4, 0.7, and 0.9 are illustrated on Fig. 1 for reference. 4 By way of nomenclatural caveats: (i) a wide range of noise colour definitions have been suggested by various authors in different branches of the scientific literature. Here, I use a simple and parsimonious analogy to the visible light spectrum. Specifically, “red noise” is used to generically denote a preponderance of signal power at low f and a dearth at high f , and of course “white noise” to indicate S = g(f ), i.e., a flat spectrum in the mean; (ii) one person’s noise is another person’s signal. Strictly speaking, then, appropriateness of the term “noise” is context-dependent. Here, I follow convention and use the term broadly; (iii) although 1/f β components are often referred to as noises, it should be noted that the genetics of fractal processes may span fundamentally stochastic behaviour; effectively stochastic behaviour arising from deterministic systems with high degrees of freedom; or fully deterministic behaviour arising from low-DOF but nonlinear processes. 5 This can be viewed as amounting to a measure of the “decorrelation time” (or sometimes “correlation time”). Another commonly invoked linear memory metric is the e-folding time of the autocorrelation function; however, experimentation suggested that this produces τ estimates far too small (f too high) for reliable practical discrimination between the observational spectra of fractal and low-order linear processes. c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ann-phys.org Ann. Phys. (Berlin) 17, No. 12 (2008) 961 The rule is applied as follows. If observed S(f ) ∝ 1/f β and N N ∗ (or equivalently, fRay f ∗ ), then in general insufficient data are available to safely determine whether the generating dynamics are fractal or low-order linear. Further, at an absolute minimum (in fact, more than; see immediately below) N ∗ − N additional measurements would in general be required to capture the diagnostic white-to-red spectral transition of a simple autocorrelated linear process. Alternatively, if observed S(f ) ∝ 1/f β and N N ∗ (or fRay f ∗ ), then the process can (perhaps preliminarily but reasonably) be identified as fractal. Note that for clear empirical identification of the slope break in an observed and therefore noisy AR(1) spectrum, it is obviously inadequate to have S(f ) estimates only over f f ∗ . Rather, N must be sufficiently large that S(f ) extends for some distance on either side of f ∗ , extensively sampling both the white and red noise domains, hence the N N ∗ and fRay f ∗ constraints above. An alternative method was additionally developed on the basis of Eq. (3). Given that an obvious criterion for a rule-of-thumb is maximum operational simplicity, this second approach is felt to be inferior to that outlined above. However, brief documentation seems appropriate (see Appendix). 4 Applications 4.1 Controlled tests (synthetic data) Application of the foregoing rule-of-thumb is first explored using numerical experiments. The first two cases are straightforward; the second two numerical examples consider somewhat more complicated circumstances. For conciseness and consistency, in each case the same two record lengths (short, N = 100; long, N = 10, 000) are considered. The first synthetic example (top row, Fig. 3) consists of an AR(1) process with α = 0.7 and ε drawn from the exponential (Boltzmann) distribution with λ = 0.5 (μ = 1/λ = 2, σ 2 = μ2 = 4). For N = 100, S(f ) appears to scale as 1/f β . However, for the observed sample autocorrelation coefficient of 0.698, (N ∗ = 192) > (N = 100); or equivalently, (f ∗ = 4.99 × 10−3 ) < (fRay = 1 × 10−2 ). Thus, had this time series and its spectrum been encountered observationally, we would know from our simple calculation that the dataset length is insufficient to determine whether the process is fractal, and that more than N ∗ −N additional observations would have been required to do so. The observational spectrum for the long record confirms of course (as N N ∗ ) that the process is non-fractal; note also that estimated f ∗ corresponds reasonably well to the initiation of the white-to-red spectral transition with increasing f , as expected. Random sampling variability between the two time series accounts for the slight difference in estimated α (and therefore N ∗ and f ∗ ) between the N = 100 and N = 10, 000 datasets. Overall, the rule-of-thumb performs exactly as desired. The second example is a dynamical fractal consisting of the (fully deterministic) first-order nonlinear iterative map suggested by Procaccia and Schuster [10, 11]: xt = xt−1 + x2t−1 mod (1) . (6) Equation (6) gives 1/f β spectra with β = 1, i.e., classical “flicker noise.” As in the first example, at N = 100 scale invariance is observed, but the rule-of-thumb indicates the record length is too short for reliably discriminating between a genuinely fractal vs. low-order linear generating process (second row, Fig. 3). At least as importantly, however, at N = 10, 000 1/f β scaling is again apparent and (N ∗ = 385) (N = 104 ), or equivalently (f ∗ = 2.5 × 10−3 ) (fRay = 10−4 ). Thus, had this latter time series and its spectrum been encountered experimentally, and one did not know (as would often be the case in experimental practise) its fundamental character and origins a priori, application of the proposed simple construction would have correctly informed us that the dataset is indeed sufficiently long to identify the process as fractal. Again, the suggested record length constraint performs as planned. The third example considers a slightly more complicated scenario. Although cellular automata were devised and refined by von Neumann and others, the subsequent Bak–Tang–Weisenfeld (sandpile) CA www.ann-phys.org c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 962 S. W. Fleming: On detecting dynamical fractals Fig. 3 Numerical tests for rule-of-thumb. Grey dashed line illustrates ln(f ∗ ). c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ann-phys.org Ann. Phys. (Berlin) 17, No. 12 (2008) 963 [14, 15] played an important role specifically in the field of fractal dynamics, by helping to phrase the question in terms of self-organized criticality and emergent behaviour. A BTW cellular automaton was constructed6 following Turcotte [16], including use of the average number of particles within the grid at the end of each time step as the output time series. Application of the rule-of-thumb (third row, Fig. 3) yields anomalous results, with very large N ∗ and very low f ∗ . This is easily explained but illustrates an important point. As in other studies (see [16] for a review), this CA was found to give 1/f β noise with β = 2, which is consistent with a random walk: xt = xt−1 + εt . (7) Equation (7) is in turn simply Eq. (1) with α = 17. Thus, in a highly restricted but important sense, there is no ambiguity between an AR(1) process and a random walk, and there is neither need nor applicability for our rule-of-thumb when β = 2. The range of application is therefore evidently limited to 0 < α < 2. Nevertheless, many observed and interesting 1/f β spectra fall within this range, including the quintessential case of 1/f flicker noise discussed above8. Our overall approach is to develop a very easily applied scoping calculation, by assuming that an AR(1) model can be used as a benchmark to approximately but adequately represent low-order linear memory processes in general. The AR(1) process was specifically selected because it is generally meaningful and relevant, and has previously been used as a general linear memory parameterization (Sect. 2); and because it readily yields the desired straightforward construction (Sect. 3). In our fourth example, we assess how sensitive the results are to that assumption by considering a time series generated by an autoregressivemoving average (ARMA) process having second-order autoregressive, and first-order moving average, components: xt = α1 xt−1 + α2 xt−2 + θ1 εt−1 + εt (8) and then applying the rule-of-thumb, again pretending that we do not know the actual nature of the dataset. Driving noise was drawn from the standard Lorentzian (Cauchy) distribution, i.e., with location and shape parameters of null and unity respectively. A few ARMA(2,1) model parameter values were explored. The results illustrated in Fig. 3 (bottom row) seem generally representative, and use α1 = 0.5, α2 = 0.2, and θ1 = 0.3. The performance of the rule-of-thumb is essentially identical to that for an AR(1) process (Fig. 3, top row). Presumably, a parameterization for an ARMA(2,1) (or another, non-AR(1), low-order linear memory process) could be contrived such that our approximate calculation might fail. However, the results provide some reassurance that minor to moderate violations of its underlying assumptions do not automatically incur disaster. 4.2 Uncontrolled tests (observational data) Small N is frequently encountered in the passive observation and subsequent analysis of the net dynamics generated by large-scale open systems consisting of multiple, often nonlinearly interacting processes. There are two main reasons for this. First, and most fundamentally, we must make do with whatever data happen to have been collected to date, and this is often not much. Second, it is commonly useful or necessary to manipulate the data to isolate a characteristic or phenomenon of interest, but in such a way Avoiding finite-size effects over the full frequency range effectively spanned by a N = 104 time series required a large grid; 400 × 400 proved sufficient. Note that this in turn seemed to require a long burn-in time before a representative “observed” time series could be extracted. 7 For which τ ∗ is undefined. In practice, the sample autocorrelation coefficients were very slightly less than unity, giving very large but non-infinite τ ∗ . 8 This limitation may further diminish in importance if one chooses not to view the random walk as a fractal, its 1/f β spectrum notwithstanding. It is perhaps a matter of preference whether fractional and classical Brownian motions are regarded as distinct process classes, or instead the former as a general theory with the latter as a special case. There appears to be precedence in the literature for both interpretations. 6 www.ann-phys.org c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 964 S. W. Fleming: On detecting dynamical fractals (e.g., by aggregation and/or downsampling) that the record length available is further reduced (though not, of course, the actual timeframe spanned by it); or alternatively, the measured quantity only has meaning at a low sampling frequency. River flow data clearly exemplify some of these issues. Streamflow records are rarely more than a century in duration, and typically very much less, particularly outside of Europe. Further, streamflow data are usually available as a daily time series, but for design purposes an engineer will often perform flood frequency analysis using a yearly maximum daily discharge record, (max) = {sup(Qj,i=1,365 ), j = 1, N } where Qj,i is measured streamflow on the ith day of the j th year, Qj and N is the number of complete years of record and thus the length of the final Q(max) time series; whereas for assessing fish habitat availability changes, a biologist will often consider a record of annual k=i+3 (7) (MA) (MA) = (1/7) Qj,k . These minimum 7-day mean flows, Qj = {inf(Qj,i=1,365 ), j = 1, N } with Qj,i k=i−3 two metrics not only present different practical implications, but also speak to different sets of physical conditions and processes, long-term (interannual or interdecadal) variabilities in which are often difficult to study by considering the raw continuous daily flow time series. The peak flows captured by Q(max) reflect a wide variety of runoff-generating mechanisms, whereas the low flows captured by Q(7) will typically be supported solely by watershed storage depletion; day-to-day terrestrial hydrologic variations are often driven by different dynamics than year-to-year variations; and the annual flood and drought will in most regions occur in different seasons, which has consequences for climate variability and change impact analysis. El Niño-Southern Oscillation (ENSO), for instance, is phase-locked to the annual cycle with northern hemisphere teleconnections generally being strongest in winter, coinciding with the freshet peak (and not late-summer low flows) in rainfall-driven rivers. Other important and widely used water quantity and quality metrics most (or indeed only) meaningful on a yearly sampling frequency, though typically derived from high-frequency data, include freshet timing, total annual volume, and the calendar date on which a certain number of degree-days (corresponding to salmon alevin emergence, for instance) has been seasonally accumulated. The list goes on, and similar lists can be drawn for other fields of study, particularly but by no means exclusively in the geophysical and ecological sciences. The net effect of limited historical records and de facto downsampling to a derived quantity is often a time series length about on par with the smaller values of N ∗ seen in the numerical examples above. A third general cause of (effectively) short records is the use of data windowing in the analysis process, to track longer-term variations in the dynamical characteristics of the dataset. Three observational examples of the foregoing general types are considered. It is to be understood that the purpose is solely to illustrate practical application of the proposed approximate expression for minimum record length, and although a few words of context are provided in each case, no attempt whatsoever is made to thoroughly study these datasets or the dynamical systems they capture. Time series were linearly detrended and standardized to zero mean and unit variance prior to spectral analysis. Because we are not strongly concerned in these examples with the precise value of the estimated fractal dimension, and also because it is unclear whether all of the particular observational time series considered are defined for timescales shorter than the sampling frequency, correction for the β underestimation due to aliasing [17] that can occur under certain circumstances is not undertaken here (though the general value of such corrections is in no way questioned). Application of our back-of-the-envelope calculation involves only obtaining an estimate for the lag-1 linear serial correlation coefficient (readily accomplished in a spreadsheet application, for instance), and plugging this value into Eq. (5); the task can be completed in such a manual fashion in under a minute. The spectral analyses presented here were performed using a Matlab script written for the purpose, and within which the N ∗ estimation procedure was readily automated. This convenience is a key purpose and advantage of the technique. The steelhead is a variety of rainbow trout (Oncorhynchus mykiss) in Pacific coastal North America, including British Columbia, Canada. Though not formally considered a distinct subspecies, steelhead show a number of significant differences, most notably in terms of life history: like Pacific salmon, steelhead are anadromous, splitting their lives between fresh and salt water environments. Population size variations c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ann-phys.org Ann. Phys. (Berlin) 17, No. 12 (2008) 965 arise from geophysical, ecological, and anthropogenic sources, such as the impacts of coherent modes of atmosphere-ocean circulation (climate) variability upon both the fresh water and marine phases of steelhead life history, internal ecological system dynamics, habitat degradation or restoration, and accidental by-catch in the commercial salmon fishery. Through parts of its range, steelhead populations have collapsed. As noted earlier, fractal scaling is commonplace in ecological time series. Its general implications for population variability and crashes thus seem potentially relevant to British Columbia steelhead stock fluctuations, but the latter question does not appear to have been broached. For obvious reasons, fish in general (and perhaps anadromous and migratory species like salmon and steelhead in particular) are not easy to track and count. Catch data are therefore often used as an approximate but useful metric of relative variations in population over time. For both fundamental and practical reasons,9 data are normally collected, presented, and/or only meaningful as an annual time series. With no commercial harvest, steelhead catches occur in the (limited and heavily regulated) freshwater recreational fishery and are monitored primarily using surveys. Here I consider 1968–2002 wild (i.e., excluding hatchery) steelhead annual catch per angler-day10 [18], as inferred from data made openly available online for the Capilano River [19], a south coastal British Columbia watershed under significant anthropogenic environmental pressures. Results (Fig. 4, top row) indicate 1/f β spectral behaviour, but N ∗ = 86 whereas N = 35. Thus, the apparent power-law scaling over the sampled bandwidth is consistent with both fractal dynamics and a low-order linear memory process, the dataset length being insufficient to discriminate between the two potential source types. As the criterion for record length sufficiency is N N ∗ , the present exploratory results suggest that we could need well over a century of steelhead population data to reliably make that distinction. Clearly, the example is provided only for methodological demonstration purposes, and more work would be required before reaching any firm conclusions; nevertheless, the result may serve as a poignant reminder of the central importance of long-term monitoring to the science and successful management of complex environmental systems [20]. Total annual flow volume is a commonly used measure of the overall water resource productivity of a river basin. Loosely speaking, rivers show variability at daily, seasonal, and interannual timescales, each with somewhat different (though often overlapping) controls. Time-integration of high-frequency streamflow data over the course of a year is such that this metric speaks directly to total water supply availability, irrespective of the detailed dynamics of individual meteorological events, terrestrial hydrologic responses to those events, or seasonal timing variations. If the catchment is pristine, or if the data have been suitably processed to account for time-variation in watershed-specific anthropogenic flow alterations, then annual flow volumes in general reflect the net water supply effects of longer-term climatic variability and change. River flows have long been the subject of fractal analyses, including the early work of Hurst and Mandelbrot. Here, we consider annual flow volumes of the Columbia River as measured at The Dalles, Oregon. The Columbia is listed as the world’s 18th largest river by flow [21], and its flow variability has strong implications for water supply, fisheries, flooding, and hydropower generation through large areas of southwest Canada and the northwest contiguous US. The river is also the subject of international treaty by virtue of its transboundary geography. Data were obtained from a Bonneville Power Administration report [22] which corrected 1929–1998 historical observations to conditions in 2000, dynamically compensating for such anthropogenic effects as water withdrawals, land use change, and damming. The results (Fig. 4, second row) indicate 1/f β spectral behaviour. Application of the proposed calculation for minimum record length suggests that this dataset lies in somewhat of a grey area, with N > N ∗ but not N N ∗ . Such are the limitations of a simple rule-of-thumb, its advantages notwithstanding. Nevertheless, on the basis of these results and the large body of prior work providing widespread evidence for fractal dynamics in 9 For steelhead and salmon, these might include a number of related issues: the discrete-time nature of biological generations; phase-locking of multi-year anadromous salmonid life cycles to the seasonal cycle, and in particular the approximately fixed seasonal timing of the return to fresh water to spawn; the generally seasonal nature of the harvest and therefore of catch data, perhaps especially for recreational freshwater steelhead angling; and the resources involved in generating each set of official statistics. 10 Generally similar results were obtained considering time series of wild steelhead catch numbers, unadjusted for angler effort. www.ann-phys.org c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 966 S. W. Fleming: On detecting dynamical fractals Fig. 4 Observational examples. Notation follows that of Figs. 2 and 3. c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ann-phys.org Ann. Phys. (Berlin) 17, No. 12 (2008) 967 river flows, it seems reasonable to conclude that there is some tentative observational support for scale invariance in the total annual flow volume time series for the Columbia River. Financial time series have similarly long been a focus for fractal studies. A sufficiently large body of work has been accumulated to form some firm generalizations regarding scale invariance in stock prices and price volatilities [23]. Specifically, daily prices, p, can be approximated as a geometric random walk, ln(pt ) = ln(pt−1 ) + εt ; thus, the power spectrum of ln(p) scales as 1/f β with β = 2, and the return series11, rt = ln(pt ) − ln(pt−1 ), is a white noise process. This default model additionally holds that the volatility series, σt = |rt |, is generally fractal with a value of β which varies between datasets. However, many questions remain. For example, the usefulness of the foregoing random walk view notwithstanding – it underlies the Black-Scholes options pricing model, for instance – both the model and the efficient market hypothesis to which it is related are known to be significantly incomplete descriptions of stock market dynamics. Additionally, the fractal characteristics of commodity markets have been less exhaustively studied, and work on electricity prices in deregulated markets, for instance, suggests that they may follow a different model [25]. Another question might be whether, for a given stock just introduced to the market, its volatility is fractal right out of the starting gate, or some spin-up required. This latter question is preliminarily explored here, in an interesting toy problem intended solely as a final demonstration of the proposed rule-of-thumb. Details of the particular company considered are not strongly relevant for the purposes of this article; it was selected simply because the author happened across free, publicly available, open-access information necessary to explore the question posed (specifically, the date of its initial public offering, and daily closing prices extending back to that IPO). These price data were processed to form a daily volatility series according to the above definition. Spectral analysis of the first 90 days12 of volatility data suggests 1/f β scaling, but the rule-of-thumb indicates that a much longer time series is required in this case to distinguish between fractal vs. low-order linear memory processes (Fig. 4, 3rd row). Considering instead the first 365 days13 of volatility data, N N ∗ and we can clearly identify fractal dynamics (Fig. 4, bottom row). Overall, the proposed rule-of-thumb seems useful for approximately delineating the shortest window appropriate for distinguishing dynamical fractals: in this case, the first quarter appears too brief a period of time to resolve true 1/f β behaviour in the volatility of this stock, but the scale invariance generally typical of stock price volatility seems unambiguously evident by the end of the first full year of trading. 5 Conclusions An analytical expression and decision rule is suggested for assessing whether an observational record is sufficiently long to discriminate between fractal scaling and low-order linear memory on the basis of its numerical power spectrum. The technique is crafted to maximize simplicity of operational implementation. It is anticipated that its primary use is in circumstances where an approximate estimate will suffice, or as a pre-screening technique prior to investing further time and resources in more detailed analyses. Tests on synthetic datasets demonstrate that the technique functions as intended; and further, that reasonable violations of the assumptions employed to achieve a simple configuration do not appear to severely diminish performance. Observational examples are presented which help demonstrate the range of circumstances in which the rule-of-thumb might prove useful, and illustrate its application. 11 This return definition is standard in econophysics, and is effectively the same as that more broadly employed in finance and business if day-to-day price changes are small relative to the price itself. This criterion is satisfied for many financial time series. The volatility definition used here is also standard in econophysics, and is equivalent to that more broadly employed in finance (the standard deviation of returns over some finite moving window) if the window length is taken to be unity [24]. 12 Unlike the foregoing observational examples, where N was fixed by data availability, the precise choice of N here is somewhat arbitrary. A variety of values were experimented with and gave mutually consistent results, but for presentation purposes, 90 and 365 days were selected as they intuitively reflect a fiscal quarter and year, respectively. Of course, the correspondence is far from exact, because the initial public offering did not occur at the start of a fiscal year, and because time is defined here [22] in terms of consecutive trading (rather than calendar) days. 13 Similar results were obtained using all the available daily volatility data, spanning > 3 yr. www.ann-phys.org c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 968 S. W. Fleming: On detecting dynamical fractals Appendix In essence, the question is to estimate the minimum Rayleigh frequency required to sample the white-tored transition in the noise spectrum of a low-order linear process. The proposed rule-of-thumb provides an answer on the basis of a theoretical autocorrelation function. This is possible because serial dependence and red noise are essentially synonymous, and thus so too are the decay of the autocorrelation function and the red-white spectral transition. An alternative approach is to determine f ∗ directly from the slope break in the log-space representation of the theoretical AR(1) spectrum. Taking the logarithm of Eq. (3) and substituting x = ln(f ) gives: (A.1) ln(S) = ln σx2 (1 − α2 ) − ln π − 2πα cos [2π exp(x)] + πα2 . Using d[ln(u)] = u−1 du and reversing the substitution gives the local log-space slope of the AR(1) spectrum, i.e., an explicit form for the first derivative of the curves shown in Fig. 1: d[ln(S)] 4παf sin(2πf ) = . d[ln(f )] 2α cos(2πf ) − α2 − 1 The slope break then occurs at: d[ln(S)] = δf f ∗ = f d[ln(f )] (A.2) (A.3) where δf is an arbitrarily small number, akin to δ in Eq. (5), and arising for essentially the same reason: d[ln(S)]/d[ln(f )] → 0 only as f → 0. Again, N ∗ = 1/f ∗ , and α would in practise be estimated as the sample lag-1 serial correlation coefficient of the observed dataset. However, the need to set a value for δ f makes this approach no more or less objective than that proposed earlier. Further, Eq. (A.2) and (A.3) are more cumbersome to use than the utterly simple Eq. (5). References [1] M. Seiffert, A. Mennella, C. Burigana, N. Mandolesi, M. Bersanelli, P. Meinhold, and P. Lubin, Astron. Astrophys. 391, 1185 (2002). [2] C. Booy and D. R. Morgan, Can. J. Civ. Eng. 21, 150 (1985). [3] B. Pilgram and D. T. Kaplan, Physica D 114, 108 (1998). [4] S. W. Fleming, Can. J. Phys. 85, 279 (2007). [5] H. von Storch and F. W. Zwiers, Statistical Analysis in Climate Research (Cambridge University Press, Cambridge UK, 1999). [6] C. Chatfield, The Analysis of Time Series 5th edn. (Chapman and Hall, London, 1996). [7] S. W. Fleming, Acta Geophys. 55, 359 (2007). [8] J. Theiler, Phys. Lett. A 155, 480 (1991). [9] J. M. Halley, Trends Ecol. Evol. 11, 33 (1996). [10] I. Procaccia and H. Schuster, Phys. Rev. A 28, 1210 (1983). [11] E. Milotti, 1/f Noise: A Pedagogical Review (arXiv:physics/0204033v1 [physics.class-ph], 2002). [12] I. Flinn, Nature 219, 1358 (1968). [13] R. L. Bras and I. Rodrı́guez-Iturbe, Random Functions and Hydrology (Dover, New York, 1993). [14] P. Bak, C. Tang, and K. Wiesenfeld, Phys. Rev. Lett. 59, 381 (1987). [15] P. Bak, C. Tang, and K. Wiesenfeld, Phys. Rev. A 38, 364 (1988). [16] D. L. Turcotte, Rep. Prog. Phys. 62, 1377 (1999). [17] J. W. Kirchner, Phys. Rev. E 71, 066110-1 (2005). [18] B. D. Smith, B. R. Ward, and D. W. Welch, Can. J. Fish. Aquat. Sci. 57, 255 (2000). [19] BC Conservation Foundation, Greater Georgia Basin Steelhead Recovery Plan, Capilano River Watershed (http://www.bccf.com/steelhead/r2-focus3.htm#cap; accessed March 31, 2008). c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.ann-phys.org Ann. Phys. (Berlin) 17, No. 12 (2008) 969 [20] E. Nisbet, Nature 450, 789 (2007). [21] L. B. Leopold, A View of the River (Harvard University Press, Cambridge USA, 1994). [22] Bonneville Power Administration, 2000 Level Modified Streamflow, 1928–1999 (BPA, Portland, OR, USA, 2004). [23] R. N. Mantegna and H. E. Stanley, An Introduction to Econophysics: Correlations and Complexity in Finance (Cambridge University Press, Cambridge UK, 2000). [24] G. Qiu, D. Kandhai, and P. M. A. Sloot, Phys. Rev. E 75, 046616 (2007). [25] R. Weron, Measuring Long-Range Dependence in Electricity Prices (arXiv:cond-mat/0103631v1 [cond-mat.stat-mech], 2001). www.ann-phys.org c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

1/--страниц