close

Вход

Забыли?

вход по аккаунту

?

Approximate record length constraints for experimental identification of dynamical fractals.

код для вставкиСкачать
Ann. Phys. (Berlin) 17, No. 12, 955 – 969 (2008) / DOI 10.1002/andp.200810329
Approximate record length constraints for experimental
identification of dynamical fractals
Sean W. Fleming1,2,∗
1
2
BC Hydro, Generation Resource Management, E15, 6911 Southpoint Drive, Burnaby,
BC V3N 4X8 Canada
Department of Geography, University of British Columbia, 1984 West Mall, Vancouver,
BC V6T 1Z2 Canada
Received 21 May 2008, accepted 22 October 2008 by U. Eckern
Published online 24 November 2008
Key words Nonlinear dynamics, statistical mechanics, power spectrum, time series analysis,
scale invariance.
PACS 05.40.-a, 05.45.Tp, 05.45.Df, 87.23.Cc, 92.40.Zg, 89.65.Gh
The ambiguity that can exist, for short datasets, between the observational power spectra of dynamical
fractals and low-order linear memory processes is demonstrated and explained. It is argued that it could be
broadly useful to have a highly practical rule-of-thumb for assessing whether a data record is sufficiently
long to permit distinguishing the two types of processes, and if it is not, to produce an approximate estimate
of the amount of additional data that would be required to do so. Such an expression is developed using the
AR(1) process as a loose benchmark. Various aspects of the technique are successfully tested using synthetic
time series generated by a range of prescribed models, and its application and relevance to observational
datasets is then demonstrated using examples from mathematical ecology (wild steelhead population size),
geophysics (river flow volume), and econophysics (stock price volatility).
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
1 Introduction
A time series, x(t), t = 1, N , is commonly deemed a dynamical fractal if its spectral behaviour is described
by S(f ) ∝ 1/f β , i.e., if its power spectrum scales linearly in log space with slope −β. This scaling exponent is related to fractal dimension as D = (5 − β)/2, and in turn to the Hurst exponent as H = 2 − D. The
intuitive meaning of such behaviour is a temporal equivalent to spatial fractals: small, frequent variations
are superimposed on infinitesimally larger and less frequent variations, which are in turn superimposed
upon still bigger and slower variations, and so forth. The result is self-affinity (or under certain conditions,
self-similarity), which can loosely be taken to mean that if one magnified the dataset to obtain a fine-scale
view, or took a step back for a broader-scale view, those data sections would have much the same overall
look and feel. A core implication1 is that one of the most basic concepts in the quantitative analysis, design,
and management of both natural and human dynamical systems – a characteristic time scale – can have little meaning for fractal processes. Some additional key consequences can include very long-range memory,
very large potential fluctuation magnitudes, and temporal clustering of extreme events. Such scale-free dynamics have been encountered in fields of study as diverse as environmental contaminant transport, stock
∗
1
E-mail: fleming sean@hotmail.com, sean.fleming@bchydro.com
With respect to the noise signature itself. Observational spectra for complex open systems commonly reflect the linear or
nonlinear combination of several or many processes; some of these will have well-defined timescales. Superposition of an
annual cycle and fractal noise would be an obvious and common example.
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
956
S. W. Fleming: On detecting dynamical fractals
price fluctuations, meteorological variability, solar and black hole physics, ecosystem dynamics, electronic
component design, automobile traffic flow, and human physiology, to name just a few. Some of these applications may be esoteric, but others are quite practical. The latter range from general insights into such
broadly relevant phenomena as crashes in both biological populations and stock markets, to specific design
criteria for noise suppression algorithms for PLANCK satellite radiometer measurements of the cosmic
microwave background [1] and flood frequency analysis in support of civil works planning in the Red
River valley [2].
Various methods are available for identifying fractal dynamics in observational time series. The most
common approach – likely because of the relative familiarity and computational accessibility of the mathematical and software tools involved (facilitating analysis), and because it is phrased clearly and explicitly in
terms of the fundamental definition of fractal dynamics (facilitating interpretation) – is to empirically evaluate a double-log plot of the numerical spectrum for linear behaviour. Many variations on this theme exist,
but typically the fast Fourier transform (FFT) is used to obtain X(f ) from observed x(t), and some measure of the power spectral density or periodogram is calculated from X(f ), such as S(f ) = (N/2)|X(f )|2 .
Then, a least-squares regression fit, ln(S) = b − β ln(f ), usually with the associated significance test on
the estimated slope value, is performed to assess linearity. Attention is restricted here to this very common,
and generally well-performing [3], technique. Given the reasonably fundamental nature of the issues raised
below, however, it seems reasonable to at least conjecture that other techniques (detrended fluctuation analysis, rescaled range (R/S, Hurst) analysis, temporal box-counting, maximum likelihood estimation, etc.)
may be affected by similar or analogous considerations.
Specifically, for short time series records, there can be ambiguities between the log-space power spectral behaviour of dynamical fractals versus simpler, linear autocorrelated processes. Following the lead
of others, I use here the linear first-order autoregressive, or AR(1), process (see, e.g., [4–6]; summary
provided below) as a widely familiar, meaningful, and relevant benchmark, and present numerical experiments that demonstrate the nature of this interpretational ambiguity. The results echo findings presented
much earlier in the literature, but which do not appear to be broadly and consistently recognized in practical
applications. I then introduce a straightforward analytical expression which may serve as a widely useful
rule-of-thumb for constraining such ambiguity in practice. The utility of this simple calculation is subsequently demonstrated through numerical experiments on synthetic fractal and low-order linear time series,
and by application to three observational records, drawn from population biology, hydroclimatology, and
quantitative finance. The emphasis here is on the development of readily applied knowledge for the analyst
with a pragmatic interest in quickly and conveniently assessing observational time series – in particular but
not exclusively, short and noisy records generated by complex, open systems – for fractal dynamics.
2 Experimental ambiguities between AR(1) and fractal spectra
The first-order linear autoregressive, or AR(1), process is defined as:
xt = α xt−1 + εt
(1)
where 0 < α < 1 is the lag-1 serial correlation coefficient, and ε is white (serially uncorrelated, but not
necessarily Gaussian) driving noise. Note that α ultimately amounts to a damping parameter, controlling
how quickly the system recovers from random external shocks (also called interventions). The AR(1)
model is a Markov process which can arise from time discretization of linear ordinary differential equations
of the form:
a1
dx (t)
+ a0 x(t) = z(t)
dt
(2)
where x is the state variable; z is a stochastic forcing function, equal to ε(a0 + a1 ); and a1 and a0 are
coefficients related to the AR(1) parameter as α = a1 /(a0 + a1 ) [4–6]. Some form of Eq. (2) appears
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.ann-phys.org
Ann. Phys. (Berlin) 17, No. 12 (2008)
957
in many branches of theoretical and applied physics, ranging from a type of Langevin equation in statistical mechanics [6] to the linear reservoir equation for short-term river flow variations in geophysics
and engineering [4], and thus so too does the physical basis for AR(1) processes. By the same token, the
AR(1) process is a basic building block of most statistical time series models, including the ubiquitous
Box-Jenkins approach, and is employed throughout the physical, biological, and social sciences due to its
simplicity, interpretability, and effectiveness. Additionally, and perhaps of particular note for our purpo ses
here, the AR(1) model is frequently used as a straightforward and tractable measure or parameterization
of linear dynamical memory during the course of, and to facilitate, analyses of other kinds of behaviour
(see [7] and citations therein).
The theoretical spectrum of an AR(1) process is [6]:
S(f ) =
σε2
π (1 − 2α cos (2πf ) + α2 )
(3)
where σε2 = σx2 (1 − α2 ); σε2 is the variance of the forcing noise, and σx2 is the variance of the AR(1) time
series. Double logarithmic plots of Eq. (3) for three values of α are given in Fig. 1. The log-space spectrum
contains two near-linear regions, one at high frequencies with a negative slope and one at low frequencies
with zero slope. Nevertheless, nowhere is the log-space spectrum strictly linear, and the aforementioned
slope break is obvious. In principle, then, there should be no ambiguity between the spectra of AR(1) and
fractal time series. In practise, matters can be less clear.
Fig. 1 Theoretical spectra of AR(1) processes from Eq. (3),
using three different serial correlation coefficients and σe =
1. Triangles denote values of f ∗ provided by the approximate
expression proposed here.
Spectra obtained using the FFT for individual realizations of synthetic AR(1) processes with the same
three values of α, generated using Eq. (1) with ε drawn from the standard normal distribution, are illustrated
in Fig. 2. This far more closely approximates the real experimental situation of analyzing an observed
time series. As for all numerical spectrum estimates presented in this article, the time series were each
standardized to zero mean and unit variance prior to analysis. For each model, three dataset lengths were
considered. For N = 100, the spectra are indistinguishable from those of 1/f β processes. Best-fit linear
regression lines to the observational spectra are also illustrated, with the corresponding estimated values
of β. In each N = 100 case, a test under the standard assumptions suggests a statistically significant
(P < 0.05) linear relationship between ln(f ) and ln(S). At a record length 10 times greater, the ambiguity
begins to resolve itself, but it is not until roughly N ∼ 10, 000 that the spectrum is revealed with complete
clarity to be non-fractal.
The reasons for this ambiguity seem straightforward. The frequency range sampled by spectral analysis
of a finite-length observational record (taking the sampling interval Δt = 1 for notational convenience
throughout) extends from the Nyquist frequency, fNyq = 0.5, down to the Rayleigh frequency, fRay =
1/N . Thus, the lowest (non-zero) frequency sampled is simply the inverse of the dataset length, and for
short records, that frequency is not sufficiently low to capture the slope break so obvious in Fig. 1. Further,
the presence of substantial random variability in the observational spectrum of an individual realization
www.ann-phys.org
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
958
S. W. Fleming: On detecting dynamical fractals
Fig. 2 Numerical spectrum (periodogram) estimates for nine synthetic AR(1) time series, and corresponding logspace linear regressions (estimated slope is −β; P < 0.05 indicates significant relationship). Linear regression is
performed over all fRay f fNyq for N = 100, but only (for comparison purposes) over the pseudo-fractal
high-frequency region for N = 1000 and 10 000.
is such that departures from log-space linearity over the high-frequency, negative-slope portion of the
theoretical AR(1) spectrum are for all practical purposes undetectable.
What to do about this problem, however, is not quite as straightforward, particularly if the dataset is
not amenable to significant extension for purposes of analysis. That some minimum time series length is
required to differentiate the spectra of 1/f β and simple linear autocorrelated processes was recognized in
the theoretical literature some time ago [8], but a survey of applications papers seems to suggest that this
fact is not often accommodated. A couple of techniques might be employed to address the issue but appear
to have some practical limitations. One potential criterion is whether the sample autocorrelation function
for the finite-length observed time series has decayed to zero by some lag N [8]. If the empirical result is
that it has not decayed to zero, however, the method provides little useful information. We would not know
how much more data would be required to make the distinction between the spectra of a fractal vs. simple
linear process. Further, the criterion means little if the process is actually fractal (and we do not know
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.ann-phys.org
Ann. Phys. (Berlin) 17, No. 12 (2008)
959
whether it is fractal a priori, or there would be no need for analysis), as in principle the autocorrelation
function will in this case never decline to zero; we would always be led to believe that more data are
required to make the distinction [8], which is clearly not true. Another practical problem is that it can be
difficult to robustly determine whether the sample autocorrelogram of a short, noisy time series has indeed
decayed to zero or not. An alternative approach could be to go a step further and conduct a statistical
hypothesis test to directly determine whether the observed log-space linear spectrum is attributable to a
simple, low-order, linear memory process. However, it is unclear how well this would work when the time
series length is too short to sample the slope break in an AR(1) spectrum. Additionally, such tests can
be easy to do badly but challenging or time-consuming to do well2, and cutting corners conceptually or
procedurally in their implementation or interpretation can sometimes be problematic given the perception
of rigour, precision, and objectivity often associated with their results.
It is expressly not my intent to discourage use of any of the foregoing techniques. Rather, I make the
more modest argument that it could be widely useful – particularly in the context of analyzing short, noisy
records generated by complex, open systems – to have a simple and convenient rule of thumb available
for determining: (i) whether N is sufficiently large, and fRay is sufficiently low, to distinguish between
fractal spectra and those outwardly showing 1/f β scaling but generated by a simpler, linear process; and
(ii) if N is too low for this distinction to be made, a ballpark estimate of how much more data would be
needed to do so. Such an expression is suggested in the next section, nominally using the AR(1) process
as a benchmark. As a primary goal in its design, the method is carefully constructed so as to be very
easy to use. Additionally, it does not involve formal hypothesis tests and the complications associated with
them (see footnote 2). The price paid for these attributes is that the outcome does not constitute, in the
conventional and strict sense, a statistically rigorous or definitive result. The technique might be employed
by physics and applied physics practitioners who require only a quick, approximate answer; or similarly,
as a rapid screening method prior to investing the additional time and resources required for implementing
more sophisticated tests.
Before proceeding, a slightly more philosophical point is addressed regarding the suitability and desirability of comparing fractal and low-order linear processes. Many fractal observational spectra show
1/f β scaling down only until some lower cutoff frequency, at which the spectrum becomes white; such
frequency cutoffs to fractal S(f ) are additionally predicted from or explained by theoretical studies [8,10];
and in principle, 1/f β scaling ∀f cannot exist (the so-called ultraviolet and infrared catastrophes3 [8, 11]).
Comparing these observations against Figs. 1 and 2, it might be argued that the differences between fractal
and (for instance) AR(1) spectra are not so important, or that a low-order linear spectrum is just as “fractal”
as many other things we normally call fractal. However, the interpretations (in terms of both underlying
physical process and broader practical implications) are fundamentally different between a low-order linear process showing de facto 1/f β scaling over a narrow high-frequency bandwidth, and a fractal process
2
Under the complications encountered in practise. One example is non-Gaussian distributions, which in general require data
transformations, non-parametric or rank-based tests, or Monte Carlo resampling (randomization or bootstrap) techniques. Another issue is the manner in which hypothesis tests are logically configured, including in particular the question of identifying
the most meaningful hypotheses to test under a particular set of experimental circumstances. Conventional formal statistical
hypothesis testing forces a strictly binary test between specific and pre-selected null (H0) and alternative (H1) hypotheses;
further, the a priori choice of which of the two competing models is assigned null or alternative status largely determines
which must bear the burden of proof (the onus is on the alternative hypothesis to demonstrate its superiority). Thus, “I don’t
know which model is best – the data are insufficient to tell” is not an explicitly permitted answer, yet it would seem the
appropriate answer when records are too short to distinguish the spectra of 1/f β vs. linear low-order processes. Rather, this
“insufficient-data” response is typically but implicitly embedded in a failure to reject whatever one happened to choose as the
null hypothesis. This issue might be problematic in fractal analyses. Classically, the null hypothesis consists of a simple or
no-effect model, which in our case might be a low-order linear memory process, such as the AR(1) model; but in ecology (for
instance) it has been suggested that temporal scale invariance is so common it should be viewed as the routine null model [9].
Similar considerations likely apply to experimental time series from many other fields of study.
3 These limits are in fact extremely liberal, with timescales ranging essentially from the cosmological (e.g., age of the universe)
to the quantum (e.g., time required to cross the Compton wavelength for an electron at the speed of light), and thus likely have
little practical relevance to observational time series analysis [11, 12].
www.ann-phys.org
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
960
S. W. Fleming: On detecting dynamical fractals
showing scale invariance over many (or effectively all) octaves. Further, one might wonder whether some
of the experimental studies (particularly those involving analysis of complex open systems with short observational records) nominally identifying a dynamical fractal with a low-frequency cutoff, might reflect
misinterpretation of a simpler spectrum, particularly given that the low-order linear alternative often remains unmentioned and untested. Conversely, comparisons between more sophisticated traditional time
series models and the fractal hypothesis, though no doubt important in some contexts, seem generally
less meaningful and relevant. For instance, both ARIMA(1,0,1) models, and ffGn generators consisting of
weighted sums of AR(1) processes, can be used to approximate a fractal time series [13], so the differences
are minimal by construction. Overall, then, developing and applying a back-of-the-envelope calculation
to support differentiation between fractal and low-order linear memory processes seems worthwhile and
prudent.
3 An approximate rule for minimum record length
Red noise4 corresponds by definition to autocorrelation, and white noise to serial independence. Hence,
a 1/f β process is autocorrelated over a very wide range of (strictly, all) timescales, whereas the whiteto-red transition in the AR(1) spectrum (the slope break in Fig. 1) corresponds to autocorrelation only
over small lags. The proposed technique involves using the autocorrelation function of an AR(1) process
to develop a simple analytical expression for estimating the dataset length required to yield a sufficiently
broad frequency range for sampling this spectral transition.
The theoretical autocorrelation function of the AR(1) process is [6]:
ρ(τ ) = ατ
(4)
where τ is the lag. Although ρ(τ ) → 0 only as τ → ∞, it typically decays to a negligible value, δ, fairly
rapidly. Define5 τ ∗ = τ |ρ(τ ) = δ, or from Eq. (4):
τ∗ =
ln(δ)
.
ln(α)
(5)
Then f ∗ = 1/τ ∗ is the frequency at which the AR(1) spectrum begins to transition from white noise to
an inverse frequency-scaled spectrum. Clearly, f ∗ is the minimum Rayleigh frequency required to sample
that slope break, and its inverse (τ ∗ ) is the corresponding minimum record length, N ∗ .
To produce a usable rule-of-thumb, a specific value for δ must be selected. There is no obvious single
right number. Also, in practise α must be estimated from the observational dataset (as the sample correlation coefficient between the time series and a lagged copy of itself), and will therefore be subject to some
random sampling variability. Thus, it seems appropriate to set δ conservatively, i.e., to yield an upper-end
τ ∗ (lower-end f ∗ ) estimate. Experimentation suggested δ = 10−30 is a reasonable and robust value in
practise. Corresponding theoretical values of f ∗ for α = 0.4, 0.7, and 0.9 are illustrated on Fig. 1 for
reference.
4
By way of nomenclatural caveats: (i) a wide range of noise colour definitions have been suggested by various authors in
different branches of the scientific literature. Here, I use a simple and parsimonious analogy to the visible light spectrum.
Specifically, “red noise” is used to generically denote a preponderance of signal power at low f and a dearth at high f , and
of course “white noise” to indicate S = g(f ), i.e., a flat spectrum in the mean; (ii) one person’s noise is another person’s
signal. Strictly speaking, then, appropriateness of the term “noise” is context-dependent. Here, I follow convention and use
the term broadly; (iii) although 1/f β components are often referred to as noises, it should be noted that the genetics of fractal
processes may span fundamentally stochastic behaviour; effectively stochastic behaviour arising from deterministic systems
with high degrees of freedom; or fully deterministic behaviour arising from low-DOF but nonlinear processes.
5 This can be viewed as amounting to a measure of the “decorrelation time” (or sometimes “correlation time”). Another commonly invoked linear memory metric is the e-folding time of the autocorrelation function; however, experimentation suggested
that this produces τ estimates far too small (f too high) for reliable practical discrimination between the observational spectra
of fractal and low-order linear processes.
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.ann-phys.org
Ann. Phys. (Berlin) 17, No. 12 (2008)
961
The rule is applied as follows. If observed S(f ) ∝ 1/f β and N N ∗ (or equivalently, fRay f ∗ ),
then in general insufficient data are available to safely determine whether the generating dynamics are
fractal or low-order linear. Further, at an absolute minimum (in fact, more than; see immediately below)
N ∗ − N additional measurements would in general be required to capture the diagnostic white-to-red
spectral transition of a simple autocorrelated linear process. Alternatively, if observed S(f ) ∝ 1/f β and
N N ∗ (or fRay f ∗ ), then the process can (perhaps preliminarily but reasonably) be identified as
fractal.
Note that for clear empirical identification of the slope break in an observed and therefore noisy AR(1)
spectrum, it is obviously inadequate to have S(f ) estimates only over f f ∗ . Rather, N must be sufficiently large that S(f ) extends for some distance on either side of f ∗ , extensively sampling both the white
and red noise domains, hence the N N ∗ and fRay f ∗ constraints above.
An alternative method was additionally developed on the basis of Eq. (3). Given that an obvious criterion
for a rule-of-thumb is maximum operational simplicity, this second approach is felt to be inferior to that
outlined above. However, brief documentation seems appropriate (see Appendix).
4 Applications
4.1 Controlled tests (synthetic data)
Application of the foregoing rule-of-thumb is first explored using numerical experiments. The first two
cases are straightforward; the second two numerical examples consider somewhat more complicated circumstances. For conciseness and consistency, in each case the same two record lengths (short, N = 100;
long, N = 10, 000) are considered.
The first synthetic example (top row, Fig. 3) consists of an AR(1) process with α = 0.7 and ε drawn
from the exponential (Boltzmann) distribution with λ = 0.5 (μ = 1/λ = 2, σ 2 = μ2 = 4). For N = 100,
S(f ) appears to scale as 1/f β . However, for the observed sample autocorrelation coefficient of 0.698,
(N ∗ = 192) > (N = 100); or equivalently, (f ∗ = 4.99 × 10−3 ) < (fRay = 1 × 10−2 ). Thus, had this
time series and its spectrum been encountered observationally, we would know from our simple calculation
that the dataset length is insufficient to determine whether the process is fractal, and that more than N ∗ −N
additional observations would have been required to do so. The observational spectrum for the long record
confirms of course (as N N ∗ ) that the process is non-fractal; note also that estimated f ∗ corresponds
reasonably well to the initiation of the white-to-red spectral transition with increasing f , as expected.
Random sampling variability between the two time series accounts for the slight difference in estimated α
(and therefore N ∗ and f ∗ ) between the N = 100 and N = 10, 000 datasets. Overall, the rule-of-thumb
performs exactly as desired.
The second example is a dynamical fractal consisting of the (fully deterministic) first-order nonlinear
iterative map suggested by Procaccia and Schuster [10, 11]:
xt = xt−1 + x2t−1
mod (1) .
(6)
Equation (6) gives 1/f β spectra with β = 1, i.e., classical “flicker noise.” As in the first example, at
N = 100 scale invariance is observed, but the rule-of-thumb indicates the record length is too short for
reliably discriminating between a genuinely fractal vs. low-order linear generating process (second row,
Fig. 3). At least as importantly, however, at N = 10, 000 1/f β scaling is again apparent and (N ∗ =
385) (N = 104 ), or equivalently (f ∗ = 2.5 × 10−3 ) (fRay = 10−4 ). Thus, had this latter time
series and its spectrum been encountered experimentally, and one did not know (as would often be the case
in experimental practise) its fundamental character and origins a priori, application of the proposed simple
construction would have correctly informed us that the dataset is indeed sufficiently long to identify the
process as fractal. Again, the suggested record length constraint performs as planned.
The third example considers a slightly more complicated scenario. Although cellular automata were
devised and refined by von Neumann and others, the subsequent Bak–Tang–Weisenfeld (sandpile) CA
www.ann-phys.org
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
962
S. W. Fleming: On detecting dynamical fractals
Fig. 3 Numerical tests for rule-of-thumb. Grey dashed line illustrates ln(f ∗ ).
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.ann-phys.org
Ann. Phys. (Berlin) 17, No. 12 (2008)
963
[14, 15] played an important role specifically in the field of fractal dynamics, by helping to phrase the
question in terms of self-organized criticality and emergent behaviour. A BTW cellular automaton was
constructed6 following Turcotte [16], including use of the average number of particles within the grid at
the end of each time step as the output time series. Application of the rule-of-thumb (third row, Fig. 3)
yields anomalous results, with very large N ∗ and very low f ∗ . This is easily explained but illustrates an
important point. As in other studies (see [16] for a review), this CA was found to give 1/f β noise with
β = 2, which is consistent with a random walk:
xt = xt−1 + εt .
(7)
Equation (7) is in turn simply Eq. (1) with α = 17. Thus, in a highly restricted but important sense, there is
no ambiguity between an AR(1) process and a random walk, and there is neither need nor applicability for
our rule-of-thumb when β = 2. The range of application is therefore evidently limited to 0 < α < 2. Nevertheless, many observed and interesting 1/f β spectra fall within this range, including the quintessential
case of 1/f flicker noise discussed above8.
Our overall approach is to develop a very easily applied scoping calculation, by assuming that an AR(1)
model can be used as a benchmark to approximately but adequately represent low-order linear memory
processes in general. The AR(1) process was specifically selected because it is generally meaningful and
relevant, and has previously been used as a general linear memory parameterization (Sect. 2); and because
it readily yields the desired straightforward construction (Sect. 3). In our fourth example, we assess how
sensitive the results are to that assumption by considering a time series generated by an autoregressivemoving average (ARMA) process having second-order autoregressive, and first-order moving average,
components:
xt = α1 xt−1 + α2 xt−2 + θ1 εt−1 + εt
(8)
and then applying the rule-of-thumb, again pretending that we do not know the actual nature of the dataset.
Driving noise was drawn from the standard Lorentzian (Cauchy) distribution, i.e., with location and shape
parameters of null and unity respectively. A few ARMA(2,1) model parameter values were explored. The
results illustrated in Fig. 3 (bottom row) seem generally representative, and use α1 = 0.5, α2 = 0.2,
and θ1 = 0.3. The performance of the rule-of-thumb is essentially identical to that for an AR(1) process
(Fig. 3, top row). Presumably, a parameterization for an ARMA(2,1) (or another, non-AR(1), low-order
linear memory process) could be contrived such that our approximate calculation might fail. However, the
results provide some reassurance that minor to moderate violations of its underlying assumptions do not
automatically incur disaster.
4.2 Uncontrolled tests (observational data)
Small N is frequently encountered in the passive observation and subsequent analysis of the net dynamics generated by large-scale open systems consisting of multiple, often nonlinearly interacting processes.
There are two main reasons for this. First, and most fundamentally, we must make do with whatever data
happen to have been collected to date, and this is often not much. Second, it is commonly useful or necessary to manipulate the data to isolate a characteristic or phenomenon of interest, but in such a way
Avoiding finite-size effects over the full frequency range effectively spanned by a N = 104 time series required a large grid;
400 × 400 proved sufficient. Note that this in turn seemed to require a long burn-in time before a representative “observed”
time series could be extracted.
7 For which τ ∗ is undefined. In practice, the sample autocorrelation coefficients were very slightly less than unity, giving very
large but non-infinite τ ∗ .
8 This limitation may further diminish in importance if one chooses not to view the random walk as a fractal, its 1/f β spectrum
notwithstanding. It is perhaps a matter of preference whether fractional and classical Brownian motions are regarded as distinct
process classes, or instead the former as a general theory with the latter as a special case. There appears to be precedence in
the literature for both interpretations.
6
www.ann-phys.org
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
964
S. W. Fleming: On detecting dynamical fractals
(e.g., by aggregation and/or downsampling) that the record length available is further reduced (though not,
of course, the actual timeframe spanned by it); or alternatively, the measured quantity only has meaning at a low sampling frequency. River flow data clearly exemplify some of these issues. Streamflow
records are rarely more than a century in duration, and typically very much less, particularly outside of
Europe. Further, streamflow data are usually available as a daily time series, but for design purposes an
engineer will often perform flood frequency analysis using a yearly maximum daily discharge record,
(max)
= {sup(Qj,i=1,365 ), j = 1, N } where Qj,i is measured streamflow on the ith day of the j th year,
Qj
and N is the number of complete years of record and thus the length of the final Q(max) time series;
whereas for assessing fish habitat availability changes, a biologist will often consider a record of annual
k=i+3
(7)
(MA)
(MA)
= (1/7)
Qj,k . These
minimum 7-day mean flows, Qj = {inf(Qj,i=1,365 ), j = 1, N } with Qj,i
k=i−3
two metrics not only present different practical implications, but also speak to different sets of physical conditions and processes, long-term (interannual or interdecadal) variabilities in which are often difficult to
study by considering the raw continuous daily flow time series. The peak flows captured by Q(max) reflect
a wide variety of runoff-generating mechanisms, whereas the low flows captured by Q(7) will typically
be supported solely by watershed storage depletion; day-to-day terrestrial hydrologic variations are often
driven by different dynamics than year-to-year variations; and the annual flood and drought will in most
regions occur in different seasons, which has consequences for climate variability and change impact analysis. El Niño-Southern Oscillation (ENSO), for instance, is phase-locked to the annual cycle with northern
hemisphere teleconnections generally being strongest in winter, coinciding with the freshet peak (and not
late-summer low flows) in rainfall-driven rivers. Other important and widely used water quantity and quality metrics most (or indeed only) meaningful on a yearly sampling frequency, though typically derived
from high-frequency data, include freshet timing, total annual volume, and the calendar date on which a
certain number of degree-days (corresponding to salmon alevin emergence, for instance) has been seasonally accumulated. The list goes on, and similar lists can be drawn for other fields of study, particularly but
by no means exclusively in the geophysical and ecological sciences. The net effect of limited historical
records and de facto downsampling to a derived quantity is often a time series length about on par with
the smaller values of N ∗ seen in the numerical examples above. A third general cause of (effectively)
short records is the use of data windowing in the analysis process, to track longer-term variations in the
dynamical characteristics of the dataset.
Three observational examples of the foregoing general types are considered. It is to be understood that
the purpose is solely to illustrate practical application of the proposed approximate expression for minimum record length, and although a few words of context are provided in each case, no attempt whatsoever
is made to thoroughly study these datasets or the dynamical systems they capture. Time series were linearly detrended and standardized to zero mean and unit variance prior to spectral analysis. Because we are
not strongly concerned in these examples with the precise value of the estimated fractal dimension, and
also because it is unclear whether all of the particular observational time series considered are defined for
timescales shorter than the sampling frequency, correction for the β underestimation due to aliasing [17]
that can occur under certain circumstances is not undertaken here (though the general value of such corrections is in no way questioned). Application of our back-of-the-envelope calculation involves only obtaining
an estimate for the lag-1 linear serial correlation coefficient (readily accomplished in a spreadsheet application, for instance), and plugging this value into Eq. (5); the task can be completed in such a manual fashion
in under a minute. The spectral analyses presented here were performed using a Matlab script written for
the purpose, and within which the N ∗ estimation procedure was readily automated. This convenience is a
key purpose and advantage of the technique.
The steelhead is a variety of rainbow trout (Oncorhynchus mykiss) in Pacific coastal North America, including British Columbia, Canada. Though not formally considered a distinct subspecies, steelhead show
a number of significant differences, most notably in terms of life history: like Pacific salmon, steelhead
are anadromous, splitting their lives between fresh and salt water environments. Population size variations
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.ann-phys.org
Ann. Phys. (Berlin) 17, No. 12 (2008)
965
arise from geophysical, ecological, and anthropogenic sources, such as the impacts of coherent modes of
atmosphere-ocean circulation (climate) variability upon both the fresh water and marine phases of steelhead life history, internal ecological system dynamics, habitat degradation or restoration, and accidental
by-catch in the commercial salmon fishery. Through parts of its range, steelhead populations have collapsed. As noted earlier, fractal scaling is commonplace in ecological time series. Its general implications
for population variability and crashes thus seem potentially relevant to British Columbia steelhead stock
fluctuations, but the latter question does not appear to have been broached. For obvious reasons, fish in
general (and perhaps anadromous and migratory species like salmon and steelhead in particular) are not
easy to track and count. Catch data are therefore often used as an approximate but useful metric of relative variations in population over time. For both fundamental and practical reasons,9 data are normally
collected, presented, and/or only meaningful as an annual time series. With no commercial harvest, steelhead catches occur in the (limited and heavily regulated) freshwater recreational fishery and are monitored
primarily using surveys. Here I consider 1968–2002 wild (i.e., excluding hatchery) steelhead annual catch
per angler-day10 [18], as inferred from data made openly available online for the Capilano River [19], a
south coastal British Columbia watershed under significant anthropogenic environmental pressures. Results (Fig. 4, top row) indicate 1/f β spectral behaviour, but N ∗ = 86 whereas N = 35. Thus, the apparent
power-law scaling over the sampled bandwidth is consistent with both fractal dynamics and a low-order
linear memory process, the dataset length being insufficient to discriminate between the two potential
source types. As the criterion for record length sufficiency is N N ∗ , the present exploratory results suggest that we could need well over a century of steelhead population data to reliably make that distinction.
Clearly, the example is provided only for methodological demonstration purposes, and more work would
be required before reaching any firm conclusions; nevertheless, the result may serve as a poignant reminder
of the central importance of long-term monitoring to the science and successful management of complex
environmental systems [20].
Total annual flow volume is a commonly used measure of the overall water resource productivity of a
river basin. Loosely speaking, rivers show variability at daily, seasonal, and interannual timescales, each
with somewhat different (though often overlapping) controls. Time-integration of high-frequency streamflow data over the course of a year is such that this metric speaks directly to total water supply availability,
irrespective of the detailed dynamics of individual meteorological events, terrestrial hydrologic responses
to those events, or seasonal timing variations. If the catchment is pristine, or if the data have been suitably
processed to account for time-variation in watershed-specific anthropogenic flow alterations, then annual
flow volumes in general reflect the net water supply effects of longer-term climatic variability and change.
River flows have long been the subject of fractal analyses, including the early work of Hurst and Mandelbrot. Here, we consider annual flow volumes of the Columbia River as measured at The Dalles, Oregon.
The Columbia is listed as the world’s 18th largest river by flow [21], and its flow variability has strong
implications for water supply, fisheries, flooding, and hydropower generation through large areas of southwest Canada and the northwest contiguous US. The river is also the subject of international treaty by virtue
of its transboundary geography. Data were obtained from a Bonneville Power Administration report [22]
which corrected 1929–1998 historical observations to conditions in 2000, dynamically compensating for
such anthropogenic effects as water withdrawals, land use change, and damming. The results (Fig. 4, second row) indicate 1/f β spectral behaviour. Application of the proposed calculation for minimum record
length suggests that this dataset lies in somewhat of a grey area, with N > N ∗ but not N N ∗ . Such
are the limitations of a simple rule-of-thumb, its advantages notwithstanding. Nevertheless, on the basis
of these results and the large body of prior work providing widespread evidence for fractal dynamics in
9
For steelhead and salmon, these might include a number of related issues: the discrete-time nature of biological generations;
phase-locking of multi-year anadromous salmonid life cycles to the seasonal cycle, and in particular the approximately fixed
seasonal timing of the return to fresh water to spawn; the generally seasonal nature of the harvest and therefore of catch data,
perhaps especially for recreational freshwater steelhead angling; and the resources involved in generating each set of official
statistics.
10 Generally similar results were obtained considering time series of wild steelhead catch numbers, unadjusted for angler effort.
www.ann-phys.org
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
966
S. W. Fleming: On detecting dynamical fractals
Fig. 4 Observational examples. Notation follows that of Figs. 2 and 3.
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.ann-phys.org
Ann. Phys. (Berlin) 17, No. 12 (2008)
967
river flows, it seems reasonable to conclude that there is some tentative observational support for scale
invariance in the total annual flow volume time series for the Columbia River.
Financial time series have similarly long been a focus for fractal studies. A sufficiently large body of
work has been accumulated to form some firm generalizations regarding scale invariance in stock prices
and price volatilities [23]. Specifically, daily prices, p, can be approximated as a geometric random walk,
ln(pt ) = ln(pt−1 ) + εt ; thus, the power spectrum of ln(p) scales as 1/f β with β = 2, and the return
series11, rt = ln(pt ) − ln(pt−1 ), is a white noise process. This default model additionally holds that the
volatility series, σt = |rt |, is generally fractal with a value of β which varies between datasets. However,
many questions remain. For example, the usefulness of the foregoing random walk view notwithstanding
– it underlies the Black-Scholes options pricing model, for instance – both the model and the efficient
market hypothesis to which it is related are known to be significantly incomplete descriptions of stock
market dynamics. Additionally, the fractal characteristics of commodity markets have been less exhaustively studied, and work on electricity prices in deregulated markets, for instance, suggests that they may
follow a different model [25]. Another question might be whether, for a given stock just introduced to the
market, its volatility is fractal right out of the starting gate, or some spin-up required. This latter question
is preliminarily explored here, in an interesting toy problem intended solely as a final demonstration of
the proposed rule-of-thumb. Details of the particular company considered are not strongly relevant for the
purposes of this article; it was selected simply because the author happened across free, publicly available,
open-access information necessary to explore the question posed (specifically, the date of its initial public
offering, and daily closing prices extending back to that IPO). These price data were processed to form a
daily volatility series according to the above definition. Spectral analysis of the first 90 days12 of volatility
data suggests 1/f β scaling, but the rule-of-thumb indicates that a much longer time series is required in
this case to distinguish between fractal vs. low-order linear memory processes (Fig. 4, 3rd row). Considering instead the first 365 days13 of volatility data, N N ∗ and we can clearly identify fractal dynamics
(Fig. 4, bottom row). Overall, the proposed rule-of-thumb seems useful for approximately delineating the
shortest window appropriate for distinguishing dynamical fractals: in this case, the first quarter appears too
brief a period of time to resolve true 1/f β behaviour in the volatility of this stock, but the scale invariance
generally typical of stock price volatility seems unambiguously evident by the end of the first full year of
trading.
5 Conclusions
An analytical expression and decision rule is suggested for assessing whether an observational record is
sufficiently long to discriminate between fractal scaling and low-order linear memory on the basis of its
numerical power spectrum. The technique is crafted to maximize simplicity of operational implementation.
It is anticipated that its primary use is in circumstances where an approximate estimate will suffice, or as
a pre-screening technique prior to investing further time and resources in more detailed analyses. Tests on
synthetic datasets demonstrate that the technique functions as intended; and further, that reasonable violations of the assumptions employed to achieve a simple configuration do not appear to severely diminish
performance. Observational examples are presented which help demonstrate the range of circumstances in
which the rule-of-thumb might prove useful, and illustrate its application.
11 This return definition is standard in econophysics, and is effectively the same as that more broadly employed in finance and
business if day-to-day price changes are small relative to the price itself. This criterion is satisfied for many financial time
series. The volatility definition used here is also standard in econophysics, and is equivalent to that more broadly employed in
finance (the standard deviation of returns over some finite moving window) if the window length is taken to be unity [24].
12 Unlike the foregoing observational examples, where N was fixed by data availability, the precise choice of N here is somewhat
arbitrary. A variety of values were experimented with and gave mutually consistent results, but for presentation purposes, 90
and 365 days were selected as they intuitively reflect a fiscal quarter and year, respectively. Of course, the correspondence is
far from exact, because the initial public offering did not occur at the start of a fiscal year, and because time is defined here [22]
in terms of consecutive trading (rather than calendar) days.
13 Similar results were obtained using all the available daily volatility data, spanning > 3 yr.
www.ann-phys.org
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
968
S. W. Fleming: On detecting dynamical fractals
Appendix
In essence, the question is to estimate the minimum Rayleigh frequency required to sample the white-tored transition in the noise spectrum of a low-order linear process. The proposed rule-of-thumb provides an
answer on the basis of a theoretical autocorrelation function. This is possible because serial dependence
and red noise are essentially synonymous, and thus so too are the decay of the autocorrelation function
and the red-white spectral transition. An alternative approach is to determine f ∗ directly from the slope
break in the log-space representation of the theoretical AR(1) spectrum. Taking the logarithm of Eq. (3)
and substituting x = ln(f ) gives:
(A.1)
ln(S) = ln σx2 (1 − α2 ) − ln π − 2πα cos [2π exp(x)] + πα2 .
Using d[ln(u)] = u−1 du and reversing the substitution gives the local log-space slope of the AR(1) spectrum, i.e., an explicit form for the first derivative of the curves shown in Fig. 1:
d[ln(S)]
4παf sin(2πf )
=
.
d[ln(f )]
2α cos(2πf ) − α2 − 1
The slope break then occurs at:
d[ln(S)]
= δf
f ∗ = f d[ln(f )]
(A.2)
(A.3)
where δf is an arbitrarily small number, akin to δ in Eq. (5), and arising for essentially the same reason:
d[ln(S)]/d[ln(f )] → 0 only as f → 0. Again, N ∗ = 1/f ∗ , and α would in practise be estimated as the
sample lag-1 serial correlation coefficient of the observed dataset. However, the need to set a value for δ f
makes this approach no more or less objective than that proposed earlier. Further, Eq. (A.2) and (A.3) are
more cumbersome to use than the utterly simple Eq. (5).
References
[1] M. Seiffert, A. Mennella, C. Burigana, N. Mandolesi, M. Bersanelli, P. Meinhold, and P. Lubin, Astron. Astrophys. 391, 1185 (2002).
[2] C. Booy and D. R. Morgan, Can. J. Civ. Eng. 21, 150 (1985).
[3] B. Pilgram and D. T. Kaplan, Physica D 114, 108 (1998).
[4] S. W. Fleming, Can. J. Phys. 85, 279 (2007).
[5] H. von Storch and F. W. Zwiers, Statistical Analysis in Climate Research (Cambridge University Press, Cambridge UK, 1999).
[6] C. Chatfield, The Analysis of Time Series 5th edn. (Chapman and Hall, London, 1996).
[7] S. W. Fleming, Acta Geophys. 55, 359 (2007).
[8] J. Theiler, Phys. Lett. A 155, 480 (1991).
[9] J. M. Halley, Trends Ecol. Evol. 11, 33 (1996).
[10] I. Procaccia and H. Schuster, Phys. Rev. A 28, 1210 (1983).
[11] E. Milotti, 1/f Noise: A Pedagogical Review (arXiv:physics/0204033v1 [physics.class-ph], 2002).
[12] I. Flinn, Nature 219, 1358 (1968).
[13] R. L. Bras and I. Rodrı́guez-Iturbe, Random Functions and Hydrology (Dover, New York, 1993).
[14] P. Bak, C. Tang, and K. Wiesenfeld, Phys. Rev. Lett. 59, 381 (1987).
[15] P. Bak, C. Tang, and K. Wiesenfeld, Phys. Rev. A 38, 364 (1988).
[16] D. L. Turcotte, Rep. Prog. Phys. 62, 1377 (1999).
[17] J. W. Kirchner, Phys. Rev. E 71, 066110-1 (2005).
[18] B. D. Smith, B. R. Ward, and D. W. Welch, Can. J. Fish. Aquat. Sci. 57, 255 (2000).
[19] BC Conservation Foundation, Greater Georgia Basin Steelhead Recovery Plan, Capilano River Watershed
(http://www.bccf.com/steelhead/r2-focus3.htm#cap; accessed March 31, 2008).
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.ann-phys.org
Ann. Phys. (Berlin) 17, No. 12 (2008)
969
[20] E. Nisbet, Nature 450, 789 (2007).
[21] L. B. Leopold, A View of the River (Harvard University Press, Cambridge USA, 1994).
[22] Bonneville Power Administration, 2000 Level Modified Streamflow, 1928–1999 (BPA, Portland, OR, USA,
2004).
[23] R. N. Mantegna and H. E. Stanley, An Introduction to Econophysics: Correlations and Complexity in Finance
(Cambridge University Press, Cambridge UK, 2000).
[24] G. Qiu, D. Kandhai, and P. M. A. Sloot, Phys. Rev. E 75, 046616 (2007).
[25] R. Weron, Measuring Long-Range Dependence in Electricity Prices (arXiv:cond-mat/0103631v1
[cond-mat.stat-mech], 2001).
www.ann-phys.org
c 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Документ
Категория
Без категории
Просмотров
0
Размер файла
443 Кб
Теги
experimentov, length, approximate, constraint, record, dynamical, fractals, identification
1/--страниц
Пожаловаться на содержимое документа