close

Вход

Забыли?

вход по аккаунту

?

asmb.536

код для вставкиСкачать
APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/asmb.536
Intrinsic Kriging and prior information
E. Vazquez1,*,y, E. Walter1,z and G. Fleury2,}
1
Laboratoire des Signaux et Syste"mes, CNRS-Supe!lec-Universite! Paris-Sud, 91192 Gif-sur-Yvette, France
2
Supe!lec, Service des Mesures, 91192 Gif-sur-Yvette, France
SUMMARY
Kriging, one of the oldest prediction methods based on reproducing kernels, can be used to build black-box
models in engineering. In practice, however, it is often necessary to take into account prior information to
obtain satisfactory results. First, the kernel (the covariance) can be used to specify properties of the
prediction such as its regularity or the correlation distance. Moreover, intrinsic Kriging (viewed as a semiparametric formulation of Kriging) can be used with an additional set of factors to take into account a
specific type of prior information. We show that it is thus very easy to transform a black-box model into a
grey-box one. The prediction error is orthogonal in some sense to the prior information that has been
incorporated. An application in flow measurement illustrates the interest of the method. Copyright # 2005
John Wiley & Sons, Ltd.
KEY WORDS:
black-box models; grey-box models; Kriging; regularization; semi-parametric models
1. INTRODUCTION
Black-box models are used in engineering to predict the behaviour of non-linear systems or
processes that are hard to describe based on physical knowledge. They can also be used when
the computational cost of a physical model is too high for the intended use. Building a blackbox model can be viewed as an approximation or interpolation problem. The aim is to predict
the output f ðxÞ of the system (where x 2 Rd is a vector of inputs, or factors) from a finite set of
observation data fxi ; f ðxi Þgni¼1 :
Several techniques can be used to approximate or interpolate the data. The framework of
reproducing kernel Hilbert spaces (rkhs) is very useful for this task and is shared by four methods
that have proven to be efficient in black-box modelling, namely radial basis functions networks
(RBF) [1], splines [2], support vector machines (SVM) [3, 4] and Kriging [5, 6] (also known under
the name of Gaussian processes [7]). The rkhs framework is also strongly connected with
regularization theory. However, despite well-known connections between these techniques
*Correspondence to: E. Vazquez, Laboratoire des Signaux et Syst"emes, CNRS-Sup!elec, Universit!e Paris-Sud, 91192 Gifsur-Yvette, France.
y
E-mail: vazquez@lss.supelec.fr
z
E-mail: walter@lss.supelec.fr
}
E-mail: gilles.fleury@supelec.fr
Copyright # 2005 John Wiley & Sons, Ltd.
216
E. VAZQUEZ, E. WALTER AND G. FLEURY
[2, 8, 9], they are not equivalent in practice. A first recurrent problem is the choice of a suitable
kernel for a given application. We wish to show that the framework of Kriging is well suited to
this task, because of its probabilistic foundations (the kernel is seen as a covariance, which can
be estimated from experimental data), and because of the large body of knowledge accumulated
by the geostatisticians who have created, developed and used Kriging during the last 40 years.
A second very important problem forms the main topic of the present paper. It appears that
totally black-box models are often not good enough in applications, because they fail to take
into account some prior knowledge. We point out a simple yet efficient technique based on a
semi-parametric formulation of Kriging and an extended set of factors. This semi-parametric
formulation is well known in kernel theories but does not seem to have been exploited in
engineering applications. The next section will briefly recall the basics of Kriging and more
specifically intrinsic Kriging.
2. INTRINSIC KRIGING
2.1. Regularization and the probabilistic viewpoint
Assume F ðxÞ is a random process with zero mean, which serves to model an output depending
on a vector x of factors. Assume also that F ðxÞ 2 L2 ðO; A; PÞ for all x; so F ðxÞ is a second-order
random process. A Kriging estimator is the best linear approximation of F ðxÞ in the space
generated by the random variables of F ðx1 Þ; . . . ; F ðxn Þ: The data f ðxi Þ are assumedPto be sample
values of F ðxi Þ; i ¼ 1; . . . ; n: More precisely, a linear combination F# ðxÞ ¼ ni¼1 li;x F ðxi Þ
is sought for, such that jjF# ðxÞ F ðxÞjj2 is minimized. Here, the norm jj:jj is deduced from the
classical scalar product ðX ; Y Þ ¼ E½XY in the probabilistic space L2 ðO; A; PÞ: We shall see later
how this estimator can be obtained, based on orthogonal projection.
The link between Kriging, rkhs and regularization theory is now well established, with the
first studies in the context of splines [9, 10]. Let F be a Hilbert space of real functions on a
bounded domain X Rd and A be a continuous linear operator on F: Define an interpolant as
the solution of the following variational problem:
min
Li f ¼f ðxi Þ; 8i2f1;...;ng
jjAf jj
ð1Þ
where the Li ’s are continuous linear functionals, in practice evaluation functionals, and jj:jj is the
L2 ðXÞ norm. The solution of this constrained problem is unique [10] if
kerðAÞ \ kerðLi Þ ¼ 0;
8i 2 f1; . . . ; ng
ð2Þ
(This means that a function f 2 kerðAÞ must be uniquely determined by the constraints Li f ¼
f ðxi Þ; and generally kerðAÞ is a set of low-degree polynomials.)
Minimizing jjAf jj is equivalent to enforcing some regularity condition on the solutions. The
regularization operator A is chosen based on mathematical considerations [11], jjAf jj
corresponding in practice to a norm in an rkhs (also called the feature space in support vector
regression). In the framework of splines, a norm on the derivatives of f is chosen in order to
penalize variations of the interpolant. The rkhs may then be a Sobolev space. However,
choosing a type of regularization without considering the nature of the process to be modelled is
what Kriging makes it possible to avoid. This can be achieved using the existence of a random
process F ðxÞ; such that the Hilbert space spanned by F ðxÞ is isomorphic to the space of functions
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
INTRINSIC KRIGING
217
F0 ¼ ff 2 F; such that jjAf jj2 51g\kerðAÞ [9]. Hence, there is a one-to-one correspondence
between a given rkhs (a feature space) and a probabilistic space. As a consequence, the feature
space inherits the characteristics of the probabilistic space, in the sense that the properties of the
functions f 2 F0 are described by the model F ðxÞ: In the framework of Kriging, the kernel,
written as Rðx; yÞ; has a probabilistic interpretation as the covariance of F ðxÞ and F ðyÞ: The
structure of the kernel thus indicates how two values of the output of the process are correlated
depending on the values taken by the factors. For instance, radial kernels written as Rðjjx yjjÞ
correspond to assuming the stationarity and isotropy of F ðxÞ:
2.2. Relevant kernels for prediction
We shall address this topic briefly, with examples illustrating the influence of the choice of a
covariance on the prediction. Most kernels considered in the literature are isotropic and
invariant by translation. Since a kernel can be identified to a covariance, it is also required that
the kernel be a positive function, which means that, for any integer m; sets of reals fa1 ; . . . ; am g;
and vectors fx1 ; . . . ; xm g;
m
X
ai Rðxi ; xj Þaj 50
ð3Þ
i; j¼1
Most often, because the actual covariance of the data is unknown, the user falls back on
classical structures of covariance comprising some parameters that can be adjusted in order to
adapt the covariance model to the data generating process. Nevertheless, a model of covariance
must be chosen with care and the kernel must be relevant to describe the data. For instance, the
popular Gaussian covariance is far from being adapted to every type of data: problems may
arise, for instance, due to too smooth interpolations, wrong confidence intervals, numerical
instabilities. Stein [12] stresses the importance of taking care of the behaviour of the covariance
near the origin, which is linked with the quadratic-mean regularity of the random process. For
instance, if the covariance is continuous at the origin, then the process will be continuous in
quadratic mean. The Mate!rn covariance makes it possible to adjust regularity with only one
parameter. Stein advocates the use of the following parameterization of the Mat!ern model:
1=2 n
1=2 s2
2n h
2n h
Rðjjx yjjÞ ¼ RðhÞ ¼ n1
Kn
ð4Þ
r
r
2 GðnÞ
where Kn is the modified Bessel function of the second kind. This parameterization allows an
easy interpretation of the parameters, as n controls regularity, s2 is the variance ðRð0Þ ¼ s2 Þ; and
r represents the range of the covariance, i.e. the characteristic correlation distance. Figure 1
shows the influence of n: Under the hypothesis that F ðxÞ is Gaussian, the parameters n; s2 ; r can
be adjusted to the data by an automatic procedure, such as maximum likelihood, which is a
natural route in the probabilistic context of Kriging. The maximum-likelihood estimate of the
vector h of the parameters of the covariance is obtained by maximizing the log-likelihood
function
n
1
1
lðhÞ ¼ logð2pÞ log det RðhÞ f > RðhÞ1 f
2
2
2
ð5Þ
where n is the number of observed output data, RðhÞ is the covariance matrix of the vector
½F ðx1 Þ; . . . ; F ðxn Þ> ; and f ¼ ½f ðx1 Þ; . . . ; f ðxn Þ> is the vector of observed outputs.
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
218
E. VAZQUEZ, E. WALTER AND G. FLEURY
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−1.5
−1
−0.5
0
0.5
1
1.5
2
Figure 1. Mat!ern covariances for the parameterization (4) with r ¼ 0:5; s ¼ 1: Solid line corresponds to
n ¼ 0:5; dashed line to n ¼ 1 and dotted line to n ¼ 3:
When the output to be modelled is corrupted by noise, this noise is taken into account by
modifying the covariance. The simplest case is when the noise is white; the covariance of the
data then possesses a discontinuity at the origin. More generally, it is interesting to apply the
Kriging methodology concerning the choice of a kernel to other rkhs methods such as support
vector regression (SVR), as illustrated by Figure 2.
2.3. Why intrinsic Kriging?
The stationarity hypothesis seems inadequate for describing systems that are as simple as linear
systems, for instance. More generally, the output of systems found in engineering are often
smooth functions probably well described by piecewise polynomials or exponentials, and
splines-based models may then be well adapted.
As stated in References [9, 10], intrinsic Kriging [14] and splines are formally equivalent. This
is one of the reasons why we are using the intrinsic version of Kriging to benefit from its
probabilistic framework. The main idea of intrinsic Kriging is to assume that differences such as
F ðxÞ F ðyÞ are (weakly) stationary, whereas F ðxÞ could be non-stationary, as for a Brownian
motion for instance. The stationarity of differences implies that there exist functions mðhÞ; h 2 Rd
and gðhÞ; called variogram, such that
E½F ðxÞ F ðyÞ ¼ mðx yÞ
ð6Þ
var½F ðxÞ F ðyÞ ¼ 2gðx yÞ
ð7Þ
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
219
INTRINSIC KRIGING
0.35
1.2
0.3
1
SVR
margins
data points
support vectors
0.8
0.25
0.6
0.2
0.4
0.15
0.2
0.1
0
0.05
−0.2
−0.4
0
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
0.5
1
1.5
2
2.5
3
3.5
4
Figure 2. Kriging lore applied to SVR in a one-factor case. The left plot shows a Mat!ern covariance with
white noise whose parameters have been estimated by maximum likelihood from the data of the right plot
(crosses). The circle at the origin of the covariance materializes the discontinuity due to additive white noise
on the output. This kernel has then been used in support vector regression applied to the same data. The
resulting prediction is provided on the right plot (thick line). The dotted lines indicate the margins of the
SVR, chosen according to Vazquez and Walter [13].
It is straightforward to note that these differences will filter out any constants, and in particular
an unknown constant mean of F ðxÞ: More generally, any finite linear combination of F ðxÞ
written as
4
F ðlÞ ¼
n
X
li F ðxi Þ
such that
i¼1
n
X
li ¼ 0
ð8Þ
i¼1
with also filter out the unknown mean of F ðxÞ: Even more generally, a linear combination
4
F ðlÞ ¼
n
X
li F ðxi Þ
ð9Þ
i¼1
4
will filter out any monomial xl (since x 2 Rd ; l is a multi-index ðll ; . . . ; ld Þ and xl ¼ xl11 xl22 xldd Þ
provided that l satisfies the constraint
n
X
li xli ¼ 0
ð10Þ
i¼1
P
In (8) and (9), l is a finite-support measure that can be written as l ¼ i li dxi ; using the
Kronecker symbol. The set of measure l satisfying (10) for all basis functions xl up to degree k
(that is, l1 þ l2 þ þ ld 4k) will be denoted by Lk : It follows that Lkþ1 Lk : F ðlÞ will be said
to be (weakly) stationary if its second-order moments are invariant by translation, and in
particular
covðF ðlÞ; F ðmÞÞ ¼ covðF ðth lÞ; F ðth mÞÞ
ð11Þ
P
where th denotes the translation operator ðth l ¼ i li dxi h Þ: If F ðlÞ is stationary, then F ðxÞ is
called an intrinsic random function (IRF).
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
220
E. VAZQUEZ, E. WALTER AND G. FLEURY
Intrinsic Kriging builds the best linear predictor of F ðxÞ [14], written as
n
X
F# ðxÞ ¼
li;x F ðxi Þ
ð12Þ
i¼1
P
under the constraint that the error of approximation of F# ðxÞ F ðxÞ ¼ ni¼1 li;x F ðxi Þ F ðxÞ;
seen as a linear combination of P
F ð:Þ; satisfies a filtering equation similar to (10) (which was
written for a linear combination ni¼1 li F ðxi ÞÞ: That is, 8l4k
n
X
li;x xli xl ¼ 0
ð13Þ
i¼1
Equation (13) expresses the orthogonality of the error F# ðxÞ F ðxÞ with respect to the functions
xl [14], which is chosen as an optimality criterion in a linear prediction framework. Owing to
this fundamental property, it becomes possible to make the error of approximation orthogonal
to any prior information that can be expressed using the basis functions xl ; as will be shown in
Section 3. Intrinsic Kriging can also be used to deal with a possibly non-stationary random
process with an unknown polynomial mean, that is, such that
X
E½F ðxÞ ¼ mðxÞ ¼
bl xl
ð14Þ
l
with unknown coefficients bl :
2.4. Implementation issues
In the context of intrinsic Kriging, the covariance is no longer necessarily positive. Instead, one
looks for a function Rðx; yÞ; called a generalized covariance of order k; where the order k of Lk
has to be specified. This generalized covariance allows (11) to be evaluated for any ðl; mÞ 2 L2k as
X
covðF ðlÞ; F ðmÞÞ ¼
li Rðxi ; xj Þmj
ð15Þ
i;j
Since F ðlÞ is stationary, Rðx; yÞ is actually a function of h ¼ x y: Rðx; yÞ is said to be
conditionally positive because it is only required that the variance of linear combinations of the
type F ðlÞ; l 2 Lk ; be a positive quantity, i.e.
X
li Rðxi ; xj Þlj ¼ var½F ðlÞ50
ð16Þ
i;j
For instance, RðhÞ ¼ gðhÞ; with g defined by (7), is a general covariance of order 0. Condition
(16) is weaker than positivity, and therefore, a more general class of kernels/covariances can be
used besides classical covariances. Polynomial kernels turn out to be a useful class of generalized
covariances [14]. Given the order k; they can be written as
Rðx; yÞ ¼
k
X
ð1Þpþ1 ap jjhjj2pþ1 ;
h¼xy
ð17Þ
p¼0
This very simple expression is linear in its parameters ap : For example, intrinsic Kriging based
on a covariance written as a0 jjhjj gives a piecewise-linear interpolation. Such polynomial
covariances are therefore well adapted to describing data that do not specifically show a
stationary behaviour. The regularity of the interpolation can be tuned by the user by adjusting
the coefficients ap but it is generally preferred to rely on automatic procedures such as maximum
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
INTRINSIC KRIGING
221
likelihood. In the probalistic context of intrinsic Kriging, restricted maximum likelihood
(REML, see Reference [15]) can be used, where maximum likelihood is applied to the differences
of the data rather than to the data themselves. A trivial but lengthy computation leads to a more
complex log-likelihood cost function to be maximized, which can be written as
np
1
1
logð2pÞ log det RðhÞ log det R* ðhÞ
lðhÞ ¼ 2
2
2
1 >
v ðRðhÞ1 RðhÞ1 OR* ðhÞ1 O> RðhÞ1 Þv
ð18Þ
2
This result is stated without proof in Reference [12]. In (18), n is the number of observed output
data, p is the number of basis functions xl for l4k; O is an n p matrix whose ðj; lÞth entry is
xlj ; R* ðhÞ ¼ O> RðhÞ1 O and the vector v is obtained after transformation of the data as v ¼
ðI OðO> OÞ1 O> Þf:
Once the covariance has been chose, F ðxÞ can be estimated based on the observations. The
squared norm of the error of approximation
2
X
2
2
#
se ¼ jjF ðxÞ F ðxÞjj ¼ F ðxÞ li;x F ðxi Þ
i
X
X
¼ Rðx; xÞ 2
li;x Rðx; xi Þ þ
li;x Rðxi ; xj Þlj;x
ð19Þ
i
i;j
P
is minimized under the constraint dx i li;x dxi 2 Lk ; i.e.
X
xi li;x xli ¼ 0; 8l4k
ð20Þ
i
This constrained-minimization problem is dealt with by the Lagrange method. A straightforward computation shows that for any given value x of the vector of factors, the coefficients
li;x are obtained by solving the linear system
!
!
!
R O
kx
rx
¼
lx
ox
O> 0
where R is the n n matrix whose ði; jÞth entry is Rðxi ; xj Þ; rx 2 Rn is the column vector whose ith
entry is Rðxi ; xÞ; ox 2 Rp is the column vector whose lth entry is xl ; and lx is a vector of
Lagrange coefficients.
3. SEMI-PARAMETRIC FORMULATION AND PRIOR KNOWLEDGE
As (14) indicates, an IRF F ðxÞ can be written as the sum of a zero-mean random process W ðxÞ
and polynomial parametric terms corresponding to its mean:
X
F ðxÞ ¼ W ðxÞ þ
bl xl
ð21Þ
l
It was shown in Reference [14] that exponential terms could also be considered but we shall limit
ourselves here to polynomial terms. Intrinsic Kriging is thus a semi-parametric method. This
formal decomposition is well known in the theory of splines or radial basis functions, [1, 2], and
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
222
E. VAZQUEZ, E. WALTER AND G. FLEURY
has also been used in SVM [16]. The intrinsic Kriging theory [14] states that the functions in the
parametric terms must be chosen linearly independent and that the space spanned by these
functions must be stable under translations, which limits these functions to be polynomials or
real/complex exponentials. (Using exponentials in the parametric terms is unusual and this is
why we only consider polynomial terms xl :) In the literature (see, for instance, Reference [17]), it
has often been argued that these parametric terms offer little advantage in practice, and it has
been shown in Reference [14] that an IRF is generally locally stationary (on a bounded domain,
an IRF can be identified to a stationary random function), which suggests that there is in
principle nothing to be gained using IRFs.
On the contrary, we found that introducing parametric terms was interesting indeed to take
into account relevant prior information by using an extended set of factors. The intrinsic
Kriging framework is obtained when considering a subset LN > of finite support measures l
orthogonal to a set N of functions. This set can be written as
(
)
m
X
LN > ¼ l; such that
li f ðxi Þ ¼ 0; 8f 2 N
i¼1
l
For instance, Lk is orthogonal to N ¼ fx ; l4kg: From the equivalence between Kriging and
regularization methods, one can show that the parametric terms in (21) are not regularized
because the set N of functions corresponds to kerðAÞ: This is an interesting way of taking prior
information into account, as one may wish it not to be regularized. Additionally, if prior
information is included under the form of parametric terms, it is orthogonal to the prediction
error.
However, using polynomial parametric terms can be quite restrictive and this is why we shall
distinguish two types of factors. The factors of the first type are the natural model inputs. The
factors of the second type are introduced to take advantage of additional prior information.
They are assumed to be deterministic functions of the factors of the first type, which formally
circumvents the restriction of using polynomial terms only. The resulting method is also known
as Kriging with external drift in geostatistics.
For example, assume that a phenomenon involving spatial factors is to be predicted over a
given region at a given date. If the phenomenon is two dimensional, there would be typically two
factors describing the position x and the output would be predicted based on a finite number of
observations f ðxi Þ of the phenomenon. Kriging would yield an interpolation of the data if the
measurement errors were neglected. If a third factor z associated to the position x turns out to be
correlated with the output of the phenomenon, a parametric term such as bz can be included in
the model, with the regression coefficient b to be estimated by intrinsic Kriging; z can be any
relevant quantity, the value of which can be evaluated for all x: Note that the covariance kernel
remains unchanged and still depends on x only. Including this parametric term into the Kriging
estimator is direct and leads to computing the coefficients li;x by solving the slightly modified
linear system
0
10 1 0 1
R O z
rx
kx
B >
CB C B C
B C B C
BO
0 0C
@
A@ lx A ¼ @ ox A
lzx
zx
z> 0 0
where z is the column vector whose ith entry is zðxi Þ; that is, the value of z at xi :
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
223
INTRINSIC KRIGING
Two types of application are directly concerned by semi-parametric methods. The first one,
illustrated in Section 4, is when a totally black-box model is not accurate enough. Often though,
the user has some information on the process to be modelled based either on physical
considerations or on past experimentation. This prior knowledge can then be taken into account
via an extended set of factors. As a second type of application, suppose that a phenomenon can
be accurately modelled but that the simulation of this model is so complex and time consuming
that only a few points are accessible in practice. It then becomes interesting to predict the result
of this simulation at any other point by a method such as Kriging. However, due to the very few
data available, not much can be generally expected from a mere interpolation. Then suppose a
simpler model, fast but less reliable, could be used. Then the natural idea is to plug this model
into the Kriging procedure, as allowed by intrinsic Kriging.
There is a limit, however, to the number of factors that can be added, which comes from the
fact that the corresponding parametric terms are not regularized with the risk of an ill-posed
problem and of overfitting the data.
4. APPLICATION
We have studied a device for the measurement of flow in water pipes. The flow can be obtained
from the integration of the water speed profile, which can be very perturbed as shown in Figure
4. The water speed profile is reconstructed from the observations of speed at given points of a
cross-section, see Figure 3. Unfortunately, the number of observations is limited and, moreover,
the speed near the inner surface of the pipe is not available.
1
0.8
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
− 0.8
−1
−1
−0.5
0
0.5
1
Figure 3. Cross-section of the pipe. Dots indicate the locations of all speeds computed
during the simulation whereas triangles represent the observations (raw data) in the
measurement device. There are only 16 observations.
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
224
E. VAZQUEZ, E. WALTER AND G. FLEURY
A simple interpolation neglects the friction phenomenon that slows down water near the inner
surface of the cylinder. This is an obvious consequence of the fact that black-box models do not
incorporate any physical knowledge. The model of the water flow can therefore be refined if
prior information is added. Speed profiles in simple cases are easy to obtain. It is then natural to
incorporate this knowledge in the model as parametric terms, i.e. additional factors. In Figure 5,
we compare the result of a reconstruction of the speed profile with and without such a prior. In
the former case, only one speed profile, corresponding to that in a straight pipe, has been used as
an additional factor. Experiments have shown that this simple procedure drastically improved
the performance of the resulting software sensor. With two additional factors, the results are
even better. However, it is not advisable to add as many factors as there are possible speed
profile configurations, because of overfitting.
This example is typical of black-box modelling problems where prior knowledge is available
under the form of a prior output. It can be transposed to other situations in engineering and two
cases can be distinguished regarding the selection of prior outputs. The first one is when prior
outputs correspond to observations of the real system. This approach was followed for the
measurement of flows, where the speed profile in a straight pipe at a given flow rate has been
used as prior. The alternative approach is to design basic first-order physical models of the
studied phenomenon. In the case of flow measurement, it would have been possible to obtain the
Figure 4. Different speed profiles: (a) in a straight pipe; (b) after a change of the pipe diameter;
(c) and (d) in angled pipes.
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
INTRINSIC KRIGING
225
Figure 5. (a) true speed profile, (b) simple black-box reconstruction, (c) corresponding prediction
error, (d) interpolation with prior, (e) much improved corresponding prediction error.
speed profile in straight pipe from fluid mechanics. Carefully selecting priors leads to
satisfactory improvements of black-box models, which thus become grey-box models.
REFERENCES
1. Schaback R. Native Hilbert spaces for radial basis functions I. International Series of Numerical Mathematics 1999;
132:255–282.
2. Wahba G. Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics,
vol. 59. SIAM: Philadelphia, PA, 1990.
3. Scho. lkopf B, Smola A. Learning with Kernels. MIT Press: Cambridge, MA, 2002.
4. Vapnik VN. The Nature of Statistical Learning Theory. Springer: Heidelberg, 1995.
5. Chiles J-P, Delfiner P. Geostatistics: Modeling Spatial Uncertainty. Wiley Series in Probability and Statistics. Wiley
Interscience: New York, 1999.
6. Matheron G. Principles of geostatistics. Economic Geology 1963; 58:1246–1266.
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
226
E. VAZQUEZ, E. WALTER AND G. FLEURY
7. Williams CKI. Regression with Gaussian processes. In Mathematics of Neural Networks: Models, Algorithms and
Applications, Ellacott SW, Mason JC, Anderson IJ (eds). Kluwer: Dordrecht, 1997. Presented at the Mathematics of
Neural Networks and Applications Conference, Oxford, 1995.
8. Wahba G. Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV. In Advances in
Kernel Method}Support Vector Learning, Scho. lkopf B, Burges CJC, Smola AJ (eds), Chapter 6. MIT Press: Boston,
1998; 69–87.
9. Wahba G, Kimeldorf GS. Spline functions and stochastic processes. Sankhya.: The Indian Journal of Statistics: Series
A 1970; 32(2):173–180.
10. Matheron G. Splines and Kriging: their formal equivalence. In Down-to-Earth Statistics: Solutions Looking for
Geological Problems, Merriam DF (ed.). Syracuse University of Geology Contributions Edition, Academic Press:
New York, 1981; 8:77–95.
11. Kybic J, Blu T, Unser M. Generalized sampling: a variational approach}part I: theory. IEEE Transactions on
Signal Processing 2002; 50(8):1965–1976.
12. Stein ML. Interpolation of Spatial Data: Some Theory for Kriging. Springer: New York, 1999.
13. Vazquez E, Walter E. Multi-output support vector regression. In the 13th IFAC Symposium on System Identification,
SYSID-2003, Rotterdam, August 2003; 1820–1825.
14. Matheron G. The intrinsic random functions, and their applications. Advances in Applied Probability 1973; 5:
439–468.
15. Kitanidis PK. Statistical estimation of polynomial generalized covariance functions and hydrologic applications.
Water Resources Research 1983; 19:909–921.
16. Smola A, Friess T, Scho. lkopf B. Semiparametric support vector and linear programming machines. In Advances in
Neural Information Processing Systems, vol. 11. Kearns M, Solla S, Cohn D (eds). MIT Press: Cambridge, MA,
1999; 585–591.
17. Sacks J, Welch WJ, Mitchell TJ, Wynn HP. Design and analysis of computer experiments. Statistical Science 1989;
4(4):409–435.
Copyright # 2005 John Wiley & Sons, Ltd.
Appl. Stochastic Models Bus. Ind., 2005; 21:215–226
Документ
Категория
Без категории
Просмотров
2
Размер файла
340 Кб
Теги
asmb, 536
1/--страниц
Пожаловаться на содержимое документа