APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/asmb.536 Intrinsic Kriging and prior information E. Vazquez1,*,y, E. Walter1,z and G. Fleury2,} 1 Laboratoire des Signaux et Syste"mes, CNRS-Supe!lec-Universite! Paris-Sud, 91192 Gif-sur-Yvette, France 2 Supe!lec, Service des Mesures, 91192 Gif-sur-Yvette, France SUMMARY Kriging, one of the oldest prediction methods based on reproducing kernels, can be used to build black-box models in engineering. In practice, however, it is often necessary to take into account prior information to obtain satisfactory results. First, the kernel (the covariance) can be used to specify properties of the prediction such as its regularity or the correlation distance. Moreover, intrinsic Kriging (viewed as a semiparametric formulation of Kriging) can be used with an additional set of factors to take into account a speciﬁc type of prior information. We show that it is thus very easy to transform a black-box model into a grey-box one. The prediction error is orthogonal in some sense to the prior information that has been incorporated. An application in ﬂow measurement illustrates the interest of the method. Copyright # 2005 John Wiley & Sons, Ltd. KEY WORDS: black-box models; grey-box models; Kriging; regularization; semi-parametric models 1. INTRODUCTION Black-box models are used in engineering to predict the behaviour of non-linear systems or processes that are hard to describe based on physical knowledge. They can also be used when the computational cost of a physical model is too high for the intended use. Building a blackbox model can be viewed as an approximation or interpolation problem. The aim is to predict the output f ðxÞ of the system (where x 2 Rd is a vector of inputs, or factors) from a ﬁnite set of observation data fxi ; f ðxi Þgni¼1 : Several techniques can be used to approximate or interpolate the data. The framework of reproducing kernel Hilbert spaces (rkhs) is very useful for this task and is shared by four methods that have proven to be eﬃcient in black-box modelling, namely radial basis functions networks (RBF) [1], splines [2], support vector machines (SVM) [3, 4] and Kriging [5, 6] (also known under the name of Gaussian processes [7]). The rkhs framework is also strongly connected with regularization theory. However, despite well-known connections between these techniques *Correspondence to: E. Vazquez, Laboratoire des Signaux et Syst"emes, CNRS-Sup!elec, Universit!e Paris-Sud, 91192 Gifsur-Yvette, France. y E-mail: vazquez@lss.supelec.fr z E-mail: walter@lss.supelec.fr } E-mail: gilles.ﬂeury@supelec.fr Copyright # 2005 John Wiley & Sons, Ltd. 216 E. VAZQUEZ, E. WALTER AND G. FLEURY [2, 8, 9], they are not equivalent in practice. A ﬁrst recurrent problem is the choice of a suitable kernel for a given application. We wish to show that the framework of Kriging is well suited to this task, because of its probabilistic foundations (the kernel is seen as a covariance, which can be estimated from experimental data), and because of the large body of knowledge accumulated by the geostatisticians who have created, developed and used Kriging during the last 40 years. A second very important problem forms the main topic of the present paper. It appears that totally black-box models are often not good enough in applications, because they fail to take into account some prior knowledge. We point out a simple yet eﬃcient technique based on a semi-parametric formulation of Kriging and an extended set of factors. This semi-parametric formulation is well known in kernel theories but does not seem to have been exploited in engineering applications. The next section will brieﬂy recall the basics of Kriging and more speciﬁcally intrinsic Kriging. 2. INTRINSIC KRIGING 2.1. Regularization and the probabilistic viewpoint Assume F ðxÞ is a random process with zero mean, which serves to model an output depending on a vector x of factors. Assume also that F ðxÞ 2 L2 ðO; A; PÞ for all x; so F ðxÞ is a second-order random process. A Kriging estimator is the best linear approximation of F ðxÞ in the space generated by the random variables of F ðx1 Þ; . . . ; F ðxn Þ: The data f ðxi Þ are assumedPto be sample values of F ðxi Þ; i ¼ 1; . . . ; n: More precisely, a linear combination F# ðxÞ ¼ ni¼1 li;x F ðxi Þ is sought for, such that jjF# ðxÞ F ðxÞjj2 is minimized. Here, the norm jj:jj is deduced from the classical scalar product ðX ; Y Þ ¼ E½XY in the probabilistic space L2 ðO; A; PÞ: We shall see later how this estimator can be obtained, based on orthogonal projection. The link between Kriging, rkhs and regularization theory is now well established, with the ﬁrst studies in the context of splines [9, 10]. Let F be a Hilbert space of real functions on a bounded domain X Rd and A be a continuous linear operator on F: Deﬁne an interpolant as the solution of the following variational problem: min Li f ¼f ðxi Þ; 8i2f1;...;ng jjAf jj ð1Þ where the Li ’s are continuous linear functionals, in practice evaluation functionals, and jj:jj is the L2 ðXÞ norm. The solution of this constrained problem is unique [10] if kerðAÞ \ kerðLi Þ ¼ 0; 8i 2 f1; . . . ; ng ð2Þ (This means that a function f 2 kerðAÞ must be uniquely determined by the constraints Li f ¼ f ðxi Þ; and generally kerðAÞ is a set of low-degree polynomials.) Minimizing jjAf jj is equivalent to enforcing some regularity condition on the solutions. The regularization operator A is chosen based on mathematical considerations [11], jjAf jj corresponding in practice to a norm in an rkhs (also called the feature space in support vector regression). In the framework of splines, a norm on the derivatives of f is chosen in order to penalize variations of the interpolant. The rkhs may then be a Sobolev space. However, choosing a type of regularization without considering the nature of the process to be modelled is what Kriging makes it possible to avoid. This can be achieved using the existence of a random process F ðxÞ; such that the Hilbert space spanned by F ðxÞ is isomorphic to the space of functions Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 INTRINSIC KRIGING 217 F0 ¼ ff 2 F; such that jjAf jj2 51g\kerðAÞ [9]. Hence, there is a one-to-one correspondence between a given rkhs (a feature space) and a probabilistic space. As a consequence, the feature space inherits the characteristics of the probabilistic space, in the sense that the properties of the functions f 2 F0 are described by the model F ðxÞ: In the framework of Kriging, the kernel, written as Rðx; yÞ; has a probabilistic interpretation as the covariance of F ðxÞ and F ðyÞ: The structure of the kernel thus indicates how two values of the output of the process are correlated depending on the values taken by the factors. For instance, radial kernels written as Rðjjx yjjÞ correspond to assuming the stationarity and isotropy of F ðxÞ: 2.2. Relevant kernels for prediction We shall address this topic brieﬂy, with examples illustrating the inﬂuence of the choice of a covariance on the prediction. Most kernels considered in the literature are isotropic and invariant by translation. Since a kernel can be identiﬁed to a covariance, it is also required that the kernel be a positive function, which means that, for any integer m; sets of reals fa1 ; . . . ; am g; and vectors fx1 ; . . . ; xm g; m X ai Rðxi ; xj Þaj 50 ð3Þ i; j¼1 Most often, because the actual covariance of the data is unknown, the user falls back on classical structures of covariance comprising some parameters that can be adjusted in order to adapt the covariance model to the data generating process. Nevertheless, a model of covariance must be chosen with care and the kernel must be relevant to describe the data. For instance, the popular Gaussian covariance is far from being adapted to every type of data: problems may arise, for instance, due to too smooth interpolations, wrong conﬁdence intervals, numerical instabilities. Stein [12] stresses the importance of taking care of the behaviour of the covariance near the origin, which is linked with the quadratic-mean regularity of the random process. For instance, if the covariance is continuous at the origin, then the process will be continuous in quadratic mean. The Mate!rn covariance makes it possible to adjust regularity with only one parameter. Stein advocates the use of the following parameterization of the Mat!ern model: 1=2 n 1=2 s2 2n h 2n h Rðjjx yjjÞ ¼ RðhÞ ¼ n1 Kn ð4Þ r r 2 GðnÞ where Kn is the modiﬁed Bessel function of the second kind. This parameterization allows an easy interpretation of the parameters, as n controls regularity, s2 is the variance ðRð0Þ ¼ s2 Þ; and r represents the range of the covariance, i.e. the characteristic correlation distance. Figure 1 shows the inﬂuence of n: Under the hypothesis that F ðxÞ is Gaussian, the parameters n; s2 ; r can be adjusted to the data by an automatic procedure, such as maximum likelihood, which is a natural route in the probabilistic context of Kriging. The maximum-likelihood estimate of the vector h of the parameters of the covariance is obtained by maximizing the log-likelihood function n 1 1 lðhÞ ¼ logð2pÞ log det RðhÞ f > RðhÞ1 f 2 2 2 ð5Þ where n is the number of observed output data, RðhÞ is the covariance matrix of the vector ½F ðx1 Þ; . . . ; F ðxn Þ> ; and f ¼ ½f ðx1 Þ; . . . ; f ðxn Þ> is the vector of observed outputs. Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 218 E. VAZQUEZ, E. WALTER AND G. FLEURY 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −1.5 −1 −0.5 0 0.5 1 1.5 2 Figure 1. Mat!ern covariances for the parameterization (4) with r ¼ 0:5; s ¼ 1: Solid line corresponds to n ¼ 0:5; dashed line to n ¼ 1 and dotted line to n ¼ 3: When the output to be modelled is corrupted by noise, this noise is taken into account by modifying the covariance. The simplest case is when the noise is white; the covariance of the data then possesses a discontinuity at the origin. More generally, it is interesting to apply the Kriging methodology concerning the choice of a kernel to other rkhs methods such as support vector regression (SVR), as illustrated by Figure 2. 2.3. Why intrinsic Kriging? The stationarity hypothesis seems inadequate for describing systems that are as simple as linear systems, for instance. More generally, the output of systems found in engineering are often smooth functions probably well described by piecewise polynomials or exponentials, and splines-based models may then be well adapted. As stated in References [9, 10], intrinsic Kriging [14] and splines are formally equivalent. This is one of the reasons why we are using the intrinsic version of Kriging to beneﬁt from its probabilistic framework. The main idea of intrinsic Kriging is to assume that diﬀerences such as F ðxÞ F ðyÞ are (weakly) stationary, whereas F ðxÞ could be non-stationary, as for a Brownian motion for instance. The stationarity of diﬀerences implies that there exist functions mðhÞ; h 2 Rd and gðhÞ; called variogram, such that E½F ðxÞ F ðyÞ ¼ mðx yÞ ð6Þ var½F ðxÞ F ðyÞ ¼ 2gðx yÞ ð7Þ Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 219 INTRINSIC KRIGING 0.35 1.2 0.3 1 SVR margins data points support vectors 0.8 0.25 0.6 0.2 0.4 0.15 0.2 0.1 0 0.05 −0.2 −0.4 0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 0.5 1 1.5 2 2.5 3 3.5 4 Figure 2. Kriging lore applied to SVR in a one-factor case. The left plot shows a Mat!ern covariance with white noise whose parameters have been estimated by maximum likelihood from the data of the right plot (crosses). The circle at the origin of the covariance materializes the discontinuity due to additive white noise on the output. This kernel has then been used in support vector regression applied to the same data. The resulting prediction is provided on the right plot (thick line). The dotted lines indicate the margins of the SVR, chosen according to Vazquez and Walter [13]. It is straightforward to note that these diﬀerences will ﬁlter out any constants, and in particular an unknown constant mean of F ðxÞ: More generally, any ﬁnite linear combination of F ðxÞ written as 4 F ðlÞ ¼ n X li F ðxi Þ such that i¼1 n X li ¼ 0 ð8Þ i¼1 with also ﬁlter out the unknown mean of F ðxÞ: Even more generally, a linear combination 4 F ðlÞ ¼ n X li F ðxi Þ ð9Þ i¼1 4 will ﬁlter out any monomial xl (since x 2 Rd ; l is a multi-index ðll ; . . . ; ld Þ and xl ¼ xl11 xl22 xldd Þ provided that l satisﬁes the constraint n X li xli ¼ 0 ð10Þ i¼1 P In (8) and (9), l is a ﬁnite-support measure that can be written as l ¼ i li dxi ; using the Kronecker symbol. The set of measure l satisfying (10) for all basis functions xl up to degree k (that is, l1 þ l2 þ þ ld 4k) will be denoted by Lk : It follows that Lkþ1 Lk : F ðlÞ will be said to be (weakly) stationary if its second-order moments are invariant by translation, and in particular covðF ðlÞ; F ðmÞÞ ¼ covðF ðth lÞ; F ðth mÞÞ ð11Þ P where th denotes the translation operator ðth l ¼ i li dxi h Þ: If F ðlÞ is stationary, then F ðxÞ is called an intrinsic random function (IRF). Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 220 E. VAZQUEZ, E. WALTER AND G. FLEURY Intrinsic Kriging builds the best linear predictor of F ðxÞ [14], written as n X F# ðxÞ ¼ li;x F ðxi Þ ð12Þ i¼1 P under the constraint that the error of approximation of F# ðxÞ F ðxÞ ¼ ni¼1 li;x F ðxi Þ F ðxÞ; seen as a linear combination of P F ð:Þ; satisﬁes a ﬁltering equation similar to (10) (which was written for a linear combination ni¼1 li F ðxi ÞÞ: That is, 8l4k n X li;x xli xl ¼ 0 ð13Þ i¼1 Equation (13) expresses the orthogonality of the error F# ðxÞ F ðxÞ with respect to the functions xl [14], which is chosen as an optimality criterion in a linear prediction framework. Owing to this fundamental property, it becomes possible to make the error of approximation orthogonal to any prior information that can be expressed using the basis functions xl ; as will be shown in Section 3. Intrinsic Kriging can also be used to deal with a possibly non-stationary random process with an unknown polynomial mean, that is, such that X E½F ðxÞ ¼ mðxÞ ¼ bl xl ð14Þ l with unknown coeﬃcients bl : 2.4. Implementation issues In the context of intrinsic Kriging, the covariance is no longer necessarily positive. Instead, one looks for a function Rðx; yÞ; called a generalized covariance of order k; where the order k of Lk has to be speciﬁed. This generalized covariance allows (11) to be evaluated for any ðl; mÞ 2 L2k as X covðF ðlÞ; F ðmÞÞ ¼ li Rðxi ; xj Þmj ð15Þ i;j Since F ðlÞ is stationary, Rðx; yÞ is actually a function of h ¼ x y: Rðx; yÞ is said to be conditionally positive because it is only required that the variance of linear combinations of the type F ðlÞ; l 2 Lk ; be a positive quantity, i.e. X li Rðxi ; xj Þlj ¼ var½F ðlÞ50 ð16Þ i;j For instance, RðhÞ ¼ gðhÞ; with g deﬁned by (7), is a general covariance of order 0. Condition (16) is weaker than positivity, and therefore, a more general class of kernels/covariances can be used besides classical covariances. Polynomial kernels turn out to be a useful class of generalized covariances [14]. Given the order k; they can be written as Rðx; yÞ ¼ k X ð1Þpþ1 ap jjhjj2pþ1 ; h¼xy ð17Þ p¼0 This very simple expression is linear in its parameters ap : For example, intrinsic Kriging based on a covariance written as a0 jjhjj gives a piecewise-linear interpolation. Such polynomial covariances are therefore well adapted to describing data that do not speciﬁcally show a stationary behaviour. The regularity of the interpolation can be tuned by the user by adjusting the coeﬃcients ap but it is generally preferred to rely on automatic procedures such as maximum Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 INTRINSIC KRIGING 221 likelihood. In the probalistic context of intrinsic Kriging, restricted maximum likelihood (REML, see Reference [15]) can be used, where maximum likelihood is applied to the diﬀerences of the data rather than to the data themselves. A trivial but lengthy computation leads to a more complex log-likelihood cost function to be maximized, which can be written as np 1 1 logð2pÞ log det RðhÞ log det R* ðhÞ lðhÞ ¼ 2 2 2 1 > v ðRðhÞ1 RðhÞ1 OR* ðhÞ1 O> RðhÞ1 Þv ð18Þ 2 This result is stated without proof in Reference [12]. In (18), n is the number of observed output data, p is the number of basis functions xl for l4k; O is an n p matrix whose ðj; lÞth entry is xlj ; R* ðhÞ ¼ O> RðhÞ1 O and the vector v is obtained after transformation of the data as v ¼ ðI OðO> OÞ1 O> Þf: Once the covariance has been chose, F ðxÞ can be estimated based on the observations. The squared norm of the error of approximation 2 X 2 2 # se ¼ jjF ðxÞ F ðxÞjj ¼ F ðxÞ li;x F ðxi Þ i X X ¼ Rðx; xÞ 2 li;x Rðx; xi Þ þ li;x Rðxi ; xj Þlj;x ð19Þ i i;j P is minimized under the constraint dx i li;x dxi 2 Lk ; i.e. X xi li;x xli ¼ 0; 8l4k ð20Þ i This constrained-minimization problem is dealt with by the Lagrange method. A straightforward computation shows that for any given value x of the vector of factors, the coeﬃcients li;x are obtained by solving the linear system ! ! ! R O kx rx ¼ lx ox O> 0 where R is the n n matrix whose ði; jÞth entry is Rðxi ; xj Þ; rx 2 Rn is the column vector whose ith entry is Rðxi ; xÞ; ox 2 Rp is the column vector whose lth entry is xl ; and lx is a vector of Lagrange coeﬃcients. 3. SEMI-PARAMETRIC FORMULATION AND PRIOR KNOWLEDGE As (14) indicates, an IRF F ðxÞ can be written as the sum of a zero-mean random process W ðxÞ and polynomial parametric terms corresponding to its mean: X F ðxÞ ¼ W ðxÞ þ bl xl ð21Þ l It was shown in Reference [14] that exponential terms could also be considered but we shall limit ourselves here to polynomial terms. Intrinsic Kriging is thus a semi-parametric method. This formal decomposition is well known in the theory of splines or radial basis functions, [1, 2], and Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 222 E. VAZQUEZ, E. WALTER AND G. FLEURY has also been used in SVM [16]. The intrinsic Kriging theory [14] states that the functions in the parametric terms must be chosen linearly independent and that the space spanned by these functions must be stable under translations, which limits these functions to be polynomials or real/complex exponentials. (Using exponentials in the parametric terms is unusual and this is why we only consider polynomial terms xl :) In the literature (see, for instance, Reference [17]), it has often been argued that these parametric terms oﬀer little advantage in practice, and it has been shown in Reference [14] that an IRF is generally locally stationary (on a bounded domain, an IRF can be identiﬁed to a stationary random function), which suggests that there is in principle nothing to be gained using IRFs. On the contrary, we found that introducing parametric terms was interesting indeed to take into account relevant prior information by using an extended set of factors. The intrinsic Kriging framework is obtained when considering a subset LN > of ﬁnite support measures l orthogonal to a set N of functions. This set can be written as ( ) m X LN > ¼ l; such that li f ðxi Þ ¼ 0; 8f 2 N i¼1 l For instance, Lk is orthogonal to N ¼ fx ; l4kg: From the equivalence between Kriging and regularization methods, one can show that the parametric terms in (21) are not regularized because the set N of functions corresponds to kerðAÞ: This is an interesting way of taking prior information into account, as one may wish it not to be regularized. Additionally, if prior information is included under the form of parametric terms, it is orthogonal to the prediction error. However, using polynomial parametric terms can be quite restrictive and this is why we shall distinguish two types of factors. The factors of the ﬁrst type are the natural model inputs. The factors of the second type are introduced to take advantage of additional prior information. They are assumed to be deterministic functions of the factors of the ﬁrst type, which formally circumvents the restriction of using polynomial terms only. The resulting method is also known as Kriging with external drift in geostatistics. For example, assume that a phenomenon involving spatial factors is to be predicted over a given region at a given date. If the phenomenon is two dimensional, there would be typically two factors describing the position x and the output would be predicted based on a ﬁnite number of observations f ðxi Þ of the phenomenon. Kriging would yield an interpolation of the data if the measurement errors were neglected. If a third factor z associated to the position x turns out to be correlated with the output of the phenomenon, a parametric term such as bz can be included in the model, with the regression coeﬃcient b to be estimated by intrinsic Kriging; z can be any relevant quantity, the value of which can be evaluated for all x: Note that the covariance kernel remains unchanged and still depends on x only. Including this parametric term into the Kriging estimator is direct and leads to computing the coeﬃcients li;x by solving the slightly modiﬁed linear system 0 10 1 0 1 R O z rx kx B > CB C B C B C B C BO 0 0C @ A@ lx A ¼ @ ox A lzx zx z> 0 0 where z is the column vector whose ith entry is zðxi Þ; that is, the value of z at xi : Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 223 INTRINSIC KRIGING Two types of application are directly concerned by semi-parametric methods. The ﬁrst one, illustrated in Section 4, is when a totally black-box model is not accurate enough. Often though, the user has some information on the process to be modelled based either on physical considerations or on past experimentation. This prior knowledge can then be taken into account via an extended set of factors. As a second type of application, suppose that a phenomenon can be accurately modelled but that the simulation of this model is so complex and time consuming that only a few points are accessible in practice. It then becomes interesting to predict the result of this simulation at any other point by a method such as Kriging. However, due to the very few data available, not much can be generally expected from a mere interpolation. Then suppose a simpler model, fast but less reliable, could be used. Then the natural idea is to plug this model into the Kriging procedure, as allowed by intrinsic Kriging. There is a limit, however, to the number of factors that can be added, which comes from the fact that the corresponding parametric terms are not regularized with the risk of an ill-posed problem and of overﬁtting the data. 4. APPLICATION We have studied a device for the measurement of ﬂow in water pipes. The ﬂow can be obtained from the integration of the water speed proﬁle, which can be very perturbed as shown in Figure 4. The water speed proﬁle is reconstructed from the observations of speed at given points of a cross-section, see Figure 3. Unfortunately, the number of observations is limited and, moreover, the speed near the inner surface of the pipe is not available. 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 − 0.8 −1 −1 −0.5 0 0.5 1 Figure 3. Cross-section of the pipe. Dots indicate the locations of all speeds computed during the simulation whereas triangles represent the observations (raw data) in the measurement device. There are only 16 observations. Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 224 E. VAZQUEZ, E. WALTER AND G. FLEURY A simple interpolation neglects the friction phenomenon that slows down water near the inner surface of the cylinder. This is an obvious consequence of the fact that black-box models do not incorporate any physical knowledge. The model of the water ﬂow can therefore be reﬁned if prior information is added. Speed proﬁles in simple cases are easy to obtain. It is then natural to incorporate this knowledge in the model as parametric terms, i.e. additional factors. In Figure 5, we compare the result of a reconstruction of the speed proﬁle with and without such a prior. In the former case, only one speed proﬁle, corresponding to that in a straight pipe, has been used as an additional factor. Experiments have shown that this simple procedure drastically improved the performance of the resulting software sensor. With two additional factors, the results are even better. However, it is not advisable to add as many factors as there are possible speed proﬁle conﬁgurations, because of overﬁtting. This example is typical of black-box modelling problems where prior knowledge is available under the form of a prior output. It can be transposed to other situations in engineering and two cases can be distinguished regarding the selection of prior outputs. The ﬁrst one is when prior outputs correspond to observations of the real system. This approach was followed for the measurement of ﬂows, where the speed proﬁle in a straight pipe at a given ﬂow rate has been used as prior. The alternative approach is to design basic ﬁrst-order physical models of the studied phenomenon. In the case of ﬂow measurement, it would have been possible to obtain the Figure 4. Diﬀerent speed proﬁles: (a) in a straight pipe; (b) after a change of the pipe diameter; (c) and (d) in angled pipes. Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 INTRINSIC KRIGING 225 Figure 5. (a) true speed proﬁle, (b) simple black-box reconstruction, (c) corresponding prediction error, (d) interpolation with prior, (e) much improved corresponding prediction error. speed proﬁle in straight pipe from ﬂuid mechanics. Carefully selecting priors leads to satisfactory improvements of black-box models, which thus become grey-box models. REFERENCES 1. Schaback R. Native Hilbert spaces for radial basis functions I. International Series of Numerical Mathematics 1999; 132:255–282. 2. Wahba G. Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. SIAM: Philadelphia, PA, 1990. 3. Scho. lkopf B, Smola A. Learning with Kernels. MIT Press: Cambridge, MA, 2002. 4. Vapnik VN. The Nature of Statistical Learning Theory. Springer: Heidelberg, 1995. 5. Chiles J-P, Delﬁner P. Geostatistics: Modeling Spatial Uncertainty. Wiley Series in Probability and Statistics. Wiley Interscience: New York, 1999. 6. Matheron G. Principles of geostatistics. Economic Geology 1963; 58:1246–1266. Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226 226 E. VAZQUEZ, E. WALTER AND G. FLEURY 7. Williams CKI. Regression with Gaussian processes. In Mathematics of Neural Networks: Models, Algorithms and Applications, Ellacott SW, Mason JC, Anderson IJ (eds). Kluwer: Dordrecht, 1997. Presented at the Mathematics of Neural Networks and Applications Conference, Oxford, 1995. 8. Wahba G. Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV. In Advances in Kernel Method}Support Vector Learning, Scho. lkopf B, Burges CJC, Smola AJ (eds), Chapter 6. MIT Press: Boston, 1998; 69–87. 9. Wahba G, Kimeldorf GS. Spline functions and stochastic processes. Sankhya.: The Indian Journal of Statistics: Series A 1970; 32(2):173–180. 10. Matheron G. Splines and Kriging: their formal equivalence. In Down-to-Earth Statistics: Solutions Looking for Geological Problems, Merriam DF (ed.). Syracuse University of Geology Contributions Edition, Academic Press: New York, 1981; 8:77–95. 11. Kybic J, Blu T, Unser M. Generalized sampling: a variational approach}part I: theory. IEEE Transactions on Signal Processing 2002; 50(8):1965–1976. 12. Stein ML. Interpolation of Spatial Data: Some Theory for Kriging. Springer: New York, 1999. 13. Vazquez E, Walter E. Multi-output support vector regression. In the 13th IFAC Symposium on System Identiﬁcation, SYSID-2003, Rotterdam, August 2003; 1820–1825. 14. Matheron G. The intrinsic random functions, and their applications. Advances in Applied Probability 1973; 5: 439–468. 15. Kitanidis PK. Statistical estimation of polynomial generalized covariance functions and hydrologic applications. Water Resources Research 1983; 19:909–921. 16. Smola A, Friess T, Scho. lkopf B. Semiparametric support vector and linear programming machines. In Advances in Neural Information Processing Systems, vol. 11. Kearns M, Solla S, Cohn D (eds). MIT Press: Cambridge, MA, 1999; 585–591. 17. Sacks J, Welch WJ, Mitchell TJ, Wynn HP. Design and analysis of computer experiments. Statistical Science 1989; 4(4):409–435. Copyright # 2005 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind., 2005; 21:215–226

1/--страниц