Environ Sci Pollut Res (2017) 24:24284–24296 DOI 10.1007/s11356-017-0030-2 RESEARCH ARTICLE Coupled Monte Carlo simulation and Copula theory for uncertainty analysis of multiphase flow simulation models Xue Jiang 1 & Jin Na 2 & Wenxi Lu 3 & Yu Zhang 4 Received: 27 June 2017 / Accepted: 24 August 2017 / Published online: 9 September 2017 # Springer-Verlag GmbH Germany 2017 Abstract Simulation-optimization techniques are effective in identifying an optimal remediation strategy. Simulation models with uncertainty, primarily in the form of parameter uncertainty with different degrees of correlation, influence the reliability of the optimal remediation strategy. In this study, a coupled Monte Carlo simulation and Copula theory is proposed for uncertainty analysis of a simulation model when parameters are correlated. Using the self-adaptive weight particle swarm optimization Kriging method, a surrogate model was constructed to replace the simulation model and reduce the computational burden and time consumption resulting from repeated and multiple Monte Carlo simulations. The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) were employed to identify whether the t Copula function or the Gaussian Copula is the optimal Copula function to match the relevant structure of the parameters. The results show that both the AIC and BIC values of the t Copula function are less than those of the Gaussian Copula function. This indicates that the t Copula function is the optimal function for matching the relevant Responsible editor: Marcus Schulz * Jin Na na_jin@126.com 1 State Key Laboratory of Biogeology and Environmental Geology and School of Environmental Studies, China University of Geosciences, Wuhan 430074, China 2 Institute of Disaster Prevention Science and Technology, Sanhe 065201, China 3 College of Environment and Resources, Jilin University, Changchun 130021, China 4 Songliao Institute of Water Environment Science, Songliao River Basin Water Resources Protection Bureau, Changchun 130021, China structure of the parameters. The outputs of the simulation model when parameter correlation was considered and when it was ignored were compared. The results show that the amplitude of the fluctuation interval when parameter correlation was considered is less than the corresponding amplitude when parameter estimation was ignored. Moreover, it was demonstrated that considering the correlation among parameters is essential for uncertainty analysis of a simulation model, and the results of uncertainty analysis should be incorporated into the remediation strategy optimization process. Keywords Uncertainty analysis . Copula theory . Surrogate model . DNAPL . SAPSOKRG Introduction The leakage of dense nonaqueous phase liquids (DNAPL) has caused serious environmental pollution, which has resulted in worldwide health hazards (Qin et al. 2008; Muff et al. 2016; Hou et al. 2016). DNAPL, with densities greater than that of water, are difficult to remove from groundwater (Qin et al. 2007). Surfactant-enhanced aquifer remediation (SEAR), a chemical enhancement of the conventional pump-and-treat technique, can considerably increase the removal efficiency of DNAPL-contaminated groundwater. However, SEAR is quite expensive. Therefore, it is critical to select a costeffective optimal remediation strategy by simulationoptimization techniques. SEAR simulation models are involved in the simulation-optimization process. When the simulation model involves uncertainty, which is primarily manifested in form of parameter uncertainty, the reliability of a certain optimal strategy could be jeopardized. Previously, the uncertainty of model parameters has been considered in certain studies. For example, He et al. (2008) Environ Sci Pollut Res (2017) 24:24284–24296 established a nonlinear chance-constrained programming model to address porosity and intrinsic permeability uncertainty and optimize remediation strategy for a laboratoryscale SEAR system. Neuman et al. (2012) proposed a multimodel approach for data-worth analysis based on a Bayesian model averaging framework considering parameter uncertainty. Zheng et al. (2011) and Wu et al. (2014) used a probabilistic collocation method to assess parameter uncertainty in watershed water quality models and integrated surface water-groundwater models. Fan et al. (2016a) used a hybrid sequential data assimilation and probabilistic collocation method to analyze the uncertainty of stream flow predictions in the form of parameter uncertainty. Zheng and Han (2016) used Markov chain Monte Carlo algorithms to assess the parameter uncertainty of watershed water quality models. Sepulveda and Doherty (2015) analyzed the parameter uncertainty of east-central Florida groundwater flow models, demonstrating that there is little justification for failing to accompany model predictions with the uncertainty estimate. Hou et al. (2016) analyzed the uncertainty of the simulation model using a Monte Carlo method, where the model parameters were assumed independent. Li et al. (2017a) assessed the performance of two parameter uncertainty analysis techniques (generalized likelihood uncertainty estimation and Bayesian method) for a topographic hydrologic model. Aquifer parameters, such as permeability and porosity, may be statistically correlated (Boving and Grathwohl 2001; Wong and Yeh 2002; Chen and Huang 2003; McPhee and Yeh 2006; Morin 2006). Even though neglecting parameter correlation may influence the reliability of uncertainty analysis, few studies focus on this issue and assume that those parameters are independent. Therefore, there is an urgent need for a method of handling parameter correlation in the process of uncertainty analysis. Copula theory was proposed by Sklar (1959), who pointed out that a multidimensional joint distribution function can be decomposed into a corresponding edge distribution function and a Copula function that determine the correlation among the variables (Sklar 1959; Possolo 2010; Xu et al. 2010; Haslauer et al. 2012). Copulas have been used for depicting correlation of data in biometric data studies (Rukhin and Osmoukhina 2005), financial risk management (Chai et al. 2008), rainfall data generation (Ghosh 2010; Xu et al. 2010), bivariate hydrologic risk analysis and hydrologic data assimilation (Fan et al. 2016b, 2017), structural reliability analysis (Goda 2010; Jiang et al. 2014; Cui et al. 2017), ocean platform design (Zhai et al. 2017), and conditional quantile estimators (Rémillard et al. 2017). However, Copula theory has not been applied in parameter correlation of SEAR simulation models in the context of remediation of groundwater contamination. In this study, a coupled Monte Carlo simulation and Copula theory is proposed for uncertainty analysis of a SEAR simulation model considering parameter correlation. 24285 In uncertainty analysis, Monte Carlo simulation is repeatedly involved, which results in enormous computational burden and time requirements. A surrogate model, that is, an approximation of the simulation model that has similar input-output relationship to the simulation model (Lee and Kang 2007; Raza and Kim 2007; Forrester and Keane 2009; He et al. 2010; Li et al. 2010; Zhang et al. 2014; Nguyen et al. 2014; Li et al. 2017b), is employed to reduce the computational burden and time requirements. Specifically, the tasks of the present study involve the following: (1) constructing a SEAR simulation model of a DNAPL-contaminated site; (2) constructing a selfadaptive weight particle swarm optimization Kriging (SAPSOKRG) surrogate model of the SEAR simulation model; (3) proposing a t Copula function and a Gaussian Copula function to describe parameter correlation, where the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are used to select an optimal Copula function for matching the relevant structure of the parameters; (4) generating 5000 samples of parameters that obey a multidimensional joint distribution function, and entering them into the SAPSOKRG surrogate model to obtain and analyze the corresponding output; and (5) generating 5000 additional independent samples of parameters and entering them into the same surrogate model to obtain output that is compared with the output when parameter correlation was considered. Methods SEAR simulation model A SEAR mathematical model was constructed including a mass conservation equation and definite conditions (Mason and Kueper 1996; Coulon et al. 2009; Luo and Lu 2014; Jiang et al. 2015). The University of Texas Chemical Compositional Simulator (UTCHEM) was applied to solve the mathematical model. UTCHEM is a three-dimensional, multiphase, and multicomponent finite-difference model, which was originally developed to simulate surfactantenhanced oil recovery but was subsequently modified for applications involving surfactant-enhanced remediation of nonaqueous phase liquids contaminated aquifers (Delshad et al. 1996; Qin et al. 2007). The mass conservation equation can be expressed as ∼ ∂ ϕC k ρk ∂t 2 0 13 ! → ! !4 3 @ ! þ ∇ ∑ ρk C kl v l −ϕS l K kl ∇ C kl A5 l¼1 ¼ Rk ; ðx; y; zÞ∈Ω; t ≥ 0 ; ð1Þ 24286 Environ Sci Pollut Res (2017) 24:24284–24296 → → where K kl and ! v l are the dispersion tensor (m2 s−1) and Darcy −1 velocity (m s ) of l, respectively, which can be expressed as → → K kl ¼ Ddkl aTl ! aLl −aTl δij þ vl δij þ τ ϕS l ϕS l → → k rl k ! vl ¼ − μl vli vlj ; ðx; y; zÞ∈Ω; t ≥ 0 ! vl ! ! ∇ Pl þ ρl g ∇ z ; ðx; y; zÞ∈Ω; t ≥ 0 ; ð2Þ ð3Þ where k denotes the component index, i.e., 1, 2, 3 for water, oil, and surfactant, respectively; l denotes the phase index; φ denotes ~ k denotes the overall concentration of k (volume the porosity; C fraction); ρk denotes the density of k (kg m−3); Ckl denotes the concentration of k in l (volume fraction); Sl denotes the saturation of l; Rk denotes the total source/sink of k (kg m−3 s−1); Ddkl denotes the molecular diffusion coefficient of k in l (m2 s−1); τ denotes the tortuosity; δij denotes the Kronecker delta function; αTl and αLl denote the transverse and longitudinal dispersities of l, respectively (m); vli and vlj denote thw Darcy velocities of l in the i and j directions, respectively (m s−1) (Qin et al. 2007); ! k !rl denotes the relative permeability of the porous medium to l; k denotes the intrinsic permeability tensor (m2); Pl denotes the pressure of l (Pa); ρl denotes the density of l (kg m−3); g denotes the gravity acceleration (m s−2); and z denotes the vertical distance (m) (Qin et al. 2007; Jiang et al. 2015). The definite conditions are elaborated in the BCase study^ section. Self-adaptive weight particle swarm optimization Kriging surrogate modeling method The Kriging (KRG) method is expressed as. yðxÞ ¼ f T ðxÞβ þ zðxÞ ð4Þ where f(x) = [f1(x), f2(x), ⋯ , fp(x)]T denotes the regression functions,βdenotes the matrix of regression coefficients to be estimated from the training samples (Hou et al. 2015), and z(x) is the stochastic model with zero mean and variance σ2 (Lee et al. 2002; Sakata et al. 2003; Jiang et al. 2015; Zhao et al. 2016). For m training samples with input X = [(x1, x2,…xm)]T and corresponding output response Y = [(Y1, Y2,…Ym)]T, the response for the predicted point x is ^^ þ rT ðxÞR−1 Y − F β ^^ ð5Þ ^yðxÞ ¼ f T ðxÞβ ^ are where r(x), R, and the scalarβ rðxÞ ¼ ½Rðx; x1 Þ; Rðx; x2 Þ; ⋯; Rðx; xm ÞT 2 3 Rðx1 ; x1 Þ ⋯ Rðx1 ; xm Þ 5 R¼4 ⋮ ⋱ ⋮ Rðxm ; x1 Þ ⋯ Rðxm ; xm Þ ^ ¼ F T R−1 F −1 F T R−1 Y β ð6Þ ð7Þ ð8Þ where r(x) denotes the correlation vector between m sampling points and the prediction point x, and F denotes the response column vector. T F ¼ f T ðx1 Þ; f T ðx2 Þ; ⋯; f T ðxm Þ ð9Þ In this study, the Gaussian function was selected as a correlation function. The correlation function of the n-dimensional vectors xi and xj is n 2 R θ; xi ; x j ¼ exp −θ ∑ xi −x j ð10Þ k¼1 where θ is the parameter of the KRG model to be optimized (Pan and Zhu 2011; Balesdent et al. 2013; Janusevskis and Le Riche 2013; Lu et al. 2013). Particle swarm optimization (PSO) is a random optimization algorithm (Hou et al. 2015). A self-adaptive weight was employed to balance the local improvement ability and global search ability of the PSO algorithm (Hou et al. 2015). The self-adaptive weight can be expressed as 8 * > < wmin þ ðwmax −wmin Þ ð f − f min Þ ; f ≤ f avg f avg − f min w¼ > : wmax ; f > f avg ð11Þ where wmin and wmax denote the minimum and maximum values of the weight, respectively, f denotes the current value of the objective function (minimization of the errors) of a particle, favg and fmin denote the average value and the minimum of all current objective function values, respectively. In this study, the self-adaptive weight particle swarm optimization (SAPSO) was used to optimize the parameters of KRG, resulting in the SAPSOKRG method. The construction of the SAPSOKRG surrogate model is as follows (Fig. 1). A training set of 100 samples and a testing set of 20 samples were collected in feasible regions of the five input variables (parameters of SEAR) using the Latin hypercube sampling (LHS) method, which can capture more variability in the sample space compared with the simple random sampling method (Xu et al. 2005). The corresponding output (average nitrobenzene removal rates) was obtained from the SEAR simulation model. Based on the input and output training sample sets, the surrogate model for approximating the relationship between the input variables and the output in the SEAR simulation model was trained and constructed using the SAPSOKRG method. The testing sample set was used to verify the accuracy of the surrogate model. Environ Sci Pollut Res (2017) 24:24284–24296 24287 function that connects F(x1, x2, ⋯ , xn) with F1(x1) , F2(x2) , ⋯ , Fn(xn) (Nelsen 2006; Tang et al. 2013; Li et al. 2014), namely, F ðx1 ; x2 ; ⋯; xn Þ ¼ C ½ F 1 ðx1 Þ; F 2 ðx2 Þ; ⋯; F n ðxn Þ ¼ C ðu1 ; u2 ; ⋯; un Þ ð12Þ The Sklar theorem converts the multidimensional joint cumulative distribution function of the random variables into a one-dimensional edge cumulative distribution function of the random variables F1(x1) , F2(x2) , ⋯ , Fn(xn) and a Copula function C(u1, u2, ⋯ , un), describing the correlation among the variables. Multidimensional elliptical Copula functions, such as the t Copula function and the Gaussian function, are used to describe positive and negative parameter correlation. Thus, the t Copula function and the Gaussian Copula function were adopted in this study to identify the relationship among the parameters. (1) t Copula function The distribution and density functions of the multidimensional t Copula function are (Li et al. 2012) Fig. 1 Flowchart of SAPSOKRG surrogate model Copula theory F(x1, x2, ⋯xn) is the multidimensional joint cumulative distribution function of the edge of the cumulative distribution function F 1(x 1) , F2 (x2 ) , ⋯ , F n(x n). There is a Copula Fig. 2 Flow chart of uncertainty analysis C ðu1 ; u2 ; ⋯un ; θÞ ¼ T n T −1 ðu1 Þ; T −1 ðu2 Þ; ⋯; T −1 ðun Þ; θ; ν − νþn . ν þ n ν n−1 0 2 Γ 2 1 þ ν1 ξ θ−1 ξ −1 2 Γ 2 Dðu1 ; u2 ; ⋯un ; θÞ ¼ jθj n − νþ1 n 2 Γ νþ1 ξ2 2 ∏ 1 þ νi i¼1 ð13Þ ð14Þ 24288 Environ Sci Pollut Res (2017) 24:24284–24296 where θ is the n-order symmetric positive-definite matrix whose diagonal elements are 1; Tn(•, ⋯ , •; θ) is the n-dimensional t distribution function with correlation coefficient matrix θ and νdegrees of freedom; Tν−1(•) is the inverse function of the one-dimensional t distribution function Tν(•); ξ' = Tν−1(u1) , Tν−1(u2) , ⋯ , Tν−1(un) is the t distribution variable (Nelsen 2006; Lei 2008). (2) Gaussian Copula function The distribution and density functions of the multidimensional Gaussian Copula function are (Li et al. 2012) C ðu1 ; u2 ; ⋯un ; θÞ ¼ Φn Φ−1 ðu1 Þ; Φ−1 ðu2 Þ; ⋯; Φ−1 ðun Þ; θ . −1 2 1 0 −1 exp − ξ θ −I ξ Dðu1 ; u2 ; ⋯un ; θÞ ¼ jθj 2 matrix θ; Φ−1(•) is the inverse function of the onedimensional standard normal distribution function Φ(•); ξ' = Φ−1(u1) , Φ−1(u2) , ⋯ , Φ−1(un) is the standard normal distribution variable; I is the unit matrix (Nelsen 2006; Lei 2008; Li et al. 2012). The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) were employed in this study to identify whether the t Copula function or the Gaussian function is the optimal Copula function for matching the related structures of the parameters. N AIC ¼ ‐2 ∑ lnDðu1i ; u2i ; ⋯; uni ; θÞ þ 2k ð17Þ i¼1 N ð15Þ BIC ¼ ‐2 ∑ lnDðu1i ; u2i ; ⋯; uni ; θÞ þ klnN ð16Þ where ∑ lnDðu1i ; u2i ⋯; uni ; θÞ is the likelihood function of where θ is the n-order symmetric positive-definite matrix whose diagonal elements are 1; |θ| is the determinant of θ; Φn(•, ⋯ , •; θ) is the n-dimensional standard normal distribution function with correlation coefficient the Copula function; D(u1i,u2i,...,uni) is the probability density function of the edge distribution function; n is the dimension; N is the number of parameters; θ represents the unknown parameters, whose maximum likelihood estimate is ^θ ¼ argmaxLðθÞ; k is the number of Copula parameters; Fig. 3 a Boundary conditions and b parameter distribution ð18Þ i¼1 N i¼1 Environ Sci Pollut Res (2017) 24:24284–24296 24289 Fig. 4 Contaminant concentration distribution at the initial time of the remediation stage and locations of injection and extraction wells. a Concentration distribution at X–Y plane (10th layer). b Extent of the plume (u1i,u2i,...,uni) is the empirical distribution of the original data (x1i,x2i,...,xni). It can be calculated as follows: 8 rank ðx1i Þ > > ; i ¼ 1; 2; ⋯; N u1i ¼ > > > N þ1 > > < rank ðx2i Þ u2i ¼ ; i ¼ 1; 2; ⋯; N ð19Þ N þ1 > > > ⋮ > > > rank ðxni Þ > : uni ¼ ; i ¼ 1; 2; ⋯; N N þ1 Table 1 Physical and chemical parameters of the case study Parameters Values Water density (g/cm3) Water viscosity (Pa·s) Water residual saturation Nitrobenzene density (g/cm3) Nitrobenzene viscosity (Pa·s) Nitrobenzene residual saturation Nitrobenzene solubility in water (g/L) Nitrobenzene/water interfacial tension (N/m) Porosity Permeability (zoneI) (m2) Permeability (zoneII) (m2) Longitudinal dispersivity (m) Transverse dispersivity (m) 1.000 0.001 0.24 1.205 0.00168 0.17 1.9 0.02566 0.30 5.527 × 10−11 1.776 × 10−11 1.00 0.30 where rank(x1i) denotes the rank of x1i among x1 in an ascending order (Li et al. 2012). Uncertainty analysis based on Copula theory The specific steps of uncertainty analysis for the SEAR simulation model based on Copula theory and Monte Carlo simulation are as follows (Fig. 2): (1) Porosity (n), permeability (k), aqueous phase dispersity (αw), oil phase dispersity (αo) and microemulsion phase dispersity (αm) are the main parameters influencing the output of SEAR. Thus, these five parameters were selected as input variables, with the average nitrobenzene removal rate as the output variable. The surrogate model Table 2 The strategy for uncertainty analysis Surfactant flush duration (day) Injection/extraction rate (m3/day) 88.17 In1 In2 In3 In4 Ex1 28.24 44.67 37.16 59.56 169.63 24290 Table 3 Environ Sci Pollut Res (2017) 24:24284–24296 Probability distribution of input variables Input variables Probability distribution Porosity Normal distribution Permeability (D) Logarithmic normal distribution Aqueous phasedispersity (m) Oil phase dispersity (m) Normal distribution Normal distribution Microemulsion phase dispersity (m) 1D = 9.8697 × 10−13 m2 Normal distribution for approximating the relationship between the input variables and the output in the SEAR simulation model was constructed using the SAPSOKRG method. (2) The number of Monte Carlo simulations was set to 5000 in the present study. Based on the Copula function and the corresponding calculation formula, 5000 parameter samples obeying the multidimensional joint distribution were generated. (3) The samples were entered into the SAPSOKRG surrogate model to obtain the corresponding output. (4) Finally, uncertainty analysis was conducted by analyzing the output. has dimensions of 2, 2, and 2 m in the x, y, and z directions, respectively. The former and border boundaries are no-flux boundaries, and both left and right boundaries are first-type boundary condition. The upward boundary is a water exchange boundary, and the downward boundary is a no-flux boundary (Fig. 3). The initial contaminant plume is shown in Fig. 4. Based on the contaminant plume at the initial time of the remediation stage, a SEAR with sodium lauryl sulfate as the surfactant was designed with one pumping well and four injection wells (Fig. 4). A 5% surfactant solution (volume fraction) was injected into the injection wells. To maintain hydraulic balance, the total injection rate was set equal to the pumping rate. The physical and chemical parameters of the case study are listed in Table 1. The uncertainty analysis focused on the five parameters (n, k, αw, αo, αm) mentioned above under a certain strategy (Table 2). The probability distribution of the five variables was determined empirically (Table 3). Results and discussion Surrogate model The proposed methods were evaluated by applying them to a characteristic nitrobenzene-contaminated site. The objective simulation layer was pore phreatic water. Groundwater flowed from northeast to southwest. The simulation domain was generalized as a heterogeneous and isotropic 3D multiphase flow and transport model. It was discretized into 10 layers, and each layer was discretized into 70 × 30 grid blocks. Each grid block The removal rates of the testing samples obtained from the SAPSOKRG surrogate model were compared with those from the simulation model. The comparison is shown in Fig. 5. Figure 5 shows that the results of the surrogate model results are close to those of the simulation model for the test phase, and the maximum absolute residuals is less than 0.8%, which indicates that the SAPSOKRG surrogate model accurately approximates the simulation model. Therefore, the SAPSOKRG surrogate model may replace SEAR for uncertainty analysis. Fig. 5 Distribution of results and residuals of simulation model vs. SAPSOKRG surrogate model Fig. 6 Comparison of Gaussian and t Copula functions Case study Environ Sci Pollut Res (2017) 24:24284–24296 24291 Fig. 7 Scatter diagram of parameter values. X1: porosity; X2: permeability; X3: aqueous dispersity. In a, c, correlation was ignored; in b and d, correlation was considered Construction of the multidimensional joint distribution model of the parameters The t Copula function and the Gaussian Copula function were proposed for constructing the multidimensional joint distribution model of the parameters of SEAR in the present study. The values of AIC and BIC are shown in Fig. 6. Figure 6 shows that the AIC and BIC values of the t Copula function are less than those of the Gaussian Copula function, indicating that the t Copula function is the optimal Copula function fitting to the related structure of the parameters (n, k, αw, αo, αm). Therefore, the t Copula function was selected to handle parameter correlation. Uncertainty analysis Uncertainty analysis of the simulation model was conducted based on coupled Monte Carlo simulation and Copula theory using the MATLAB software application. Considering the parameter correlation based on the t Copula function, 5000 samples were generated and entered into SAPSOKRG surrogate model of the simulation model. Fig. 8 Output of the simulation model when a parameter correlation was considered and b parameter correlation was ignored 24292 Environ Sci Pollut Res (2017) 24:24284–24296 Fig. 9 Frequency histogram and cumulative distribution function of nitrobenzene removal rates when a parameter correlation was considered, b parameter correlation was ignored. CDF of removal rate is shown in c Subsequently, the corresponding output (nitrobenzene removal rates) was obtained. Furthermore, 5000 additional independent samples of parameters were extracted and entered into the same surrogate model, and the output was compared with the output that was obtained when parameter correlation was considered. The necessity of considering parameter correlation was demonstrated by analyzing these outputs. Comparison of considering and ignoring parameter correlation The scatter diagrams of three representative correlated parameters based on the t Copula function are shown in Fig. 7. It can be seen that the distribution of the parameter samples varies significantly, depending on whether the correlation was considered or not. This variation directly affects uncertainty analysis and remediation strategy optimization. The outputs of the simulation model when parameter correlation was considered and when it was ignored are shown in Fig. 8. It can be seen that the output of SEAR is uncertain when the parameter changes within a reasonable scope regardless of whether parameter correlation was considered. The nitrobenzene removal rates fluctuate in the mean value. Therefore, the uncertainty analysis of the simulation model is of great significance. In addition, the frequency histogram and cumulative distribution function (CDF) of nitrobenzene removal rates when parameter correlation was considered and when it was ignored are shown in Fig. 9. Figure 9a, b shows that the probability distributions under the two conditions are different. The mean (82.35) and the standard deviation (1.51) of the output when parameter correlation was ignored are greater than the mean (81.92) and the standard deviation (1.11) of the output when parameter correlation was considered. In the former case, the mean and Fig. 10 a Distribution of fluctuations in different intervals considering and ignoring parameter correlation. b Cumulative distribution function of the removal rate fluctuation Environ Sci Pollut Res (2017) 24:24284–24296 24293 Fig. 11 Probability of removal rates in different intervals: a CDF of removal rate, b probability in different intervals variance of removal rates are overestimated, affecting remediation strategy optimization. Figure 9c shows that the main distribution interval of nitrobenzene removal rates when parameter correlation was considered is more centralized than when parameter correlation was ignored. Moreover, the significance test (t test) was conducted using SPSS to verify whether the difference between the two groups of data is statistically significant. The results indicated that the two conditions have significant difference. To further analyze the uncertainty analysis results, the distribution of fluctuations in different intervals and the cumulative distribution function of the removal rates fluctuation were drawn when parameter correlation was considered or ignored (Fig. 10). Figure 10a shows that the number of samples in the fluctuation intervals < 1 and 1–2 when parameter correlation was considered is larger than the corresponding number when parameter correlation was ignored, whereas in the fluctuation intervals 2–3, 3–4, and > 4, the number of samples is smaller when parameter correlation was considered than when it was ignored. Figure 10b shows that when fluctuation is less than 2, the frequency when parameter correlation was considered is almost 100%, whereas the frequency when parameter correlation was ignored is slightly greater than 80%. Therefore, Fig. 10 indicates that the amplitude of fluctuation intervals when parameter correlation was considered is less than the corresponding amplitude when parameter correlation was ignored. In conclusion, the comparison analysis results indicated that considering the correlation among parameters is critical Fig. 12 Probability of fluctuation in different intervals: a CDF of fluctuation, b probability in different intervals and more in line with reality in the uncertainty analysis of the SEAR simulation model. If parameter correlation is ignored, the reliability of uncertainty analysis results and optimal remediation strategy can be influenced. Uncertainty analysis when parameter correlation was considered The probability of removal rates and fluctuation in different intervals when parameter correlation was considered is shown in Figs. 11 and 12, respectively. Figure 11 shows that the probability of removal rates in the interval 82–83 is the highest, followed by the intervals 81–82, 80–81, and 83–84. The probability of removal rates in the intervals 78–79, 79–80, and > 84 is considerably low. Figure 12 shows that the probability of the fluctuation interval < 1 is the highest, followed by the interval 1–2, and the probability of the fluctuation interval > 2 is considerably low. Compared with recent work of Hou et al. (2016), who analyzed the uncertainty of the simulation model ignoring parameter correlation, the results of uncertainty analysis in this study, where parameter correlation was considered, are more in line with reality and provide more insightful information, such as the probability of removal rates and fluctuations in different intervals. Therefore, considering parameter correlation is essential for uncertainty analysis, and its results should be incorporated into SEAR strategy optimization. The DNAPL removal rate of the optimal SEAR strategy should 24294 achieve the remediation objective with an acceptable probability considering parameter uncertainty and correlation. Discussion on the relative computational cost The main computational burden results from repeated numerical simulations. It required nearly 300 s of CPU time to run the simulation model once. In total, the uncertainty analysis based on the simulation model would require 10,000 simulation runs (35 days) if the surrogate model was not used. By contrast, the surrogate-based optimization requires only 120 simulation runs (10 h) for training, testing the surrogate model. Thus, the introduction of a surrogate model considerably reduces the computational burden and time requirements. Conclusions In this study, a coupled Monte Carlo simulation and Copula theory was proposed for uncertainty analysis of SEAR simulation models. To reduce the enormous computational burden and time requirements resulting from repeated and multiple Monte Carlo simulation, the SAPSOKRG method was employed to establish a surrogate model of the simulation model. The t Copula function and the Gaussian Copula function were proposed and compared to describe parameter correlation. On this basis, 5000 samples of parameters obeying a multidimensional joint distribution function were generated. The corresponding output was obtained by entering the samples into the SAPSOKRG surrogate model. Subsequently, uncertainty analysis of the simulation model was conducted. Furthermore, 5000 additional independent samples of parameters were extracted and entered into the same surrogate model to obtain output that was compared with the output when parameter correlation was considered. The conclusions of this study are as follows: (1) The t Copula function is more effective than the Gaussian Copula function in handling parameter correlation. The application of Copula theory provides a new tool for depicting the complex nonlinear correlation of parameters in the SEAR simulation model. (2) The mean and the standard deviation of nitrobenzene removal rates when parameter correlation was ignored were greater by 0.43 and 0.4, respectively, than the corresponding figures when parameter correlation was considered. The amplitude of fluctuation intervals when parameter correlation was considered was less than the corresponding amplitude when parameter correlation was ignored. Moreover, the significance test indicated that the two conditions have significant difference. Therefore, neglecting parameter correlation would affect the reliability of remediation strategy optimization. In Environ Sci Pollut Res (2017) 24:24284–24296 conclusion, considering parameter correlation is critical and more in line with reality in the uncertainty analysis of the simulation model. (3) The proposed method provided more insightful information, such as the probability of removal rates and fluctuations in different intervals. Uncertainty analysis based on the t Copula function showed that the output of the simulation model was uncertain when the parameter varied within a reasonable scope, the removal rates of nitrobenzene fluctuated, with mean 81.92 and standard deviation 1.11, and the probability of the fluctuation interval 0–2 was 95.52%. Thus, the results of uncertainty analysis should be incorporated into SEAR strategy optimization. The proposed method provides tools for effective analysis of uncertainty problems in complex simulation models involving nonlinear correlation parameters. However, certain limitations are worth noting. Even though the SAPSOKRG surrogate model accurately approximates the simulation model (the maximum absolute residual is less than 0.8%), the error may affect the results of uncertainty analysis. Therefore, future work should focus on surrogate model uncertainty. Acknowledgements This study was financially supported by Project funded by China Postdoctoral Science Foundation (No. 2016 M602388), the National Key Research and Development Program of China (No. 2016YFC0402803-02), and the National Natural Science Foundation of China (No. 41372237). References Balesdent M, Morio J, Marzat J (2013) Kriging-based adaptive importance sampling algorithms for rare event estimation. Struct Saf 44:1– 10 Boving TB, Grathwohl P (2001) Tracer diffusion coefficients in sedimentary rocks: correlation to porosity and hydraulic conductivity. J Contam Hydrol 53(1–2):85–100 Chai WY, Zhu Y, Hou ZQ (2008) The research of copula theoryin in financial risk management. IEEE Int Conf Mach Learn Cybern 3: 1489–1493 Chen Z, Huang GH (2003) Integrated subsurface model in gand risk assessment of petroleum-contaminated sites in western Canada. J Environ Eng 129(19):858–871 Coulon F, Orsi R, Turner C, Walton C, Daly P, Pollard SJT (2009) Understanding the fate and transport of petroleum hydrocarbons from coal tar within gasholders. Environ Int 35(2):248–252 Cui XY, Hu XB, Zeng Y (2017) A copula-based perturbation stochastic method for fiber-reinforced composite structures with correlations. Comput Methods Appl Mech Eng 322:351–372 Delshad M, Pope GA, Sepehrnoori K (1996) A compositional simulator for modeling surfactant enhanced aquifer remediation, 1 formulation. J Contam Hydrol 23(4):303–327 Fan YR, Huang GH, Baetz BW, Li YP, Huang K, Li Z, Chen X, Xiong LH (2016a) Parameter uncertainty and temporal dynamics of sensitivity for hydrologic models: a hybrid sequential data assimilation Environ Sci Pollut Res (2017) 24:24284–24296 and probabilistic collocation method. Environ Model Softw 86:30– 49 Fan YR, Huang WW, Huang GH, Huang K, Li YP, Kong XM (2016b) Bivariate hydrologic risk analysis based on a coupled entropycopula method for the xiangxi river in the three gorges reservoir area, china. Theor Appl Climatol 125(1–2):381–397 Fan YR, Huang GH, Baetz BW, Li YP, Huang K (2017) Development of a copula-based particle filter (coppf) approach for hydrologic data assimilation under consideration of parameter interdependence. Water Resour Res 53(6):4850–4875 Forrester AIJ, Keane AJ (2009) Recent advances in surrogate-based optimization. Prog Aerosp Sci 45(1):50–79 Ghosh S (2010) Modelling bivariate rainfall distribution and generating bivariate correlated rainfall data in neighbouring meteorological subdivisions using copula. Hydrol Process 24:3558–3567 Goda K (2010) Statistical modeling of joint probability distribution using copula: application to peak and permanent displacement seismic demands. Struct Saf 32:112–123 Haslauer CP, Guthke P, Bárdossy A, Sudicky EA (2012) Effects of nongaussian copula-based hydraulic conductivity fields on macrodispersion. Water Resour Res 48(7):2360–2368 He L, Huang GH, Lu HW, Zeng GM (2008) Optimization of surfactantenhanced aquifer remediation for a laboratory BTEX system under parameter uncertainty. Environ Sci Technol 42(6):2009–2014 He L, Huang GH, Lu HW (2010) A stochastic optimization model under modeling uncertainty and parameter certainty for groundwater remediation design-part 1. Model development. J Hazard Mater 176(1–3):521–526 Hou ZY, Lu WX, Chu HB, Luo JN (2015) Selecting parameter-optimized surrogate models in DNAPL-contaminated aquifer remediation strategies. Environ Eng Sci 32(12):1016–1026 Hou ZY, Lu WX, Chen M (2016) Surrogate-based sensitivity analysis and uncertainty analysis for dnapl-contaminated aquifer remediation. J Water Resour Plan Manag 142(11):04016043 Janusevskis J, Le Riche R (2013) Simultaneous kriging-based estimation and optimization of mean response. J Glob Optim 55(2):313–336 Jiang C, Zhang W, Wang B, Han X (2014) Structure reliability analysis using a copula-function-based evidence theory model. Comput Struct 143:19–31 Jiang X, Lu WX, Hou ZY, Zhao HQ, Na J (2015) Ensemble of surrogatesbased optimization for identifying an optimal surfactant-enhanced aquifer remediation strategy at heterogeneous DNAPLcontaminated sites. Comput Geosci 84:37–45 Lee JS, Kang SK (2007) GA based meta-modeling of BPN architecture for constrained approximate optimization. Int J Solids Struct 44: 5980–5993 Lee SH, Kim HY, Oh SI (2002) Cylindrical tube optimization using response surface method based on stochastic process. J Mater Process Technol 130:490–496 Lei L (2008) Empirical research on VaR model on Chinese stock market based on GJR-GARCH, FHS, CoPula & EVT. Dissertation, Jinan university. (In Chinese) Li YF, Ng SH, Xie M, Goh TN (2010) A systematic comparison of meta modeling techniques for simulation optimization in decision support systems. Appl Soft Comput 10(4):1257–1273 Li DQ, Tang XS, Zhou CB, Phoon KK (2012) Uncertainty analysis of correlated non-normal geotechnical parameters using Gaussian copula. Sci China Technol Sci 55(11):3081–3089 Li DQ, Tang XS, Zhou CB (2014) Uncertainty characterization and reliability analysis of geotechnical parameters based on copula theory. Science press, Beijing (In Chinese) Li BQ, Liang ZM, He YQ, Hu L, Zhao WM, Acharya K (2017a) Comparison of parameter uncertainty analysis techniques for a topmodel application. Stoch Environ Res Risk Assess 31(5):1045– 1059 24295 Li J, Lu HW, Xing F, Chen YZ (2017b) Human health risk constrained naphthalene-contaminated groundwater remediation management through an improved credibility method. Environ Sci Pollut Res 24:16120–16136 Lu WX, Chu HB, Zhao Y, Luo JN (2013) Optimization of denser nonaqueous phase liquids-contaminated groundwater remediation based on kriging surrogate model. Water Pract Technol 8(2):304– 314 Luo JN, Lu WX (2014) Sobol’ sensitivity analysis of NAPLcontaminated aquifer remediation process based on multiple surrogates. Comput Geosci 67:110–116 Mason AR, Kueper BH (1996) Numerical simulation of surfactant flooding to remove pooled DNAPL from porous media. Environ Sci Technol 30(11):3205–3215 McPhee J, Yeh WG (2006) Experimental design for groundwater modeling and management. Water Resour Res 42(2):336–336 Morin RH (2006) Negative correlation between porosity and hydraulic conductivity in sand-and-gravel aquifers at cape cod, massachusetts, USA. J Hydrol 316(1–4):43–52 Muff J, Mackinnon L, Durant ND, Bennedsen LF, Rügge K, Bondgaard M, Pennell K (2016) The influence of cosolvent and heat on the solubility and reactivity of organophosphorous pesticide DNAPL alkaline hydrolysis. Environ Sci Pollut Res 23(22):22658–22666 Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New York Neuman SP, Xue L, Ye M, Lu D (2012) Bayesian analysis of data-worth considering model and parameter uncertainties. Adv Water Resour 36:75–85 Nguyen AT, Reiter S, Rigo P (2014) A review on simulation-based optimization methods applied to building performance analysis. Appl Energy 113:1043–1058 Pan F, Zhu P (2011) Lightweight design of vehicle front-end structure: contributions of multiple surrogates. Int J Veh Des 57(2):124–147 Possolo A (2010) Copulas for uncertainty analysis. Metrologia 47(3): 262–271 Qin XS, Huang GH, Chakma A, Chen B, Zeng GM (2007) Simulationbased process optimization for surfactant-enhanced aquifer remediation at heterogeneous DNAPL-contaminated sites. Sci Total Environ 381(1):17–37 Qin XS, Huang GH, Zeng GM, Chakma A (2008) Simulation-based optimization of dual-phase vacuum extraction to remove nonaqueous phase liquids in subsurface. Water Resour Res 44(4):106–113 Raza W, Kim KY (2007) Evaluation of surrogate models in optimization of wire-wrapped fuel assembly. J Nucl Sci Technol 44(6):819–822 Rémillard B, Nasri B, Bouezmarni T (2017) On copula-based conditional quantile estimators. Stat Probabil Lett 128:14–20 Rukhin AL, Osmoukhina A (2005) Nonparametric measures of dependence for biometric data studies. J Stat Plan Infer 131(1):1–18 Sakata S, Ashida F, Zako M (2003) Structural optimization using kriging approximation. Comput Methods Appl Mech Eng 192(7–8):923– 939 Sepulveda N, Doherty J (2015) Uncertainty analysis of a groundwater flow model in east-Central Florida. Groundwater 53(3):464–474 Sklar M (1959) Fonctions de répartition à $n$ dimensions etleursmarges. Publ Inst Stat Univ Paris 8:229–231 Tang XS, Li DQ, Zhou CB, Zhang LM (2013) Bivariate distribution models using copulas for reliability analysis. J Risk Reliab 227(5): 499–512 Wong HS, Yeh WG (2002) Uncertainty analysis in contaminated aquifer management. J Water Resour Plan Manag 128(1):33–45 Wu B, Zheng Y, Tian Y, Wu X, Yao YY, Han F, Liu J, Zheng CM (2014) Systematic assessment of the uncertainty in integrated surface watergroundwater modeling based on the probabilistic collocation method. Water Resour Res 50(7):5848–5865 Xu C, He HS, Hu Y, Yu C, Li X, Bu R (2005) Latin hypercube sampling and geostatistical modeling of spatial uncertainty in a spatially 24296 explicit forest landscape model simulation. Ecol Model 185(2):255– 269 Xu YP, Booij MJ, Tong YB (2010) Uncertainty analysis in statistical modeling of extreme hydrological events. Stoch Environ Res Risk Assess 24:567–578 Zhai J, Yin Q, Dong S (2017) Metocean design parameter estimation for fixed platform based on copula functions. J Ocean Univ China 16(4):635–648 Zhang J, Chowdhury S, Mehmani A (2014) Characterizing uncertainty attributable to surrogate models. J Mech Des 136(3):252–261 Environ Sci Pollut Res (2017) 24:24284–24296 Zhao Y, Lu WX, Xiao CN (2016) A kriging surrogate model coupled in simulation–optimization approach for identifying release history of groundwater sources. J Contam Hydrol 185-186(Pt2):225–236 Zheng Y, Han F (2016) Markov chain monte carlo (mcmc) uncertainty analysis for watershed water quality modeling and management. Stoch Env Res Risk A 30(1):293–308 Zheng Y, Wang W, Han F, Ping J (2011) Uncertainty assessment for watershed water quality modeling: a probabilistic collocation method based approach. Adv Water Resour 34(7):887–898

1/--страниц