Network-Aware Stochastic Virtual Machine Placement in Geo-Distributed Data Centers (Short Paper) Hana Teyeb1,2(B) , Nejib Ben Hadj-Alouane3 , and Samir Tata4 1 Faculty of Sciences of Tunis, University of Tunis El Manar, OASIS, Tunis, Tunisia hana.teyeb@gmail.com 2 SAMOVAR, Telecom SudParis, CNRS, University of Paris-Saclay, 9 Rue Charles Fourier, 91011 Evry, France 3 National Engineering School of Tunis, OASIS, Tunis, Tunisia nejib bha@yahoo.com 4 IBM Research - Almaden, 650 Harry Rd, San Jose, CA 95120, USA stata@us.ibm.com Abstract. In this work, we focus on the stochastic network-aware virtual machine placement (VM) problem in geodistributed data centers (DCs). We consider the uncertainty of the inter-VMs traﬃc while making placement and migration decisions. First, we propose a stochastic program with the objective of minimizing inter-DCs traﬃc. Then, we propose an equivalent optimization model using sampling methods and we present a two-step approach to solve the problem. Experiments show the eﬀectiveness of the proposed approach. Keywords: Cloud computing · IaaS · VM placement · VM migration Stochastic integer programming 1 · Introduction In order to achieve reliability and serve world wide users, large-scale cloud providers are relying on a geo-distributed infrastructure where data centers (DCs) are built in diﬀerent locations and interconnected within a backbone network. In such a context, network congestion is a crucial issue. The increasing traﬃc exchange of the applications hosted in the VMs may cause bottlenecks in the network resulting in performance degradation of the cloud system. Thus, it is important to minimize the traﬃc circulating within the backbone traﬃc. Recent studies [1–3] have shown that the workload of VMs is highly dynamic which may cause the existent placement and migration schemes to be ineﬃcient. Most of the existent works [4–6] make migration decisions based on deterministic demand estimation and workload characterization without considering stochastic properties. There are several optimization techniques to solve the VM placement problem. Among these techniques, we cite deterministic and Stochastic Integer c Springer International Publishing AG 2017 H. Panetto et al. (Eds.): OTM 2017 Conferences, Part I, LNCS 10573, pp. 37–44, 2017. https://doi.org/10.1007/978-3-319-69462-7_3 38 H. Teyeb et al. Programming (SIP) [7]. In contrast to deterministic approach, the SIP technique considers uncertain parameters. Many existing works have used SIP to deal with resource provision, load balancing and capacity planning problems [1,8,9]. Others have studied VM consolidation problem with stochastic demand [10,11]. In [12], the authors have proposed a joint approach that combines VM placement and bandwidth allocation. The main limitation of the aforementioned works is the fact that they did not consider inter-VMs communication while making the placement decisions. In this paper, we propose stochastic integer programming (SIP) formulation that aims to solve the network-aware VM placement problem in geo-distributed DCs while minimizing the inter-DCs traﬃc. The remainder of this paper is organized as follows. In Sect. 2, it presents the problem formulations as well as the proposed heuristic for solving the network-aware VM placement problem in geo-distributed DCs. As for Sect. 3, presents and discusses the experiment results. Finally, we conclude in Sect. 4. 2 Problem Formulation Table 1 presents the notations used in the presented formulations. DC edge routers are responsible for connecting the DCs to Wide Area Network (WAN) [13]. In the following, we use the decision variables deﬁned below. • ϕhi deﬁnes the amount of traﬃc originated from the VM i ∈ V and sent from the DC h ∈ D (i.e. the traﬃc sent to the backbone network). • xhi is equal to 1 if the VM i ∈ V is placed in the DC h ∈ D, 0 otherwise. • zih is equal to 1 if the VM i is migrated from the DC h ∈ D. i which denotes the amount of traﬃc originated from the VM i ∈ V and • fhk circulating between the DCs h ∈ D and k ∈ D. Table 1. Notations. Symbol Description D The set of data centers V The set of virtual machines R The set of hardware resources (CPU, RAM, storage) Mi The migration cost of the VM i (i ∈ V ). It is the amount of data transferred when the VM i is migrated aki Takes 1 if the VM i can be placed in the DC k, 0 otherwise (i ∈ V , k ∈ D) chr The capacity of the DC h in terms of resource r (h ∈ D, r ∈ R) uir The amount of resource r consumed by the VM i (r ∈ R, i ∈ V ) Eh The bandwidth capacity of the edge router of the DC h ∈ D xh0i is equal to 1 if the VM i is already placed in the DC h. (i ∈ V , h ∈ D) Network-Aware Stochastic Virtual Machine Placement 2.1 39 Stochastic Optimization Model In this formulation, we consider a random variable d ij that describes the amount of traﬃc exchanged between each pair of VMs (i.e. inter-VMs bandwidth demand). The variable follows a probability distribution that can be estimated from runtime measurement. We assume that the distribution can be obtained using statistical process to analyze historical data. Many studies [1,10,11] have shown that the resource demand of VMs follows a Normal distribution N (μ, σ 2 ). Thus, we consider that the inter-VMs bandwidth demand follows also the Normal distribution. Our aim is to minimize the amount of traﬃc circulating between the diﬀerent DCs. Mi .zih + ϕhi (1) min i∈V h∈D Subject to the following constraints: h xh .d d ϕh ij .x − ij i i j∈V k∈D P r( i fhk − i fkh = k∈D Mi .zih + i∈V ahi uir .xhi chr i∈V zih xh0i − zih ∈ {0.1} xki ∈ {0.1} ϕhi 0 i fhk 0 j∈V h d ij xi − h d ij .xj ∀i ∈ V, ∀h ∈ D (2) ∀i ∈ V, ∀h ∈ D (3) ∀h ∈ D (4) ∀i ∈ V (5) ∀i ∈ V, ∀h ∈ D (6) ∀r ∈ R, ∀h ∈ D (7) ∀i ∈ V, h ∈ D (8) j∈V ϕhi Eh ) 1 − i∈V xhi = 1 h∈D xhi j j∈V xhi ∀i ∈ V, h, k ∈ D ∀i ∈ V, k ∈ D ∀i ∈ V, h, k ∈ D ∀i ∈ V, h, k ∈ D The constraints (2) and (3) are both ﬂow conservation constraints. They are both stochastic due to the random variable d ij which refers to the inter-VMs communication traﬃc. The constraint (4) ensures that for each DC edge router, the total traﬃc, which includes the inter-VMs communication traﬃc and the migration traﬃc, does not exceed the bandwidth capacity of the edge router with a high probability (1 − ). The constraint (4) ensures the service quality guarantee with a predeﬁned threshold ε. The constraint (5) is a demand satisfaction constraint. As for the constraint (6) it restricts the placement of VMs in a particular number of DCs. The constraint (7) represents the capacity constraint on the DCs. It ensures that the amount of resources consumed by diﬀerent VMs 40 H. Teyeb et al. placed in a given DC does not exceed the resource capacities of the DC. The constraint (8), ensures that only already existing VMs can be considered as candidates for the migration. Suppose that it has ﬁnite support, hence, we can enumerate the set of all diﬀerent scenarios. We can then, formulate an equivalent deterministic optimization problem that can be solved as a Mixed Integer Linear Program (MILP). However, the size of the problem space can grow very large as the number of scenarios increases. Therefore, in the next section, we propose an alternative solution using sampling-based methods. 2.2 Equivalent Optimization Formulation In order to solve the stochastic problem, we propose an equivalent formulation h using sampling methods. Let us consider the function g(x) = j∈V dij .xi − h ∀i ∈ V, ∀h ∈ D It is clearly impossible to enumerate all the posj∈V xj .dij , sible outcomes. Hence, sampling techniques are a commonly used tool. In order to discretize the stochastic function g(.), we apply Sample Average Approximation (SAA) method [14]. Sampling-based methods often approximate well, with a small number of samples, problems that have a very large number of scenarios [15]. In this work, we use Monte Carlo methods to generate samples of N = {1, . . . , n} replications of the random variable d ij using the Normal distrib2 2 ), where μij is the mean and σij is the variance. Let us consider ution N (μij , σij of the stochastic function g(.) by applythe function gn (.), as the discretization n ing SAA methods, gn (x) = n1 i=1 g(x, ξi ) Where ξi is a random element such n 1 k that: d ij = n k=1 ξij and n is the number of iterations. Hence, the equivalent deterministic constraint of (2) is obtained by replacing g(x) by gn (x). gn (x) = n n 1 1 k k ( ξij ).xhi − ( ξij ).xhj n n j∈V j∈V k=1 ∀i ∈ V, ∀h ∈ D (9) k=1 Thus, the constraint (2) becomes ϕhi gn (x) ∀i ∈ V, ∀h ∈ D. As mentioned above, we consider that the inter-VMs bandwidth demand d ij follows 2 the Normal distribution N (μij , σij ). Hence, we can estimate dij by its mean h i μij (d ∀h ∈ ij μij ). Let us consider the following equation, ϕi = k∈D fhk , i D, ∀i ∈ V . The value of fhk can be obtained from the ﬂow conservation constraint i i h h ∀i ∈ V, ∀h ∈ D. If we (3) k∈D fhk = k∈D fkh + j∈V d ij xi − j∈V dij .xj , replace (Sect. 2.2) in (4), we obtain: i h h Mi .zih + ( fkh + d d P r( ij xi − ij .xj ) Eh ) 1 − ∀h ∈ D i∈V i∈V k∈D j∈V j∈V (10) 2 Since, d ij follows the Normal distribution N (μij , σij ), then, because we assume that the traﬃc of each pair of VM (i, j) ∈ V , is independent if i = j, the aggregate traﬃc demand i∈V j∈V dij follows the Normal distribution Network-Aware Stochastic Virtual Machine Placement 41 2 N ( i∈V j∈V μij , i∈V j∈V σij ) according to the property of normal distri bution and Central Limit Theorem (CLT). Note that, the term i∈V Mi .zih is deterministic, thus, it does not follow a probability distribution. Let us denote h h by αh = i∈V j∈V dij xi − j∈V dij .xj . We need to estimate the Normal distribution parameters μαh and σα2 h . Since, xhi ∈ {0, 1}, ∀i ∈ V, h ∈ D, and because we assume that the traﬃc of each pair of VM (i, j) ∈ V , is independent if i = j, then,by applying SAA methods, we can estimate μαh and σα2 h as follows μαh = i∈V j∈V μij xhi − i∈V j∈V μij .xhj , ∀h ∈ D, 2 h 2 xi + i∈V j∈V σij .xhj , ∀h ∈ D. Hence, it easy to show σα2 h = i∈V j∈V σij that the constraint (10) is equal to the overloading probability constraint presented in (11), where φ−1 (.) is the inverse of the cumulative distribution function of the Standard Normal distribution. In this work, we consider that 0.5 and φ−1 (1 − ) 0. Eh − i∈V k∈D i fkh + i∈V i∈V Mi .zih + 2 h j∈V σij xi i∈V + i∈V j∈V μij xh i − 2 h j∈V σij .xj i∈V j∈V μij .xh j −1 φ (1 − ) (11) The constraint can be written as follows: i Eh Mi .zih + fkh + μαh + φ−1 (1 − ). σα2 h ∀h ∈ D i∈V (12) i∈V k∈D The objective function as well as the rest of constraints are the same. We replace d ij by μij . One of the challenges of this formulation is the non-linearity of the constraint (12). The linearization of this constraint leads to a very large number of variable which will enlarge the research space. To cope with this problem, we propose, in the next section, an iterative two-step approach. 2.3 Network-Aware Stochastic VM Placement Algorithm h Let us denote by β h = φ−1 (1 − ). σα2 h and γ h = i∈V Mi .zi + i i∈V k∈D fkh + μαh . We denote by SIPγ+β , the optimization program presented in Sect. 2.2. We deﬁne SIPγ , the optimization program where the constraint (12) is replaced by Eh γ h ∀h ∈ D. Let us deﬁne the list P ref List which contains the diﬀerent values of ε. P ref List is sorted by increasing value of ε. Smaller ε means that the risk of overloading must be very low. The algorithm tries to solve the stochastic optimization problem within two iterations. In the ﬁrst iteration, it solves the SIPγ model without considering the non-linear term β h . In fact, the optimization program SIPγ provides a deterministic VM placement scheme as it does not consider the variance of inter-VMs communication traﬃc in the future. In addition, the term γ ensures that the overall traﬃc sent via the DC edge router does not exceed its capacity. Afterword, the algorithm evaluates the term β h by considering the solution provided by SIPγ . In the second iteration, it tries to solve the SIPγ+β model by adding the term β h to the overloading probability constraint. If the SIPγ+β model is feasible, then the new placement scheme is the solution provided by SIPγ+β , otherwise, we relax 42 H. Teyeb et al. Algorithm 1. Stochastic VM Placement Algorithm. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Input: Initial Placement scheme, stochastic traﬃc matrix, P ref List Output: New VMs Placement Plan ε ← P ref List[0] Solve SIPγ Record solutions Calculate β h with the solutions provided by SIPγ Solve SIPγ+β if SIPγ+β is feasible then New Placement Plan ← Solutions of SIPγ+β else i←i+1 end while SIPγ+β is not feasible and i P ref List.size() do ε ← P ref List[i] Check feasibility of SIPγ+β i←i+1 end if SIPγ+β is feasible then New Placement Plan ← Solutions of SIPγ+β else New Placement Plan ← Solutions of SIPγ Add SLA violation end return New Placement Plan the value of ε and try to solve it again. The new placement scheme is either the solution provided by SIPγ+β , if it exists, or the solution provided by SIPγ with an SLA violation due to the high risk of network overload. 3 Performance Evaluation The diﬀerent experiments were carried out on a machine that has an Intel Xeon 3; 3 GHz CPU and 8 GB of RAM. We have used the commercial solver CPLEX 12.5 [16] to solve the diﬀerent formulations. In all experiments, we have ﬁxed the number of DCs to six. We generated for each pair of VMs traﬃc a sample of 10000 replications according to Normal distribution. Then, we applied SAA to approximate the values of the traﬃc matrix. We randomly generated groups of (mean, variance range) for the inter-VM traﬃc and set each pair of VMs traﬃc to a value generated by a randomly chosen group. We assume that client’s demands are independent. We have implemented the deterministic equivalent model in Java with the above listed parameters. We studied ﬁrst the performance of the proposed algorithm in terms of computational time. The results depicted in Fig. 1 show that the value of ε has no considerable impact on the execution time of the model. We note that the execution time does not exceed 10 s for a total Network-Aware Stochastic Virtual Machine Placement 43 Fig. 1. Execution Time versus the total number of VMs. Fig. 2. Variantion of the number of migrations for ε ∈ {0.1, 0.01, 0.001}. number of VM |V | = 1800. The value of the parameter ε aﬀects the bandwidth utilization. Smaller ε requires the system to reserve more bandwidth in order to accommodate the possible variance of inter-VM traﬃc. To ensure the non violation of the overloading probability with smaller ε, some VMs may have to be migrated to another DC. We have varied the value of and we plotted the number of migrations performed for each value. We have considered that the bandwidth capacity of the DC edge router are large enough to satisfy the bandwidth demand for all values of ε. The results depicted in Fig. 2 show that smaller ε produces smaller number of migration. This can be explained by the fact that smaller ε means that the risk of network overloading is very small. Since, migration produces additional traﬃc, SIPγ+β tries to minimize the number of migration in order to prevent from extensive migrations. 4 Conclusion In this work, we proposed a SIP formulation that aims to solve the network-aware VM placement problem in geo-distributed DCs. We considered the uncertainty of inter-VMs traﬃc. Our objective was to minimize the overall traﬃc circulating in the backbone network in order to prevent from congestion problems. In order to solve the problem, we proposed an equivalent optimization model based on sampling methods as well as an eﬃcient algorithm to cope with non-linearity problem. 44 H. Teyeb et al. References 1. Yu, L., Chen, L., Cai, Z., Shen, H., Liang, Y., Pan, Y.: Stochastic load balancing for virtual resource management in datacenters. IEEE Trans. Cloud Comput. PP(99), 1 (2016) 2. Benson, T., Akella, A., Maltz, D.A.: Network traﬃc characteristics of data centers in the wild. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp. 267–280 (2010) 3. Kandula, S., Sengupta, S., Greenberg, A., Patel, P., Chaiken, R.: The nature of data center traﬃc: measurements & analysis. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement Conference, pp. 202–208 (2009) 4. Beloglazov, A., Buyya, R.: Managing overloaded hosts for dynamic consolidation of virtual machines in cloud data centers under quality of service constraints. IEEE Trans. Parallel Distrib. Syst. 24(7), 1366–1379 (2013) 5. Xiao, Z., Song, W., Chen, Q.: Dynamic resource allocation using virtual machines for cloud computing environment. IEEE Trans. Parallel Distrib. Syst. 24(6), 1107– 1117 (2013) 6. Gong, Z., Gu, X., Wilkes, J.: Press: Predictive elastic resource scaling for cloud systems. In: International Conference on Network and Service Management, pp. 9–16 (2010) 7. Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming - Modeling and Theory, vol. 16, 2nd edn. SIAM, Philadelphia (2014) 8. Maguluri, S.T., Srikant, R., Ying, L.: Stochastic models of load balancing and scheduling in cloud computing clusters. In: INFOCOM Proceedings IEEE, pp. 702–710 (2012) 9. Ghosh, R., Longo, F., Xia, R., Naik, V.K., Trivedi, K.S.: Stochastic model driven capacity planning for an infrastructure-as-a-service cloud. IEEE Trans. Serv. Comput. 7(4), 667–680 (2014) 10. Wang, M., Meng, X., Zhang, L.: Consolidating virtual machines with dynamic bandwidth demand in data centers. In: INFOCOM Proceedings IEEE, pp. 71–75 (2011) 11. Jin, H., Pan, D., Xu, J., Pissinou, N.: Eﬃcient VM placement with multiple deterministic and stochastic resources in data centers. In: 2012 IEEE Global Communications Conference, pp. 2505–2510 (2012) 12. Chase, J., Niyato, D.: Joint optimization of resource provisioning in cloud computing. IEEE Trans. Services Comput. PP(99), 1 (2015) 13. Corporate Headquarters: Data center networking: enterprise distributed data centers solutions reference nework design. In: Solutions Reference Network Design, Cisco Systems Inc (2003) 14. Kim, S., Pasupathy, R., Henderson, S.G.: A Guide to Sample Average Approximation. Handbook of Simulation Optimization. International Series in Operations Research & Management Science, pp. 207–243. Springer, New York (2015). doi:10. 1007/978-1-4939-1384-8 8 15. Homem-de Mello, T., Bayraksan, G.: Monte carlo sampling-based methods for stochastic optimization. Surveys Oper. Res. Manag. Sci. 19(1), 56–85 (2014) 16. IBM Corporation ILOG CPLEX: http://www.ilog.com/products/cplex/. Accessed 04 Feb 2013

1/--страниц