An Improved Firefly Algorithm for Permutation Routing in Baseline Multistage Interconnection network Abhay B. Rathod Sanjay M. Gulhane Department of Electronics and Telecomm. Engg Jawaharlal Darda Institute of Engg. and Tech. Yavatmal, India. Department of Electronics and Telecomm. Engg Jawaharlal Darda Institute of Engg. and Tech. Yavatmal, India. Abstract— This paper present the parallelization of firefly algorithm for permutation routing in baseline multistage interconnection network. The proposed parallel firefly algorithm employs a self routing approach to route the permutation in baseline network. The objective of this paper is as follows (i) to improve serial firefly algorithm to parallel one for better utilization of multi core/ many core processor. (ii) to use speedup, quality of solution and efficiency as performance metric in improved firefly algorithm for optimal permutation routing in baseline network. (iii) the analysis of the simulation results on firefly algorithm is compared with other algorithms. Simulation results demonstrate that the proposed parallel firefly algorithm for permutation routing in baseline network outperforms known sequential and parallel algorithm in terms of speedup, efficiency and quality solution. Index Terms— Firefly algorithm, permutation, multistage interconnection network, self-routing, Latin square, baseline network, multi core processor, optimization, parallel processing mutation, simulation. I. INTRODUCTION To gain the advantage of multi/ many core processors and to explore thread level parallelism of an application one of the best solution is to shift from sequential algorithm to parallel one. Parallel implementation of sequential algorithm is becoming more popular in order to improve the efficiency of sequential algorithm. In this paper a parallelization of firefly algorithm is introduced for permutation routing in baseline multistage interconnection network. Firefly algorithm is one of the latest bio-inspired algorithms developed by Yang [21]. It is highly suitable for parallel implementation as different firefly work independently [5]. Firefly algorithms are simple, distributed and do not have central control mechanism which allows the system to become more scalable. This motivated us to check the permutation realization capabilities of baseline networks using fire fly algorithm. Firefly algorithm is an optimization method and it is found to be efficient in solving the permutation combinatorial optimization problems [4]. The main aim of the optimization method is to find the most suitable value of a function. The optimization methods can be divided into two major groups: deterministic and stochastic methods. The majority of deterministic methods always produce the same set of solutions (optimal solution) if the algorithm start under the same initial conditions [21]. In turn, the stochastic methods under same initial condition may not generate the same optimal solutions. Firefly algorithm belongs to the second group, that is, stochastic algorithms [1]. To the best of our knowledge, this is the first time that an improved firefly algorithm is applied for permutation routing in baseline multistage interconnection network. Bigger is the routing problem in multistage interconnection network (MIN), permutations is one of the fundamental solution. Permutation is an important communication pattern in parallel and distributed computing systems [6]. Permutation represents a one-to-one mapping among the processors for communication in a processor group. To keep the communication step overhead minimum, parallel algorithms are often designed with permutation type data transfers [15]. We can realize a permutation operation in MIN as it has N inputs and N outputs. There has been much research work in the literature on the permutation capability of various multistage networks [9]. However none of the paper published on permutation realization capability of MIN using firefly algorithms running on multi-core processor. A class of unique-path, self-routable multistage interconnection networks, such as baseline, omega, and Banyan networks, have been proposed and widely used in parallel processing systems [6]. Due to their easy routing capability and uniform communication latency between any network inputs and outputs, these types of MIN has been used in several parallel computers, such as NEC Cenju-3 and IBM SP [11]. In this paper, we are realizing permutation routing in baseline network because this class of multistage network have unique-path, low hardware cost O(n log n), low network latency, self-routing capability [8] and the routing for permutations is efficient (in O(log n) time complexity) [9]. The paper is organized as follows: Section 1 gives introduction about the paper topic. Section 2 describes the structure of baseline network and the concepts of realizing permutations in such networks. In Section 3, self routing in baseline network and application of Latin square for permutation routing is presented. The section further provides the detailed algorithm for implementing parallel firefly algorithm in network under consideration. Simulation results will be presented in Section 4. Finally, Section 5 concludes the paper. II. PRELIMINARIES A. Baseline Network Structure A switch is the basic elements in a network. A baseline network consists of several stages of switching. In switching network the outputs of one stage are connected to the inputs of the next stage. A switch with N inputs, and N outputs can realize N!. For a class of unique-path, self-routable multistage network, such as a baseline network, which consists of log n stages of 2 X 2 switches, routing for permutations can be performed in the most efficient way [6]. Parker in [6] showed that the set of admissible permutations of a baseline network is equal to that of its reverse network. This shows that any permutation admissible to a baseline network is also admissible to a reverse baseline network, and vice versa. A network belonging to this class satisfies the following basic properties [7]: 1. The switching elements are 2 × 2 crossbar switches. 2. There is a unique path between each input and each output. 3. There are n stages of switches, each stage contains L switches or L levels, and each level contains n switches, where n = log2 N, L = N/2, and N = number of inputs or outputs. Because of the above three properties (short connection diameter, unique connection path, uniform modularity, etc.), baseline network is very attractive for constructing switching networks. As disused above baseline network posses a common feature that there is as unique path between each pair of network inputs and network outputs. In some literature, it is also called banyan property. This property enables the baseline network to have self routing capability which makes a very fast routing possible [8]. We will elaborate self routing algorithm in the next subsection. B. Self-Routing in Baseline Network In this paper the proposed firefly algorithm employs self routing approach to route the permutation in baseline network. In self routing concept each switching elements operates independently. The characteristic of firefly algorithm is also self–organization and decentralized decision making [1]. These characteristic is highly suitable for self-routing (i.e. setting switches while routing) approaches to route permutations in a baseline network (or a reverse baseline network). A switching network is a self-routing network if any connection can be established only by the addresses of its source and destination regardless of other connections. A MIN that can determine the path of a message from source to destination on the fly, using simple computational steps within the switches, is known as a self-routing MIN [12]. To establish connection self routing is an attractive feature of MIN in that no complicated control mechanism is needed for establishing connection [7]. Fig. 1. Demonstration of 8 X 8 baseline network with self routing between S=000 and D=101. Figure 1 shows the connection between Source address = 000 and Destination address = 101. A binary routing tag of D can be used to achieve self-routing, with the ith bit of the tag used to select the upper output (0) or the lower output (1). In next section we will discuss Permutation as data routing function and its realization in baseline network. We will also discuss the application of Latin square for conflict free data routing in baseline network and at the end of Section 3 we will describe the implementation of parallel firefly algorithm. III. PERMUTATION AS DATA ROUTING FUNCTION A. Permutation realized in Baseline Network Permutation is a one-to-one mapping between the network inputs and outputs. It refers to the connection of a set of sources to a set of destinations such that each source is connected to a single destination. A permutation [(s0, d0), (s1, d1), ......, (s7,d7)] means that source s0 is connected to d0, s1 to d1, and so on. In permutation notation, there are two rows of values. The values in the upper row are mapped onto the elements in the lower row. For example, given a set of values {0, 1} for which 0 maps to 1 and 1 maps to 0, the permutation, α, of the set is written as in Eq. (1) 0 1 (1) 1 0 Identity permutation β can be represented as in Eq. (2) 0 1 (2) 0 1 The switch setting for α is called as cross and β is called as parallel which are illustrated in figure 2. Fig. 2. Switch Settings in a 2 X 2 switch As mentioned before, permutation notation is one of the common methods used to describe the mappings in a switch, stage, and network. To realize a permutation operation in a parallel or distributed computing system is to use a multistage interconnection network (MIN) [8] such as baseline network. A baseline network consists of multiple stages of 2×2 switches with adjacent stages connected by a mapping function. Consider an 8×8 baseline network with 2×2 switches and three stages. We refer permutation σi as a stage permutation, permutation πi as an inter-stage permutation, and the permutation realized by the entire multistage network as an admissible permutation of the network. Clearly, an admissible permutation can be expressed by a composition of stage permutations and inter-stage permutations. For examples, an admissible permutation of an n × n network can be expressed as Eq. (3). σm-1 πm-2 σm-2…… π0 σ0 (3) TABLE I. THE ALGORITHM FOR SELF-ROUTING IN BASELINE NETWORK where σi(0≤i≤m-1) denote the permutation represented by stage i, and πi(0≤i≤m-2) denote the permutation represented by the set of interstage links between stage i and stage i+1. Fig. 3.An example of Permutation Routing in an 8X 8 baseline MIN In general, inter-stage permutations πis are fixed by the network topology. However, stage permutation σis are not fixed since each switch can be set to either parallel (for switch setting value 0) or crossing (for switch setting value 1) as shown in Figure 2. Since, in a unique-path multistage network, routing can be determined solely by the source and destination addresses [6] we also denote the switch setting for a switch in stage i on the routing path as σi(s,d). For input 1, we can obtain the following transformation to reach output 1 as shown in Eq. (4). 1 σ0 0 π0 0 σ1 0 π1 0 σ2 1 (4) Figure 3 illustrate the permutation routing from source 1 to destination 1 in 8 x 8 baseline multistage interconnection networks. Similarly, we can obtain transformation from 2 to 4, 4 to 2 and 6 to 6 applying composition operations on stage permutation (σi) and inter-stage permutation (πi). The algorithm for self routing and composition operation of stage permutation (σi) and inter-stage permutation (πi) for realization of admissible permutation is presented in Table 1. B. Latin Square In this paper we use Latin squares to achieve conflict free access to various subsets of data. Latin squares are well known combinatorial objects for centuries [16]. Parallel communication in baseline multistage network is realized by routing a set of N permutations forming a Latin square [10]. The Latin square is generated by means of an off-line algorithm run at the time the baseline network is built. The key idea of the method in [9] is to build a Latin square matrix for a given multistage network i.e. baseline network and it can be routed using the tag information (self routing) represented by the elements in the Latin square matrix. A Latin square of order n is an n x n square composed with symbols from 0 to n - 1 such that no symbol appears more than once in any rows or in any columns [16] as given in Eq. (5). 0 1 2 3 1 0 3 2 A (5) 2 3 0 1 3 2 1 0 2 2 We assume that for a self-routable multistage network under consideration, there exists a Latin square such that any permutation formed by each row of the matrix is admissible to the network; that is, this permutation is self-routable from the inputs to the outputs of the network [11]. Although in general there are many ways to construct a Latin square [11][14][16], in this section we describe one of the method of constructing the Latin square of an array 2nx 2n which will be suitable to the class of multistage networks we consider in this paper. Theorem: If we take Latin square matrix A of order n x n and add n in each number row wise and column wise we get Latin square matrix B. Proof: Take the matrix of order n x n as show in Eq. (6). a0,1 ... a0, n 1 a0 , 0 a a1,1 ... a1, n 1 1,0 (6) A : : : : an 1,0 an 1,1 ... an 1, n 1 n n Add n in each row we get a new matrix B of order n x n and repeat this process. Latin square matrix B is given in Eq. (7). n a0, n n a0,1 n a0, 2 ... n a n a1, 2 ... n a1, n 1,1 B : : : : n an,1 n an, 2 ... n an 1, n 1 n n (7) It is easy to construct Latin square matrix B from A. We give an algorithm to generate Latin square matrix B from A in Table 2. Its correctness can be seen from (9). Notice that we only need to build B once (as for A) for a given multistage network and both A and B can be considered as system parameters used in our permutation algorithm. The algorithm to generate Latin square is given in Table 2. TABLE II. THE ALGORITHM TO GENERATE LATIN SQUARE B FROM A To construct Latin square C of order 2n×2n place the matrix A and B in such a way that A and B will appear more than once in any row or in any column. It is given in Eq. (8). A B C (8) B A 2 n 2 n From above described method we can construct Latin square C for 8x8 baseline network given in (9). 0 1 2 3 4 1 0 3 2 5 2 3 0 1 6 3 2 1 0 7 C 4 5 6 7 0 5 4 7 6 1 6 7 4 5 2 7 6 5 4 3 Latin square C is generated only network, and can be viewed as a network. 5 6 7 4 7 6 7 4 5 6 5 4 (9) 1 2 3 0 3 2 3 0 1 2 1 0 once for a given multistage system parameter [9] of the We can design baseline network under the assumption that for a self routable multistage networks, there exists a Latin square such that any permutation formed by each row of the matrix is admissible to the network; that is, this permutation is self routable from the inputs to the outputs of the network. However, baseline MIN is not rearrange-able [10] that is it cannot realize the entire N! possible permutations, but only few of them. C. Firefly Algorithm Firefly algorithm is efficient and an easy to implement algorithm. It is also suitable for parallel implementation with few modifications [1]. However, original firefly algorithm is slow in convergence and easily gets trapped in local optimum for multimodal problems [26]. In addition, since the parameters are fixed, the search behavior remains to be the same for any condition in all iterations. Hence modifying the standard firefly algorithm to boost its performance has been one of the research issues. In this paper we are modifying two parameters one is parallelization of firefly algorithm and second is modifying the Euclidean distance to squared Euclidean distance. The detailed procedure of the proposed algorithm is as follow and it is described in Table 3. All fireflies are unisex. All fireflies are of same size. Their attractiveness is proportional to their light intensity. The light intensity of a firefly is affected or determined by the landscape of the fitness function. 1) Objective function Each solution in the search space is associated with a numeric objective values. So, the quality of a solution is proportional to the value of the objective function. In the case of the permutation routing in baseline network, the quality of solution is related to the Latin square. The best solution is the one that replicate the Latin square. 2) Attractiveness The attractiveness of a firefly is determined by its light intensity. Light intensity is a value that represents the objective function of a problem. The objective function of permutation routing is Latin square and admissible permutation. 3) Distance There are two possible ways to measure the distance between two permutations fireflies: a) Squared Euclidean Distance The Square Euclidean distance between two points, a and b, with k order matrix is given in Eq. (10). k ri j a b i i TABLE III. THE ALGORITHM FOR PARALLEL FIREFLY ALGORITHM 2 (10) i 1 b) Swap distance It is number of the required swaps of the first solution in order to get the second one. The swap distance is an edit-distance, counting how many edit operation (here: swaps, i.e., transposition of two adjacent elements) have to be performed to transform permutation a into permutation b. We need such measurement of the distances; so that the closer permutations/solutions are to one another, the smaller difference of their objective functions is produced. 4) Movement The movement of a firefly i attracted to another brighter (more attractive) firefly j is determined by Eq. (11). = (2, rij ) (11) Where, xi is the step that must be taken by firefly i to move towards j, and rij is distance between firefly i and j. The length of movement (step) of a firefly will be randomly selected from 2 to rij using uniform distribution. When a firefly moves, existing solutions in the firefly is changed. Since the representation of firefly is a permutation representation, then we use inversion mutation to represent the movement. Mutation prevents the algorithm to be trapped in a local minimum. With inversion mutation, the solution that has been obtained can be maintained so the good solution formed previously is not changed. The objective function of each firefly population obtained in the first generation replaces its previous value (P i.e. permutation value), if its current solution is better than the previously stored firefly solution. The best solution of the first iteration replaces the global best firefly solution (Pbest i.e. Latin square value) if it is better than the previously stored global best solution. The procedure is repeated until the termination criterion is satisfied. In this study, the termination criterion is the total number of generations. In this paper implementation of improved firefly algorithm in baseline interconnection network for permutation routing is based on the static information associated with Latin square matrices and self routing property of MIN. The performance of improved firefly algorithm is compared with conventional sequential and parallel algorithm. IV. PERFORMANCE EVALUATION AND RESULTS The performance of MIN is usually determined by modeling, simulation [3], mathematical [8], [15], [20] and graphical methods [18]. The graph models are very helpful in comparing the permutation capabilities of various MIN's. They provide an easy way of resolving the conflicts and for designing a suitable MIN [18]. The graph model is also useful in testing of MIN, and the network partitioning could be easily done for the multiple SIMD(single-instruction stream, multiple data stream)/MIMD(multiple instruction stream, multiple data stream) [18] computation modes. However, if the network size in MIN increases it becomes difficult to represent it graphically. In mathematical modeling exact performance results can be obtained from it if the models are simple enough. As the network model becomes more complex and large the performance results obtained by those models may become inaccurate [3]. Additionally, we cannot investigate many network parameters because they have to be neglected and they are not included into the model. In consequence, simulation gives an alternative to graphical and mathematical methods. Its advantages are a more detailed network description. Nevertheless, simulations of very large networks suffer from long simulation run times. However, for reasonable network sizes and in all cases, where mathematical and graphical methods cannot be applied, simulation is a good option [3]. Still, to our knowledge, no one has presented simulation result on permutation routing in baseline MINs using multi-core environment, which is the main focus of this paper. In the proposed work, we improved the distance formula from Euclidean to Square Euclidean distance for a nature – inspired firefly algorithm so that in large network search space, distance between two fireflies can be calculated. The simulation environment and the performance results are described in next sub section. A. Simulation Result The Simulation results obtained based on effectiveness of an algorithm and speedup of an algorithm. Effectiveness is defined as a high probability of finding a high quality solution [5][13]. Here, the quality of a solution (Q) is measured by how close the solution is to the known global solution as shown in Eq. (12). Solution KnownSolui on % (12) KnowSoluti on In parallel processing speedup emphasizes time reduction of a given problem and it is defined as sequential execution time over parallel execution time [2]. It is given in Eq. (13). Sequential Time of Solving Scaled Workload (13) Speedup (S ) Parallel Time of Solving Scaled Workload The Simulation is carried out on Intel(R) Core(TM) i3 CPU with 2.53 GHz, 3.00 GB RAM. We have implemented improved firefly algorithm using Java 7 language under 32 bit Windows 7 operating system, and Eclipse 3.7 as an integrated development environment (IDE). As Java support multithreading, here we considered one thread as one core. Q TABLE V. RESULT OF COMPUTATION EXPERIMENTS OF FIREFLY ALGORITHM Simulations of very large networks suffer from long simulation run times, particularly if stochastic events are involved [3] and confidence levels must be fulfilled. The simulation results, as presented in Table 5, shows that as the network size increases above 7x7 the speedup decreases but quality of solution gets improved. In Figure 5, the horizontal axis represents the number of cores, scaled from 2 to 8. The vertical axis represents the speedup value. As shown clearly from Figure 4, the speedup illustrates a very limited scalability of a multi-core architecture, and the speedup is quickly restricted by the sequential portion of a problem under study i.e. parallelization of firefly algorithm for permutation routing in baseline multistage interconnection network. TABLE IV. RESULT OF COMPUTATION EXPERIMENTS OF PARALLEL ALGORITHM FIG. 5. ILLUSTRATES THE SPEEDUP OF MULTI-CORE ARCHITECTURES FOR PARALLEL AND PARALLEL FIREFLY ALGORITHM. V. As we can see from Table 4, result obtained by parallel approach is better than results generated by traditional serial approach. CONCLUSION In this paper, we have presented an improved parallel firefly algorithm for permutation routing in baseline multistage interconnection network. Self-routing is used to route the permutations. The performance of improved parallel firefly algorithm is measured on the basis of speedup and quality of solution. We examined speed improvement, as well as the quality of solution improvement. Comparisons with sequential results has demonstrated that routing with parallel firefly algorithm could be a better choice for routing permutations in baseline network due to its better quality solution and it achieve speedup up to 4 times with reasonable size network 7 X 7. A recent trend in computer architecture is the move from traditional, single-core processors to multi-core processors and further to many-core or massively multi-core processors. The result of this trend is that research community can take advantage of this technology by modifying sequential algorithm to parallel one and performing parallel computation on GPUs that has traditionally been done on CPUs. As a plan for further research, we will try to enhance the original Genetic algorithm and parallelize it in the same way we did with the original firefly algorithm. Also, to get better result we will test our algorithms on more number of cores. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers of this article for their comments and suggestions. REFERENCES [1] I. Fister, I. Fister Jr., X. Yang, and J. Brest, "A comprehensive review of firefly algorithms," Swarm and Evolutionary Computation 13, pp. 34-46, June 2013. [2] X. Sun and Y. Chen, "Reevaluating Amdahl’s law in the multicore era," Journal of Parallel and Distributed Computing, pp.183-188, May 2009. [3] D.Lüdtke and D. Tutsch, "The modeling power of CINSim: Performance evaluation of interconnection networks," Computer Networks, pp.1274-1288, March 2009. [4] P. Vidal and A. C. Olivera, "A Parallel Discrete Firefly Algorithm on GPU for Permutation Combinatorial Optimization Problems," Latin American High Performance Computing Conference Springer Berlin Heidelberg, pp.191-205, 2014. [5] X. Yang, “Nature-inspired metaheuristic algorithms,” Luniver press, 2010. [6] Y. Yuanyuan, and J. Wang, "Routing permutations on baseline networks with node-disjoint paths," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 8, pp.737-746, August 2005. [7] E. Lu, and S. Q. Zheng, "Parallel routing algorithms for nonblocking electronic and photonic switching networks," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 8, pp.702-713, August 2005. [8] Y. Yuanyuan, and J. Wang,"A class of multistage conference switching networks for group communication," IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 3, pp.228-243, March 2004. [9] Y. Yuanyuan, and J. Wang,"Routing permutations with linkdisjoint and node-disjoint paths in a class of self-routable interconnects," IEEE Transactions on Parallel and Distributed Systems , vol. 14, no. 4, pp. 383-393, April 2003. [10] A. Massini, "All-to-all personalized communication on multistage interconnection networks," Discrete applied mathematics, pp.435-446, June 2002. [11] Y. Yuanyuan, and J. Wang,"Optimal all-to-all personalized exchange in self-routable multistage networks," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 3, pp.261-274, March 2000. [12] B. Parhami, “Introduction to parallel processing: algorithms and architectures,” Springer Science & Business Media, 2006. [13] R.Hassan, B. Cohanim, O. de Weck, and G. Venter, "A comparison of particle swarm optimization and the genetic algorithm," In Proceedings of the 1st AIAA multidisciplinary design optimization specialist conference, pp.18-21, 2005. [14] K. Heinrich, K. Kim, and V.K. Prasanna Kumar, "Perfect Latin squares," Discrete applied mathematics, pp.281-286, April 1990. [15] C. S.Raghavendra, and R. V. Boppana. "On self-routing in Benes and Shuffle-exchange networks," IEEE Transactions on Computers vol.40, no.9, pp.1057-1064, September1991. [16] K. Kim, and V. K. Prasanna-Kumar, "Perfect Latin squares and parallel array access," ACM SIGARCH Computer Architecture pp.372-379,1989. [17] Ted H. Szymanski, and V. Carl Hamacher, "On the permutation capability of multistage interconnection networks," IEEE Transactions on Computers, vol. C-36, no. 7, pp. 810-822, July 1987. [18] D. P. Agrawal, “Graph Theoretical Analysis and Design of Multistage Interconnection Networks,” IEEE Transaction on Computers, vol. c-32, no. 7,pp. 637-648, July 1983. [19] D. Nassimi and S. Sahni, "A self-routing Benes network and parallel permutation algorithms," IEEE Transactions on Computers, vol. c-30, no. 5, pp.332-340, May 1981. [20] Y. Yuanyuan, and J. Wang,"Routing permutations on optical baseline networks with node-disjoint paths," Tenth International Conference on Parallel and Distributed Systems,2004. [21] S. Arora, and S. Singh, "The firefly optimization algorithm: convergence analysis and parameter selection," International Journal of Computer Applications, vol.69, no.3, pp.48-52, May 2013. [22] K. Day, N. Alzeidi, B. Arafeh, and A. Touzene, “A Parallel Routing Algorithm for Torus NoCs” 2012 International Conference on Computer Networks and Communication Systems, vol.35 IACSIT Press, Singapore, 2012. [23] F. Bistouni1, and M. Jahanshahi, “ Pars Network: A Multistage Interconnection Network with Fault-Tolerance Capability”, J. Parallel Distrib. Comput., in press. [24] M. Borkar and Nitin, “Realizing frequently used permutations on gamma interconnection network’s family networks with the help of alternate source”, Springer Science Business Media New York, 2015. [25] T. Jain, K. Schneider, and A. Bhagyanath, “The Selector-Tree Network: A New Self-Routing and Non-Blocking Interconnection Network”, IEEE, 2016. [26] W. A. Khan, N. N. Hamadneh, S. L. Tilahun and J. M. T. Ngnotchouye, “A Review and Comparative Study of Firefly Algorithm and its Modified Versions”, in Optimization Algorithms- Methods and Applications, Intech, 2016, pp. 282313.

1/--страниц