Th02-1 / 207 Capacity-Efficient Restoration for Optical Networks Muriel MCdard ECE Department, U. of Illinois, CoordinatedScience Lab, 1308 K Main, Urbana,IL 61801 medardacomm.csl.uiuc.edu Steven S. Lumetta ECE Department, U. of Illinois, CoordinatedScience Lab, 1308 K Main, Urbana,IL 61801 firstname.lastname@example.org Yung-Ching Tseng ECE Department, U. ofIllinois, CoordinatedScience Lab, 1308 K Main, Urbana, IL 61801 email@example.com We present a method to provide link recovery for all links in a network without using all links for backup traffic transmission. The method extends generalized loopback to selectively eliminate backup links. The backup capacity on such links can be used to carry unprotected traffic, i.e., traffic that is not recovered in case of a failure, while the primary fibers on those links retain failure protection. Although we can provide recovery for traffic on primary fibers for any single link failure without using all links in a graph, reserving links for unprotected traffic does reduce a network’s ability to recover from multiple failures. We explore the tradeoff between capacity and robustness to two-link failures for several typical optical fiber networks. Our results demonstrate robustness comparable or superior to that available with covers of rings while providing an additional unprotected traffic capacity of roughly 20% of the network’s primary capacity. We consider preplanned link restoration, which replaces a failed link by selecting a backup route between the end nodes of the link. Preplanned restoration offers rapid failure recovery. The main approaches to preplanned link restoration use rings. Ring-based architectures may be more expensive than meshes, and are difficult to scale. Covering mesh topologies with rings provides both mesh topologies and ring-based restoration. One approach is to cover the network nodes by rings [ 5 ] . Thus, a portion of links are covered by rings. If primary routing is restricted to the covered links, link restoration can be effected on each ring. To allow all links to carry protected traffic, every link must be covered by a ring, for example by covering a network with rings so that every link is part of at least one ring . This approach suffers from capacity drawbacks: a link covered by two rings, for example, requires eight fibers. The logical fibers can be physically routed through four physical fibers, but only at the cost of significant network management overhead. A third approach to ring covers, intended to overcome the difficulties of the first two, is to cover every link with exactly two rings , each with two fibers. Such double-cycle covers are conjectured to exist for all networks, but are difficult to find. Double-cycle cover serves as the basis for comparison with our approach. Node U Node U Fig. 1. Generalized loopback using (a) all links for backup (b) only a subset of links. 208 / Th02-2 Fig. 2. Sample networks for our simulations. The dashed lines show non-critical links for a backup graph maximizing the number of non-critical links. Generalized loopback, a new approach to preplanned link restoration [4,61, provides loopback along a secondary graph in a manner akin to loopback in a SONET bi-directional line-switched ring (BLSR). We have four fibers per link. Each fiber carries traffic in a single direction, with two fibers carrying traffic in one direction and two in the opposite direction. We use primary to denote primary working traffic and secondary to denote backup traffic. Each fiber is represented by a directed arc and belongs to a digraph, either primary or secondary. On' each link, a fiber is associated with a single digraph and we have two primary working fibers, one in each direction, and two backup protection fibers, one in each direction. Figure 1.a shows two fibers-one primary working fiber and its backup protection fiber on each link of a simple network. The other primary fiber and secondary fiber are not shown. When the link [Y,XI fails, traffic from the primary digraph is placed upon the secondary digraph at Y.The secondary digraph carries this backup traffic from one endpoint of the failed link to the other, i.e., from Y to X . When the traffic reaches X , it is again placed on the primary fiber. The primary digraph does not restrict routings: a route is associated with a particular digraph on a link by link basis only. We use generalized loopback to provide recovery in a capacity-efficient manner in four-fiber networks. The crux of our discussion is illustrated in Figure 1. Figure 1.a shows generalized loopback in a mesh network. In the event of a failure, flooding occurs on the secondary digraph and the first arriving connection is selected. In Figure l.b, the network is the same as in Figure l.a, except for the addition of an extra link. If any link fails in Figure 1.b, loopback can be performed using the secondary digraph shown in Figure 1.a.. Thus, the extra link is a non-critical link, and the secondary fibers can be used to carry unprotected traffic. This extension to generalized loopback thus enables recovery from single link failures for the primary fibers in every link while providing unprotected capacity through the secondary fibers on some links. Intuitively, removing non-critical links reduces the likelihood of recovery from multiple failures. We use three sample networks, shown in Figure 2, to investigate this tradeoff. Figure 3.a reports the maximum backup path length over all primary arcs as a fiinction of the number of non-critical links carrying unprotected traffic rather than backup traffic. Figure 3 also reports three measures of recovery from two-link failures. A two-linkfailure consists of two independent link failures in a network graph. The second failure occurs long enough aRer the first to allow normal recovery to complete but before any repair can be accomplished. The first recovery measure, robustness, is the percentage of all two-link failures from which a network successfully recovers, restoring both failed links simultaneously through the intact portion of the backup graph. The second and third recovery measures capture a more local notion: the worst-case percentage of other links in a network that, should they fail, do not prevent recovery of a given (directed) primary arc. Two measures are used to capture the effect of time ordering on the failures. Recovery of the first failed link occurs before recovery of the second, and the backup path for the first link may occupy links necessary to restoration of the second. First-failure reliability is thus usually at least as high as second-failure reliability. Th02-3 / 209 I , , D I , , , # , , , 1 2 4 6 I 10 12 14 16 18 Nvrnbw d W o n C M L ! h m l n g Unpmlr*d 1°C (a) Maximum backup path length (e) ARPANET measures Fig. 3. (a) Maximum backup length vs. number of non-critical links carrying unprotected traffic and @)-(d) measures of robustness and reliability for our sample networks. Full lines in (b)-(d) indicate that our method outperforms couble cycle cover using all links. We use generalized loopback to redirect traffic from a failed primary arc through the backup digraph. First-failure reliability is only a function of the longest backup path. Robustness requires both that neither failed link fall on the backup path of the other and that the backup paths for the two failed arcs share no links. Second-failure reliability depends on similar conditions. The second failed arc is certainly restored when the robustness conditions are met, but can also be restored when the second failed link lies on the backup path of the first but does not use the first for backup. In this case, we assume that the network recognizes its inability to restore the first failed link and chooses to recover the second rather than occupying the backup digraph with the first link’s traffic. For double cycle cover, we chose a cover from [ 11 for ARPANET and a cover drawn from a planar embedding for NJ LATA. A double cycle cover restores two failed links if and only if the links do not reside on the same cycle. Robustness can be calculated by counting the number of link pairs that meet this criterion. Reliability is a simple function of the number of links in the longest cycle. Time ordering plays no role in reliability. Figures 3.byc,dshow robustness and reliability for our three sample graphs (except for LATA X, for which no double cycle cover is known). Our method outperforms double cycle cover (solid lines) as long as the number of non-critical links used to cany unprotected traffic is below roughly 20%of the total number of links. Double cycle cover outperforms our method (dashed lines) only for second-failure reliability for ARPANET, but that may be an artifact of the fact we did not optimize directions for all arcs in the digraphs. References 1. G. 2. 3. 4. 5. 6. Ellinas, “Fault Restoration in Optical Networks with Arbitrary Mesh Topologies,” at h t t p : / / m .ctz columbia.edu/-georg‘os/APS.html. G. Ellinas, T.E. Stern, “Automatic Protection Switching for Link Failures in Optical Networks with Bi-Directional Links,” Proc. of Globecom 1996, pp. 152-156. W.D. Grover, “Case Studies of Survivable Ring, Mesh and Mesh-Arc Hybrid Networks,” in Proceedings ofIEEE Globecom 1992, pp. 633-638. M. MBdard, S. G. Finn, R. A. Barry, “WDM Loop-back Recovery in Mesh Networks,” Proc. of INFOCOM 1999. O.J.Wasem, “An Algorithm for Designing Rings for Survivable Fiber Networks,” ZEEE Trans. on Reliability, vol. 40, no. 4, October 1991. S.G. Finn, M. MBdard, R.A. Barry, “A New Algorithm for Bi-directional Link Self-healing for Arbitrary Redundant Networks,”Proc. OFC ’98, ThJ4.