idnits 2.17.00 (12 Aug 2021) /tmp/idnits14007/draft-ietf-rtgwg-mrt-frr-architecture-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 406: '... fast-reroute MUST support Option A ...' RFC 2119 keyword, line 454: '...MRT fast-reroute SHOULD support Option...' RFC 2119 keyword, line 493: '...n the MRT Island MUST agree on a value...' RFC 2119 keyword, line 497: '...n the MRT Island MUST agree on a value...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 12, 2012) is 3722 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'R' is mentioned on line 256, but not defined == Missing Reference: 'F' is mentioned on line 872, but not defined == Missing Reference: 'C' is mentioned on line 256, but not defined == Missing Reference: 'I' is mentioned on line 286, but not defined == Missing Reference: 'ABR1' is mentioned on line 721, but not defined == Missing Reference: 'ABR2' is mentioned on line 607, but not defined == Missing Reference: 'A' is mentioned on line 711, but not defined == Missing Reference: 'H' is mentioned on line 872, but not defined == Missing Reference: 'E' is mentioned on line 872, but not defined == Missing Reference: 'G' is mentioned on line 872, but not defined == Outdated reference: A later version (-04) exists of draft-enyedi-rtgwg-mrt-frr-algorithm-01 ** Downref: Normative reference to an Informational draft: draft-enyedi-rtgwg-mrt-frr-algorithm (ref. 'I-D.enyedi-rtgwg-mrt-frr-algorithm') ** Downref: Normative reference to an Informational RFC: RFC 5714 == Outdated reference: A later version (-02) exists of draft-atlas-rtgwg-mrt-mc-arch-00 == Outdated reference: draft-ietf-mpls-ldp-multi-topology has been published as RFC 7307 == Outdated reference: draft-ietf-rtgwg-ipfrr-notvia-addresses has been published as RFC 6981 == Outdated reference: draft-ietf-rtgwg-lfa-applicability has been published as RFC 6571 == Outdated reference: draft-ietf-rtgwg-ordered-fib has been published as RFC 6976 Summary: 3 errors (**), 0 flaws (~~), 17 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing Area Working Group A. Atlas, Ed. 3 Internet-Draft R. Kebler 4 Intended status: Standards Track Juniper Networks 5 Expires: September 13, 2012 G. Enyedi 6 A. Csaszar 7 Ericsson 8 M. Konstantynowicz 9 R. White 10 Cisco Systems 11 M. Shand 12 March 12, 2012 14 An Architecture for IP/LDP Fast-Reroute Using Maximally Redundant Trees 15 draft-ietf-rtgwg-mrt-frr-architecture-01 17 Abstract 19 As IP and LDP Fast-Reroute are increasingly deployed, the coverage 20 limitations of Loop-Free Alternates are seen as a problem that 21 requires a straightforward and consistent solution for IP and LDP, 22 for unicast and multicast. This draft describes an architecture 23 based on redundant backup trees where a single failure can cut a 24 point-of-local-repair from the destination only on one of the pair of 25 redundant trees. 27 One innovative algorithm to compute such topologies is maximally 28 disjoint backup trees. Each router can compute its next-hops for 29 each pair of maximally disjoint trees rooted at each node in the IGP 30 area with computational complexity similar to that required by 31 Dijkstra. 33 The additional state, address and computation requirements are 34 believed to be significantly less than the Not-Via architecture 35 requires. 37 Status of this Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at http://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on September 13, 2012. 54 Copyright Notice 56 Copyright (c) 2012 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 1.1. Goals for Extending IP Fast-Reroute coverage beyond LFA . 4 73 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 3. Maximally Redundant Trees (MRT) . . . . . . . . . . . . . . . 6 75 4. Maximally Redundant Trees (MRT) and Fast-Reroute . . . . . . . 8 76 5. Unicast Forwarding with MRT Fast-Reroute . . . . . . . . . . . 9 77 5.1. LDP Unicast Forwarding - Avoid Tunneling . . . . . . . . . 10 78 5.2. IP Unicast Traffic . . . . . . . . . . . . . . . . . . . . 10 79 6. Protocol Extensions and Considerations: OSPF and ISIS . . . . 12 80 7. Multi-homed Prefixes . . . . . . . . . . . . . . . . . . . . . 14 81 8. Inter-Area and ABR Forwarding Behavior . . . . . . . . . . . . 15 82 9. Issues with Area Abstraction . . . . . . . . . . . . . . . . . 18 83 10. Partial Deployment and Islands of Compatible MRT FRR 84 routers . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 85 11. Network Convergence and Preparing for the Next Failure . . . . 21 86 11.1. Micro-forwarding loop prevention and MRTs . . . . . . . . 21 87 11.2. MRT Recalculation . . . . . . . . . . . . . . . . . . . . 22 88 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 89 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 90 14. Security Considerations . . . . . . . . . . . . . . . . . . . 23 91 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 92 15.1. Normative References . . . . . . . . . . . . . . . . . . . 23 93 15.2. Informative References . . . . . . . . . . . . . . . . . . 23 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 96 1. Introduction 98 There is still work required to completely provide IP and LDP Fast- 99 Reroute[RFC5714] for unicast and multicast traffic. This draft 100 proposes an architecture to provide 100% coverage for unicast 101 traffic. The associated multicast architecture is described in 102 [I-D.atlas-rtgwg-mrt-mc-arch]. 104 Loop-free alternates (LFAs)[RFC5286] provide a useful mechanism for 105 link and node protection but getting complete coverage is quite hard. 106 [LFARevisited] defines sufficient conditions to determine if a 107 network provides link-protecting LFAs and also proves that augmenting 108 a network to provide better coverage is NP-hard. 109 [I-D.ietf-rtgwg-lfa-applicability] discusses the applicability of LFA 110 to different topologies with a focus on common PoP architectures. 112 While Not-Via [I-D.ietf-rtgwg-ipfrr-notvia-addresses] is defined as 113 an architecture, in practice, it has proved too complicated and 114 stateful to spark substantial interest in implementation or 115 deployment. Academic implementations [LightweightNotVia] exist and 116 have found the address management complexity high (but no 117 standardization has been done to reduce this). 119 A different approach is needed and that is what is described here. 120 It is based on the idea of using disjoint backup topologies as 121 realized by Maximally Redundant Trees (described in 122 [LightweightNotVia]); the general architecture can also apply to 123 future improved redundant tree algorithms. 125 1.1. Goals for Extending IP Fast-Reroute coverage beyond LFA 127 Any scheme proposed for extending IPFRR network topology coverage 128 beyond LFA, apart from attaining basic IPFRR properties, should also 129 aim to achieve the following usability goals: 131 o ensure maximum physically feasible link and node disjointness 132 regardless of topology, 134 o automatically compute backup next-hops based on the topology 135 information distributed by link-state IGP, 137 o do not require any signaling in the case of failure and use pre- 138 programmed backup next-hops for forwarding, 140 o introduce minimal amount of additional addressing and state on 141 routers, 143 o enable gradual introduction of the new scheme and backward 144 compatibility, 146 o and do not impose requirements for external computation. 148 2. Terminology 150 2-connected: A graph that has no cut-vertices. This is a graph 151 that requires two nodes to be removed before the network is 152 partitioned. 154 2-connected cluster: A maximal set of nodes that are 2-connected. 156 2-edge-connected: A network graph where at least two links must be 157 removed to partition the network. 159 ADAG: Almost Directed Acyclic Graph - a graph that, if all links 160 incoming to the root were removed, would be a DAG. 162 block: Either a 2-connected cluster, a cut-edge, or an isolated 163 vertex. 165 cut-link: A link whose removal partitions the network. A cut-link 166 by definition must be connected between two cut-vertices. If 167 there are multiple parallel links, then they are referred to as 168 cut-links in this document if removing the set of parallel links 169 would partition the network. 171 cut-vertex: A vertex whose removal partitions the network. 173 DAG: Directed Acyclic Graph - a graph where all links are directed 174 and there are no cycles in it. 176 GADAG: Generalized ADAG - a graph that is the combination of the 177 ADAGs of all blocks. 179 Maximally Redundant Trees (MRT): A pair of trees where the path 180 from any node X to the root R along the first tree and the path 181 from the same node X to the root along the second tree share the 182 minimum number of nodes and the minimum number of links. Each 183 such shared node is a cut-vertex. Any shared links are cut-links. 184 Any RT is an MRT but many MRTs are not RTs. 186 network graph: A graph that reflects the network topology where all 187 links connect exactly two nodes and broadcast links have been 188 transformed into the standard pseudo-node representation. 190 Redundant Trees (RT): A pair of trees where the path from any node 191 X to the root R along the first tree is node-disjoint with the 192 path from the same node X to the root along the second tree. 193 These can be computed in 2-connected graphs. 195 3. Maximally Redundant Trees (MRT) 197 In the last few years, there's been substantial research on how to 198 compute and use redundant trees. Redundant trees are directed 199 spanning trees that provide disjoint paths towards their common root. 200 These redundant trees only exist and provide link protection if the 201 network is 2-edge-connected and node protection if the network is 202 2-connected. Such connectiveness may not be the case in real 203 networks, either due to architecture or due to a previous failure. 204 The work on maximally redundant trees has added two useful pieces 205 that make them ready for use in a real network. 207 o Computable regardless of network topology: The maximally redundant 208 trees are computed so that only the cut-edges or cut-vertices are 209 shared between the multiple trees. 211 o Computationally practical algorithm is based on a common network 212 topology database. Algorithm variants can compute in O( e) or O(e 213 + n log n), as given in [I-D.enyedi-rtgwg-mrt-frr-algorithm]. 215 There is, of course, significantly more in the literature related to 216 redundant trees and even fast-reroute, but the formulation of the 217 Maximally Redundant Trees (MRT) algorithm makes it very well suited 218 to use in routers. 220 A known disadvantage of MRT, and redundant trees in general, is that 221 the trees do not necessarily provide shortest detour paths. The use 222 of the shortest-path-first algorithm in tree-building and including 223 all links in the network as possibilities for one path or another 224 should improve this. Modeling is underway to investigate and compare 225 the MRT alternates to the optimal 226 [I-D.enyedi-rtgwg-mrt-frr-algorithm]. Providing shortest detour 227 paths would require failure-specific detour paths to the 228 destinations, but the state-reduction advantage of MRT lies in the 229 detour being established per destination (root) instead of per 230 destination AND per failure. 232 The specific algorithms to compute MRTs as well as the logic behind 233 that algorithm and alternative computational approaches are given in 234 detail in [I-D.enyedi-rtgwg-mrt-frr-algorithm]. Those interested are 235 highly recommended to read that document. This document describes 236 how the MRTs can be used and not how to compute them. 238 The most important thing to understand about MRTs is that for each 239 pair of destination-routed MRTs, there is a path from every node X to 240 the destination D on the Blue MRT that is as disjoint as possible 241 from the path on the Red MRT. The two paths along the two MRTs to a 242 given destination-root of a 2-connected graph are node-disjoint and 243 link-disjoint, while in any non-2-connected graph, only the cut- 244 vertices and cut-edges can be contained by both of the paths. 246 For example, in Figure 1, there is a network graph that is 247 2-connected in (a) and associated MRTs in (b) and (c). One can 248 consider the paths from B to R; on the Blue MRT, the paths are 249 B->F->D->E->R or B->C->D->E->R. On the Red MRT, the path is B->A->R. 250 These are clearly link and node-disjoint. These MRTs are redundant 251 trees because the paths are disjoint. 253 [E]---[D]---| [E]<--[D]<--| [E]-->[D]---| 254 | | | | ^ | | | 255 | | | V | | V V 256 [R] [F] [C] [R] [F] [C] [R] [F] [C] 257 | | | ^ ^ ^ | | 258 | | | | | | V | 259 [A]---[B]---| [A]-->[B]---| [A]---[B]<--| 261 (a) (b) (c) 262 a 2-connected graph Blue MRT towards R Red MRT towards R 264 Figure 1: A 2-connected Network 266 By contrast, in Figure 2, the network in (a) is not 2-connected. If 267 F, G or the link F<->G failed, then the network would be partitioned. 268 It is clearly impossible to have two link-disjoint or node-disjoint 269 paths from G, I or J to R. The MRTs given in (b) and (c) offer paths 270 that are as disjoint as possible. For instance, the paths from B to 271 R are the same as in Figure 1 and the path from G to R on the Blue 272 MRT is G->F->D->E->R and on the Red MRT is G->F->B->A->R. 274 [E]---[D]---| 275 | | | |----[I] 276 | | | | | 277 [R]---[C] [F]---[G] | 278 | | | | | 279 | | | |----[J] 280 [A]---[B]---| 282 (a) 283 a non-2-connected graph 285 [E]<--[D]<--| [E]-->[D]---| 286 | ^ | [I] | | [I] 287 V | | ^ V V | 288 [R]<--[C] [F]<--[G] | [R]---[C] [F]<--[G] | 289 ^ ^ | | ^ | | ^ V 290 | | |--->[J] | V | |----[J] 291 [A]-->[B]---| [A]<--[B]<--| 293 (b) (c) 294 Blue MRT towards R Red MRT towards R 296 Figure 2: A non-2-connected network 298 4. Maximally Redundant Trees (MRT) and Fast-Reroute 300 In normal IGP routing, each router has its shortest-path-tree to all 301 destinations. From the perspective of a particular destination, D, 302 this looks like a reverse SPT (rSPT). To use maximally redundant 303 trees, in addition, each destination D has two MRTs associated with 304 it; by convention these will be called the blue and red MRTs. 306 Any IP/LDP fast-reroute technique beyond LFA requires an additional 307 dataplane procedure, such as an additional forwarding mechanism. The 308 well-known options are tunneling (e.g. 309 [I-D.ietf-rtgwg-ipfrr-notvia-addresses]), per-interface forwarding 310 (e.g. Loop-Free Failure Insensitive Routing in [EnyediThesis]), and 311 multi-topology forwarding. MRT is realized by using multi-topology 312 forwarding. There is a Blue MRT forwarding topology and a Red MRT 313 forwarding topology. 315 MRTs are practical to maintain redundancy even after a single link or 316 node failure. If a pair of MRTs is computed rooted at each 317 destination, all the destinations remain reachable along one of the 318 MRTs in the case of a single link or node failure. 320 When there is a link or node failure affecting the rSPT, each node 321 will still have at least one path via one of the MRTs to reach the 322 destination D. For example, in Figure 2, C would normally forward 323 traffic to R across the C<->R link. If that C<->R link fails, then C 324 could use either the Blue MRT path C->D->E->R or the Red MRT path 325 C->B->A->R. 327 As is always the case with fast-reroute technologies, forwarding does 328 not change until a local failure is detected. Packets are forwarded 329 along the shortest path. The appropriate alternate to use is pre- 330 computed. [I-D.enyedi-rtgwg-mrt-frr-algorithm] describes exactly how 331 to determine whether the Blue MRT next-hops or the Red MRT next-hops 332 should be the MRT alternate next-hops for a particular primary next- 333 hop N to a particular destination D. 335 MRT alternates are always available to use, unless the network has 336 been partitioned. It is a local decision whether to use an MRT 337 alternate, a Loop-Free Alternate or some other type of alternate. 338 When a network needs to use a micro-loop prevention mechanism 339 [RFC5715] such as Ordered FIB[I-D.ietf-rtgwg-ordered-fib] or Farside 340 Tunneling[RFC5715], then the whole IGP area needs to have alternates 341 available so that the micro-loop prevention mechanism, which requires 342 slower network convergence, can take the necessary time without 343 impacting traffic badly. 345 As described in [RFC5286], when a worse failure than is anticipated 346 happens, using LFAs that are not downstream neighbors can cause 347 micro-looping. An example is given of link-protecting alternates 348 causing a loop on node failure. Even if a worse failure than 349 anticipated happened, the use of MRT alternates will not cause 350 looping. Therefore, while node-protecting LFAs may be prefered, an 351 advantage to using MRT alternates when such a node-protecting LFA is 352 not a downstream path is the certainty that no alternate-induced 353 looping will occur. 355 5. Unicast Forwarding with MRT Fast-Reroute 357 With LFA, there is no need to tunnel unicast traffic, whether IP or 358 LDP. The traffic is simply sent to an alternate. As mentioned 359 earlier in Section 4, MRT needs multi-topology forwarding. 360 Unfortunately, neither IP nor LDP provide extra bits for a packet to 361 indicate its topology. 363 Once the MRTs are computed, the two sets of MRTs are seen by the 364 forwarding plane as essentially two additional topologies. The same 365 considerations apply for forwarding along the MRTs as for handling 366 multiple topologies. 368 5.1. LDP Unicast Forwarding - Avoid Tunneling 370 For LDP, it is very desirable to avoid tunneling because, for at 371 least node protection, tunneling requires knowledge of remote LDP 372 label mappings and thus requires targeted LDP sessions and the 373 associated management complexity. There are two different mechanisms 374 that can be used. 376 1. Option A - Encode MT-ID in Labels: In addition to sending a 377 single label for a FEC, a router would provide two additional 378 labels with the MT-IDs associated with the Blue MRT or Red MRT 379 forwarding topologies. This is very simple for hardware support. 380 It does reduce the label space for other uses. It also increases 381 the memory to store the labels and the communication required by 382 LDP. 384 2. Option B - Create Topology-Identification Labels: Use the label- 385 stacking ability of MPLS and specify only two additional labels - 386 one for each associated MRT color - by a new FEC type. When 387 sending a packet onto an MRT, first swap the LDP label and then 388 push the topology-identification label for that MRT color. When 389 receiving a packet with a topology-identification label, pop it 390 and use it to guide the next-hop selection in combination with 391 the next label in the stack; then swap the remaining label, if 392 appropriate, and push the topology-identification label for the 393 next-hop. This has minimal usage of additional labels, memory 394 and LDP communication. It does increase the size of packets and 395 the complexity of the required label operations and look-ups. 396 This can use the same mechanisms as are needed for context-aware 397 label spaces. 399 Note that with LDP unicast forwarding, regardless of whether 400 topology-identification label or encoding topology in label is used, 401 no additional loopbacks per router are required. This is because LDP 402 labels are used on a hop-by-hop basis to identify MRT-blue and MRT- 403 red forwading topologies. 405 For greatest hardware compatibility, routers implementing MRT LDP 406 fast-reroute MUST support Option A of encoding the MT-ID in the 407 labels. The extensions to indicate an MT-ID for a FEC are described 408 in Section 3.2.1 of [I-D.ietf-mpls-ldp-multi-topology] 410 5.2. IP Unicast Traffic 412 For IP, there is no currently practical alternative except tunneling. 413 The tunnel egress could be the original destination in the area, the 414 next-next-hop, etc.. If the tunnel egress is the original 415 destination router, then the traffic remains on the redundant tree 416 with sub-optimal routing. If the tunnel egress is the next-next-hop, 417 then protection of multi-homed prefixes and node-failure for ABRs is 418 not available. Selection of the tunnel egress is a router-local 419 decision. 421 There are three options available for marking IP packets with which 422 MRT it should be forwarded in. 424 1. Tunnel IP packets via an LDP LSP. This has the advantage that 425 more installed routers can do line-rate encapsulation and 426 decapsulation. Also, no additional IP addresses would need to be 427 allocated or signaled. 429 A. Option A - LDP Destination-Topology Label: Use a label that 430 indicates both destination and MRT. This method allows easy 431 tunneling to the next-next-hop as well as to the IGP-area 432 destination. For multi-homed prefixes, this requires that 433 additional labels be advertised for each proxy-node. 435 B. Option B - LDP Topology Label: Use a Topology-Identifier 436 label on top of the IP packet. This is very simple and 437 doesn't require additional labels for proxy-nodes. If 438 tunneling to a next-next-hop is desired, then a two-deep 439 label stack can be used with [ Topology-ID label, Next-Next- 440 Hop Label ]. 442 2. Tunnel IP packets in IP. Each router supporting this option 443 would announce two additional loopback addresses and their 444 associated MRT color. Those addresses are used as destination 445 addresses for MRT-blue and MRT-red IP tunnels respectively. They 446 allow the transit nodes to identify the traffic as being 447 forwarded along either MRT-blue or MRT-red tree topology to reach 448 the tunnel destination. Announcements of these two additional 449 loopback addresses per router with their MRT color requires IGP 450 extensions. 452 For greatest hardware compatibility and ease in removing the MRT- 453 topology marking at area/level boundaries, routers that support MPLS 454 and implement IP MRT fast-reroute SHOULD support Option A - using an 455 LDP label that indicates the destination and MT-ID. 457 For proxy-nodes associated with one or more multi-homed prefixes, 458 there is no router associated with the proxy-node, so its loopbacks 459 can't be known or used. Instead, the loopback addresses of the two 460 routers that are attached to the proxy-node can be used. One of 461 those routers will be on the Red MRT and the other on the Blue MRT. 462 The MRT-red loopback of the first router would be used to reach the 463 router on the Red MRT and similarly the MRT-blue loopback of the 464 second router would be used. The routers connected to the proxy-node 465 are the end of the area/level and can decapsulate the traffic and 466 properly forward it into the next area. 468 6. Protocol Extensions and Considerations: OSPF and ISIS 470 This captures an initial understanding of what may need to be 471 specified. In cases of partial deployment, it is necessary for a 472 router to determine a consistent set of routers to include in the 473 island of MRT support. To facilitate this, each router can announce 474 both what its capabilities are and what it requires from other 475 routers to add them to the MRT island. Generally, there will be a 476 set of information advertised about the MRT support. This 477 information has only area/level-wide scope. 479 MRT Island Creation ID: This identifies the process that the router 480 uses to form an MRT Island. By advertising an ID for the process, 481 it is possible to have different processes in the future. It may 482 be desirable to advertise a list ordered by preference to allow 483 transitions. 485 MRT Algorithm ID: This identifies the particular MRT algorithm used 486 by the router. By having an Algorithm ID, it is possible to 487 change the algorithm used or use different ones in different 488 networks. It may be desirable to advertise a list ordered by 489 preference to allow transitions. 491 Red MRT MT-ID: This specifies the MT-ID to be associated with the 492 Red MRT forwarding topology. It is needed for use in signaling. 493 All routers in the MRT Island MUST agree on a value. 495 Blue MRT MT-ID: This specifies the MT-ID to be associated with the 496 Blue MRT forwarding topology. It is needed for use in signaling. 497 All routers in the MRT Island MUST agree on a value. 499 GADAG Root Election Priority: This specifies the priority of the 500 router for being used as the GADAG root of its island. A GADAG 501 root is elected from the set of routers with the highest priority; 502 ties are broken based upon highest Router ID. The sensitivity of 503 the MRT Algorithms to GADAG root selection is still being 504 evaluated. This provides the network operator with a knob to 505 force particular GADAG root selection. 507 Forwarding Mechanism for IP: This specifies which forwarding 508 mechanisms the router supports for IP traffic. An MRT island must 509 support a common set of forwarding mechanisms, which may be less 510 than the full set advertised. Multiple forwarding mechanisms may 511 be specified, such as IP-in-IPv4, IP-in-IPv6 or IP-in-LDP- 512 Destination-Topology Label. None is also an option. 514 Forwarding Mechanism for LDP: This specifies which forwarding 515 mechanisms the router supports for LDP traffic. An MRT island 516 must support a common set of forwarding mechanisms, which may be 517 less than the full set advertised. The expected mechanisms are 518 "Encode MT-ID in Labels" or None. 520 Red MRT Loopback Address: This provides the router's loopback 521 address to reach the router via the Red MRT forwarding topology. 522 It can, of course, be specified for both IPv4 and IPv6. 524 Blue MRT Loopback Address: This provides the router's loopback 525 address to reach the router via the Blue MRT forwarding topology. 526 It can, of course, be specified for both IPv4 and IPv6. 528 MRT Capabilities Available: This is the set of capabilities that 529 the router is configured to support. 531 MRT Capabilities Required: This is the set of capabilities that 532 other routers must have available to be added into the MRT island. 534 MRT Capability: Computes MRTs: The router can compute MRTs. 536 MRT Capability: IP Fast-Reroute: The router can use the computed 537 MRTs for IP fast-reroute. 539 MRT Capability: LDP Fast-Reroute: The router can use the computed 540 MRTs for LDP fast-reroute. 542 MRT Capability: PIM Fast-Reroute: The router can use the computed 543 MRTs for PIM fast-reroute. 545 MRT Capability: mLDP Fast-Reroute: The router can use the computed 546 MRTs for mLDP fast-reroute. 548 MRT Capability: PIM Global Protection: The router can use the 549 computed MRTs for PIM Global Protection 1+1. 551 MRT Capability: mLDP Global Protection: The router can use the 552 computed MRTs for mLDP Global Protection 1+1. 554 The assumption is that a router will form 1 MRT island, compute MRTs 555 within that island, and then use those MRTs for the different 556 purposes. Including a router that, for instance, doesn't support 557 mLDP Global Protection would mean that the whole MRT island could not 558 support it. In a fully deployed case, of course, the whole area/ 559 level would support MRT and the complexities of MRT island formation 560 would be minimal. 562 If a router wanted to form multiple MRT islands for different 563 application purposes, that could be done by specifying different Red 564 MRT MT-ID and Blue MRT MT-IDs. 566 As with LFA, it is expected that OSPF Virtual Links will not be 567 supported. 569 7. Multi-homed Prefixes 571 One advantage of LFAs that is necessary to preserve is the ability to 572 protect multi-homed prefixes against ABR failure. For instance, if a 573 prefix from the backbone is available via both ABR A and ABR B, if A 574 fails, then the traffic should be redirected to B. This can also be 575 done for backups via MRT. 577 This generalizes to any multi-homed prefix. A multi-homed prefix 578 could be: 580 o An out-of-area prefix announced by more than one ABR, 582 o An AS-External route announced by 2 or more ASBRs, 584 o A prefix with iBGP multipath to different ASBRs, 586 o etc. 588 For each prefix, the two lowest total cost ABRs are selected and a 589 proxy-node is created connected to those two ABRs. If there exist 590 multiple multi-homed prefixes that share the same two best 591 connectivity, then a single proxy-node can be used to represent the 592 set. An example of this is shown in Figure 3. 594 2 2 2 2 595 A----B----C A----B----C 596 2 | | 2 2 | | 2 597 | | | | 598 [ABR1] [ABR2] [ABR1] [ABR2] 599 | | | | 600 p,10 p,15 10 |---[P]---| 15 602 (a) Initial topology (b)with proxy-node 604 A<---B<---C A--->B--->C 605 | ^ ^ | 606 V | | V 607 [ABR1] [ABR2] [ABR1] [ABR2] 608 | | 609 |-->[P] [P]<--| 611 (c) Blue MRT (d) Red MRT 613 Figure 3: Prefixes Advertised by Multiple ABRs 615 The proxy-nodes and associated links are added to the network 616 topology after all real links have been assigned to a direction and 617 before the actual MRTs are computed. Proxy-nodes cannot be transited 618 when computing the MRTs. In addition to computing the pair of MRTs 619 associated with each router destination D in the area, a pair of MRTs 620 can be computed for each such proxy-node to fully protect against ABR 621 failure. 623 Each ABR or attaching router must remove the MRT marking[see 624 Section 5] and then forward the traffic outside of the area (or 625 island of MRT-fast-reroute-supporting routers). 627 If ASBR protection is desired, this has additonal complexities if the 628 ASBRs are in different areas. Similarly, protecting labeled BGP 629 traffic in the event of an ASBR failure has additional complexities 630 due to the per-ASBR label spaces involved. 632 8. Inter-Area and ABR Forwarding Behavior 634 In regular forwarding, packets destined outside the area arrive at 635 the ABR and the ABR forwards them into the other area because the 636 next-hops from the area with the best route (according to tie- 637 breaking rules) are used by the ABR. The question is then what to do 638 with packets marked with an MRT that are received by the ABR. 640 For unicast fast-reroute, the need to stay on an MRT forwarding 641 topology terminates at the ABR/LBR whose best route is via a 642 different area/level. It is highly desirable to go back to the 643 default forwarding topology when leaving an area/level. There are 644 three basic reasons for this. First, the default topology uses 645 shortest paths; the packet will thus take the shortest possible route 646 to the destination. Second, this allows failures that might appear 647 in multiple areas (e.g. ABR/LBR failures) to be separately 648 identified and repaired around. Third, the packet can be fast- 649 rerouted again, if necessary, due to a failure in a different area. 651 An ABR/LBR that receives a packet marked with an MRT towards a 652 destination in another area should forward the MRT marked packet in 653 the area with the best route along its associated MRT. If the packet 654 came from that area, this correctly avoids the failure. 656 How does an ABR/LBR ensure that MRT-marked packets do not arrive at 657 the ABR/LBR? There are two different mechanisms depending upon the 658 forwarding mechanism being used. 660 If the LDP label encodes the MT-ID as well as the destination, then 661 the ABR/LBR is responsible for advertising a particular label to each 662 neighbor. Additionally, an LDP label is associated with an MT-ID due 663 to the MT FEC that was used and not due to any intrisic particular 664 value for the label. Assume that an ABR/LBR has allocated three 665 labels for a particular destination; those labels are L_primary, 666 L_blue, and L_red. When the ABR/LBR advertises label bindings to 667 routers in the area with the best route to the destination, the ABR/ 668 LBR provides L_primary for the default topology, L_blue for the Blue 669 MRT MT-ID and L_red for the Red MRT MT-ID, exactly as expected. 670 However, when the ABR/LBR advertises label bindings to routers in 671 other areas, the ABR/LBR advertises L_primary for the default 672 topology, for the Blue MRT MT-ID, and for the Red MRT MT-ID. The 673 ABR/LBR installs next-hops from the best area for L_primary based on 674 the default topology, for L_blue based on the Blue MRT forwarding 675 topology, and for L_red based on the Red MRT forwarding topology. 676 Therefore, packets from the non-best area will arrive at the ABR/LBR 677 with a label L_primary and will be forwarded into the best area along 678 the default topology. By controlling what labels are advertised, the 679 ABR/LBR can thus enforce that packets exiting the area do so on the 680 shortest-path default topology. 682 If IP-in-IP forwarding is used, then the ABR/LBR behavior is 683 dependent upon the outermost IP address. If the outermost IP address 684 is an MRT loopback address of the ABR/LBR, then the packet is 685 decapsulated and forwarded based upon the inner IP address, which 686 should go on the default SPT topology. If the outermost IP address 687 is not an MRT loopback address of the ABR/LBR, then the packet is 688 simply forwarded along the associated forwarding topology. A PLR 689 sending traffic to a destination outside its local area/level will 690 pick the MRT and use the associated MRT loopback address of the ABR/ 691 LBR immediately before the proxy-node on that MRT. 693 Thus, regardless of which of these two forwarding mechanisms are 694 used, there is no need for additional computation or per-area 695 forwarding state. 697 +----[C]---- --[D]--[E] --[D]--[E] 698 | \ / \ / \ 699 p--[A] Area 10 [ABR1] Area 0 [H]--p +-[ABR1] Area 0 [H]-+ 700 | / \ / | \ / | 701 +----[B]---- --[F]--[G] | --[F]--[G] | 702 | | 703 | other | 704 +----------[p]-------+ 705 area 707 (a) Example topology (b) Proxy node view in Area 0 nodes 709 +----[C]<--- [D]->[E] 710 V \ \ 711 +-[A] Area 10 [ABR1] Area 0 [H]-+ 712 | ^ / / | 713 | +----[B]<--- [F]->[G] V 714 | | 715 +------------->[p]<--------------+ 717 (c) rSPT towards destination p 719 ->[D]->[E] -<[D]<-[E] 720 / \ / \ 721 [ABR1] Area 0 [H]-+ +-[ABR1] [H] 722 / | | \ 723 [F]->[G] V V -<[F]<-[G] 724 | | 725 | | 726 [p]<------+ +--------->[p] 728 (d) Blue MRT in Area 0 (e) Red MRT in Area 0 730 Figure 4: ABR Forwarding Behavior and MRTs 732 The other potential forwarding mechanisms require additional 733 computation by the penultimate router along the in-local-area MRT 734 immediately before the ABR/LBR is reached. The penultimate router 735 can determine that the ABR/LBR will forward the packet out of area/ 736 level and, in that case, the penultimate router can remove the MRT 737 marking but still forward the packet along the MRT next-hop to reach 738 the ABR. For instance, in Figure 4, if node H fails, node E has to 739 put traffic towards prefix p onto the red MRT. But since node D 740 knows that ABR1 will use a best from another area, it is safe for D 741 to remove the MRT marking and just send the packet to ABR1 still on 742 the red MRT but unmarked. ABR1 will use the shortest path in Area 743 10. 745 In all cases for ISIS and most cases for OSPF, the penultimate router 746 can determine what decision the adjacent ABR will make. The one case 747 where it can't be determined is when two ASBRs are in different non- 748 backbone areas attached to the same ABR, then the ASBR's Area ID may 749 be needed for tie-breaking (prefer the route with the largest OPSF 750 area ID) and the Area ID isn't announced as part of the ASBR link- 751 state advertisement (LSA). In this one case, suboptimal forwarding 752 along the MRT in the other area would happen. If this is a realistic 753 deployment scenario, OSPF extensions could be considered. 755 9. Issues with Area Abstraction 757 MRT fast-reroute provides complete coverage in a area that is 758 2-connected. Where a failure would partition the network, of course, 759 no alternate can protect against that failure. Similarly, there are 760 ways of connecting multi-homed prefixes that make it impractical to 761 protect them without excessive complexity. 763 50 764 |----[ASBR Y]---[B]---[ABR 2]---[C] Backbone Area 0: 765 | | ABR 1, ABR 2, C, D 766 | | 767 | | Area 20: A, ASBR X 768 | | 769 p ---[ASBR X]---[A]---[ABR 1]---[D] Area 10: B, ASBR Y 770 5 p is a Type 1 AS-external 772 Figure 5: AS external prefixes in different areas 774 Consider the network in Figure 5 and assume there is a richer 775 connective topology that isn't shown, where the same prefix is 776 announced by ASBR X and ASBR Y which are in different non-backbone 777 areas. If the link from A to ASBR X fails, then an MRT alternate 778 could forward the packet to ABR 1 and ABR 1 could forward it to D, 779 but then D would find the shortest route is back via ABR 1 to Area 780 20. The only real way to get it from A to ASBR Y is to explicitly 781 tunnel it to ASBR Y. 783 Tunnelling to the backup ASBR is for future consideration. The 784 previously proposed PHP approach needs to have an exception if BGP 785 policies (e.g. BGP local preference) determines which ASBR to use. 786 Consider the case in Figure 6. If the link between A and ASBR X (the 787 preferred border router) fails, A can put the packets to p onto an 788 MRT alternate, even tunnel it towards ASBR Y. Node B, however, must 789 not remove the MRT marking in this case, as nodes in Area 0, 790 including ASBR Y itself would not know that their preferred ASBR is 791 down. 793 Area 20 BB Area 0 794 p ---[ASBR X]-X-[A]---[B]---[ABR 1]---[D]---[ASBR Y]--- p 796 BGP prefers ASBR X for prefix p 798 Figure 6: Failure of path towards ASBR preferred by BGP 800 The fine details of how to solve multi-area external prefix cases, or 801 identifying certain cases as too unlikely and too complex to protect 802 is for further consideration. 804 10. Partial Deployment and Islands of Compatible MRT FRR routers 806 A natural concern with new functionality is how to have it be useful 807 when it is not deployed across an entire IGP area. In the case of 808 MRT FRR, where it provides alternates when appropriate LFAs aren't 809 available, there are also deployment scenarios where it may make 810 sense to only enable some routers in an area with MRT FRR. A simple 811 example of such a scenario would be a ring of 6 or more routers that 812 is connected via two routers to the rest of the area. 814 First, a computing router S must determine its local island of 815 compatible MRT fast-reroute routers. A router that has common 816 forwarding mechanisms and common algorithm and is connected to either 817 to S or to another router already determined to be in S's local 818 island can be added to S's local island. 820 Destinations inside the local island can obviously use MRT 821 alternates. Destinations outside the local island can be treated 822 like a multi-homed prefix with caveats to avoid looping. For LDP 823 labels including both destination and topology, the routers at the 824 borders of the local island need to originate labels for the original 825 FEC and the associated MRT-specific labels. Packets sent to an LDP 826 label marked as blue or red MRT to a destination outside the local 827 island will have the last router in the local island swap the label 828 to one for the destination and forward the packet along the outgoing 829 interface on the MRT towards a router outside the local island that 830 was represented by the proxy-node. 832 For IP in IP encapsulations, remote destinations' loopback addresses 833 for the MRTs cannot be used, even if they were available. Instead, 834 the MRT loopback address of the router attached to a proxy-node, 835 which represents destinations outside the local island, can be used. 836 Packets sent to the router's MRT loopback address will have their 837 outer IP header removed and will need to be explicitly forwarded 838 along the outgoing interface on the MRT towards a router outside the 839 local island that was represented by the proxy-node. This behavior 840 requires essentially remembering the MT-ID indicated by the outer IP 841 address. An alternate option would be to advertise different 842 loopback addresses to be associated with the proxy-node; the outer IP 843 address would still be removed but it would indicate the outgoing 844 interface to use and no lookup would be necessary on the internal IP 845 address while maintaining MT-ID context. 847 A key question is which routers outside the MRT island can packets be 848 forwarded to so that they are not forwarded back into the MRT island. 849 An example of the necessary network graph transformations are given 850 in Figure 7. There are two parts to the computation. First, the MRT 851 island is collapsed into a single node; this assumes that the cost of 852 transiting the MRT island is nothing and is pessimistic but allows 853 for simpler computation. Then, for each destination (other than the 854 MRT island), the routers adjacent to the MRT island are checked to 855 see if they are loop-free with respect to the MRT island and the 856 destination. The two loop-free neighbors of the MRT island that are 857 closest to the destination are selected. Then, a graph of just the 858 MRT island is augmented with proxy-nodes that are attached via the 859 outgoing interfaces to the selected loop-free neighbors. Finally, 860 the MRTs rooted at each proxy-node are computed on that augmented MRT 861 island graph. Essentially, the MRT island must have a loop-free 862 neighbor to be able to have an alternate. 864 [G]---[E]---(B)---(C)---(D) 865 | \ | | | 866 | \ | | | 867 | \ | | | 868 [H]---[F]---(A)---(S)----| 870 (1) Network Graph with Partial Deployment 872 [E],[F],[G],[H] : No support for MRT-FRR 873 (A),(B),(C),(D),(S): MRT Island - supports MRT-FRR 875 [G]---[E]----| |---(B)---(C)---(D) 876 | \ | | | | | 877 | \ | ( MRT Island ) [ proxy ] | | 878 | \ | | | | | 879 [H]---[F]----| |---(A)---(S)----| 881 (2) Graph for determining (3) Graph for MRT computation 882 loop-free neighbors 884 Figure 7: Computing alternates to destinations outside the MRT Island 886 Naturally, there are more complicated options to improve coverage, 887 such as connecting multiple MRT islands across tunnels, but it is not 888 clear that the additional complexity is necessary. 890 11. Network Convergence and Preparing for the Next Failure 892 After a failure, MRT detours ensure that packets reach their intended 893 destination while the IGP has not reconverged onto the new topology. 894 As link-state updates reach the routers, the IGP process calculates 895 the new shortest paths. Two things need attention: micro-loop 896 prevention and MRT re-calculation. 898 11.1. Micro-forwarding loop prevention and MRTs 900 As is well known[RFC5715], micro-loops can occur during IGP 901 convergence; such loops can be local to the failure or remote from 902 the failure. Managing micro-loops is an orthogonal issue to having 903 alternates for local repair, such as MRT fast-reroute provides. 905 There are two possible micro-loop prevention mechanism discussed in 906 [RFC5715]. The first is Ordered FIB [I-D.ietf-rtgwg-ordered-fib]. 907 The second is Farside Tunneling which requires tunnels or an 908 alternate topology to reach routers on the farside of the failure. 910 Since MRTs provide an alternate topology through which traffic can be 911 sent and which can be manipulated separately from the SPT, it is 912 possible that MRTs could be used to support Farside Tunneling. 913 Details of how to do so are outside of this document. 915 11.2. MRT Recalculation 917 When a failure event happens, traffic is put by the PLRs onto the MRT 918 topologies. After that, each router recomputes its shortest path 919 tree (SPT) and moves traffic over to that. Only after all the PLRs 920 have switched to using their SPTs and traffic has drained from the 921 MRT topologies should each router install the recomputed MRTs into 922 the FIBs. 924 At each router, therefore, the sequence is as follows: 926 1. Receive failure notification 928 2. Recompute SPT 930 3. Install new SPT 932 4. Recompute MRTs 934 5. Wait configured period for all routers to be using their SPTs and 935 traffic to drain from the MRTs. 937 6. Install new MRTs. 939 While the recomputed MRTs are not installed in the FIB, protection 940 coverage is lowered. Therefore, it is important to recalculate the 941 MRTs and install them as quickly as possible. 943 The installation of the MRTs can be staged such that the affected or 944 broken MRTs are updated first and then the unbroken. 946 12. Acknowledgements 948 The authors would like to thank Hannes Gredler, Jeff Tantsura, Ted 949 Qian, Kishore Tiruveedhula, Santosh Esale, Nitin Bahadur, Harish 950 Sitaraman and Raveendra Torvi for their suggestions and review. 952 13. IANA Considerations 954 This doument includes no request to IANA. 956 14. Security Considerations 958 This architecture is not currently believed to introduce new security 959 concerns. 961 15. References 963 15.1. Normative References 965 [I-D.enyedi-rtgwg-mrt-frr-algorithm] 966 Enyedi, G., Atlas, A. and A. Csaszar, "Algorithms for 967 computing Maximally Redundant Trees for IP/LDP Fast- 968 Reroute", draft-enyedi-rtgwg-mrt-frr-algorithm-01 (work 969 in progress), March 2012. 971 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 972 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 974 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", 975 RFC 5714, January 2010. 977 15.2. Informative References 979 [EnyediThesis] 980 Enyedi, G., "Novel Algorithms for IP Fast Reroute", 981 Department of Telecommunications and Media Informatics, 982 Budapest University of Technology and Economics Ph.D. 983 Thesis, February 2011, 984 . 986 [I-D.atlas-rtgwg-mrt-mc-arch] 987 Atlas, A., Kebler, R., Wijnands, I., Csaszar, A., and G. 988 Envedi, "An Architecture for Multicast Protection Using 989 Maximally Redundant Trees", 990 draft-atlas-rtgwg-mrt-mc-arch-00 (work in progress), 991 March 2012. 993 [I-D.ietf-mpls-ldp-multi-topology] 994 Zhao, Q., Fang, L., Zhou, C., Li, L., and N. So, "LDP 995 Extensions for Multi Topology Routing", 996 draft-ietf-mpls-ldp-multi-topology-03 (work in progress), 997 March 2012. 999 [I-D.ietf-rtgwg-ipfrr-notvia-addresses] 1000 Bryant, S., Previdi, S., and M. Shand, "IP Fast Reroute 1001 Using Not-via Addresses", 1002 draft-ietf-rtgwg-ipfrr-notvia-addresses-08 (work in 1003 progress), December 2011. 1005 [I-D.ietf-rtgwg-lfa-applicability] 1006 Filsfils, C. and P. Francois, "LFA applicability in SP 1007 networks", draft-ietf-rtgwg-lfa-applicability-06 (work in 1008 progress), January 2012. 1010 [I-D.ietf-rtgwg-ordered-fib] 1011 Shand, M., Bryant, S., Previdi, S., and C. Filsfils, 1012 "Loop-free convergence using oFIB", 1013 draft-ietf-rtgwg-ordered-fib-05 (work in progress), 1014 April 2011. 1016 [LFARevisited] 1017 Retvari, G., Tapolcai, J., Enyedi, G., and A. Csaszar, "IP 1018 Fast ReRoute: Loop Free Alternates Revisited", Proceedings 1019 of IEEE INFOCOM , 2011, . 1022 [LightweightNotVia] 1023 Enyedi, G., Retvari, G., Szilagyi, P., and A. Csaszar, "IP 1024 Fast ReRoute: Lightweight Not-Via without Additional 1025 Addresses", Proceedings of IEEE INFOCOM , 2009, 1026 . 1028 [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free 1029 Convergence", RFC 5715, January 2010. 1031 Authors' Addresses 1033 Alia Atlas (editor) 1034 Juniper Networks 1035 10 Technology Park Drive 1036 Westford, MA 01886 1037 USA 1039 Email: akatlas@juniper.net 1041 Robert Kebler 1042 Juniper Networks 1043 10 Technology Park Drive 1044 Westford, MA 01886 1045 USA 1047 Email: rkebler@juniper.net 1048 Gabor Sandor Enyedi 1049 Ericsson 1050 Konyves Kalman krt 11. 1051 Budapest 1097 1052 Hungary 1054 Email: Gabor.Sandor.Enyedi@ericsson.com 1056 Andras Csaszar 1057 Ericsson 1058 Konyves Kalman krt 11 1059 Budapest 1097 1060 Hungary 1062 Email: Andras.Csaszar@ericsson.com 1064 Maciek Konstantynowicz 1065 Cisco Systems 1067 Email: maciek@bgp.nu 1069 Russ White 1070 Cisco Systems 1072 Email: russwh@cisco.com 1074 Mike Shand 1076 Email: mike@mshand.org.uk