idnits 2.17.00 (12 Aug 2021) /tmp/idnits17927/draft-litkowski-rtgwg-lfa-manageability-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 15, 2012) is 3504 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.shand-remote-lfa' is defined on line 610, but no explicit reference was found in the text == Unused Reference: 'RFC3630' is defined on line 615, but no explicit reference was found in the text == Unused Reference: 'RFC3906' is defined on line 619, but no explicit reference was found in the text == Unused Reference: 'RFC4090' is defined on line 623, but no explicit reference was found in the text == Unused Reference: 'RFC5305' is defined on line 627, but no explicit reference was found in the text == Unused Reference: 'RFC5714' is defined on line 630, but no explicit reference was found in the text == Unused Reference: 'RFC5715' is defined on line 633, but no explicit reference was found in the text == Unused Reference: 'RFC6571' is defined on line 636, but no explicit reference was found in the text == Outdated reference: A later version (-03) exists of draft-previdi-isis-te-metric-extensions-02 Summary: 0 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing Area Working Group S. Litkowski 3 Internet-Draft B. Decraene 4 Intended status: Standards Track Orange 5 Expires: April 18, 2013 C. Filsfils 6 K. Raza 7 Cisco Systems 8 October 15, 2012 10 Operational management of Loop Free Alternates 11 draft-litkowski-rtgwg-lfa-manageability-00 13 Abstract 15 Loop Free Alternates (LFA), as defined in RFC 5286 is an IP Fast 16 ReRoute (IP FRR) mechanism enabling traffic protection for IP 17 traffic. Following first deployment experience, this document 18 provides operational feedback on LFA, highlights some limitations and 19 proposes a set of refinements to address those limitations. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on April 18, 2013. 38 Copyright Notice 40 Copyright (c) 2012 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Operational issues with default LFA tie breakers . . . . . . . 3 57 2.1. Case 1: Edge router protecting core failures . . . . . . . 3 58 2.2. Case 2: Edge router choosen to protect core failures 59 while core LFA exists . . . . . . . . . . . . . . . . . . 5 60 2.3. Case 3: suboptimal core alternate choice . . . . . . . . . 6 61 3. Configuration aspects . . . . . . . . . . . . . . . . . . . . 6 62 3.1. LFA activation . . . . . . . . . . . . . . . . . . . . . . 7 63 3.2. Policy based LFA selection . . . . . . . . . . . . . . . . 7 64 3.2.1. Mandatory criteria . . . . . . . . . . . . . . . . . . 8 65 3.2.2. Enhanced criteria . . . . . . . . . . . . . . . . . . 8 66 4. Operational aspects . . . . . . . . . . . . . . . . . . . . . 12 67 4.1. Controlling LFA computation . . . . . . . . . . . . . . . 12 68 4.2. Manual triggering of FRR . . . . . . . . . . . . . . . . . 13 69 4.3. Required local information . . . . . . . . . . . . . . . . 13 70 4.4. Coverage followup . . . . . . . . . . . . . . . . . . . . 13 71 5. Security Considerations . . . . . . . . . . . . . . . . . . . 14 72 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 73 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 74 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 75 8.1. Normative References . . . . . . . . . . . . . . . . . . . 14 76 8.2. Informative References . . . . . . . . . . . . . . . . . . 14 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 79 1. Introduction 81 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 82 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 83 document are to be interpreted as described in [RFC2119]. 85 This document is being discussed on the rtgwg@ietf.org mailing list. 87 2. Operational issues with default LFA tie breakers 89 [RFC5286] introduces the notion of tie breakers when selecting the 90 LFA among multiple candidate alternate next-hops. Most 91 implementations are using the following algorithm : 93 o Prefer node protection alternate over link protection alternate. 95 o If protection type is equal, choose alternate providing shortest 96 path. 98 First deployments have revealed that the above algorithm may be 99 insufficent to reflect Service Provider preferences and could lead to 100 negative side effects. 102 The following sections details use cases highlighting the 103 limitations. Per-prefix LFA is assumed. 105 2.1. Case 1: Edge router protecting core failures 107 R1 --------- R2 ---------- R3 --------- R4 108 | 1 100 1 | 109 | | 110 | 100 | 100 111 | | 112 | 1 100 1 | 113 R5 --------- R6 ---------- R7 --------- R8 -- R9 - PE1 114 | | | | 115 | 5k | 5k | 5k | 5k 116 | | | | 117 +--- n*PEx ---+ +---- PE2 ----+ 118 | 119 | 120 PEy 122 Figure 1 124 Rx routers are core routers using n*10G links. PEs are connected 125 using links with lower bandwidth. 127 In figure 1, let's consider the traffic from PE1 to PEx. Nominal 128 path is R9-R8-R7-R6-PEx. Let's consider the failure of link R7-R8. 129 For R8, R4 is not an LFA and the only available LFA is PE2. 131 When the core link R8-R7 fails, R8 switches all traffic destined to 132 all the PEx towards the edge node PE2. Hence a edge node and edge 133 links are used to protect the failure of a core link. Typically, 134 edge links have less capacity than core links hence congestion will 135 occur on PE2 links. Note that altough PE2 was not directly affected 136 by the failure, its links become congested and its traffic will 137 suffer from the congestion. 139 In summary, in case of failure, the impact on customer traffic is: 141 o From PE2 point of view : 143 * without LFA: no impact 145 * with LFA: traffic is partially dropped (but possibly 146 prioritized by QoS mechanism). 148 o From R8 point of view: 150 * without LFA: traffic is totally dropped until convergence 152 * with LFA: traffic is partially dropped (but possibly 153 prioritized by QoS mechanism). 155 2.2. Case 2: Edge router choosen to protect core failures while core 156 LFA exists 158 R1 --------- R2 ------------ R3 --------- R4 159 | 1 100 | 1 | 160 | | | 161 | 100 | 30 | 30 162 | | | 163 | 1 50 50 | 10 | 164 R5 -------- R6 ---- R10 ---- R7 -------- R8 --- R9 - PE1 165 | | \ | 166 | 5000 | 5000 \ 5000 | 5000 167 | | \ | 168 +--- n*PEx --+ +----- PE2 ----+ 169 | 170 | 171 PEy 173 Figure 2 175 Rx routers are core routers meshed with n*10G links. PEs are meshed 176 using links with lower bandwidth. 178 In figure 2, let's consider the traffic coming from PE1 to PEx. 179 Nominal path is R9-R8-R7-R6-PEx. Let's consider the failure of the 180 link R7-R8. For R8, R4 is a link-protecting LFA and PE2 is a node- 181 protecting LFA. PE2 is chosen as best LFA due to its better 182 protection type. Just like in case 1, this will probably lead to 183 congestion on PE2 links. 185 2.3. Case 3: suboptimal core alternate choice 187 +--- PE3 --+ 188 / \ 189 1000 / \ 1000 190 / \ 191 +----- R1 ---------------- R2 ----+ 192 | | 500 | | 193 | 10 | | | 10 194 | | | | 195 R5 | 10 | 10 R7 196 | | | | 197 | 10 | | | 10 198 | | 500 | | 199 +---- R3 ---------------- R4 -----+ 200 | | 201 | 1000 | 1000 202 | | 203 PE1 -------------- PE2 204 10k 206 Figure 3 208 Rx routers are core routers. R1-R2 and R3-R4 links are 1G links. 209 All others inter Rx links are 10G links. 211 In the figure above, let's consider the failure of link R1-R3. For 212 destination PE3, R3 has two possible alternates: 214 o R4 is node-protecting 216 o R5 is link-protecting 218 R4 is chosen as best LFA due to better protection type. However, it 219 may not be desirable to use R4 as prefered alternate due to bandwidth 220 capacity reason. Service provider may prefer to use high bandwidth 221 link as prefered LFA. In this example, prefering shortest path over 222 protection type may achieve the expected behavior but in cases where 223 metric are not reflecting bandwidth, it would not work and some other 224 criteria would need to be involved when selecting the best LFA. 226 3. Configuration aspects 228 Controlling best alternate and LFA activation granularity is a 229 requirement for Service Providers. This section defines 230 configuration requirements for LFA. 232 3.1. LFA activation 234 Granularity of LFA activation is important to control scaling of 235 boxes (programmed alternate nexthop consuming memory in forwarding 236 plane) and to control what is protected and not protected. 238 An implementation of LFA SHOULD allow activation: 240 o Per address-family : ipv4 unicast, ipv6 unicast, LDP IPv4 unicast, 241 LDP IPv6 unicast ... 243 o Per routing context : VRF, virtual/logical router, global routing 244 table, ... 246 o Per interface to control protected interfaces 248 o Per protocol instance, topology, area 250 o Per prefixes: prefix protection SHOULD have a better priority 251 compared to interface protection. This means that if a specific 252 prefix must be protected due to configuration request, LFA must be 253 computed and installed for this prefix even if the primary 254 outgoing interface is not configured for protection. 256 3.2. Policy based LFA selection 258 When multiple alternates exists, LFA selection algorithm is based on 259 tie breakers . Current tie breakers do not provide sufficient 260 control on how best alternate is chosen. This document proposes an 261 enhanced tie breaker allowing service providers to manage all 262 specific cases: 264 1. An implementation of LFA SHOULD support policy based decision for 265 determining best LFA. 267 2. Policy based decision SHOULD be based on multiple criterions, 268 where each criteria having a level of preference. 270 3. If defined policy does not permit to determine a unique best LFA, 271 the implementation MUST pick only one based on its own decision. 273 4. Policy SHOULD be applied to a protected interface or to a 274 specific set of destinations. In case of application on the 275 protected interface, all destinations primarily routed on this 276 interface SHOULD use the interface policy. 278 5. An implementation MAY support a behavior providing a non 279 disruptive change compared to behavior described in [RFC5286]. 281 3.2.1. Mandatory criteria 283 An implementation of LFA MUST support following mandatory criteria: 285 o Non candidate link. A link marked as "non candidate" it will 286 never be used as LFA. 288 o A primary nexthop being protected by another primary nexthop of 289 the same prefix (ECMP case). 291 o Type of protection provided by the alternate: link protection, 292 node protection, downstream. 294 o Shortest path: lowest IGP metric used to reach the destination. 296 o Local SRLG. 298 3.2.2. Enhanced criteria 300 An implementation of LFA SHOULD support following enhanced criteria: 302 o Linecard disjointness for protected and protecting nexthop: this 303 means that primary and alternate cannot be connected on the same 304 linecard. 306 o Link coloring. 308 o Existing TE based informations: 310 * Link affinity 312 * Link speed 314 * Link bandwidth (available/residual) 316 * Link loss 318 * Link delay 320 o Router type: core, edge, core/edge... 322 o Alternate type: link or tunnel alternate. This means that user 323 may change preference between link alternate or tunnel alternate 324 (tunnel prefered over link, link prefered over tunnel, or 325 considered as equal). 327 3.2.2.1. Linecard disjointness 329 Linecard disjointness criteria provides another level of SRLG on the 330 node (automatic SRLG). The SRLG beeing the node Line Card. 332 Notion of linecard may be different depending on the hardware design. 333 If applicable, multiple level of linecard disjointness may be 334 proposed. 336 3.2.2.2. Link coloring 338 Link coloring is a powerful system to control alternates. The idea 339 is very similar to TE Link affinity but with a local significance 340 only. Protecting interfaces are tagged with colors. Protected 341 interface are configured to include some colors with a preference 342 level and exclude others 344 Example : P1 router is connected to three P routers and two PEs. 346 PE2 347 | +---- P4 348 | / 349 PE1 ---- P1 --------- P2 350 | 10Gb 351 1Gb | 352 | 353 P3 355 P1 is configured to protect the P1-P4 link. We assume that given the 356 topology, all neighbors are LFA. We would like to enforce a policy 357 in the network where only a core router may protect against the 358 failure of a core link, and where high bandwidth link are prefered. 360 In this example, we can use our link coloring system by: 362 o Marking PEs links with color RED 364 o Marking 10Gb CORE link with color BLUE 366 o Marking 1Gb CORE link with color YELLOW 368 o Configured the protected interface P1->P4 with : 370 * Include BLUE, preference 200 372 * Include YELLOW, preference 100 373 * Exclude RED 375 Using this, PE links will never be used to protect against P1-P4 link 376 failure and 10Gb link will be be prefered. 378 The main advantage of this solution is that it could be reproduced 379 easily on other interface and other nodes without specifities. A 380 Service provider has only to define color system (associate color 381 with a significance) as it is done for TE affinity or BGP 382 communities. 384 Implementation of link coloring: 386 o SHOULD support multiple include and exclude colors an a single 387 protected interface. 389 o SHOULD provide a level of preference between included colors. 391 o SHOULD support multiple colors configuration on a single 392 protecting interface. 394 3.2.2.3. TE based information 396 It would be useful to be able to reuse already existing information 397 provided by traffic engineering extensions ([RFC3630]/[RFC5305] and 398 [I-D.previdi-isis-te-metric-extensions]) as tie-breakers for LFA. 399 This would allow automatic and optimized decision when choosing best 400 LFA while limiting the configuration overhead. Existing IGP TE- 401 extensions are "dedicated" to Traffic Engineering database and any 402 change for LFA choice introduced in TE-LSDB may impact already 403 existing tunnels. It would be interesting to make traffic- 404 engineering extensions available to other components than MPLS-TE. 405 The mechanisms to achieve this is beyond the scope of this document. 407 But basically, LFA as a purely local mechanim, only the local 408 information is directly interesting for the alternate choice and 409 there is no need to propagate it. 411 3.2.2.4. Router type 413 Rather than tagging interface on each node (using link color) to 414 identify neighbor node type, it would be helpful if routers were 415 announcing their role/function in the IGP. Currently no IGP 416 extension provides this information. The mechanics for flooding this 417 information is beyond the scope of this document. 419 Consider following network: 421 PE3 422 | 423 | 424 PE2 425 | +---- P4 426 | / 427 PE1 ---- P1 -------- P2 428 | 10Gb 429 1Gb | 430 | 431 P3 433 In the example above, each node is configured with its role, and the 434 role is flooded through the IGP. 436 o PE1,PE3: edge. 438 o PE2: aggregation (edge/core). 440 o P1,P2,P3: core. 442 A simple policy could be configured on P1 to choose best alternate 443 for P1->P4 based on router function/role as follows : 445 o criteria 1 -> router type: exclude aggregation and edge. 447 o criteria 2 -> bandwidth. 449 3.2.2.5. Link vs tunnel alternate 451 In addition to LFA, tunnels (IP, LDP or RSVP-TE) to distant routers 452 may be used to complement LFA coverage (tunnel tail used as virtual 453 neighbor). When a router has multiple alternate candidates for a 454 specific destination, it may have direct alternates as well as tunnel 455 alternates. Direct alternates may not always provide an optimal 456 routing path and it may be preferable to select a tunnel alternate 457 over a direct alternate. 459 In figure 1, there is no core alternate for R8 to reach PEs located 460 behind R6, so R8 is using PE2 as alternate, which may generate 461 congestion when FRR is activated. Instead, we could have a tunnel 462 core alternate for R8 to protect PEs destinations. For example, a 463 tunnel from R8 to R3 may enable to prefer R3 over PE2 as best 464 alternate if policy permits. For example : 466 o tunnel alternates must be prefered over link alternate. 468 o Consider tunnel and link alternate as same level and use another 469 criteria to prefer R3 over PE2. 471 4. Operational aspects 473 4.1. Controlling LFA computation 475 LFA computation may be CPU intensive depending on the scope of 476 application. On a network suffer from instability, it may be useful 477 to control LFA SPF computation as it is done in most implementations 478 for main SPF. 480 An implementation MAY allow throttling of LFA computation and LFA 481 computation SHOULD have his own throttling values. The procedures 482 for throttling is beyond the scope of this document: 484 Consider a stable converged network and main SPF and LFA SPF 485 configured with throttling. 487 o T0 : LSP is received advertising a topology change. 489 o T0 : Main SPF is scheduled at t0+x (x depending on throttling 490 value of main SPF). 492 o T0+x : main SPF is computed. 494 o T1 : main SPF finishes and LFA computation is scheduled at T1+y (y 495 depending on throttling value of LFA computation). 497 o T1+y : LFA SPF computation starts. 499 o T2 : LFA SPF finishes and alternates are computed and installed. 501 If a new LSP is received in a short period of time after T2, values 502 of x and y may increase based on implementation algorithm of 503 throttling. 505 It may happen that a network topology change is received while LFA 506 SPF is computing or during LFA SPF throttling time. Priority SHOULD 507 be given to main SPF computation compared to LFA computation. An 508 implementation SHOULD abort any running or scheduled LFA computation 509 if a topology change is received. 511 Moreover when a topology change is received, current programmed 512 alternates may not be loopfree anymore, so an implementation MAY drop 513 programmed alternates when a topology change is received, and 514 recompute and reinstall the alternates again later after completion 515 of LFA SPF. 517 4.2. Manual triggering of FRR 519 Service providers often use using manual link shutdown (using router 520 CLI) to perform some network changes. An implementation MUST support 521 triggering/activating LFA Fast Reroute for a given link when a manual 522 shutdown is done. 524 4.3. Required local information 526 LFA introduction requires some enhancement in standard routing 527 information provided by implementations. Moreover, due to the non 528 100% coverage, coverage informations also is required. 530 Hence an implementation : 532 o MUST be able to display for every destination, the primary nexthop 533 as well as the alternate nexthop information 535 o MUST provide coverage information per activation domain of LFA 536 (area, level, topology, instance, virtual router ...) 538 o MUST provide total percentage of coverage 540 o SHOULD provide percentage of coverage per link 542 o MAY provide percentage of coverage per priority if implementation 543 supports prefix-priority insertion in RIB/FIB 545 o SHOULD provide a reason for chosing an alternate (policy and 546 criteria) 548 o MAY provide a mean to clear programmed backup and trigger backup 549 recomputation 551 o MAY provide the list of non protected destinations and the reason 552 why they are not protected (no protection required or no alternate 553 available) 555 4.4. Coverage followup 557 It is pretty easy to evaluate coverage of a network in a nominal 558 situation, but topology changes may change the coverage. In some 559 situations, network may not be able to provide the required level of 560 protection. Hence, it becomes very important for service providers 561 to get alerted about changes of coverage. 563 An implementation SHOULD : 565 o provide an alert system if total coverage is below a defined 566 threshold or comes back to a normal situation. 568 o provide an alert system if coverage of a specific link is below a 569 defined threshold or comes back to a normal situation 571 An implementation MAY : 573 o provide an alert system if a specific destination is not protected 574 anymore or when protection comes back up for this destination 576 Although the procedures for providing alerts are beyond the scope of 577 this document, we recommend that implementations should consider 578 standard and well used mechanisms like syslog or SNMP traps. 580 5. Security Considerations 582 This document does not introduce any change in security consideration 583 compared to [RFC5286]. 585 6. Acknowledgements 587 7. IANA Considerations 589 This document has no action for IANA. 591 8. References 593 8.1. Normative References 595 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 596 Requirement Levels", BCP 14, RFC 2119, March 1997. 598 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 599 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 601 8.2. Informative References 603 [I-D.previdi-isis-te-metric-extensions] 604 Previdi, S., Giacalone, S., Ward, D., Drake, J., Atlas, 605 A., and C. Filsfils, "IS-IS Traffic Engineering (TE) 606 Metric Extensions", 607 draft-previdi-isis-te-metric-extensions-02 (work in 608 progress), October 2012. 610 [I-D.shand-remote-lfa] 611 Bryant, S., Filsfils, C., Shand, M., and N. So, "Remote 612 LFA FRR", draft-shand-remote-lfa-01 (work in progress), 613 June 2012. 615 [RFC3630] Katz, D., Kompella, K., and D. Yeung, "Traffic Engineering 616 (TE) Extensions to OSPF Version 2", RFC 3630, 617 September 2003. 619 [RFC3906] Shen, N. and H. Smit, "Calculating Interior Gateway 620 Protocol (IGP) Routes Over Traffic Engineering Tunnels", 621 RFC 3906, October 2004. 623 [RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute 624 Extensions to RSVP-TE for LSP Tunnels", RFC 4090, 625 May 2005. 627 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 628 Engineering", RFC 5305, October 2008. 630 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", 631 RFC 5714, January 2010. 633 [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free 634 Convergence", RFC 5715, January 2010. 636 [RFC6571] Filsfils, C., Francois, P., Shand, M., Decraene, B., 637 Uttaro, J., Leymann, N., and M. Horneffer, "Loop-Free 638 Alternate (LFA) Applicability in Service Provider (SP) 639 Networks", RFC 6571, June 2012. 641 Authors' Addresses 643 Stephane Litkowski 644 Orange 646 Email: stephane.litkowski@orange.com 648 Bruno Decraene 649 Orange 651 Email: bruno.decraene@orange.com 652 Clarence Filsfils 653 Cisco Systems 655 Email: cfilsfil@cisco.com 657 Kamran Raza 658 Cisco Systems 660 Email: skraza@cisco.com