idnits 2.17.00 (12 Aug 2021) /tmp/idnits54666/draft-ietf-idr-add-paths-guidelines-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (Apr 25, 2016) is 2210 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'Basu-ibgp-osc' is mentioned on line 544, but not defined == Outdated reference: draft-ietf-idr-add-paths has been published as RFC 7911 == Outdated reference: draft-ietf-idr-bgp-optimal-route-reflection has been published as RFC 9107 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IDR Working Group J. Uttaro 2 Internet-Draft AT&T 3 Intended status: Informational 4 Expires: Oct 25, 2016 P. Francois 5 IMDEA Networks 7 K. Patel 8 Cisco Systems 10 J. Haas 11 Juniper Networks 13 A. Simpson 14 R. Fragassi 15 Nokia 17 Apr 25, 2016 19 Best Practices for Advertisement of Multiple Paths in IBGP 20 draft-ietf-idr-add-paths-guidelines-08.txt 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as Internet- 30 Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/ietf/1id-abstracts.txt 40 The list of Internet-Draft Shadow Directories can be accessed at 41 http://www.ietf.org/shadow.html 43 This Internet-Draft will expire on October 25, 2016. 45 Copyright Notice 47 Copyright (c) 2012 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Abstract 62 Add-Paths is a BGP enhancement that allows a BGP router to advertise 63 multiple distinct paths for the same prefix/NLRI. This provides a 64 number of potential benefits, including reduced routing churn, faster 65 convergence and better loadsharing. 67 This document provides recommendations to implementers of Add-Paths 68 so that network operators have the tools needed to address their 69 specific applications and to manage the scalability impact of Add- 70 Paths. A router implementing Add-Paths may learn many paths for a 71 prefix and must decide which of these to advertise to peers. This 72 document analyses different algorithms for making this selection and 73 provides recommendations based on the target application. 75 Table of Contents 77 1. Introduction...................................................4 78 2. Terminology....................................................4 79 3. Add-Paths Applications.........................................5 80 3.1. Fast Connectivity Restoration.............................5 81 3.2. Load Balancing............................................7 82 3.3. Churn Reduction...........................................7 83 3.4. Suppression of MED-Related Persistent Route Oscillation...8 84 4. Implementation Guidelines......................................8 85 4.1. Capability Advertisement..................................8 86 4.2. Receiving Multiple Paths..................................9 87 4.3. Advertising Multiple Paths...............................10 88 4.3.1. Path Selection Modes................................11 89 4.3.1.1. Advertise N Paths..............................11 90 4.3.1.2. Advertise All Paths............................12 91 4.3.1.3. Advertise All AS-Wide Best Paths...............13 92 4.3.1.4. Advertise ALL AS-Wide Best and Next-Best Paths 93 (Double AS Wide)........................................14 94 4.3.1.5. Advertise Used Multipaths......................14 95 4.3.2. Derived Modes from Bounding the Number of Advertised 96 Paths......................................................15 97 4.3.3. Derived Modes from Adding N More Paths..............15 98 5. Deployment Considerations.....................................16 99 5.1. Introducing Add-Paths into an Existing Network...........16 100 5.2. Scalability Considerations...............................18 101 5.3. Routing Consistency Considerations.......................18 102 5.4. Consistency between Advertised Paths and Forwarding Paths19 103 5.5. Interactions with Route Filtering........................20 104 5.6. Routing Churn............................................20 105 6. Security Considerations.......................................21 106 7. Acknowledgments...............................................21 107 8. Contributors..................................................21 108 9. IANA Considerations...........................................21 109 10. References...................................................21 110 10.1. Normative References....................................21 111 10.2. Informative References..................................22 112 Appendix A. Other Path Selection Modes...........................23 113 A.1. Advertise Neighbor-AS Group Best Path....................23 114 A.2. Best LocPref/Second LocPref..............................23 115 A.3. Advertise Paths at decisive step -1......................24 117 1. Introduction 119 The BGP Add-Paths capability enhances current BGP implementations by 120 allowing a BGP router to exchange with its BGP peers more than one 121 path for the same destination/NLRI. The base BGP standard [RFC4271] 122 does not provide for such a capability. If a BGP router learns 123 multiple paths for the same NLRI (from multiple peers), it selects 124 only one as its best path and advertises the best path to its peers. 125 The primary goal of Add-Paths is to increase the visibility of paths 126 within an IBGP system. This has the effect of improving robustness 127 in case of failure, reducing the number of BGP messages exchanged 128 during such an event, and offering the potential for faster re- 129 convergence. Through careful selection of the paths to be advertised, 130 Add-Paths can also prevent routing oscillations. 132 The purpose of this document is to provide the necessary 133 recommendations to the implementers of Add-Paths so that network 134 operators have the tools needed to address their specific 135 applications and to manage the scalability impact of Add-Paths while 136 maintaining routing consistency. A router implementing Add-Paths may 137 learn many paths for a prefix and must decide which of these to 138 advertise to peers. This document analyses different algorithms for 139 making this selection and provides recommendations based on the 140 target application. 142 2. Terminology 144 In this document the following terms are used: 146 Add-Paths peer: refers a peer with which the local system has agreed 147 to receive and/or send NLRI with path identifiers 149 Primary path: A path toward a prefix that is considered a best path 150 by the BGP decision process [RFC4271] and actively used for 151 forwarding traffic to that prefix. A router may have multiple primary 152 paths for a prefix if it implements multipath. 154 Diverse path: A BGP path associated with a different BGP next-hop and 155 BGP router than some other set of paths. The BGP router associated 156 with a path is inferred from the ORIGINATOR_ID attribute or, if there 157 is none, the BGP Identifier of the peer that advertised the path. 159 Backup path: A diverse path with respect to the primary paths toward 160 a prefix. The backup path can be used to forward traffic to the 161 destination if the primary paths fail. 163 Optimal backup path: The backup path that will be selected as the new 164 best path for a prefix when all primary paths are removed/withdrawn. 166 AS-Wide preferred paths: All paths that are considered as best when 167 applying rules of the BGP decision process up to the IGP tie-break. 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 171 document are to be interpreted as described in [RFC2119]. 173 3. Add-Paths Applications 175 [draft-pmohapat] presents the applications that would benefit from 176 multiple paths advertisement in IBGP. They are summarized in the 177 following subsections. 179 3.1. Fast Connectivity Restoration 181 With the dissemination of backup paths, fast connectivity restoration 182 and convergence can be achieved. If a router has a backup path, it 183 can directly select that path as best upon failure of the primary 184 path. This minimizes packet loss in the dataplane. Sending multiple 185 paths in IBGP allows routers to receive backup paths when path 186 visibility is not sufficient with classical BGP. This is especially 187 useful when Route Reflection is used. 189 Consider a network such as the one depicted in Figure 1 and suppose 190 that none of the routers support Add-Paths. AS1 receives from AS3 two 191 paths (A and B) to a particular destination in AS3. Suppose path A is 192 preferred over path B due to path A having a lower MED (multi-exit 193 discriminator). 195 AS1 uses a route reflector RR1 to reduce the scale of its IBGP mesh. 196 If the routers in AS1 are not configured to advertise the best 197 external path [BEST-EXT] then RR1 knows about only path A during 198 steady state because router B suppresses/withdraws its advertisement 199 of path (B) to RR1. If the routers in AS1 do support best-external 200 [BEST-EXT] then RR1 may have both paths in its Adj-RIB-IN, but 201 regardless, RR1 can only advertise its best path A to its peers, 202 including router D. 204 ======== ===================== 205 = +---+ +---+ +---+ 206 = |RTR|________|RTR| |RTR| 207 = | E | | A | | C | 208 = +---+Path A->+---+ AS1 +---+ 209 = = = \ / = 210 = = = \ / = 211 = = = \ / = 212 = = = \ / = 213 = AS3 = = +---+ = 214 = = = |RR | = 215 = = = | 1 | = 216 = = = +---+ = 217 = = = / \ = 218 = = = / \ = 219 = = = / \ = 220 = = = / \ = 221 = +---+Path B->+---+ +---+ 222 = |RTR| ______|RTR| |RTR| 223 = | F | | B | | D | 224 = +---+ +---+ +---+ 225 ======== ===================== 227 Figure 1: Example Topology 229 Under these circumstances consider the steps required to restore 230 traffic from router D to the destination in AS3 when the link between 231 Router A and Router E fails. (Assume that router A set next-hop to 232 self when advertising path A and that router B is not configured for 233 best-external [BEST-EXT]). 235 1. Router A sends a BGP UPDATE message to RR1 withdrawing its 236 advertisement of path (A). 238 2. RR1 receives the withdrawal, and propagates it to its other client 239 peers, routers B, C and D. 241 3. When router B receives the withdrawal of path (A) it reruns its 242 decision process and selects path (B) as its new best path. Router 243 B advertises path (B) to RR1. 245 4. RR1 reruns its decision process and selects path (B) as its new 246 best path. RR1 advertises path (B) to client peers A, C and D. 248 5. Router D reruns its decisions process, determines path (B) to be 249 the best path, and updates its forwarding table. After this step 250 traffic from router D to the destination in AS3 is restored (the 251 traffic path has changed to go through router B and then router 252 F). 254 With the use of Add-Paths, the convergence time for the above path 255 failure example can be reduced considerably. The main reason for the 256 improvement is that Add-Paths allows router D to be aware of more 257 than one path to the destination in AS3 prior to the failure of the 258 best path (A). In steady-state (with no failures) router B decides, 259 as before, that path (A) is its best path but because of its Add- 260 Paths (or best-external [BEST-EXT]) configuration it also advertises 261 path (B) to RR1. Using Add-Paths RR1 can advertise both learned paths 262 to its IBGP peers, including router D. Now consider again the 263 scenario where the link between Router A and Router E fails. In this 264 case, with Add-Paths, fewer steps are required to achieve re- 265 convergence: 267 1. Router A sends a BGP UPDATE message to RR1 withdrawing its 268 advertisement of path (A). 270 2. RR1 receives the withdrawal, and propagates it to its other client 271 peers, routers B, C and D. 273 3. Router D receives the withdrawal, reruns the decision process and 274 updates the forwarding entry for the destination in AS3. 276 4. Traffic from router D to the destination in AS3 follows the 277 alternate path through router B-router F provided that the other 278 routers on this forwarding path have the same synchronized IP FIB 279 view or else encapsulation is used to avoid IP FIB lookups until 280 traffic reaches router F; refer to section 5.4 for more discussion 281 of this topic. 283 3.2. Load Balancing 285 Increased path diversity allows routers to install several paths in 286 their forwarding tables in order to load balance traffic across those 287 paths. The forwarding consistency considerations mentioned in section 288 5.4 also apply to this use case. 290 3.3. Churn Reduction 292 When Add-Paths is used in an AS, the availability of additional 293 backup paths means failures can be recovered locally with much less 294 path exploration in IBGP and less intermediate updates sent to EBGP 295 peers. When the preferred backup path is the post-convergence path, 296 churn is minimized. 298 3.4. Suppression of MED-Related Persistent Route Oscillation 300 As described in [oscillation], Add-Paths is a valuable tool in 301 helping to stop persistent route oscillations caused by comparison of 302 paths based on MED in topologies where route reflectors or the 303 confederation structure hide some paths. With the appropriate path 304 selection algorithm Add-Paths stops these route oscillations because 305 the same set of paths are consistently advertised by the route 306 reflector or the confederation border router and the routers 307 receiving this set of paths make stable routing decisions about the 308 best path. 310 4. Implementation Guidelines 312 This section discusses recommendations for the implementation of Add- 313 Paths. The following topics are addressed: 315 . Considerations related to Add-Paths capability negotiation 317 . Receiving BGP routes from Add-Paths peers 319 . Advertising BGP routes to Add-Paths peers. This section 320 discusses various path selection algorithms, which are the 321 procedures available to an Add-Paths speaker for deciding which 322 set of paths to advertise to an Add-Paths peer for particular 323 prefixes. 325 4.1. Capability Advertisement 327 +---+ +---+ 328 |RTR|___________|RTR| 329 | A | <-BGP-> | B | 330 +---+ +---+ 332 Figure 2: BGP Peering Example 334 In Figure 2, in order for a router A to receive multiple paths per 335 NLRI from peer B, for a particular address family (AFI=x, SAFI=y), 336 the BGP capabilities advertisements during session setup MUST 337 indicate that peer B wants to send multiple paths for AFI=x, SAFI=y 338 and that router A is willing to receive multiple paths for AFI=x, 339 SAFI=y. Similarly, in order for router A to send multiple paths per 340 NLRI to peer B, for a particular address family (AFI=x, SAFI=y), the 341 BGP capabilities advertisements MUST indicate that router A wants to 342 send multiple paths for AFI=x, SAFI=y and peer B is willing to 343 receive multiple paths for AFI=x, SAFI=y. Refer to [Add-Paths] for 344 details of the Add-Paths capabilities advertisement. 346 The capabilities of the local router MUST be configurable per peer 347 and per address family, and SHOULD support the ability to configure 348 send-only operation or receive-only operation. The default mode of 349 operation is to both send and receive. 351 4.2. Receiving Multiple Paths 353 Currently, per standard BGP behavior, if a BGP router receives an 354 advertisement of an NLRI and path from a specific peer and that peer 355 subsequently advertises the same NLRI with different path information 356 (e.g. a different NEXT_HOP and/or different path attributes) the new 357 path effectively overwrites the existing path. 359 When Add-Paths has been negotiated with the peer, the newly 360 advertised path should be stored in the RIB-IN along with all of the 361 paths previously advertised (and not withdrawn) by the peer. 363 When an Add-Paths speaker has negotiated to receive multiple paths 364 for (AFIx, SAFIy) from a peer all advertisements and withdrawals of 365 NLRI within that address family from that peer MUST include a path 366 identifier, as described in [Add-Paths]. The path identifiers have no 367 significance to the receiving peer. If the combination of NLRI and 368 path identifier in an advertisement from a peer is unique (does not 369 match an existing route in the RIB-IN from that peer) then the route 370 is added to the RIB-IN. If the combination of NLRI and path 371 identifier in a received advertisement is the same as an existing 372 route in the RIB-IN from the peer then the new route replaces the 373 existing one. If the combination of NLRI and path identifier in a 374 received withdrawal matches an existing route in the RIB-IN from the 375 peer then that route shall be removed from the RIB-IN. 377 A BGP UPDATE message from an Add-Paths peer could advertise and 378 withdraw more than one NLRI belonging to one or more address 379 families. The receiving BGP router MUST NOT expect or require its 380 peer to send path identifiers with all routes belonging to all 381 address families. It also MUST NOT expect that all of the received 382 path identifiers in the UPDATE message are the same. 384 4.3. Advertising Multiple Paths 386 [Add-Paths] specifies how to encode the advertisement of multiple 387 paths towards the same NLRI over an IBGP session, but provides no 388 details about which set of multiple paths should be advertised. In 389 this section, four path selection algorithms are described and 390 compared with each other. These 4 algorithms are considered to be the 391 most useful across the widest range of deployment scenarios. The list 392 of possible path selection algorithms is much larger and for the 393 interested reader Appendix A provides information about other path 394 selection modes that were considered in historical versions of this 395 document. 397 In comparing any two path selection algorithms the following factors 398 should be taken into account: 400 Control Plane Load: When a router receives multiples paths for a 401 prefix from an IBGP client it has to store more paths in its Adj-Rib- 402 Ins. 404 Control Plane Stress: Coping with multiple IBGP paths has two 405 implications on the computation that a router has to handle. First, 406 it has to compute the paths to send to its peers, i.e. more than the 407 best path. Second, it also has to handle the potential churn related 408 to the exchange of those multiple paths. 410 MED/IGP oscillations: BGP sometimes suffers from routing oscillations 411 when the physical topology differs from the logical topology, or when 412 the MED attribute is used. This is due to the limited path 413 visibility when a single path is advertised and Route Reflection is 414 used. Increasing the path visibility by advertising multiple paths 415 can help solve this issue. 417 Path optimality: When a single path is advertised, border routers do 418 not always receive the optimal path. As an example, Route Reflectors 419 typically send a single path chosen based on their own IGP tie- 420 breaking procedure (although modifications to this are proposed in 421 [BGP-ORR]). Increasing path visibility would also help routers to 422 learn the path that is best suited for them w.r.t. the IGP tie- 423 breaking. 425 Backup path optimality: Multiple paths advertisement gives routers 426 the opportunity to have a backup path. However, some backup paths 427 are better than others. Indeed, when a link failure occurs, if a 428 router already knows its post-convergence path, the BGP re- 429 convergence is straightforward and traffic is less impacted by the 430 transient use of non-best forwarding paths. 432 Convergence time: Advertising multiple paths in IBGP has an impact on 433 the convergence time of the BGP system. More paths need to be 434 exchanged, but on the other hand, the routing information is 435 propagated faster. With an increased path visibility, there is less 436 path exploration during the convergence. Also, with the availability 437 of backup paths, convergence time in case of failure is also reduced. 439 Target application: Depending on the application type, the number of 440 paths to advertise for a prefix will vary. For example, for fast 441 connectivity restoration, it may be sufficient to advertise only 2 442 paths to a peer so that it will have the best path and the optimal 443 backup path. For load balancing purposes, it may be desirable to 444 advertise more paths, but inclusion of the optimal backup path in the 445 set may be less critical. For route oscillation elimination, it is 446 required to advertise all group-best paths for a prefix. 448 4.3.1. Path Selection Modes 450 The following subsections describe the 4 main path selection modes 451 considered in this draft. Each mode is considered either MANDATORY or 452 OPTIONAL. A MANDATORY mode MUST be supported by any implementation 453 that claims compliance with this document. An OPTIONAL made may be 454 supported by some but not all implementations. 456 The path selection mode and any parameters applicable to the mode 457 MUST be configurable per AFI/SAFI and per peer and SHOULD be 458 configurable per prefix. To illustrate the value of this flexibility, 459 consider a prefix P that belongs to an address family F requiring 460 path IDs to be included with every NLRI (e.g. due to the Add-Paths 461 capability negotiation with the peer). If P is one of a number of 462 prefixes that would not benefit from the advertisement of multiple 463 paths then it is perfectly valid to send only the best path. 465 4.3.1.1. Advertise N Paths 467 With the 'Advertise N Paths' mode (Add-N for short) a router 468 advertises up to N paths per prefix towards an Add-Paths peer. The 469 computational cost of this mode is the selection of the N paths. 470 There must be a ranking of the paths in order to ensure consistency 471 in the set of paths advertised to different Add-Paths peers. The 472 recommended way for a router to consistently select N paths is to run 473 its decision process N times and consider at each iteration only the 474 paths that meet all of the following criteria: 476 (a) not selected during a previous iteration 477 (b)_diverse with respect to previously selected paths (see section 478 2 for the definition of a diverse path) 480 (c) not rejected by route filters or split horizon advertisement 481 rules 483 The memory cost of this path selection mode is bounded: a router 484 receives a maximum of N paths for each prefix from each peer. With N 485 equal to 2, all routers know at least two paths and can provide local 486 recovery in case of failure. If multipath routing is to be deployed 487 in the AS, N can be increased to provide more alternate paths to the 488 routers. 490 Path optimality and backup path optimality are not guaranteed, i.e. 491 it is possible that the optimal path of a router (w.r.t. IGP tie- 492 breaking) is not contained in the set of paths advertised by its 493 Route Reflector. However, as the number of paths that it receives is 494 higher than without Add-Paths, it is possible that the chosen nexthop 495 is closer to the router in terms of IGP cost than the nexthop that 496 would have been chosen without Add-Paths. 498 This solution helps to reduce routing oscillations, but not in all 499 cases. Indeed, path visibility is still constrained by the maximum 500 number of paths, and configurations with routing oscillations still 501 exist. 503 This path selection mode is MANDATORY. The default value of N MUST be 504 2. The value of N MUST be configurable and MAY be upper bounded by 505 an implementation. 507 The default value of 2 ensures the availability of a backup path (if 508 2 or more paths have been received) while maintaining minimum impact 509 to memory and churn. If Add-N with N equal to 2 is insufficient to 510 meet another objective (e.g. loadsharing or MED/IGP oscillation) 511 there is always a large enough value of N that can selected, if N is 512 configurable, to meet that objective. 514 4.3.1.2. Advertise All Paths 516 A simple rule for advertising multiple paths in IBGP is to advertise 517 to IBGP peers all received paths minus those blocked by export 518 filters or applicable split horizon rules. This solution is easy to 519 implement, but the counterpart is that all those paths need to be 520 stored by all routers that receive them, which can be quite 521 expensive. If a path to a prefix P is advertised to N border 522 routers, with a Full Mesh of IBGP sessions, all routers have N paths 523 in their Adj-RIB-Ins. If Route Reflection is used and each client is 524 connected to 2 Route Reflectors, it may learn up to 2*N paths. 526 This solution gives a perfect path visibility to all routers, thus 527 limiting churn and losses of connectivity in case of failure. Indeed, 528 this allows routers to select their optimal primary path, and to 529 switch on their optimal backup path in case of failure. 531 However, as more paths are exchanged, the number of BGP messages 532 disseminated during the initial IBGP convergence can be high, and 533 convergence may be slower. 535 Routing oscillations are prevented with this rule, because a router 536 won't need to withdraw a previously advertised path when its best 537 path changes. 539 This path selection mode is OPTIONAL. 541 4.3.1.3. Advertise All AS-Wide Best Paths 543 Another choice is to consider the set of paths with the same AS-wide 544 preference [Basu-ibgp-osc], i.e. the paths that all routers would 545 select based on the rules of the decision process that are not 546 router-dependent (i.e. Local-preference, ASPath length and MED 547 rules). Thus, for a given router, those paths only differ by the IGP 548 cost to the nexthop or by the tie-breaking rules. The paths actually 549 advertised to a peer are the set of AS-wide best paths minus those 550 blocked by export filters or applicable split horizon rules. 552 The computational cost is reduced, as a router only has to send the 553 paths remaining before applying the IGP tie-breaking rule. However, 554 it is difficult to predict how many paths will be stored, as it 555 depends on the number of EBGP sessions on which this prefix is 556 advertised with the best AS-wide preference. 558 With this rule, the routing system is optimal: all routers can choose 559 their best path (or best paths if multipath is used) based on their 560 router-specific preferences, i.e. the IGP cost to the nexthop. Hot 561 potato routing is respected. Also, MED oscillations are prevented, 562 because the path visibility among the AS-wide preferred paths is 563 total. 565 The existence of a backup path is not guaranteed. If only one path 566 with the AS-wide best attributes exists, there is no backup path 567 disseminated. However, if such a path exists, it is optimal as it 568 has the same AS-wide preference as the primary 569 This path selection mode is OPTIONAL. 571 4.3.1.4. Advertise ALL AS-Wide Best and Next-Best Paths (Double 572 AS Wide) 574 This variant of "Advertise All AS Wide Best Paths" trades-off the 575 number of paths being propagated within the iBGP system for post- 576 convergence alternate paths availability and routing stability. A BGP 577 speaker running this mode will select, as candidates for 578 advertisement, its AS Wide Best paths, plus all the AS Wide Best 579 paths obtained when removing the first ones from consideration. The 580 paths actually advertised to a peer are the double-AS_wide candidate 581 paths minus those blocked by export filters or applicable split 582 horizon rules. 584 Under this mode, a BGP speaker knows multiple AS-Wide best paths or 585 the AS-Wide best path and all the second AS-Wide best paths, so that 586 routing optimality and backup path availability are ensured. Note 587 that the post-convergence paths will be known by each BGP node in an 588 AS supporting this mode. 590 The computation complexity of this mode is relatively low as it 591 requires the router to run the usual BGP Decision Process up to and 592 including the MED rule. The set of paths remaining after that step 593 form the AS-Wide best paths. Next, a best path selection algorithm 594 is run up to and including the MED rule, based on the paths that are 595 not in the set of AS-Wide best paths. 597 The number of paths for a prefix p, known by a given router of the 598 AS, is the number of AS-Wide best and second AS-Wide best paths found 599 at the Borders of the AS. 601 MED Oscillations are avoided by this mode, both for the primary and 602 alternate paths being picked under this mode. 604 This path selection mode is OPTIONAL. 606 4.3.1.5. Advertise Used Multipaths 608 Many BGP implementations support BGP Multipath, allowing a BGP router 609 to use multiple BGP next-hops for forwarding towards a prefix/NLRI 610 when the corresponding paths are considered equally preferred. In 611 cases where the deployment of Add-Paths is mostly aimed at providing 612 multiple paths for load balancing with BGP Multipath, a natural 613 approach for a BGP speaker supporting Add-paths is to advertise the 614 paths that are selected by its BGP multipath selection algorithm. 616 BGP Multipath selection algorithms can vary depending on the 617 implementation and configuration options. An Add-Paths mode based on 618 BGP multipath is considered practical because it lets the BGP path 619 propagation be aligned with the load balancing objectives expressed 620 by the operator configuring BGP multipath. 622 In some deployment scenarios, it is likely that such a mode leads to 623 the selection and advertisement of a large number of paths for some 624 NLRI, and hence should be controlled as per the mechanism described 625 in section 4.3.2. In case the number of multipaths exceeds the upper 626 bound on the number of advertised paths the ones that should be 627 advertised are those with the highest degree of preference by the BGP 628 decision process. This can be achieved if the advertising router has 629 strictly ordered all of its paths. 631 This path selection mode is OPTIONAL. 633 4.3.2. Derived Modes from Bounding the Number of Advertised Paths 635 For some of the modes discussed in section 4.3.1 the number of paths 636 selected by the algorithm (M) is not predictable in advance, and 637 depends on factors such as network topology. For such modes, 638 implementations MAY support the ability to limit the number of 639 advertised paths to some value N that is less than M. 641 It must be noted that the resulting derivative mode may no longer 642 meet the properties stated in section 4.3.1 (which assumes N=M). This 643 is particularly true for the MED oscillation avoidance property. The 644 use of such bounds thus needs to be considered carefully in 645 deployments where MED oscillation avoidance is a key goal of 646 deploying Add-path. If fast recovery is the main objective then it is 647 reasonable and sufficient to set N to 2. If the main goal is 648 improved load-balancing then limiting N to number of ECMP paths 649 supported by the forwarding planes of the receiving routers is also a 650 reasonable practice. 652 4.3.3. Derived Modes from Adding N More Paths 654 Some modes discussed in section 4.3.1 may result in only one or a few 655 selected paths, depending on network topology and/or router 656 configuration, and this small number of paths may not meet minimum 657 requirements for backup path or load balancing purposes. When using 658 such modes implementations MAY support the ability to add N more 659 paths to the set returned by the basic selection algorithm as 660 described in section 4.3.1. The N more paths should be the N next- 661 best paths, as determined by the BGP decision process. 663 It must be noted that the resulting derivative mode may no longer 664 meet the properties stated in section 4.3.1 (which assumes N=0). 666 5. Deployment Considerations 668 This section proposes a potential strategy for introducing Add-Paths 669 into an existing network and discusses considerations related to 670 scalability, routing consistency and routing churn. 672 5.1. Introducing Add-Paths into an Existing Network 674 There are many possible ways that Add-Paths can be introduced into an 675 existing deployed network. It is not a practical goal for this 676 document to list all of these options and discuss the pros and cons 677 for each one. It is however valuable to consider an example migration 678 strategy that may be relatively common among layer 3 service 679 providers that currently use route reflectors for scaling. This 680 example migration strategy is attractive for several reasons: 682 1. It involves incremental steps that allow the impact of Add- 683 Paths to be carefully evaluated before proceeding to the next 684 step. 686 2. It recognizes the fact that many routers will require at least 687 a software upgrade to support Add-Paths, and it will not be 688 practical to upgrade all of these routers all at once. 690 3. It reduces convergence time (in stages) with a relatively 691 moderate increase in router memory and CPU demands. 693 The example migration strategy assumes a starting point of a deployed 694 network with one or more RR clusters. None of the routers in the 695 network support Add-Paths without an upgrade, but some do support 696 best-external. Two of the clusters in this network are shown in 697 Figure 3. In cluster 2, PE1, PE2, RRy and RRz are configured for 698 best-external. This makes RRy and RRz aware of all external paths 699 received by PEs in cluster 2 and ensures that RRy and RRz can 700 advertise a path to the RRs in cluster 1 if it happens that the best 701 overall route is learned from cluster 1. It doesn't however allow 702 other clusters to be aware of more than one path per prefix learned 703 by cluster 2. 705 ========== ================== 706 = = = = 707 = +---+ +---+ +---+ = 708 = |RR |---------------|RR | <-BE| | = 709 = |a | |y |------|PE1| = 710 = | | | | | | = 711 = +---+ +---+ +---+ = 712 = | = = | \ / = 713 = | = = | \ / = 714 = | = = | \/ = 715 = | = = | /\ = 716 = | = = | / \ = 717 = | = = | / \ = 718 = +---+ +---+ +---+ = 719 = |RR |---------------|RR |------| | = 720 = |b | |z | <-BE|PE2| = 721 = | | | | | | = 722 = +---+ +---+ +---+ = 723 = = = = 724 ========== ================== 725 RR Cluster 1 RR Cluster 2 727 Figure 3: RR Cluster Before Add-Paths 729 The following sequence of steps occurs in the example migration 730 strategy: 732 1. The route reflectors are upgraded in each cluster, one by one, to 733 support Add-Paths. This allows the intra- and (eventually) inter- 734 cluster RR-to-RR sessions to start using Add-Paths. All RRs are 735 configured to use the Add-N, N=2 path selection algorithm. The 736 effect of this step is to slightly reduce convergence time when 737 the best and second-best paths for a prefix are learned by a 738 single cluster (such as cluster 2 in Figure 3). 740 2. The clients are upgraded in each cluster, one by one, to support 741 Add-Paths. On the RRs Add-Paths is configured to use the Add-N, 742 N=2 path selection algorithm towards upgraded client peers. At 743 this step clients are configured in the receive-only Add-Paths 744 mode. This means that best-external continues to operate as 745 before in the client-to-RR direction. The effect of this step is 746 to ensure that all clients have two paths per prefix for ECMP or 747 fast failover, assuming at least 2 paths are available. 749 3. The clients are re-configured to use Add-Paths in the transmit 750 direction towards their RR peers. This causes Add-Paths to replace 751 the best-external behavior. The effect of this step is to free up 752 CPU and memory resources related to the storage of paths that are 753 third best or worse. If a cluster such as the one in Figure 3 had 754 50 clients, and 10 of these learned an external route for the same 755 prefix, then the RRs in that cluster would need to store up to 12 756 paths for that prefix. This would be true even if the 2 best 757 overall paths came from another cluster. Contrast this with the 758 use of Add-Paths in the client-to-RR direction. For the same case 759 the route reflectors need only store the 2 paths learned from non- 760 client peers. 762 5.2. Scalability Considerations 764 In terms of scalability, we note that advertising multiple paths per 765 prefix requires more memory and state than the current behavior of 766 advertising the best path only. A BGP speaker that does not implement 767 Add-Paths maintains send state information in its prefix data 768 structure per neighbor as a way to determine that the prefix has been 769 advertised to the neighbor. With Add-Paths, this information has to 770 be replicated on a per path basis that needs to be advertised. 771 Mathematically, if K is the number of neighbors, s is the "send 772 state" size per prefix in bytes, and N is the number of advertised 773 paths per prefix, then the current memory requirement for BGP "send 774 state" = K * s bytes; with Add-Paths, it becomes K * s * N bytes. In 775 practice, this value may be reduced with implementation optimizations 776 similar to attribute sharing. Receiving multiple paths per prefix 777 also requires more memory and state since each path is a separate 778 entry in the Adj-RIB-Ins. 780 5.3. Routing Consistency Considerations 782 As discussed in previous sections Add-Paths can help routers select 783 more optimal paths and it can help deal with certain route 784 oscillation conditions arising from incomplete knowledge of the 785 available paths. But depending on the path selection algorithm and 786 how it is used Add-Paths is not immune to its own cases of routing 787 inconsistencies. If the BGP routers within an AS do not make 788 consistent routing decisions about how to reach a particular 789 destination, route oscillations may occur and these route 790 oscillations may result in traffic loss. 792 Optimizing an Add-Paths deployment for scalability may run counter to 793 routing consistency goals, and in these circumstances operators have 794 to decide the correct tradeoff for their particular deployment. For 795 example the Advertise All Paths mode, if applied to many prefixes, is 796 far from ideal from a scalability perspective but it does guarantee 797 routing consistency and correctness. A path selection mode that 798 allows better control over scalability is the Advertise N paths mode, 799 but this is susceptible to routing inconsistency. First, if the N 800 paths do not include the best path from each neighbor AS group then 801 route oscillation cannot be precluded. Second, if the advertising 802 router (e.g. an RR) advertises N paths to peer_n and M paths to 803 peer_m, and N < M, care must be exercised to ensure that all paths 804 advertised to peer_n are included in the paths advertised to peer_m. 805 This can be assured as long as the advertising router has strictly 806 ordered all of its paths. 808 5.4. Consistency between Advertised Paths and Forwarding Paths 810 When using Add-Paths, routers may advertise paths that they have not 811 selected as best, and that they are thus not using for traffic 812 forwarding. This is generally not an issue if encapsulation is used 813 in the AS as described in [RFC4364] and all forwarding decisions, 814 including by the tunnel egress router, are based on label information 815 - i.e. if only the ingress router performs an IP FIB lookup. In this 816 situation the dataplane path followed by the packets is the one 817 intended by the ingress router, and corresponds to the control plane 818 path it selected. 820 On the other hand, if Add-Paths is used in a network without 821 encapsulation, some scenarios can result in forwarding deflection or 822 loops. Such forwarding anomalies already occur without Add-Paths, 823 when the routers on the forwarding path do not have a synchronized 824 view of the best path. They will deflect the traffic to their own 825 local view of the best path, and, when multiple deflections occur, 826 forwarding loops can occur. With Add-Paths, the issue can be 827 exacerbated due to routers advertising non-best paths. As discussed 828 above, encapsulation can help with this issue, but only to the extent 829 that it allows downstream routers to forward without an IP FIB 830 lookup. 832 A first example of such issue is when the Local-Pref of non-primary 833 paths received over IBGP sessions is modified. The ingress router 834 may thus select as best a path non-preferred by the egress, and the 835 egress router will thus deflect the traffic. 837 Another example is when the best path is selected based on tie- 838 breaking rule. When the ingress and the egress base their path 839 selection on the router-id of the neighbor that advertised the path 840 to them, the result may be different for each of them. This specific 841 issue is described and solved in [draft-pmohapat]. 843 In general, if the network forwards on a hop-by-hop basis and does 844 not make use of encapsulation, it is necessary to advertise the best 845 path. The second path that is advertised should be the second best 846 path using one of the path selection modes described previously. 847 Additional paths are discretionary with the presumption that they can 848 be forwarded on a hop-by-hop basis. 850 Similarly, if the network uses encapsulation, the best path should be 851 advertised for consistency, the second best path should be advertised 852 for fast routing convergence. All further paths and their choice for 853 selection are completely discretionary; the destination is presumed 854 to be reachable via encapsulation. 856 5.5. Interactions with Route Filtering 858 As noted in the previous section, modification of advertised paths 859 may lead to inconsistent route selection. This is true even when the 860 Add-Paths feature is not in use. Similarly, the use of route 861 filtering, when used carelessly for IBGP, may result in inconsistent 862 route selection in an AS with the possibility of introducing 863 forwarding loops. 865 The Add-Paths feature has additional considerations for route 866 filtering since the receiver of multiple paths is unable to determine 867 by inspection of the received NLRI which path corresponds to the 868 sender's active path for the prefix. The sender SHOULD send the best 869 path when sending multiple paths for a destination. The receiver must 870 take care when rejecting destinations to not discard the best path 871 but permit alternate paths. A failure on either the part of the 872 sender or receiver to distribute/receive the best path may result in 873 inconsistent route selection. 875 An implementation MAY support the ability to suppress advertisement 876 of all alternate paths when the export policy would otherwise 877 suppress the best path. 879 5.6. Routing Churn 881 As noted in section 3.3 using Add-Paths between IBGP peers can help 882 to reduce routing churn with EBGP peers. This benefit does however 883 come at the cost of potentially increased churn between the IBGP Add- 884 Paths peers. In a non Add-Paths deployment a change in the preference 885 order of non-best paths requires no updates to be sent to peers. But 886 when a router has Add-Paths peers changes in non-best path preference 887 may no longer be invisible and increased route churn may be 888 observable. Choosing the right path selection mode and parameters - 889 for example not setting N unnecessarily large in the Add-N mode, is 890 important to minimizing this additional churn. 892 6. Security Considerations 894 This document introduces no new security concerns in the base 895 operation of BGP [RFC4271]. 897 7. Acknowledgments 899 This document was prepared using 2-Word-v2.0.template.dot. 901 8. Contributors 903 The following individuals are acknowledged for their contributions to 904 earlier versions of this draft: Pradosh Mohapatra, Virginie Van den 905 Schrieck and Rohit Gupta. 907 9. IANA Considerations 909 IANA has assigned capability number 69 for the ADD-PATH Capability 910 described in this document. This registration is in the BGP 911 Capability Codes registry. 913 10. References 915 10.1. Normative References 917 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 918 Requirement Levels", BCP 14, RFC 2119, DOI 919 10.17487/RFC2119, March 1997, . 922 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 923 Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 924 10.17487/RFC4271, January 2006, . 927 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual 928 Private Networks (VPNs)", RFC 4364, DOI 929 10.17487/RFC4364, February 2006, . 932 10.2. Informative References 934 [Add-Paths] Walton, D., Retana, A., Chen E., Scudder J., 935 "Advertisement of Multiple Paths in BGP", draft- 936 ietf-idr-add-paths-13, Dec 11, 2015. 938 [draft-pmohapat] Mohapatra, P., Fernando, R., Filsfils, C., and R. 939 Raszuk, "Fast Connectivity Restoration Using BGP 940 Add-path", draft-pmohapat-idr-fast-conn-restore- 941 02.txt, Oct 3, 2011. 943 [BEST-EXT] Marques, P., Fernando, R., Chen, E., Mohapatra, P., 944 Gredler, H., "Advertisement of the best external 945 route in BGP", draft-ietf-idr-best-external-05.txt, 946 Jan 3, 2012. 948 [oscillation] Walton, D., Retana, A., Chen, E., Scudder, J., "BGP 949 Persistent Route Oscillation Solutions", draft- 950 walton-bgp-route-oscillation-stop-06.txt, June 14, 951 2012. 953 [BGP-ORR] Raszuk, R., Cassar, C., Aman, E., Decraene, B., 954 Litkowski, S., Wang, K., "BGP Optimal Route 955 Reflection (BGP-ORR)", draft-ietf-idr-bgp-optimal- 956 route-reflection-11, Jan 8, 2016. 958 Appendix A. Other Path Selection Modes 960 A.1. Advertise Neighbor-AS Group Best Path 962 [walton-osc] proposes that a router groups its paths based on the 963 neighbor AS from which it was learned, and to advertise the best path 964 in each of those groups. 966 The control plane stress induced by this solution is the computation 967 of the per-neighbor path group, and the application of the decision 968 process to each of them. The Control-Plane load is bounded by the 969 number of neighboring ASes advertising a prefix, which cannot be 970 known a-priori. 972 Path optimality and backup path optimality are not guaranteed, as the 973 paths advertised are not all the AS-wide preferred paths. Backup path 974 availability is not guaranteed. Indeed, if only one AS advertises 975 this prefix, even on multiple EBGP sessions, only one of the paths 976 may be selected and advertised. 978 A.2. Best LocPref/Second LocPref 980 This selection method consists in grouping the paths by Local 981 Preference. A router sends to its peers all paths with the highest 982 Local Preference. If there is only a single path with the highest 983 Local Preference, it also sends all paths with the second best Local 984 Preference. 986 This method ensures that all routers know all paths with the best 987 local preference. As local preference are often related to the type 988 of peering of the peer the path comes from, this ensures that in case 989 of failure, routers have a backup path of equivalent quality. This 990 prevents for example that a router switches temporarily on a peer 991 path while an alternate path from a customer is available but hidden 992 at the border of the AS. Such a situation could result in a 993 temporary withdrawal of the prefix on some EBGP sessions when the 994 router selects the path via the peer. 996 The advertisement of the Second Local Preference occurs when there is 997 no alternate path with the same quality as the best path. This way, 998 fast convergence is still ensured. Backup path is optimal, as it has 999 the second AS-Wide preference, which becomes the AS-wide best 1000 preference upon failure of the primary one. 1002 Sending all the paths with a given Local Preference also has a 1003 positive impact on routing optimality. Indeed, this allows border 1004 routers to have an increased path visibility and to choose their best 1005 path based on their own criteria. 1007 The computational cost of this solution is reduced when there are 1008 several paths with the best local preference. In this case, it is 1009 sufficient to stop the decision process after the first rule to have 1010 the set of paths to be advertised. When it is necessary to advertise 1011 the paths with second local-preference, the additional cost is to 1012 apply a second time the first rule of the decision process, which is 1013 still reasonable. The memory cost depends on the number of paths 1014 with the best local preference. 1016 A.3. Advertise Paths at decisive step -1 1018 When the goal is to provide fast recovery by advertising candidate 1019 post-reconvergence paths, one can choose to stop the decision process 1020 just before the step where only one path remains. If the decision 1021 process comes to IGP tie-break, all remaining paths are advertised. 1022 This way, routers advertise as many paths as possible with a quality 1023 as similar as possible. 1025 This path selection is an intermediary solution between the two 1026 preceding ones. Here, instead of stopping the decision process at 1027 the local preference step or the IGP step, we stop it before the rule 1028 that removes the best potential backup paths. This way, we minimize 1029 the number of paths to advertise while guaranteeing the presence of a 1030 backup path. Primary and backup path optimality is ensured, as all 1031 paths with the same AS-wide preference as the best paths are included 1032 in the set of paths advertised. 1034 Authors' Addresses 1036 Jim Uttaro 1037 AT&T 1038 200 S. Laurel Avenue 1039 Middletown, NJ 07748 USA 1040 Email: uttaro@att.com 1042 Pierre Francois 1043 Institute IMDEA Networks 1044 Avda. del Mar Mediterraneo, 22 1045 Leganese 28918 1046 ES 1047 Email: pierre.francois@imdea.org 1049 Pradosh Mohapatra 1050 Cumulus Networks 1051 pmohapat@cumulusnetworks.com 1053 Roberto Fragassi 1054 Alcatel-Lucent 1055 600 Mountain Avenue 1056 Murray Hill, New Jersey 1057 Email: roberto.fragassi@alcatel-lucent.com 1059 Adam Simpson 1060 Alcatel-Lucent 1061 600 March Road 1062 Ottawa, Ontario K2K 2E6 1063 Canada 1064 Email: adam.simpson@alcatel-lucent.com 1066 Keyur Patel 1067 Cisco Systems 1068 170 W. Tasman Drive 1069 San Jose, CA 95134 USA 1070 Email: keyupate@cisco.com 1072 Jeffrey Haas 1073 Juniper Networks 1074 1194 N. Mathilda Ave. 1075 Sunnyvale, CA 94089 1076 USA 1077 Email: jhaas@juniper.net