idnits 2.17.00 (12 Aug 2021) /tmp/idnits9096/draft-raszuk-diverse-bgp-path-dist-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 6, 2010) is 4452 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2119' is defined on line 775, but no explicit reference was found in the text == Unused Reference: 'RFC5226' is defined on line 785, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: draft-ietf-idr-add-paths has been published as RFC 7911 == Outdated reference: A later version (-05) exists of draft-ietf-idr-best-external-01 == Outdated reference: draft-ietf-idr-route-oscillation has been published as RFC 3345 == Outdated reference: A later version (-03) exists of draft-pmohapat-idr-fast-conn-restore-00 Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 GROW Working Group R. Raszuk, Ed. 3 Internet-Draft K. Patel 4 Intended status: Informational R. Fernando 5 Expires: September 7, 2010 I. Kouvelas 6 Cisco Systems 7 D. McPherson 8 Arbor Networks 9 March 6, 2010 11 Distribution of diverse BGP paths. 12 draft-raszuk-diverse-bgp-path-dist-01 14 Abstract 16 The BGP4 protocol specifies the selection and propagation of a single 17 best path for each prefix. As defined today BGP has no mechanisms to 18 distribute paths other then best path between it's speakers. This 19 behavior results in number of disadvantages for new applications and 20 services. 22 This document presents an alternative mechanism for solving the 23 problem based on the concept of parallel route reflector planes. It 24 also compares existing solutions and proposed ideas that enable 25 distribution of more paths than just the best path. 27 This proposal does not specify any changes to the BGP protocol 28 definition. It does not require upgrades to provider edge or core 29 routers nor does it need network wide upgrades. The authors believe 30 that the GROW WG would be the best place for this work. 32 Status of this Memo 34 This Internet-Draft is submitted to IETF in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF), its areas, and its working groups. Note that 39 other groups may also distribute working documents as Internet- 40 Drafts. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 The list of current Internet-Drafts can be accessed at 48 http://www.ietf.org/ietf/1id-abstracts.txt. 50 The list of Internet-Draft Shadow Directories can be accessed at 51 http://www.ietf.org/shadow.html. 53 This Internet-Draft will expire on September 7, 2010. 55 Copyright Notice 57 Copyright (c) 2010 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 2. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2.1. BGP Add-Paths Proposal . . . . . . . . . . . . . . . . . . 4 75 3. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 76 4. Multi-plane route reflection . . . . . . . . . . . . . . . . . 6 77 4.1. Co-located best and backup path RRs . . . . . . . . . . . 9 78 4.2. Randomly located best and backup path RRs . . . . . . . . 10 79 4.3. Multi plane route servers for Internet Exchanges . . . . . 12 80 5. Discussion on current models of IBGP route distribution . . . 13 81 5.1. Full Mesh . . . . . . . . . . . . . . . . . . . . . . . . 13 82 5.2. Confederations . . . . . . . . . . . . . . . . . . . . . . 14 83 5.3. Route reflectors . . . . . . . . . . . . . . . . . . . . . 15 84 6. Deployment considerations . . . . . . . . . . . . . . . . . . 15 85 7. Summary of benefits . . . . . . . . . . . . . . . . . . . . . 17 86 8. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 17 87 9. Security considerations . . . . . . . . . . . . . . . . . . . 18 88 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 89 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 90 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 91 12.1. Normative References . . . . . . . . . . . . . . . . . . . 18 92 12.2. Informative References . . . . . . . . . . . . . . . . . . 19 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 95 1. Introduction 97 Current BGP4 [RFC4271] protocol specification allows for the 98 selection and propagation of only one best path for each prefix. The 99 BGP protocol as defined today has no mechanism to distribute other 100 then best path between it's speakers. This behavior results in a 101 number of problems in the deployment of new applications and 102 services. 104 This document presents an alternative mechanism for solving the 105 problem based on the concept of parallel route reflector planes. It 106 also compares existing solutions and proposed ideas that enable 107 distribution of more paths than just the best path. The parallel 108 route reflector planes solution brings very significant benefits at a 109 negligible capex and opex deployment price as compared to the 110 alternative techniques and is being considered by a number of network 111 operators for deployment in their networks. 113 This proposal does not specify any changes to the BGP protocol 114 definition. It does not require upgrades to provider edge or core 115 routers nor does it need network wide upgrades. The authors believe 116 that the GROW WG would be the best place for this work. 118 2. History 120 The needs to disseminate more then best path were observed primarily 121 for two reasons. One of them was the problem of BGP oscillations 122 [I-D.ietf-idr-route-oscillation] and the other was the desire for 123 reduction of time of reachability restoration in the event of network 124 or network element's failure. That lead to proposal of BGP add-path. 126 The need to disseminate more paths than just the best path is 127 primarily driven by two requirements. One of them is the problem of 128 BGP oscillations [I-D.ietf-idr-route-oscillation]. The second is the 129 desire for reduction of time of reachability restoration in the event 130 of network or network element's failure. These two reasons have lead 131 to the proposal of BGP add-paths [I-D.ietf-idr-add-paths]. 133 2.1. BGP Add-Paths Proposal 135 As it has been proven that distribution of only the best path of a 136 route is not sufficient to meet the needs of continuously growing 137 number of services carried over BGP the add-paths proposal was 138 submitted in 2002 to enable BGP to distribute more then one path. 139 This is achieved by including as a part of the NLRI an additional 140 four octet value called the Path Identifier. 142 The implication of this change on a BGP implementation is that it 143 must now maintain per path, instead of per prefix, peer advertisement 144 state to track which of the peers each path was advertised to. This 145 new requirement has it's own memory and processing cost. Suffice to 146 say that by the middle of 2009 none of the commercial BGP 147 implementation can claim to support the new add-path behavior in 148 production code, in part because of this resource overhead. 150 An important observation is that distribution of more than one best 151 path by Autonomous System Border Routers (ASBRs) with multiple EBGP 152 peers attached to it where no "next hop self" is set may result in 153 best path selection inconsistency within the autonomous system. 154 Therefore it is also required to attach in the form of a new 155 attribute the possible tie breakers and propagate those within the 156 domain. The example of such attribute for the purpose of fast 157 connectivity restoration to address that very case of ASBR injecting 158 multiple external paths into the IBGP mesh has been presented and 159 discussed in Fast Connectivity Restoration Using BGP Add-paths 160 [I-D.ietf-idr-add-paths] document. Based on the additionally 161 propagated information also best path selection is recommended to be 162 modified to make sure that best and backup path selection within the 163 domain stays consistent. More discussion on this particular point 164 will be contained in the deployment considerations section below. In 165 the proposed solution in this document we observe that in order to 166 address most of the applications just use of best external 167 advertisement is required. For ASBRs which are peering to multiple 168 upstream ASs setting "next hop self" is recommended. 170 The add paths protocol extensions have to be implemented by all the 171 routers within an AS in order for the system to work correctly. The 172 required code modifications include enhancements such as the Fast 173 Connectivity Restoration Using BGP Add-path 174 [I-D.pmohapat-idr-fast-conn-restore]. The deployment of such 175 technology in an entire service provider network requires software 176 and perhaps sometimes in the cases of End-of-Engineering or End-of- 177 Life equipment even hardware upgrades. Such an operation may or may 178 not be economically feasible. Even if add-path functionality was 179 available today on all commercial routing equipment and across all 180 vendors, experience indicates that to achieve 100% deployment 181 coverage within any medium or large global network may easily take 182 years. 184 While it needs to be clearly acknowledged that the add-path mechanism 185 provides the most general way to address the problem of distributing 186 more then one path between BGP speakers, this document provides a 187 much easier to deploy solution that requires no modification to the 188 BGP protocol. The alternative method presented is capable of 189 addressing critical service provider requirements for disseminating 190 more than a single path across an AS with a significantly lower 191 deployment cost. 193 3. Goals 195 The proposal described in this document is not intended to compete 196 with add-paths. Instead if deployed it is to be used as a very easy 197 method to accommodate the majority of applications which may require 198 presence of alternative BGP exit points. 200 It is presented to network operators as a possible choice and 201 provides those operators who need additional paths today an 202 alternative from the need to transition to a full mesh. 204 It is intended as a way to buy more time allowing for a smoother and 205 gradual migration where router upgrades will be required for perhaps 206 different reasons. It will also allow the time required where 207 standard RP/RE memory size can easily accommodate the associated 208 overhead with other techniques without any compromises. 210 4. Multi-plane route reflection 212 The idea contained in the proposal assumes the use of route 213 reflection within the network. Other techniques as described in the 214 following sections already provide means for distribution of 215 alternate paths today. 217 Let's observe today's picture of simple route reflected domain: 219 ASBR3 220 *** 221 * * 222 +------------* *-----------+ 223 | AS1 * * | 224 | *** | 225 | | 226 | | 227 | | 228 | RR1 *** RR2 | 229 | *** * * *** | 230 |* * * P * * *| 231 |* * * * * *| 232 | *** *** *** | 233 | | 234 | IBGP | 235 | | 236 | | 237 | *** *** | 238 | * * * * | 239 +-----* *---------* *----+ 240 * * * * 241 *** *** 242 ASBR1 ASBR2 243 EBGP 245 Figure1: Simple route reflection 247 Figure 1 shows an AS that is connected via EBGP peering at ASBR1 and 248 ASBR2 to an upstream AS or set of ASes. For a given destination "D" 249 ASBR1 and ASBR2 will each have an external path P1 and P2 250 respectively. The AS network uses two route reflectors RR1 and RR2 251 for redundancy reasons. The route reflectors propagate the single 252 BGP best path for each route to all clients. All ASBRs are clients 253 of RR1 and RR2. 255 Below are the possible cases of the path information that ASBR3 may 256 receive from route reflectors RR1 and RR2: 258 1. When best path tie breaker is the IGP distance: When paths P1 and 259 P2 are considered to be equally good best path candidates the 260 selection will depend on the distance of the path next-hops from 261 the route reflector making the decision. Depending on the 262 positioning of the route reflectors in the IGP topology they may 263 choose the same best path or a different one. In such a case 264 ASBR3 may receive either the same path or different paths from 265 each of the route reflectors. 267 2. When best path tie breaker is Multi-Exit-Discriminator or Local 268 Preference: In this case only one path from preferred exit point 269 ASBR will be available to RRs since the other peering ASBR will 270 consider the IBGP path as best and will not announce (or if 271 already announced will withdraw) its own external path. The 272 exception here is the use of BGP Best-External proposal which 273 will allow stated ASBR to still propagate to the RRs its own 274 external path. Unfortunately RRs will not be able to distribute 275 it any further to other clients as only the overall best path 276 will be reflected. 278 The proposed solution is based on the use of additional route 279 reflectors or new functionality enabled on the existing route 280 reflectors that instead of distributing the best path for each route 281 will distribute an alternative path other then best. The best path 282 (main) reflector plane distributes the best path for each route as it 283 does today. The second plane distributes the second best path for 284 each route and so on. Distribution of N paths for each route can be 285 achieved by using N reflector planes. 287 Each plane of route reflectors is a logical entity and may or may not 288 be co-located with the existing best path route reflectors. Adding a 289 route reflector plane to a network may be as easy as enabling a 290 logical router partition, new BGP process or just a new configuration 291 knob on an existing route reflector and configuring an additional 292 IBGP session from the current clients. There are no changes required 293 on the route reflector clients for this mechanism to work. It is 294 easy to observe that the installation of one or more additional route 295 reflector control planes is much cheaper and an easier than the need 296 of upgrading 100s of routers in the entire network. 298 The only required code change is on the route reflectors serving the 299 non best path planes (provided those will be separate from the 300 existing route reflectors). These reflectors need the new ability to 301 propagate the Nth best path instead of the best path. To deploy a 302 new plane the reflectors belonging to the plane must be configured 303 with the number of the plane they are serving. For example if a 304 reflector is provisioned to disseminate the 2nd best path while 305 enforcing an alternate exit point (next hop), a single BGP command 306 instructing it to do so is sufficient. 308 While this is an implementation detail, the code to calculate Nth 309 best path is also required by other BGP solutions. For example in 310 the application of fast connectivity restoration BGP must calculate a 311 backup path for installation into the RIB and FIB ahead of the actual 312 failure. 314 To address the problem of external paths not being available to route 315 reflectors due to local preference or MED factors it is recommended 316 that ASBRs enable the best-external function in order to always 317 inject their external paths to the route reflectors. 319 4.1. Co-located best and backup path RRs 321 To simplify the description let's assume that we only use two route 322 reflector planes (N=2). When co-located the additional 2nd best path 323 reflectors are connected to the network at the same points from the 324 perspective of the IGP as the existing best path RRs. Let's also 325 assume that best-external is enabled on all ASBRs. 327 ASBR3 328 *** 329 * * 330 +------------* *-----------+ 331 | AS1 * * | 332 | *** | 333 | | 334 | RR1 RR2 | 335 | *** *** | 336 |* * *** * *| 337 |* * * * * *| 338 | *** * P * *** | 339 |* * * * * *| 340 |* * *** * *| 341 | *** *** | 342 | RR1' IBGP RR2'| 343 | | 344 | | 345 | *** *** | 346 | * * * * | 347 +-----* *---------* *----+ 348 * * * * 349 *** *** 350 ASBR1 ASBR2 352 EBGP 354 Figure2: Co-located 2nd best RR plane 356 The following is a list of configuration changes required to enable 357 the 2nd best path route reflector plane: 359 1. Adding RR1' and RR2' either as logical or physical new control 360 plane RRs in the same IGP points as RR1 and RR2 respectively 362 2. Enabling RR1' and RR2' for 2nd plane route reflection 364 3. Enabling best-external on ASBRs 366 4. Configuring ASBR-RR's IBGP sessions 368 The expected behavior is that under any BGP condition the ASBR3 and P 369 routers will receive both paths P1 and P2 for destination D. The 370 availability of both paths will allow them to implement a number of 371 new services as listed in the applications section below. 373 As an alternative to fully meshing all RRs and RRs' an operator who 374 has a large number of reflectors deployed today may choose to peer 375 newly introduced RRs' to a hierarchical RR' which would be an IBGP 376 interconnect point within the 2nd plane as well as between planes. 378 One of the deployment model of this scenario can be achieved by 379 simple upgrade of the existing route reflectors without the need to 380 deploy any new logical or physical platforms. Such upgrade would 381 allow route reflectors to service both upgraded to add-paths peers as 382 well as those peers which can not be immediately upgraded while in 383 the same time allowing to distribute more then single best path. 385 The way to accomplish this would be to create a separate IBGP session 386 for each N-th BGP path. Such session should be preferably terminated 387 at a different loopback address of the route reflector. At the BGP 388 OPEN stage of each such session a different bgp_router_id should be 389 used. Correspondingly route reflector should also allow its clients 390 to use the same bgp_router_id on each such session. 392 4.2. Randomly located best and backup path RRs 394 Now let's consider a deployment case where an operator wishes to 395 enable a 2nd RR' plane using only a single additional router in a 396 different network location to his current route reflectors. 398 Note that this model of operation assumes that the present best path 399 route reflectors are only control plane devices. If the route 400 reflector is in the data forwarding path then the implementation must 401 be able to clearly separate the Nth best-path selection from the 402 selection of the paths to be used for data forwarding. The basic 403 premise of this mode of deployment assumes that all reflector planes 404 have the same information to choose from which includes the same set 405 of BGP paths. It also requires the ability to skip the comparison of 406 the IGP metric to reach the bgp next hop during best-path 407 calculation. 409 ASBR3 410 *** 411 * * 412 +------------* *-----------+ 413 | AS1 * * | 414 | IBGP *** | 415 | | 416 | *** | 417 | * * | 418 | RR1 * P * RR2 | 419 | *** * * *** | 420 |* * *** * *| 421 |* * * *| 422 | *** RR' *** | 423 | *** | 424 | * * | 425 | * * | 426 | *** | 427 | *** *** | 428 | * * * * | 429 +-----* *---------* *----+ 430 * * * * 431 *** *** 432 ASBR1 ASBR2 434 EBGP 436 Figure3: Experimental deployment of 2nd best RR 438 The following is a list of configuration changes required to enable 439 the 2nd best path route reflector RR' as a single platform: 441 1. Adding RR' logical or physical as new route reflector anywhere in 442 the network 444 2. Enabling RR' for 2nd plane route reflection 446 3. Enabling best-external on ASBRs 448 4. Fully meshing newly added RRs' with the all other reflectors in 449 both planes. That condition does not apply if the newly added 450 RR'(s) already have peering to all ASBRs/PEs. 452 5. Configuring ASBRs-RR' IBGP sessions 454 6. Disabling IGP metric check in BGP best path on all route 455 reflectors. 457 In this scenario the operator has the flexibility to introduce the 458 new additional route reflector on any existing or new hardware in the 459 network. Any of the existing routers that are not already members of 460 the best path route reflector plane can be easily configured to serve 461 the 2nd plane either via using a logical / virtual router partition 462 or by local implementation hooks. 464 Even if the IGP metric is not taken into consideration when comparing 465 paths during the bestpath calculation, an implementation still has to 466 consider paths with unreachable nexthops as invalid. It is worth 467 pointing out that some implementations today already allow for 468 configuration which results in no IGP metric comparison during the 469 best path calculation. 471 The additional planes of route reflectors do not need to be fully 472 redundant as the primary one does. If we are preparing for a single 473 network failure event, a failure of a non backed up N-th best-path 474 route reflector would not result in an connectivity outage of the 475 actual data plane. The reason is that this would at most affect the 476 presence of a backup path (not an active one) on same parts of the 477 network. If the operator chooses to build the N-th best path plane 478 redundantly by installing not one, but two or more route reflectors 479 serving each additional plane the additional robustness will be 480 achieved. 482 As a result of this solution ASBR3 and other ASBRs peering to RR' 483 will be receiving the 2nd best path. 485 Similarly to section 4.1 as an alternative to fully meshing all RRs & 486 RRs' an operator who may have a large number of reflectors already 487 deployed today may choose to peer newly introduced RRs' to a 488 hierarchical RR' which would be an IBGP interconnect point within the 489 2nd plane as well as between planes. 491 4.3. Multi plane route servers for Internet Exchanges 493 Another group of devices where the proposed multi-plane architecture 494 may be of particular applicability are EBGP route servers used at the 495 majority of internet exchange points. 497 In such cases 100s of ISPs are interconnected on a common LAN. 498 Instead of having 100s of direct EBGP sessions on each exchange 499 client, a single peering is created to the transparent route server. 501 The route server can only propagate a single best path. Mandating 502 the upgrade for 100s of different service providers in order to 503 implement add-path may be much more difficult as compared to asking 504 them for provisioning one new EBGP session to an Nth best-path route 505 server plane. 507 The solution proposed in this document fits very well with the 508 requirement of having broader EBGP path diversity among the members 509 of any Internet Exchange Point. 511 5. Discussion on current models of IBGP route distribution 513 In today's networks BGP4 operates as specified in [RFC4271] 515 There are a number of technology choices for intra-AS BGP route 516 distribution: 518 1. Full mesh 520 2. Confederations 522 3. Route reflectors 524 5.1. Full Mesh 526 A full mesh, the most basic iBGP architecture, exists when all the 527 BGP speaking routers within the AS peer directly with all other BGP 528 speaking routers within the AS, irrespective of where a given router 529 resides within the AS (e.g., P router, PE router, etc..). 531 While this is the simplest intra-domain path distribution method, 532 historically there have been a number of challenges in realizing such 533 an IBGP full mesh in a large scale network. While some of these 534 challenges are no longer applicable today some may still apply, to 535 include the following: 537 1. Number of TCP sessions: The number of IBGP sessions on a single 538 router in a full mesh topology of a large scale service provider 539 can easily reach 100s. While on hardware and software used in 540 the late 70s, 80s and 90s such numbers could be of concern, today 541 customer requirements for the number of BGP sessions per box are 542 reaching 1000s. This is already an order of magnitude more then 543 the potential number of IBGP sessions. Advancement in hardware 544 and software used in production routers mean that running a full 545 mesh of IBGP sessions should not be dismissed due to the 546 resulting number of TCP sessions alone. 548 2. Provisioning: When operating and troubleshooting large networks 549 one of the top-most requirements is to keep the design as simple 550 as possible. When the autonomous systems network is composed of 551 hundreds of nodes it becomes very difficult to manually provision 552 a full mesh of IBGP sessions. Adding or removing a router 553 requires reconfiguration of all the other routers in the AS. 554 While this is a real concern today there is already work in 555 progress in the IETF to define IBGP peering automation through an 556 IBGP Auto Discovery [I-D.raszuk-idr-ibgp-auto-mesh] mechanism. 558 3. Number of paths: Another concern when deploying a full IBGP mesh 559 is the number of BGP paths for each route that have to be stored 560 at every node. This number is very tightly related to the number 561 of external peerings of an AS, the use of local preference or 562 multi-exit-discriminator techniques and the presence of best- 563 external [I-D.ietf-idr-best-external] advertisement 564 configuration. If we make a rough assumption that the BGP4 path 565 data structure consumes about 80-100 bytes the resulting control 566 plane memory requirement for 500,000 IPv4 routes with one 567 additional external path is 38-48 MB while for 1 million IPv4 568 routes it grows linearly to 76-95 MB. It is not possible to 569 reach a general conclusion if this condition is negligible or if 570 it is a show stopper for a full mesh deployment without direct 571 reference to a given network. 573 To summarize, a full mesh IBGP peering can offer natural 574 dissemination of multiple external paths among BGP speakers. When 575 realized with the help of IBGP Auto Discovery peering automation this 576 seems like a viable deployment especially in medium and small scale 577 networks. 579 5.2. Confederations 581 For the purpose of this document let's observe that confederations 582 [RFC5065] can be viewed as a hierarchical full mesh model. 584 Within each sub-AS BGP speakers are fully meshed and as discussed in 585 section 2.1 all full mesh characteristics (number of TCP sessions, 586 provisioning and potential concern over number of paths still apply 587 in the sub-AS scale). 589 In addition to the direct peering of all BGP speakers within each 590 sub-AS, all sub-AS border routers must also be fully meshed with each 591 other. Sub-AS border routers configured with best-external 592 functionality can inject additional exit paths within a sub-AS. 594 To summarize, it is technically sound to use confederations with the 595 combination of best-external to achieve distribution of more than a 596 single best path per route in a large autonomous systems. 598 In topologies where route reflectors are deployed within the 599 confederation sub-ASes the technique describe here does apply. 601 5.3. Route reflectors 603 The main motivation behind the use of route reflectors [RFC4456] is 604 the avoidance of the full mesh session management problem described 605 above. Route reflectors, for good or for bad, are the most common 606 solution today for interconnecting BGP speakers within an internal 607 routing domain. 609 Route reflector peerings follow the advertisement rules defined by 610 the BGP4 protocol. As a result only a single best path per prefix is 611 sent to client BGP peers. That is the main reason why many current 612 networks are exposed to a phenomenon called BGP path starvation which 613 essentially results in inability to deliver a number of applications 614 discussed later. 616 The route reflection equivalent when interconnecting BGP speakers 617 between domains is popularly called the Route Server and is globally 618 deployed today in many internet exchange points. 620 6. Deployment considerations 622 The diverse BGP path dissemination proposal allows the distribution 623 of more paths than just the best-path to route reflector or route 624 server clients of today's BGP4 implementations. 626 From the client's point of view receiving additional paths via 627 separate IBGP sessions terminated at the new router reflector plane 628 is functionally equivalent to constructing a full mesh peering 629 without the problems that such a full mesh would come with (discussed 630 in section 2.1). 632 By precisely defining the number of reflector planes, network 633 operators have full control over the number of redundant paths in the 634 network. This number can be defined to address the needs of the 635 service(s) being deployed. 637 The Nth plane route reflectors should be acting as control plane 638 devices. While they can be provisioned on the current production 639 routers selected backup BGP paths should not be used directly in the 640 date plane. Use of the calculated Nth path by the RRs can lead to 641 inconsistent best-path selection in the domain. For the purposes of 642 local RIB / FIB installation, any router (including the RRs) which is 643 in the data path must use the overall global best and Nth best paths. 645 The proposed architecture deployed along with the BGP best-external 646 functionality covers all three cases where the classic BGP route 647 reflection paradigm would fail to distribute alternate exit points 648 paths. 650 1. ASBRs advertising their single best external paths with no local- 651 preference or multi-exit-discriminator present. 653 2. ASBRs advertising their single best external paths with local- 654 preference or multi-exit-discriminator present and with BGP best- 655 external functionality enabled. 657 3. ASBRs with multiple external paths. 659 Let's discuss the last (3rd) case in more detail. This describes the 660 scenario of a single ASBR connected to multiple EBGP peers. In 661 practice this peering scenario is quite common. It is mostly due to 662 the geographic location of EBGP peers and the diversity of those 663 peers (for example peering to multiple tier 1 ISPs etc...). It is 664 not designed for failure recovery scenarios as single failure of the 665 ASBR would simultaneously result in loss of connectivity to all of 666 the peers. In most medium and large geographically distributed 667 networks there is always another ASBR or multiple ASBRs providing 668 peering backups, typically in other geographically diverse locations 669 in the network. 671 When an operator uses ASBRs with multiple peerings setting next hop 672 self will effectively allow to locally repair the atomic failure of 673 any external peer without any compromise to the data plane. The most 674 common reason for not setting next hop self is traditionally the 675 associated drawback of loosing ability to signal the external 676 failures of peering ASBRs or links to those ASBRs by fast IGP 677 flooding. Such potential drawback can be easily avoided by using 678 different peering address from the address used for next hop mapping 679 as well as removing such next hop from IGP at the last possible BGP 680 path failure. 682 Herein one may correctly observe that in the case of setting next hop 683 self on an ASBR, attributes of other external paths such ASBR is 684 peering with may be different from the attributes of it's best 685 external path. Therefore, not injecting all of those external paths 686 with their corresponding attribute can not be compared to equivalent 687 paths for the same prefix coming from different ASBRs. 689 While such observation in principle is correct one should put things 690 in perspective of the overall goal which is to provide data plane 691 connectivity upon a single failure with minimal interruption/packet 692 loss. During such transient conditions, using even potentially 693 suboptimal exit points is reasonable, so long as forwarding 694 information loops are not introduced. In the mean time BGP control 695 plane will on it's own re-advertise newly elected best external path, 696 route reflector planes will calculate their Nth best paths and 697 propagate to it's clients. The result is that after seconds even if 698 potential sub-optimality were encountered it will be quickly and 699 naturally healed. 701 7. Summary of benefits 703 The diverse BGP path dissemination proposal provides the following 704 benefits when compared to the alternatives: 706 1. No modifications to BGP4 protocol. 708 2. No requirement for upgrades to edge and core routers. Backward 709 compatible with the existing BGP deployments. 711 3. Can be easily enabled by introduction of a new route reflector / 712 route server plane dedicated to the selection and distribution of 713 Nth best-path. 715 4. Does not require major modification to BGP implementations in the 716 entire network which will result in an unnecessary increase of 717 memory and CPU consumption due to the shift from today's per 718 prefix to a per path advertisement state tracking. 720 5. Can be safely deployed gradually through addition of a single 721 logical or physical route reflector with the new functionality 722 described in this document. 724 6. The proposed solution is equally applicable to any BGP address 725 family as described in Multiprotocol Extensions for BGP-4 RFC4760 726 [RFC4760]. In particular it can be used "as is" without any 727 modifications to both IPv4 and IPv6 address families. 729 8. Applications 731 This section lists the most common applications which require 732 presence of redundant BGP paths: 734 1. Fast connectivity restoration where backup paths with alternate 735 exit points would be pre-installed as well as pre-resolved in the 736 FIB of routers. That would allow for a local action upon 737 reception of a critical event notification of network / node 738 failure. This failure recovery mechanism based on the presence 739 of backup paths is also suitable for gracefully addressing 740 scheduled maintenance requirements as described in 741 [I-D.decraene-bgp-graceful-shutdown-requirements]. 743 2. Multi-path load balancing for both IBGP and EBGP. 745 3. BGP control plane churn reduction both intra-domain and inter- 746 domain. 748 An important point to observe is that all of the above intra-domain 749 applications based on the use of reflector planes but are also 750 applicable in the inter-domain Internet exchange case. As discussed 751 in section 4.3 an internet exchange can deploy shadow route server 752 slices each responsible for distribution of an Nth best path to it's 753 EBGP peers. 755 9. Security considerations 757 The new mechanism for diverse BGP path dissemination proposed in this 758 document does not introduce any new security concerns as compared to 759 base BGP4 specification [RFC4271]. 761 10. IANA Considerations 763 The new mechanism for diverse BGP path dissemination does not require 764 any new allocations from IANA. 766 11. Acknowledgments 768 The authors would like to thank Bruno Decraene, Bart Peirens and Eric 769 Rosen for their valuable input. 771 12. References 773 12.1. Normative References 775 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 776 Requirement Levels", BCP 14, RFC 2119, March 1997. 778 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 779 Protocol 4 (BGP-4)", RFC 4271, January 2006. 781 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 782 "Multiprotocol Extensions for BGP-4", RFC 4760, 783 January 2007. 785 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 786 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 787 May 2008. 789 12.2. Informative References 791 [I-D.decraene-bgp-graceful-shutdown-requirements] 792 Decraene, B., Francois, P., pelsser, c., Ahmad, Z., and A. 793 Armengol, "Requirements for the graceful shutdown of BGP 794 sessions", 795 draft-decraene-bgp-graceful-shutdown-requirements-01 (work 796 in progress), March 2009. 798 [I-D.ietf-idr-add-paths] 799 Walton, D., Retana, A., Chen, E., and J. Scudder, 800 "Advertisement of Multiple Paths in BGP", 801 draft-ietf-idr-add-paths-03 (work in progress), 802 February 2010. 804 [I-D.ietf-idr-best-external] 805 Marques, P., Fernando, R., Chen, E., and P. Mohapatra, 806 "Advertisement of the best external route in BGP", 807 draft-ietf-idr-best-external-01 (work in progress), 808 February 2010. 810 [I-D.ietf-idr-route-oscillation] 811 McPherson, D., "BGP Persistent Route Oscillation 812 Condition", draft-ietf-idr-route-oscillation-01 (work in 813 progress), February 2002. 815 [I-D.pmohapat-idr-fast-conn-restore] 816 Mohapatra, P., Fernando, R., Filsfils, C., and R. Raszuk, 817 "Fast Connectivity Restoration Using BGP Add-path", 818 draft-pmohapat-idr-fast-conn-restore-00 (work in 819 progress), September 2008. 821 [I-D.raszuk-idr-ibgp-auto-mesh] 822 Raszuk, R., "IBGP Auto Mesh", 823 draft-raszuk-idr-ibgp-auto-mesh-00 (work in progress), 824 June 2003. 826 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 827 Reflection: An Alternative to Full Mesh Internal BGP 828 (IBGP)", RFC 4456, April 2006. 830 [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 831 System Confederations for BGP", RFC 5065, August 2007. 833 Authors' Addresses 835 Robert Raszuk (editor) 836 Cisco Systems 837 170 West Tasman Drive 838 San Jose, CA 95134 839 US 841 Email: raszuk@cisco.com 843 Keyur Patel 844 Cisco Systems 845 170 West Tasman Drive 846 San Jose, CA 95134 847 US 849 Email: keyupate@cisco.com 851 Rex Fernando 852 Cisco Systems 853 170 West Tasman Drive 854 San Jose, CA 95134 855 US 857 Email: rex@cisco.com 859 Isidor Kouvelas 860 Cisco Systems 861 170 West Tasman Drive 862 San Jose, CA 95134 863 US 865 Email: kouvelas@cisco.com 867 Danny McPherson 868 Arbor Networks 870 Email: danny@arbor.net