idnits 2.17.00 (12 Aug 2021) /tmp/idnits55251/draft-ietf-idr-route-oscillation-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 54 instances of too long lines in the document, the longest one being 5 characters in excess of 72. ** The abstract seems to contain references ([2], [3], [4], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 7 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (ref. '1') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 2796 (ref. '2') (Obsoleted by RFC 4456) ** Obsolete normative reference: RFC 1965 (ref. '3') (Obsoleted by RFC 3065) -- Possible downref: Non-RFC (?) normative reference: ref. '4' == Outdated reference: draft-ietf-idr-bgp4 has been published as RFC 4271 Summary: 12 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Danny McPherson 2 INTERNET DRAFT Amber Networks, Inc. 3 Vijay Gill 4 Metromedia Fiber Network, Inc. 5 Daniel Walton 6 Alvaro Retana 7 March 2001 Cisco Systems, Inc. 9 BGP Persistent Route Oscillation Condition 10 12 1. Status of this Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC 2026. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet- Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 2. Abstract 35 The Border Gateway Protocol (BGP) [1] is an inter-Autonomous System 36 routing protocol. The primary function of a BGP speaking system is to 37 exchange network reachability information with other BGP systems. 39 It has recently been discovered that in particular configurations, 40 the BGP scaling mechanisms defined in "BGP Route Reflection - An 41 Alternative to Full Mesh IBGP" [2] and "Autonomous System 42 Confederations for BGP" [3] will introduce persistent BGP route 43 oscillation[4]. This document discusses the two types of persistent 44 route oscillation that have been identified, describes when these 45 conditions will occur, and provides some network design guidelines to 46 avoid introducing such occurrences. 48 3. Introduction 50 It has been known for some time that in particular configurations, 51 the BGP scaling mechanisms defined in "BGP Route Reflection - An 52 Alternative to Full Mesh IBGP" [2] and "Autonomous System 53 Confederations for BGP" [3] will introduce persistent BGP route 54 oscillation. 56 The problem is inherent in the way BGP works: locally defined routing 57 policies may conflict globally, and certain types of conflicts can 58 cause persistent oscillation of the protocol. Given current 59 practices, we happen to see the problem manifest itself in the 60 context of MED + route reflectors or confederations. 62 The current specification of BGP-4 [5] states that the 63 MULTI_EXIT_DISC is only comparable between routes learned from the 64 same neighboring AS. This limitation is consistent with the 65 description of the attribute: "The MULTI_EXIT_DISC attribute may be 66 used on external (inter-AS) links to discriminate among multiple exit 67 or entry points to the same neighboring AS." [1,5] 69 In a full mesh iBGP network, all the internal routers have complete 70 visibility of the available exit points into a neighboring AS. The 71 comparison of the MULTI_EXIT_DISC for only some paths is not a 72 problem. 74 Because of the scalability implications of a full mesh iBGP network, 75 two alternatives have been standardized: route reflectors [2] and AS 76 confederations [3]. Both alternatives describe methods by which 77 route distribution may be achieved without a full iBGP mesh in an AS. 79 The route reflector alternative defines the ability to re-advertise 80 (reflect) iBGP-learned routes to other iBGP peers once the best path 81 is selected [2]. AS Confederations specify the operation of a 82 collection of autonomous systems under a common administration as a 83 single entity (i.e. from the outside, the internal topology and the 84 existence of separate autonomous systems are not visible). In both 85 cases, the reduction of the iBGP full mesh results in the fact that 86 not all the BGP speakers in the AS have complete visibility of the 87 available exit points into a neighboring AS. In fact, the visibility 88 may be partial and inconsistent depending on the location (and 89 function) of the router in the AS. 91 In certain topologies involving either route reflectors or 92 confederations (detailed description later in this document), the 93 partial visibility of the available exit points into a neighboring AS 94 may result in an inconsistent best path selection decision as the 95 routers don't have all the relevant information. If the 96 inconsistencis span more than one peering router, they may result in 97 a persistent route oscillation. The best path selection rules 98 applied in this document are consistent with the current 99 specification [5]. 101 The persistent route oscillation behavior is deterministic and can be 102 avoided by employing some rudimentary BGP network design principles 103 until protocol enhancements resolve the problem. 105 In the following sections a taxonomy of the types of oscillations is 106 presented and a description of the set of conditions that will 107 trigger route oscillations is given. We continue by providing 108 several network design alternatives that remove the potential for 109 this to occur. 111 It is the intent of the authors that this document serve to increase 112 operator awareness of the problem, as well as to trigger discussion 113 and subsequent proposals for potential protocol enhancements that 114 remove the possibly for this to occur. 116 The oscillations are classified into Type I and Type II depending 117 upon criteria documented below. 119 4. Type I Discussion 121 In the following two subsections we provide configurations under 122 which Type I Churn will occur. We begin with a discussion of the 123 problem when using Route Reflection, and then discuss the problem as 124 it relates to AS Confederations. 126 In general, Type I Churn occurs only when BOTH of the following 127 conditions are met: 129 1) a single-level Route Reflection or AS Confederations 130 design is used in the network AND 132 2) the network accepts the BGP MULTI_EXIT_DISC (MED) 133 attribute from two or more ASs for a single prefix 134 and the MED values are unique. 136 It is also possible for the non-deterministic ordering of paths to 137 cause the route oscillation problem. [1] does not specify that paths 138 should be ordered based on MEDs but it has been proven that non- 139 deterministic ordering can lead to loops and inconsistent routing 140 decisions. Most vendors have either implemented deterministic 141 ordering as default behavior, or provide a knob that permits the 142 operator to configure the router to order paths in a deterministic 143 manner based on MEDs. 145 4.1. Route Reflection and Type I Churn 147 We now discuss Type I oscillation as it relates to Route Reflection. 148 To begin, consider the topology depicted in Figure 1: 150 --------------------------------------------------------------- 151 / -------------------- -------------------- \ 152 | / \ / \ | 153 | | Cluster 1 | | Cluster 2 | | 154 | | | | | | 155 | | | *1 | | | 156 | | Ra(RR) . . . . . . . . . . . . . . Rd(RR) | | 157 | | . . | | . | | 158 | | .*5 .*4 | | .*12 | | 159 | | . . | | . | | 160 | | Rb(C) Rc(C) | | Re(C) | | 161 | | . . | | . | | 162 | \ . . / \ . / | 163 | ---.------------.--- ---------.---------- | 164 \ .(10) .(1) AS1 .(0) / 165 -------.------------.---------------------------.-------------- 166 . . . 167 ------ . ------------ . 168 / \ . / \ . 169 | AS10 | | AS6 | 170 \ / \ / 171 ------ ------------ 172 . . 173 . . 174 . -------------- 175 . / \ 176 | AS100 |- 10.0.0.0/8 177 \ / 178 -------------- 180 Figure 1: Example Route Reflection Topology 182 In Figure 1 AS1 contains two Route Reflector Clusters, Clusters 1 and 183 2. Each Cluster contains one Route Reflector (RR) (i.e., Ra and Rd, 184 respectively). An associated 'RR' in parentheses represents each RR. 185 Cluster 1 contains two RR Clients (Rb and Rc), and Cluster 2 contains 186 one RR Client (Re). An associated 'C' in parentheses indicates RR 187 Client status. The dotted lines are used to represent BGP peering 188 sessions. 190 The number contained in parentheses on the AS1 EBGP peering sessions 191 represents the MED value advertised by the peer to be associated with 192 the 10.0.0.0/8 network reachability advertisement. 194 The number proceeding each '*' on the IBGP peering sessions repre- 195 sents the additive IGP metrics that are to be associated with the BGP 196 NEXT_HOP attribute for the concerned route. For example, the Ra IGP 197 metric value associated with a NEXT_HOP learned via Rb would be 5; 198 while the metric value associated with the NEXT_HOP learned via Re 199 would be 13. 201 Table 1 depicts the 10.0.0.0/8 route attributes as seen by routers 202 Rb, Rc and Re, respectively. Note that the IGP metrics in Figure 1 203 are only of concern when advertising the route to an IBGP peer. 205 Router MED AS_PATH 206 -------------------- 207 Rb 10 10 100 208 Rc 1 6 100 209 Re 0 6 100 211 Table 1: Route Attribute Table 213 For the following steps 1 through 5 the best path will be marked with 214 a '*'. 216 1) Ra has the following installed in its BGP table with 217 the path learned via AS2 marked best: 219 NEXT_HOP 220 AS_PATH MED IGP Cost 221 ----------------------- 222 6 100 1 4 223 * 10 100 10 5 225 The '10 100' route should not be marked as best, though 226 this is not the cause of the persistent route oscillation. 227 Ra realizes it has the wrong route marked as best since the 228 '6 100' path has a lower IGP metric. As such, Ra makes this 229 change and advertises an UPDATE message to its neighbors to 230 let them know that it now considers the '6 100, 1, 4' route 231 as best. 233 2) Rd receives the UPDATE from Ra, which leaves Rd with the 234 following installed in its BGP table: 236 NEXT_HOP 237 AS_PATH MED IGP Cost 238 ----------------------- 239 * 6 100 0 12 240 6 100 1 5 242 Rd then marks the '6 100, 0, 12' route as best because it has 243 a lower MED. Rd sends an UPDATE message to its neighbors to 244 let them know that this is the best route. 246 3) Ra receives the UPDATE message from Rd and now has the 247 following in its BGP table: 249 NEXT_HOP 250 AS_PATH MED IGP Cost 251 ----------------------- 252 6 100 0 13 253 6 100 1 4 254 * 10 100 10 5 256 The first route (6 100, 0, 13) beats the second route (6 100, 257 1, 4) because of lower MED, then the third route (10 100, 10, 258 5) beats the first route because of lower IGP metric to 259 NEXT_HOP. Ra sends an UPDATE message to its peers to let them 260 know its new best route. 262 4) Rd receives the UPDATE message from Ra, which leaves Rd with the 263 following BGP table: 265 NEXT_HOP 266 AS_PATH MED IGP Cost 267 ----------------------- 268 6 100 0 12 269 * 10 100 10 6 271 Rd selects the '10 100, 10, 6' path as best because of the IGP 272 metric. Rd sends an UPDATE/withdraw to its peers to let them 273 know this is its best route. 275 5) Ra receives the UPDATE message from Rd, which leaves Ra with the 276 following BGP table: 278 NEXT_HOP 279 AS_PATH MED IGP Cost 280 ----------------------- 281 6 100 1 4 282 * 10 100 10 5 284 Ra received a withdraw for '6 100, 0, 13', which changes what is 285 considered the best route for Ra. 286 This is why Ra has the '10 100, 10, 5' route selected as best in 287 Step 1, even though '6 100, 1, 4' is actually better. 289 At this point, we've made a full loop and are back at Step 1. The 290 router realizes it is using the incorrect best path, and the cycle 291 repeats. This is an example of Type I Churn when using Route Reflec- 292 tion. 294 4.2. AS Confederations and Type I Churn 296 We'll now provide an example of Type I Churn occurring with AS Con- 297 federations. To begin, consider the topology depicted in Figure 2: 299 --------------------------------------------------------------- 300 / -------------------- -------------------- \ 301 | / \ / \ | 302 | | Sub-AS 65000 | | Sub-AS 65001 | | 303 | | | | | | 304 | | | *1 | | | 305 | | Ra . . . . . . . . . . . . . . . . . Rd | | 306 | | . . | | . | | 307 | | .*3 .*2 | | .*6 | | 308 | | . . | | . | | 309 | | Rb . . . . . Rc | | Re | | 310 | | . *5 . | | . | | 311 | \ . . / \ . / | 312 | ---.------------.--- ---------.---------- | 313 \ .(10) .(1) AS1 .(0) / 314 -------.------------.---------------------------.-------------- 315 . . . 316 ------ . ------------ . 317 / \ . / \ . 318 | AS10 | | AS6 | 319 \ / \ / 320 ------ ------------ 321 . . 322 . . 323 . -------------- 324 . / \ 325 | AS100 |- 10.0.0.0/8 326 \ / 327 -------------- 329 Figure 2: Example AS Confederations Topology 331 The number proceeding each '*' on the BGP peering sessions represents 332 the additive IGP metrics that are to be associated with the BGP 333 NEXT_HOP. The number contained in parentheses on each AS1 EBGP peer- 334 ing sessions represents the MED value advertised by the peer to be 335 associated with the 10.0.0.0/8 network reachability advertisement. 337 The number contained in parentheses on each AS1 EBGP peering sessions 338 represents the MED value advertised by the peer to be associated with 339 the 10.0.0.0/8 network reachability advertisement. 341 The number proceeding each '*' on the IBGP peering sessions repre- 342 sents the additive IGP metrics that are to be associated with the BGP 343 NEXT_HOP attribute for the concerned route. 345 For example, the Ra IGP metric value associated with a NEXT_HOP 346 learned via Rb would be 5; while the metric value associated with the 347 NEXT_HOP learned via Re would be 13. 349 Table 2 depicts the 10.0.0.0/8 route attributes as seen by routers 350 Rb, Rc and Re, respectively. Note that the IGP metrics in Figure 2 351 are only of concern when advertising the route to an IBGP peer. 353 Router MED AS_PATH 354 -------------------- 355 Rb 10 10 100 356 Rc 1 6 100 357 Re 0 6 100 359 Table 2: Route Attribute Table 361 For the following steps 1 through 6 the best route will be marked 362 with an '*'. 364 1) Ra has the following BGP table: 366 NEXT_HOP 367 AS_PATH MED IGP Cost 368 ------------------------------- 369 * 10 100 10 3 370 (65001) 6 100 0 7 371 6 100 1 2 373 The '10 100' route is selected as best and advertised to 374 Rd, though this is not the cause of the persistent route 375 oscillation. 377 2) Rd has the following in its BGP table: 379 NEXT_HOP 380 AS_PATH MED IGP Cost 381 ------------------------------- 382 6 100 0 6 383 * (65000) 10 100 10 4 385 The "(65000) 10 100' route is selected as best because it has 386 the lowest IGP metric. As a result, Rd sends an UPDATE/withdraw 387 to Ra for the '6 100' route that it had previously advertised. 389 3) Ra receives the withdraw from Rd. Ra now has the following in 390 its BGP table: 392 NEXT_HOP 393 AS_PATH MED IGP Cost 394 ------------------------------- 395 * 10 100 10 3 396 6 100 1 2 398 Ra received a withdrawal for '(65001) 6 100', which changes what 399 is considered the best route for Ra. Ra does not compute the 400 best path for a prefix unless its best route was withdrawn. 401 This is why Ra has the '10 100, 10, 3' route selected as best, 402 even though the '6 100, 1, 2' route is better. 404 4) Ra realizes that the '6 100' route is better because of the 405 lower IGP metric. Ra sends an UPDATE/withdraw to Rd for the '10 406 100' route since Ra is now using the '6 100' path as its best 407 route. 409 Ra's BGP table looks like this: 411 NEXT_HOP 412 AS_PATH MED IGP Cost 413 ------------------------------- 414 10 100 10 3 415 * 6 100 1 2 417 5) Rd receives the UPDATE from Ra and now has the following in 418 its BGP table: 420 NEXT_HOP 421 AS_PATH MED IGP Cost 422 ------------------------------- 423 (65000) 6 100 1 3 424 * 6 100 0 6 426 Rd selects the '6 100, 0, 5' route as best because of the lower 427 MED value. Rd sends an UPDATE message to Ra, reporting that 428 '6 100, 0 5' is now its best route. 430 6) Ra receives the UPDATE from Rd. Ra now has the following in its 431 BGP table: 433 NEXT_HOP 434 AS_PATH MED IGP Cost 435 ------------------------------- 436 * 10 100 10 3 437 (65001) 6 100 0 7 438 6 100 1 2 440 At this point we have made a full cycle and are back to step 1. This 441 is an example of Type I Churn with AS Confederations. 443 4.3. Potential Workarounds for Type I Churn 445 There are a number of alternatives that can be employed to provide 446 workarounds to this problem: 448 1) When using Route Reflection make sure that the inter-Cluster 449 links have a higher IGP metric than the intra-Cluster links. 450 This is the preferred choice when using Route Reflection. Had 451 the inter-Cluster IGP metrics been much larger than the intra- 452 Cluster IGP metrics, the above would not have occurred. 454 2) When using AS Confederations ensure that the inter-Sub-AS 455 links have a higher IGP metric than the intra-Sub-AS links. 456 This is the preferred option when using AS Confederations. 457 Had the inter-Sub-AS IGP metrics been much larger than the 458 intra-Sub-AS IGP metrics, the above would not have occurred. 460 3) Do not accept MEDs from peers (this may not be a feasible 461 alternative). 463 4) Utilize other BGP attributes higher in the decision process 464 so that the BGP decision algorithm never reaches the MED 465 step. As using this completely overrides MEDs, Option 3 may make 466 more sense. 468 5) Always compare BGP MEDs, regardless of whether or not they were 469 obtained from a single AS. This is probably a bad idea since 470 MEDs may be derived in a number of ways, and are typically done 471 so as a matter of operator-specific policy. As such, comparing 472 MED values for a single prefix learned from multiple ASs is 473 ill-advised. Of course, this mostly defeats the purpose of MEDs, 474 and as such, Option 3 may be a more viable alternative. 476 6) Use a full IBGP mesh. This is not a feasible solution for 477 ASs with a large number of BGP speakers. 479 5. Type II Discussion 481 In the following subsection we provide configurations under which 482 Type II Churn will occur when using AS Confederations. For sake of 483 brevity, we avoid similar discussion of the occurrence when using 484 Route Reflection. 486 In general, Type II churn occurs only when BOTH of the following con- 487 ditions are met: 489 1) More than one tier of Route Reflection or Sub-ASs 490 is used in the network AND 492 2) the network accepts the BGP MULTI_EXIT_DISC (MED) 493 attribute from two or more ASs for a single prefix 494 and the MED values are unique. 496 5.1. AS Confederations and Type II Churn 498 Let's now examine the occurrence of Type II Churn as it relates to AS 499 Confederations. Figure 3 provides our sample topology: 501 --------------------------------------------------------------- 502 / -------------------- \ 503 | AS N / Sub-AS 65500 \ | 504 | | | | 505 | | Rc . . . . Rd | | 506 | | . *2 . | | 507 | \ . . / | 508 | -.---------------.-- | 509 | .*40 .*40 | 510 | --------------.----- .------------------- | 511 | / . \ / . \ | 512 | | Sub-AS . | | . Sub-AS | | 513 | | 65501 . | | . 65502 | | 514 | | Rb | | Re | | 515 | | . | | . . | | 516 | | .*10 | | *3. .*2 | | 517 | | . | | . . | | 518 | | Ra . | | . Rf . . . Rg | | 519 | \ . / . . / | 520 | -----------------.--- . -----------.--------- | 521 \ (0) . .() .(1) / 522 ---------------------------.----.---------------.-------------- 523 . . 524 ------ . . ------------ 525 |AS X| | AS Y | 526 ------ ------------ 528 Figure 3: Example AS Confederations Topology 530 In Figure 3 AS N contains three Sub-ASs, 65500, 65501 and 531 65502. No RR is used within the Sub-AS, and as such, all routers 532 within each Sub-AS are fully meshed. Ra and Rb are members of Sub-AS 533 65501. Rc and Rd are members of Sub-AS 65500. Ra and Rg are EBGP 534 peering with AS Y, router Rf has an EBGP peering with AS X. The 535 dotted lines are used to represent BGP peering sessions. 537 The number proceeding each '*' on the BGP peering sessions 538 represents the additive IGP metrics that are to be associated with 539 the BGP NEXT_HOP. The number contained in parentheses on each AS N 540 EBGP peering session represents the MED value advertised by the peer 541 to be associated with the network reachability advertisement(s). 543 Rc, Rd and Re are the primary routers involved in the churn, and as 544 such, will be the only BGP tables that we will monitor step by step. 546 For the following steps 1 through 8 each routers best route will be 547 marked with a '*'. 549 1) Re receives the 'X' and 'Y1' paths. Re selects 'Y1' because of 550 IGP metric. 552 NEXT_HOP 553 Router AS_PATH MED IGP Cost 554 ------------------------------ 555 Re X 3 556 * Y 1 2 558 Re will advertise its new best path to Rd. 560 2) The 'Y0' path was passed from Ra to Rb, and then from Rb 561 to Rc. Rd learns the 'Y1' path from Re. Rc selects 'Y0', 562 Rd selects 'Y1'. 564 NEXT_HOP 565 Router AS_PATH MED IGP Cost 566 ------------------------------- 567 Rc * Y 0 50 568 Rd * Y 1 42 569 Re X 3 570 * Y 1 2 572 3) Rc and Rd advertise their best paths to each other; 573 Rd selects 'Y0' because of MED. 575 NEXT_HOP 576 Router AS_PATH MED IGP Cost 577 ------------------------------ 578 Rc * Y 0 50 579 Y 1 44 580 Rd * Y 0 52 581 Y 1 42 582 Re X 3 583 * Y 1 2 585 Rd has a new best path so he will send an advertisement 586 to Re and send a withdraw for 'Y1' to Rc. 588 4) Re selects 'X' per 'Y0' beats 'Y1' because of the MED. 589 'X' beats 'Y0' because of IGP metric. 591 NEXT_HOP 592 Router AS_PATH MED IGP Cost 593 ------------------------------ 594 Rc * Y 0 50 595 Rd * Y 0 52 596 Y 1 42 597 Re * X 3 598 Y 0 92 600 5) Rd selects 'X' because of IGP metric. 602 NEXT_HOP 603 Router AS_PATH MED IGP Cost 604 ------------------------------ 605 Rc * Y 0 50 606 Rd Y 0 52 607 * X 43 608 Re * X 3 609 Y 0 92 610 Y 1 2 612 Rd has a new best path so he will send an UPDATE to Rc 613 and an UPDATE/withdraw to Re for 'Y0'. 615 6) Rc selects 'X' because of IGP metric. Re selects 'Y1' 616 because of IGP metric. 618 NEXT_HOP 619 Router AS_PATH MED IGP Cost 620 ------------------------------ 621 Rc Y 0 50 622 * X 45 623 Rd Y 0 52 624 * X 43 625 Re X 3 626 * Y 1 2 628 7) Rd selects 'Y1'. 630 NEXT_HOP 631 Router AS_PATH MED IGP Cost 632 ------------------------------ 633 Rc Y 0 50 634 * X 45 635 Rd * Y 1 42 636 Re X 3 637 * Y 1 2 639 8) Rc selects 'Y0'. 641 NEXT_HOP 642 Router AS_PATH MED IGP Cost 643 ------------------------------ 644 Rc * Y 0 50 645 Y 1 44 646 Rd * Y 1 42 647 Re X 3 648 * Y 1 2 650 At this point we are back to Step 2 and are in a loop. 652 5.2. Potential Workarounds for Type II Churn 654 1) Do not accept MEDs from peers (this may not be a feasible 655 alternative). 657 2) Utilize other BGP attributes higher in the decision process so 658 that the BGP decision algorithm selects a single AS before it 659 reaches the MED step. For example, if local-pref were set based 660 on the advertising AS, then you first eliminated all routes 661 except those in a single AS. In the example, router Re 662 would pick either X or Y based on local-pref and never change 663 that selection. 665 This leaves two simple workarounds for the two types of problems. 667 Type I: Make inter-cluster or inter-sub-AS link metrics higher 668 than intra-cluster or intra-sub-AS metrics. 670 Type II: Make route selections based on local pref assigned to 671 advertising AS first and then used IGP cost and MED 672 to make selection among routes from the same AS. 674 Note that this requires per-prefix policies, as well as near 675 intimate knowledge of other networks by the network operator. 676 The authors are not aware of ANY [large] provider today that 677 performs per-prefix policies on routes learned from peers. 678 Implicitly removing this dynamic portion of route selection 679 does not appear to be a viable option in today's networks. 680 The main point is that an available workaround using 681 local_pref so no two AS advertise a given prefix at the same 682 local_pref solves type II churn. 684 3) Always compare BGP MEDs, regardless of whether or not they were 685 obtained from a single AS. This is probably a bad idea since 686 MEDs may be derived in a number of ways, and are typically done 687 so as a matter of operator-specific policy and largely a function 688 of available metric space provided by the employed IGP. As such, 689 comparing MED values for a single prefix learned from multiple 690 ASs is ill-advised. This mostly defeats the purpose of MEDs; 691 Option 1 may be a more viable alternative. 693 4) Do not use more than one tier of Route Reflection or Sub-ASs 694 in the network. The risk of route oscillation should be 695 considered when desiging networks that might use a multi-tiered 696 routing isolation architecture. 698 5) In a RR topology, mesh the clients. For confederations, mesh 699 the border routers at each level in the hierarchy. In 700 Figure 3, for example, if Rb and Re are peers, then there's 701 no churn. 703 Future drafts will propose other solutions for Type II Churn 705 6. Future Works 707 It should be stated that protocol enhancements regarding this problem 708 must be pursued. Imposing network design requirements such as those 709 outlined above are clearly an unreasonable long-term solution. Prob- 710 lems such as this should not occur under 'default' configurations. 712 7. Security Considerations 714 This discussion introduces no new security concerns to BGP or other 715 specifications referenced in this document. 717 8. Acknowledgments 719 The authors would like to thank: Curtis Villamizar, Tim Griffin, John 720 Scudder and Ron Da Silva. 722 9. References 724 [1] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", 725 RFC 1771, March 1995. 727 [2] Bates, T., Chandra, R., Chen, E., "BGP Route Reflection - An 728 Alternative to Full Mesh IBGP", RFC 2796, April 2000. 730 [3] Traina, P., McPherson, D., Scudder, J.. "Autonomous System 731 Confederations for BGP", RFC 1965bis, "Work In Progress", 732 October 2000. 734 [4] Cisco Systems, Inc., "Endless BGP Convergence Problem in Cisco 735 IOS Software Releases" , FN, October 10, 2000. 737 [5] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-4)", 738 Work in Progress (draft-ietf-idr-bgp4-12.txt), March 2001. 740 10. Authors' Addresses 742 Danny McPherson 743 Amber Networks, Inc. 744 48664 Milmont Drive 745 Fremont, CA 94538 746 Email: danny@ambernetworks.com 748 Vijay Gill 749 Metromedia Fiber Network, Inc. 750 8075 Leesburg Pike, STE 3 751 Vienna, VA, 22182 752 Email: vijay@umbc.edu 754 Daniel Walton 755 Cisco Systems, Inc. 756 7025 Kit Creek Rd. 757 Research Triangle Park, NC 27709 758 Email: dwalton@cisco.com 760 Alvaro Retana 761 Cisco Systems, Inc. 762 7025 Kit Creek Rd. 763 Research Triangle Park, NC 27709 764 Email: aretana@cisco.com