idnits 2.17.00 (12 Aug 2021) /tmp/idnits18466/draft-dube-route-reflection-harmful-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2022-05-14) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 4 longer pages, the longest (page 2) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'R1' on line 92 -- Looks like a reference, but probably isn't: 'R2' on line 92 -- Looks like a reference, but probably isn't: 'R3' on line 92 -- Looks like a reference, but probably isn't: 'E1' on line 96 -- Looks like a reference, but probably isn't: 'F1' on line 96 -- Looks like a reference, but probably isn't: 'E2' on line 96 ** Obsolete normative reference: RFC 1771 (ref. '1') (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 1966 (ref. '2') (Obsoleted by RFC 4456) Summary: 13 errors (**), 0 flaws (~~), 3 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Rohit Dube 2 Internet Draft Bell Labs, Lucent Technologies 3 Expiration Date: May 1999 John G. Scudder 4 Internet Engineering Group, LLC 6 Route Reflection Considered Harmful 8 draft-dube-route-reflection-harmful-00.txt 10 1. Status of this Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference 20 material or to cite them other than as ``work in progress.'' 22 To view the entire list of current Internet-Drafts, please check the 23 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 24 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern 25 Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific 26 Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 28 2. Abstract 30 Route reflection as defined by [2] is a popular way of reducing the 31 full-mesh IBGP peering required by routers running the Border Gateway 32 Protocol [1]. There are cases where a topology built using route 33 reflectors produces persistent loops or does not produce the same 34 results as what one would expect with a full IBGP mesh. This document 35 describes these problems. 37 3. Introduction 39 Route reflectors by design are selective as to which routes they 40 forward to their peers (i.e. reflect). Specifically, if many routes 41 to the same NLRI are available, a route reflector will reflect only 42 the route it has selected for its own use. Typically this reduces the 43 number of routes each peer in the AS must store in its RIB as well as 44 the volume of BGP update traffic. By this very nature of route 45 reflection, every peer in the network doesn't have a full view of all 46 the routes to a prefix to choose from. This coupled with the 47 specifics of BGP causes problems as we now describe. 49 4. Persistent Loops 51 Consider the topology in Figure 1. 53 +----------------------+ 54 | +------------+ | 55 | | | | 56 E1=====RR1=====R3=====R4=====RR2=====E2 57 <---> | | <---> 58 +-------------+ 60 Figure 1 61 -------- 63 RR1, RR2, R3 and R4 are bgp routers in the same AS. E1 and E2 are BGP 64 routers in some other AS peering with RR1 and RR2 respectively via 65 EBGP. RR1 is configured as a route reflector with R4 as a client and 66 RR2 is configured as a reflector in a different cluster with R3 as a 67 client. The IBGP sessions are denoted in the diagram above by +---+ 68 and the EBGP sessions by <--->. For simplicity, assume that all the 69 physical links (denoted by ===) have the same IGP cost. 71 Now if both E1 and E2 advertise the same prefix to RR1 and RR2 72 respectively, all other things being equal, RR1 picks the route 73 through E1 for this prefix on account of lower IGP cost. RR1 then 74 reflects this route to R4 which now routes to the prefix in question 75 through R3 and RR1 Similarly RR2 picks the route through E2 and 76 reflects it to R3 which now routes to the prefix in question through 77 R4 and RR2. Clearly a data packet for this prefix will loop between 78 R3 and R4. 80 Note that the problem would disappear if the topology is reverted to 81 full-mesh IBGP - R3 would pick the route through RR1 and R4 would 82 pick the route through RR2, both on account of lower IGP cost. 84 5. Incorrect Routing Decision 86 Consider the topology in Figure 2. 88 [RR1]------------------[RR2] 89 /\ | 90 / \ | 91 / \ | 92 [R1] [R2] [R3] 93 | | | 94 | | | 95 | | | 96 [E1] [F1] [E2] 98 Figure 2 99 -------- 100 RR1, RR2, R1, R2, R3 are bgp routers in the same AS R. RR1 is a route 101 reflector with clients R1 and R2 and RR2 is a route reflector in a 102 different cluster with client R3. E1 and E2 are bgp routers in AS E 103 and EBGP peer with R1 and R3 respectively. F1 is a bgp router in AS F 104 which EBGP peers with R2. Assume that E1, E2 and F advertise the same 105 prefix to R1, R2, R3 in accordance with the following table - 107 Router AS Router-id MED 108 -------------------------------- 109 E1 E 3.3.3.3 50 110 F1 F 2.2.2.2 - 111 E2 E 1.1.1.1 100 113 All other attributes of the prefix in question are the same. 115 Further assume that RR1's IGP cost to R1 (and E1) is the same as its 116 cost R2 (and F1) and RR2's IGP cost to R3 (and E2) is the same as 117 its IGP cost to R1 (and E1) and R2 (F1). (The --- lines in Figure 2 118 denote both physical and BGP connectivity). 120 Now, RR1 chooses the route thru F1 on account of lower router-id as 121 compared to the route through E1 (which wins over the route from E2 122 on account of MEDs). RR2 on the other hand chooses the route through 123 E2 on account of lower router-id as compared to F. Note that RR1 124 sends only the route through F1 to RR2 and not the route through E1. 126 Instead if we had a full-mesh, RR2 would see all the 3 routes and 127 pick the one thru F1 - the route through E1 wins over the route 128 through E2 on MEDs and the route through F1 wins over the route 129 through E1 on account of lower router-id. 131 A network operator shifting from a topology without to reflectors to 132 the one above with reflectors would have a problem. Packets destined 133 for the prefix in question would flow from RR2 through E2 instead of 134 the original F1. 136 6. Characterization 138 Problem 1 (Section 4) has two ingredients - a) the selective nature 139 of route reflectors which prevents some routes from getting to some 140 clients and b) The fact the some of the BGP decision process -- 141 specifically the "prefer lowest IGP cost" rule -- depend on the 142 router's location in the network. Thus the route reflector's 143 decision can never perfectly mirror the decision its client would 144 have made. Note that b) implies that reflector topologies can be 145 out of sync with the physical topologies but bad things happen only 146 when they get out of sync enough that clients would make decisions 147 (in this case based on IGP cost) different from their servers if 148 reflection was replaced by full-mesh. 150 Problem 2 (Section 5) has two components too - a) the selective 151 nature of route reflectors as above and b) the partial order that 152 MEDs impose upon competing routes (this is because MEDs can be 153 compared only between routes from the same AS). If all decision 154 criteria used by BGP imposed a total order on the routes (i.e all BGP 155 routes for a prefix could be arranged in strict order of precedence), 156 then b) would not be an issue and in-spite of a) this problem would 157 not happen. 159 For both examples discussed, it is possible to come up with several 160 other topologies which suffer from the problems described above. 162 7. Avoidance Guidelines 164 Since there are no protocol mechanisms currently available to detect 165 the problems mentioned above, we provide guidelines to avoid 166 situations where these problems could surface. 168 As noted in section 6, problem 1 happens because the IBGP reflector 169 topology doesn't follow the physical topology. A simple way of 170 avoiding this problem would be to ensure that reflector clusters are 171 constrained to follow the physical connectivity between the routers. 172 It is always safe (at least with respect to this problem) to deploy 173 route reflection such that no IBGP session between a pair of route 174 reflectors will ever physically transit a reflector client. One 175 common mode of deployment is to fully mesh all the routers in a 176 "backbone" region, and to do route reflection to/from/between the 177 routers in a POP, using one or more of the backbone routers as the 178 reflector(s). 180 Problem 2 can be avoided by always making sure that reflectors are 181 never forced to decide on the best BGP route based on MEDs. This can 182 be achieved either by setting the local preference of a route at the 183 border router to reflect the MED values or by configuring community 184 based policies using which the reflector can decide on the best 185 route. 187 8. Acknowledgments 189 The First author would like to thank to Harry Mantakos, James Da 190 Silva and Arvind Srivaths (all at Torrent Networking Technologies 191 Corp.), Rob Coltun (Fore Systems) and Tony Przgyienda (Bell Labs, 192 Lucent Technologies) for discussions on this topic. The second 193 author would like to thank Ravi Chandra and Tony Bates (both at 194 Cisco Systems) for similar discussions. 196 9. References 198 [1] Rekhter, Y., and Li, T., "A Border Gateway Protocol 4 (BGP-4)", 199 RFC 1771, March 1995. 201 [2] Bates, T., and Chandra, R., "BGP Route Reflection An 202 alternative to full mesh IBGP", RFC 1966, June 1996. 204 10.Author Information 206 Rohit Dube 207 Bell Labs, Lucent Technologies Inc. 208 4C-508, 101 Crawfords Corner Road 209 Holmdel, NJ 07724 210 e-mail: rohitd@dnrc.bell-labs.com 212 John G. Scudder 213 Internet Engineering Group, LLC 214 122 S. Main, Suite 280 215 Ann Arbor, MI 48104 216 e-mail: jgs@ieng.com