idnits 2.17.00 (12 Aug 2021) /tmp/idnits26257/draft-thaler-multipath-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (7 February 2000) is 8132 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2178 (ref. '1') (Obsoleted by RFC 2328) -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 2362 (ref. '5') (Obsoleted by RFC 4601, RFC 5059) ** Obsolete normative reference: RFC 2581 (ref. '6') (Obsoleted by RFC 5681) Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force D. Thaler 2 INTERNET-DRAFT Microsoft 3 Expires July 2000 C. Hopps 4 Merit Network 5 7 February 2000 7 Multipath Issues in Unicast and Multicast Next-Hop Selection 8 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with all 13 provisions of Section 10 of RFC2026. 15 This document is an Internet Draft. Internet Drafts are working 16 documents of the Internet Engineering Task Force (IETF), its Areas, and 17 its Working Groups. Note that other groups may also distribute working 18 documents as Internet Drafts. 20 Internet Drafts are valid for a maximum of six months and may be 21 updated, replaced, or obsoleted by other documents at any time. It is 22 inappropriate to use Internet Drafts as reference material or to cite 23 them other than as a "work in progress". 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Copyright Notice 33 Copyright (C) The Internet Society (2000). All Rights Reserved. 35 1. Introduction 37 Various routing protocols, including OSPF [1] and ISIS, explicitly allow 38 "Equal-Cost Multipath" routing. Some router implementations also allow 39 equal-cost multipath usage with RIP and other routing protocols. Using 40 equal-cost multipath means that if multiple equal-cost routes to the 41 Draft Multipath Issues February 2000 43 same destination exist, they can be discovered and used to provide load 44 balancing among redundant paths. 46 The effect of multipath routing on a forwarder is that the forwarder 47 potentially has several next-hops for any given destination and must use 48 some method to choose which next-hop should be used for a given data 49 packet. This memo summarizes current practices, problems, and 50 solutions. 52 2. Concerns 54 Several router implementations allow multipath forwarding. This is 55 sometimes done naively via round-robin, where each packet matching a 56 given destination route is forwarded using the subsequent next-hop, in a 57 round-robin fashion. This does provide a form of load balancing, but 58 there are several problems with approaches such as round-robin or 59 random: 61 Variable Path MTU 62 Since each of the redundant paths may have a different MTU, this 63 means that the overall path MTU can change on a packet-by-packet 64 basis, negating the usefulness of path MTU discovery. 66 Variable Latencies 67 Since each of the redundant paths may have a different latency 68 involved, having packets take separate paths can cause packets to 69 always arrive out of order, increasing delivery latency and 70 buffering requirements. 72 Packet reordering causes TCP to believe that loss has taken place 73 when packets with higher sequence numbers arrive before an earlier 74 one. When three or more packets are received before a "late" 75 packet, TCP enters a mode called "fast-retransmit" [6] which 76 consumes extra bandwidth (which could potentially cause more loss, 77 decreasing throughput) as it attempts to unnecessarily retransmit 78 the delayed packet(s). Hence, reordering can be detrimental to 79 network performance. 81 Debugging 82 Common debugging utilities such as ping and traceroute are much 83 less reliable in the presence of multiple paths and may even 85 Draft Multipath Issues February 2000 87 present completely wrong results. 89 In multicast routing, the problem with multiple paths is that multicast 90 routing protocols prevent loops and duplicates by constructing a single 91 tree to all receivers of the same group address. Multicast routing 92 protocols deployed today (DVMRP, PIM-DM, PIM-SM) [2] construct shortest- 93 path trees rooted at either the source, or another router known as a 94 Core or Rendezvous Point. Hence, the way they ensure that duplicates 95 will not arise is that a given tree must use only a single next-hop 96 towards the root of the tree. 98 3. Requirements 100 In the remainder of this document, we will use the term "flow" to 101 represent the granularity at which the router keeps state (if at all) 102 for classes of traffic. The exact definition of a flow may depend on 103 the actual implementation. For example, a flow might be identified 104 solely by destination address, or it might be identified by (source 105 address, destination address, protocol id) triplet. Hence "flow" is not 106 necessarily synonymous with the term "microflow" as used in RFC 2474 107 [7], which also includes port numbers. Indeed, including transport- 108 layer information in the next-hop selection process can actually be 109 problematic. For example, if packets are fragmented, the transport- 110 layer information may not be available in every packet. Furthermore, 111 having the choice of path depend on transport-layer fields may negate 112 the benefit of caching information such as MTU for use in subsequent 113 connections between the same endpoints. 115 All of the problems outlined in the previous section arise when packets 116 in the same unicast or multicast "flow" are split among multiple paths. 117 The natural solution is therefore to ensure that packets for the same 118 flow always use the same path. 120 Two additional features are desirable: 122 Minimal disruption 123 When multipath is used, meaning that multiple routes contribute 124 valid next-hops, the chances are higher of routes being added and 125 deleted from consideration than when only the "best" route is used 126 (in which case metric changes in alternate routes have no effect on 127 traffic paths). Since a higher number of routes may actually be 128 used for forwarding when multipath is in use, the potential for 129 packet reordering and packet loss due to route flaps can be much 131 Draft Multipath Issues February 2000 133 greater than when not using multipath. Hence, it is desirable to 134 minimize the number of active flows affected by the addition or 135 deletion of another next-hop. 137 Fast implementation 138 The amount of additional computation required to forward a packet 139 should be small. For example, when doing round-robin, this 140 computation might consist of incrementing (modulo the number of 141 next-hops) a next-hop index. 143 4. Solutions 145 We now provide three possible methods for improving the performance of 146 multipath and then discuss their applicability to unicast and multicast 147 forwarding. 149 Modulo-N Hash 150 To select a next-hop from the list of N next-hops, the router 151 performs a modulo-N hash over the packet header fields that 152 identify a flow. This has the advantage of being fast, at the 153 expense of (N-1)/N of all flows changing paths whenever a next-hop 154 is added or removed. 156 Hash-Threshold 157 The router first selects a key by performing a hash over the packet 158 header fields that identify the flow. The N next-hops have been 159 assigned unique regions in the hash function's output space. By 160 comparing the hash value against region boundaries the router can 161 determine which region the hash value belongs to and thus which 162 next-hop to use. This method has the advantage of only affecting 163 flows near the region boundaries (or thresholds) when next-hops are 164 added or removed. For ECMP hash-threshold's lookup can be done 165 with a simple division (hash_value / fixed_region_size). When a 166 next-hop is added or removed, between 1/4 and 1/2 of all flows 167 change paths. An analysis of this method can be found in [3]. 169 Highest Random Weight (HRW) 170 The router computes a key for EACH next-hop by performing a hash 171 over the packet header fields that identify the flow, as well as 172 over the address of the next-hop. The router then chooses the 174 Draft Multipath Issues February 2000 176 next-hop with the highest resulting key value [4]. This has the 177 advantage of minimizing the number of flows affected by a next-hop 178 addition or deletion (only 1/N of them), but is approximately N 179 times as expensive as a modulo-N hash. 181 The applicability of these three alternatives depends on (at least) two 182 factors: whether the forwarder maintains per-flow state, and how 183 precious CPU is to a multipath forwarder. 185 Some routers may maintain per-flow state for reasons other than for 186 supporting multipath. For example, routers typically keep per-flow 187 state for multicast flows so that they can maintain the list of 188 interfaces to which packets in the flow should be copied. 190 If per-flow state is maintained in a multipath forwarder, then 191 computation of the next-hop can be done by the router at state creation 192 time. This entails no additional computations at packet forwarding time 193 compared with normal forwarding to a single next-hop, since the next-hop 194 is precomputed. In this case, any method can be used, including round- 195 robin, random, modulo-N, hash-threshold or HRW. Hash functions such as 196 modulo-N, hash-threshold and HRW are better if the forwarder state may 197 be deleted for any reason during the lifetime of a flow since subsequent 198 next-hop computations by the router will always select the same path. 199 This also improves the usefulness of debugging utilities such as 200 traceroute. Finally, to maximize the stability of paths (and hence the 201 usefulness of traceroute, etc.), the use of HRW is recommended over the 202 other methods mentioned herein. 204 If per-flow state is not maintained by the forwarder, then using 205 multiple next-hops requires that the next-hop be calculated at packet 206 arrival time. When CPU is more precious than stability of flow paths, 207 hash-threshold is recommended over the other methods mentioned herein. 209 4.1. Unicast Forwarding 211 Depending on the implementation, unicast forwarding may or may not keep 212 per-flow state. We recommend that where forwarder implementations keep 213 flow state, routers should use HRW at state creation time (and next-hop 214 deletion time) to select the next-hop, and that forwarders without per- 215 flow state use hash-threshold. 217 Draft Multipath Issues February 2000 219 4.2. Multicast Forwarding 221 Today's multicast forwarding engines use a cache of forwarding entries 222 indexed by group (or group prefix) and source (or source prefix). This 223 means that today's multicast forwarder's always keep per-flow state, 224 although for some multicast routing protocols, the "flow" may be fairly 225 coarse (e.g., traffic from all sources to the same destination). Since 226 per-flow state is kept by the forwarder, it is recommended that the 227 router always use HRW to select the next-hop. 229 Routers using explicit-joining protocols such as PIM-SM [5] should thus 230 use the multipath information when determining to which neighbor a join 231 message should be sent. For example, when multiple next-hops exist for 232 a given Rendezvous Point (RP) toward which a (*,G) Join should be sent, 233 it is recommended that HRW be used to select the next-hop to use for 234 each group. 236 5. Applicability 238 The algorithms discussed above (except round-robin) all rely on some 239 form of hash function. Equal flow distribution is achieved when the 240 hash function is uniformly distributed. Since the commonly used hash 241 functions only become uniformly distributed when the number of inputs is 242 relatively large, these algorithms are more applicable to routers used 243 to route many flows, than in, for example, a small business setting. 245 6. Redundant Parallel Links 247 A related problem occurs when multiple parallel links are used between 248 the same pair of routers. A common solution is to bundle the two links 249 together into a "super"-link when is then used for routing. For 250 multicast forwarding, this results in the two links being reduced to a 251 single next-hop (over the combined link) which can be used to prevent 252 duplicates. When a unicast or multicast packet is queued to the 253 combined link, some method, such as those discussed earlier, is still 254 required to determine the physical link on which to transmit the packet. 255 If the parallel links are identical, then most of the concerns discussed 256 in this document are avoided with the combined link. The exception is 257 packet reordering, which can still occur with round-robin, adversely 258 affecting TCP. 260 Draft Multipath Issues February 2000 262 7. Security Considerations 264 This document discusses issues with various methods of choosing a next- 265 hop from among multiple valid next-hops. As such, it does not directly 266 impact the security of the Internet infrastructure or its applications. 268 One issue that is worth mentioning, however, is that when next-hop 269 selection is predictable, an attacker can synthesize traffic that will 270 all hash the same, making it possible to launch a denial-of-service 271 attack that overloads a particular path. Since a special case of this 272 is when the same (single) next-hop is always selected, such an attack is 273 easiest when multipath is not being used. Introducing multipath routing 274 can make such an attack more difficult; the more unpredictable the hash 275 is, the harder it becomes to conduct a denial-of-service attack against 276 any single link. 278 8. References 280 [1] Moy, J., "OSPF Version 2", RFC 2178, July 1997. 282 [2] Maufer, T., "Deploying IP Multicast in the Enterprise", Prentice- 283 Hall, 1998. 285 [3] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm",, Work 286 in progress, January 2000. 288 [4] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to 289 Increase Hit Rates", IEEE/ACM Transactions on Networking, February 290 1998. 292 [5] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S., 293 Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei, 294 "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol 295 Specification", RFC 2362, June 1998. 297 [6] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion Control", 298 RFC 2581, April 1999. 300 [7] Nichols, K., Blake, S., Baker, F., and D. Black., "Definition of 301 the Differentiated Services Field (DS Field) in the IPv4 and IPv6 302 Headers", RFC 2474, December 1998. 304 Draft Multipath Issues February 2000 306 9. Authors' Addresses 308 Dave Thaler 309 Microsoft 310 One Microsoft Way 311 Redmond, WA 98052 312 Phone: +1 425 703 8835 313 EMail: dthaler@dthaler.microsoft.com 315 Christian E. Hopps 316 Merit Network 317 4251 Plymouth Road, Suite C. 318 Ann Arbor, MI 48105 319 Phone: +1 734 936 0291 320 EMail: chopps@merit.edu 322 10. Full Copyright Statement 324 Copyright (C) The Internet Society (2000). All Rights Reserved. 326 This document and translations of it may be copied and furnished to 327 others, and derivative works that comment on or otherwise explain it or 328 assist in its implementation may be prepared, copied, published and 329 distributed, in whole or in part, without restriction of any kind, 330 provided that the above copyright notice and this paragraph are included 331 on all such copies and derivative works. However, this document itself 332 may not be modified in any way, such as by removing the copyright notice 333 or references to the Internet Society or other Internet organizations, 334 except as needed for the purpose of developing Internet standards in 335 which case the procedures for copyrights defined in the Internet 336 languages other than English. 338 The limited permissions granted above are perpetual and will not be 339 revoked by the Internet Society or its successors or assigns. 341 This document and the information contained herein is provided on an "AS 342 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 343 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 344 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 345 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 346 Draft Multipath Issues February 2000 348 FITNESS FOR A PARTICULAR PURPOSE.