idnits 2.17.00 (12 Aug 2021) /tmp/idnits32593/draft-hopps-ecmp-algo-analysis-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 363 has weird spacing: '...for the purpo...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (7 February 2000) is 8132 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Obsolete normative reference: RFC 2362 (ref. '3') (Obsoleted by RFC 4601, RFC 5059) Summary: 6 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Christian E. Hopps 2 INTERNET-DRAFT Merit Network 3 Expires July 2000 7 February 2000 5 Analysis of an Equal-Cost Multi-Path Algorithm 6 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with 11 all provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering 14 Task Force (IETF), its areas, and its working groups. Note that 15 other groups may also distribute working documents as Internet- 16 Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet- Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 Copyright Notice 31 Copyright (C) The Internet Society (2000). All Rights Reserved. 33 Draft Analysis of an ECMP Algorithm February 2000 35 Abstract 37 Equal-cost multi-path (ECMP) is a routing technique for routing 38 packets along multiple paths of equal cost. The forwarding engine 39 identifies paths by next-hop. When forwarding a packet the router 40 must decide which next-hop (path) to use. This document gives an 41 analysis of one method for making that decision. The analysis 42 includes the performance of the algorithm and the disruption caused 43 by changes to the set of next-hops. 45 1. Hash-Threshold 47 One method for determining which next-hop to use when routing with 48 ECMP can be called hash-threshold. The router first selects a key by 49 performing a hash (e.g., CRC16) over the packet header fields that 50 identify a flow. The N next-hops have been assigned unique regions in 51 the key space. The router uses the key to determine which region and 52 thus which next-hop to use. 54 As an example of hash-threshold, upon receiving a packet the router 55 performs a CRC16 on the packet's header fields that define the flow 56 (e.g., the source and destination fields of the packet), this is the 57 key. Say for this destination there are 4 next-hops to choose from. 58 Each next-hop is assigned a region in 16 bit space (the key space). 59 For equal usage the router may have chosen to divide it up evenly so 60 each region is 65536/4 or 16k large. The next-hop is chosen by 61 determining which region contains the key (i.e., the CRC result). 63 2. Analysis 65 There are a few concerns when choosing an algorithm for deciding 66 which next-hop to use. One is performance, the computational 67 requirements to run the algorithm. Another is disruption (i.e., the 68 changing of which path a flow uses). Balancing is a third concern; 69 however, since the algorithm's balancing characteristics are directly 70 related to the chosen hash function this analysis does not treat this 71 concern in depth. 73 For this analysis we will assume regions of equal size. If the 74 output of the hash function is uniformly distributed the distribution 75 of flows amongst paths will also be uniform, and so the algorithm 76 will properly implement ECMP. One can implement non-equal-cost 77 multi-path routing by using regions of unequal size; however, non- 79 Draft Analysis of an ECMP Algorithm February 2000 81 equal-cost multi-path routing is outside the scope of this document. 83 2.1. Performance 85 The performance of the hash-threshold algorithm can be broken down 86 into three parts: selection of regions for the next-hops, obtaining 87 the key and comparing the key to the regions to decide which next-hop 88 to use. 90 The algorithm doesn't specify the hash function used to obtain the 91 key. Its performance in this area will be exactly the performance of 92 the hash function. It is presumed that if this calculation proves to 93 be a concern it can be done in hardware parallel to other operations 94 that need to complete before deciding which next-hop to use. 96 Since regions are restricted to be of equal size the calculation of 97 region boundaries is trivial. Each boundary is exactly regionsize 98 away from the previous boundary starting from 0 for the first region. 99 As we will show, for equal sized regions, we don't need to store the 100 boundary values. 102 To choose the next-hop we must determine which region contains the 103 key. Because the regions are of equal size determining which region 104 contains the key is a simple division operation. 106 regionsize = keyspace.size / #{nexthops} 107 region = key / regionsize; 109 Thus the time required to find the next-hop is dependent on the way 110 the next-hops are organized in memory. The obvious use of an array 111 indexed by region yields O(1). 113 2.2. Disruption 115 Protocols such as TCP perform better if the path they flow along does 116 not change while the stream is connected. Disruption is the 117 measurement of how many flows have their paths changed due to some 118 change in the router. We measure disruption as the fraction of total 119 flows whose path changes in response to some change in the router. 120 This can become important if one or more of the paths is flapping. 121 For a description of disruption and how it affects protocols such as 123 Draft Analysis of an ECMP Algorithm February 2000 125 TCP see [1]. 127 Some algorithms such as round-robin (i.e., upon receiving a packet 128 the least recently used next-hop is chosen) are disruptive regardless 129 of any change in the router. Clearly this is not the case with hash- 130 threshold. As long as the region boundaries remain unchanged the 131 same next-hop will be chosen for a given flow. 133 Because we have required regions to be equal in size the only reason 134 for a change in region boundaries is the addition or removal of a 135 next-hop. In this case the regions must all grow or shrink to fill 136 the key space. The analysis begins with some examples of this. 138 0123456701234567012345670123456701234567 139 +-------+-------+-------+-------+-------+ 140 | 1 | 2 | 3 | 4 | 5 | 141 +-------+-+-----+---+---+-----+-+-------+ 142 | 1 | 2 | 4 | 5 | 143 +---------+---------+---------+---------+ 144 0123456789012345678901234567890123456789 146 Figure 1. Before and after deletion of region 3 148 In figure 1. region 3 has been deleted. The remaining regions grow 149 equally and shift to compensate. In this case 1/4 of region 2 is now 150 in region 1, 1/2 (2/4) of region 3 is in region 2, 1/2 of region 3 is 151 in region 4 and 1/4 of region 4 is in region 5. Since each of the 152 original regions represent 1/5 of the flows, the total disruption is 153 1/5*(1/4 + 1/2 + 1/2 + 1/4) or 3/10. 155 Note that the disruption to flows when adding a region is equivalent 156 to that of removing a region. That is, we are considering the 157 fraction of total flows that changes regions when moving from N to 158 N-1 regions, and that same fraction of flows will change when moving 159 from N-1 to N regions. 161 Draft Analysis of an ECMP Algorithm February 2000 163 0123456701234567012345670123456701234567 164 +-------+-------+-------+-------+-------+ 165 | 1 | 2 | 3 | 4 | 5 | 166 +-------+-+-----+---+---+-----+-+-------+ 167 | 1 | 2 | 3 | 5 | 168 +---------+---------+---------+---------+ 169 0123456789012345678901234567890123456789 171 Figure 2. Before and after deletion of region 4 173 In figure 2. region 4 has been deleted. Again the remaining regions 174 grow equally and shift to compensate. 1/4 of region 2 is now in 175 region 1, 1/2 of region 3 is in region 2, 3/4 of region 4 is in 176 region 3 and 1/4 of region 4 is in region 5. Since each of the 177 original regions represent 1/5 of the flows the, total disruption is 178 7/20. 180 To generalize, upon removing a region K the remaining N-1 regions 181 grow to fill the 1/N space. This growth is evenly divided between 182 the N-1 regions and so the change in size for each region is 183 1/N/(N-1) or 1/(N(N-1)). This change in size causes non-end regions 184 to move. The first region grows and so the second region is shifted 185 towards K by the change in size of the first region. 1/(N(N-1)) of 186 the flows from region 2 are subsumed by the change in region 1's 187 size. 2/(N(N-1)) of the flows in region 3 are subsumed by region 2. 188 This is because region 2 has shifted by 1/(N(N-1)) and grown by 189 1/(N(N-1)). This continues from both ends until you reach the 190 regions that bordered K. The calculation for the number of flows 191 subsumed from the Kth region into the bordering regions accounts for 192 the removal of the Kth region. Thus we have the following equation. 194 K-1 N 195 --- i --- (i-K) 196 disruption = \ --- + \ --- 197 / (N)(N-1) / (N)(N-1) 198 --- --- 199 i=1 i=K+1 201 We can factor 1/((N)(N-1)) out as it is constant. 203 Draft Analysis of an ECMP Algorithm February 2000 205 / K-1 N \ 206 1 | --- --- | 207 = --- | \ i + \ (i-K) | 208 (N)(N-1) | / / | 209 \ --- --- / 210 1 i=K+1 212 We now use the the concrete formulas for the sum of integers. The 213 first summation is (K)(K-1)/2. For the second summation notice that 214 we are summing the integers from 1 to N-K, thus it is (N-K)(N-K+1)/2. 216 (K-1)(K) + (N-K)(N-K+1) 217 = ----------------------- 218 2(N)(N-1) 220 Considering the summations, one can see that the least disruption is 221 when K is as close to half way between 1 and N as possible. This can 222 be proven by finding the minimum of the concrete formula for K 223 holding N constant. First break apart the quantities and collect. 225 2K*K - 2K - 2NK + N*N + N 226 = ------------------------- 227 2(N)(N-1) 229 K*K - K - NK N + 1 230 = -------------- + ------- 231 (N)(N-1) 2(N-1) 233 Since we are minimizing for K the right side (N+1)/2(N-1) is constant 234 as is the denominator (N)(N-1) so we can drop them. To minimize we 235 take the derivative. 237 Draft Analysis of an ECMP Algorithm February 2000 239 d 240 -- (K*K - (N+1)K) 241 dk 243 = 2K - (N+1) 245 Which is zero when K is (N+1)/2. 247 The last thing to consider is that K must be an integer. When N is 248 odd (N+1)/2 will yield an integer, however when N is even (N+1)/2 249 yields an integer + 1/2. In the case, because of symmetry, we get 250 the least disruption when K is N/2 or N/2 + 1. 252 Now since the formula is quadratic with a global minimum half way 253 between 1 and N the maximum possible disruption must occur when edge 254 regions (1 and N) are removed. If K is 1 or N the formula reduces to 255 1/2. 257 The minimum possible disruption is obtained by letting K=(N+1)/2. In 258 this case the formula reduces to 1/4 + 1/(4*N). So the range of 259 possible disruption is (1/4, 1/2]. 261 To minimize disruption we recommend adding new regions to the center 262 rather than the ends. 264 3. Comparison to other algorithms 266 Other algorithms exist to decide which next-hop to use. These 267 algorithms all have different performance and disruptive 268 characteristics. Of these algorithms we will only consider ones that 269 are not disruptive by design (i.e., if no change to the set of next- 270 hops occurs the path a flow takes remains the same). This will 271 exclude round-robin and random choice. We will look at modulo-N and 272 highest random weight. 274 Modulo-N is a "simpler" form of hash-threshold. Given N next-hops 275 the packet header fields which describe the flow are run through a 276 hash function. A final modulo-N is applied to the output of the hash. 277 This result then directly maps to one of the next-hops. Modulo-N is 278 the most disruptive of the algorithms; if a next-hop is added or 279 removed the disruption is (N-1)/N. The performance of Modulo-N is 281 Draft Analysis of an ECMP Algorithm February 2000 283 equivalent to hash-threshold. 285 Highest random weight (HRW) is a comparative method similar in some 286 ways to hash-threshold with non-fixed sized regions. For each next- 287 hop, the router seeds a pseudo-random number generator with the 288 packet header fields which describe the flow and the next-hop to 289 obtain a weight. The next-hop which receives the highest weight is 290 selected. The advantage with using HRW is that it has minimal 291 disruption (i.e., disruption due to adding or removing a next-hop is 292 always 1/N.) The disadvantage with HRW is that the next-hop 293 selection is more expensive than hash-threshold. A description of 294 HRW along with comparisons to other methods can be found in [2]. 295 Although not used for next-hop calculation an example usage of HRW 296 can be found in [3]. 298 Since each of modulo-N, hash-threshold and HRW require a hash on the 299 packet header fields which define a flow, we can factor the 300 performance of the hash out of the comparison. If the hash can not 301 be done inexpensively (e.g., in hardware) it too must be considered 302 when using any of the above methods. 304 The lookup performance for hash-threshold, like modulo-N is an 305 optimal O(1). HRW's lookup performance is O(N). 307 Disruptive behavior is the opposite of performance. HRW is best with 308 1/N. Hash-threshold is between 1/4 and 1/2. Finally Modulo-N is 309 (N-1)/N. 311 If the complexity of HRW's next-hop selection process is acceptable 312 we think it should be considered as an alternative to hash-threshold. 313 This could be the case when, for example, per-flow state is kept and 314 thus the next-hop choice is made infrequently. 316 However, when HRW's next-hop selection is seen as too expensive the 317 obvious choice is hash-threshold as it performs as well as modulo-N 318 and is less disruptive. 320 4. Security Considerations 322 This document is an analysis of an algorithm used to implement an 323 ECMP routing decision. This analysis does not directly affect the 324 security of the Internet Infrastructure. 326 Draft Analysis of an ECMP Algorithm February 2000 328 5. References 330 [1] Thaler, D., and Hopps, C., "Multipath Issues in Unicast and 331 Multicast", Work in progress, June 1999. 333 [2] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to 334 Increase Hit Rates", IEEE/ACM Transactions on Networking, February 335 1998. 337 [3] Estrin, D., Farinacci, D., Helmy, A., Thaler, D., Deering, S., 338 Handley, M., Jacobson, V., Liu, C., Sharma, P., and L. Wei, 339 "Protocol Independent Multicast-Sparse Mode (PIM-SM): Protocol 340 Specification", RFC 2362, June 1998. 342 6. Author's Address 344 Christian E. Hopps 345 Merit Network 346 4251 Plymouth Road, Suite C. 347 Ann Arbor, MI 48105 348 Phone: +1 734 936 0291 349 EMail: chopps@merit.edu 351 7. Full Copyright Statement 353 Copyright (C) The Internet Society 2000. All Rights Reserved. 355 This document and translations of it may be copied and furnished to 356 others, and derivative works that comment on or otherwise explain it 357 or assist in its implementation may be prepared, copied, published 358 and distributed, in whole or in part, without restriction of any 359 kind, provided that the above copyright notice and this paragraph are 360 included on all such copies and derivative works. However, this 361 document itself may not be modified in any way, such as by removing 362 the copyright notice or references to the Internet Society or other 363 Internet organizations, except as needed for the purpose of 364 developing Internet standards in which case the procedures for 365 copyrights defined in the Internet Standards process must be 366 followed, or as required to translate it into languages other than 367 English. 369 The limited permissions granted above are perpetual and will not be 370 revoked by the Internet Society or its successors or assigns. 372 Draft Analysis of an ECMP Algorithm February 2000 374 This document and the information contained herein is provided on an 375 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 376 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 377 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 378 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 379 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.