idnits 2.17.00 (12 Aug 2021) /tmp/idnits28257/draft-liu-bess-evpn-mcast-bw-quantity-df-election-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([I-D.ietf-bess-evpn-per-mcast-flow-df-election], [RFC8584], [RFC7432]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (Feb 21, 2021) is 447 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.skr-bess-evpn-pim-proxy' is defined on line 407, but no explicit reference was found in the text == Outdated reference: A later version (-06) exists of draft-ietf-bess-evpn-per-mcast-flow-df-election-04 == Outdated reference: A later version (-21) exists of draft-ietf-bess-evpn-igmp-mld-proxy-06 -- No information found for draft-skr-evpn-bess-pim-proxy - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'I-D.skr-bess-evpn-pim-proxy' Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 BESS Working Group Yisong Liu 2 Internet Draft China Mobile 3 Intended status: Standards Track M. McBride 4 Expires: August 21, 2021 Futurewei 5 Z. Zhang 6 ZTE 7 J. Xie 8 Huawei 9 Feb 21, 2021 11 Multicast DF Election for EVPN based on bandwidth or quantity 12 draft-liu-bess-evpn-mcast-bw-quantity-df-election-03 14 Status of this Memo 16 This Internet-Draft is submitted in full conformance with the 17 provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six 25 months and may be updated, replaced, or obsoleted by other documents 26 at any time. It is inappropriate to use Internet-Drafts as 27 reference material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html 35 This Internet-Draft will expire on August 21, 2021. 37 Copyright Notice 39 Copyright (c) 2021 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with 47 respect to this document. Code Components extracted from this 48 document must include Simplified BSD License text as described in 49 Section 4.e of the Trust Legal Provisions and are provided without 50 warranty as described in the Simplified BSD License. 52 Abstract 54 Ethernet Virtual Private Network (EVPN, RFC7432) is becoming 55 prevalent in Data Centers, Data Center Interconnect (DCI) and 56 Service Provider VPN applications. When multi-homing from a CE to 57 multiple PEs, including links in an EVPN instance on a given 58 Ethernet Segment, in an all-active redundancy mode, [RFC7432] 59 describes a basic mechanism to elect a Designated Forwarder (DF), 60 and [RFC8584] improves basic DF election by a HRW algorithm. [I- 61 D.ietf-bess-evpn-per-mcast-flow-df-election] enhances the HRW 62 algorithm for the multicast flows to perform DF election at the 63 granularity of (ESI, VLAN, Mcast flow). This document specifies a 64 new algorithm, based on multicast bandwidth utilization and 65 multicast state quantity, in order for the multicast flows to elect 66 a DF. 68 Table of Contents 70 1. Introduction ................................................ 3 71 1.1. Requirements Language .................................. 4 72 1.2. Terminology ............................................ 4 73 2. Solution .................................................... 4 74 2.1. DF Election Based on Bandwidth ......................... 5 75 2.2. DF Election Based on State Qunatity .................... 5 76 2.3. Inconsistent Timing between Multi-homed PEs ............ 5 77 2.4. Increase or Decrease of Multi-homed PEs ................ 6 78 2.4.1. Decrease of Multi-homed PEs ....................... 6 79 2.4.2. Increase of Multi-homed PEs ....................... 7 80 3. BGP Encoding ................................................ 7 81 3.1. DF Election Extended Community ......................... 7 82 3.2. Multicast DF Extended Community ........................ 8 83 4. Security Considerations ..................................... 8 84 5. IANA Considerations ......................................... 9 85 6. References .................................................. 9 86 6.1. Normative References ................................... 9 87 6.2. Informative References ................................. 9 88 7. Acknowledgments ............................................ 10 89 Authors' Addresses ............................................ 11 91 1. Introduction 93 Ethernet Virtual Private Network (EVPN [RFC7432]) solutions are 94 becoming prevalent in Data Centers, Data Center Interconnect (DCI) 95 and Service Provider VPN applications. When multi-homing from a CE 96 to multiple PEs, with links in an EVPN instance on a given Ethernet 97 Segment (ES), in an all-active redundancy mode, [RFC7432] defines 98 the role of Designated Forwarder (DF) as the node that is 99 responsible to forward multicast flows. 101 Per [RFC7432], the basic method of DF election is specified. The 102 same ES is sorted in ascending order according to the IP address of 103 the EVPN peer. The PE set is generated, and then the number of PEs 104 is modulo according to the VLAN. The modulo value is equal to the 105 position of the PE in the PE set. The election is the primary DF of 106 the corresponding VLAN, and the other PEs are elected as standby. 108 [RFC8584] defines extended community attributes for DF elections, 109 which can be extended to use different DF election algorithms and 110 would be used for PEs in a redundancy group to reach a consensus as 111 to which DF election procedure is desired. A PE can notify other 112 participating PEs in a redundancy group about its DF election 113 algorithm by signaling a DF election extended community along with 114 the ES route. The document also improves the basic DF election by a 115 HRW algorithm. 117 [I-D.ietf-bess-evpn-per-mcast-flow-df-election] proposes a method 118 for DF election by enhancing the HRW algorithm, adding the source 119 and group address of the multicast flow as hash factors, and 120 extending the types 4 and 5 of the extended community of the DF 121 election for (S, G) and (*, G) types for different multicast flows. 122 The source and group address is introduced as new elements to HRW 123 algorithm, and the PE with the largest weight is selected as the DF 124 of the multicast flow. 126 However, the relationship between the bandwidth of the multicast 127 flows and the link capacity of different PEs, to the same CE device, 128 is not considered in any of the current DF election algorithms. This 129 may result in severe bandwidth utilization of different links due to 130 different bandwidth usage of multicast flows. This document 131 specifies a new algorithm for multicast flow DF election based on 132 multicast bandwidth or multicast state quantity and extends the 133 existing extended community defined in [I-D.ietf-bess-evpn-df- 134 election-framework]. 136 1.1. Requirements Language 138 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 139 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 140 "OPTIONAL" in this document are to be interpreted as described in 141 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 142 capitals, as shown here. 144 1.2. Terminology 146 CE: Customer Edge equipment 148 PE: Provider Edge device 150 EVPN: Ethernet Virtual Private Network 152 Ethernet Segment (ES): When a customer site (device or network) is 153 connected to one or more PEs via a set of Ethernet links, then that 154 set of links is referred to as an 'Ethernet segment'. 156 IGMP: Internet Group Management Protocol 158 MLD: Multicast Listener Discovery 160 PIM: Protocol Independent Multicast 162 2. Solution 164 In the DF election calculation, the bandwidth weight of each multi- 165 homed link of the PE is added, and the bandwidth occupation of the 166 multicast flows is calculated and divided into two scenarios: 168 * The specific bandwidth value of the multicast flow exists, and the 169 ratio of the current multicast flow bandwidth value to the link 170 bandwidth weight is calculated according to the bandwidth weight of 171 each multi-homed link, and the link with the smallest ratio is 172 elected as the new multicast flow DF. 174 * The specific bandwidth value of the multicast flow does not exist, 175 and the ratio of the current multicast flow state quantity to the 176 link bandwidth weight is calculated according to the bandwidth 177 weight of each multi-homed link, and the link with the smallest 178 ratio is elected as the new multicast flow DF. 180 In particular, if there are multiple PEs with the same calculated 181 ratio, the DF is elected according to the method of maximum 182 bandwidth weight of the link or maximum IP address of the EVPN peer. 184 Since [I-D.ietf-idr-link-bandwidth] defines the link bandwidth 185 extended community, it can be reused to transfer the link bandwidth 186 value of the local ES to other multi-homed PEs, so that each PE can 187 calculate the bandwidth weight ratio of each link of the ES in 188 advance. 190 2.1. DF Election Based on Bandwidth 192 Each PE obtains the link bandwidth values of the other multi-homed 193 PEs in the same EVPN instance on a given ES according to the 194 extended community of the Link bandwidth, and calculates the link 195 bandwidth weight ratio, for example W1:W2:...:Wn for N multi-homed 196 PEs. 198 When the CE sends an IGMP or PIM join to one of the PEs, like PE1, 199 PE1 advertises the PE2, PE3, ... and PEn by the EVPN IGMP/PIM Join 200 Synch route defined in [I-D.ietf-bess-evpn-igmp-mld-proxy] and [I- 201 D.skr-bess-evpn-pim-proxy]. If PE2, PE3, ... or PEn receives an IGMP 202 or PIM join, the procedure will be the same. 204 Each PE calculates the ratio of the current multicast flows 205 bandwidth to the link bandwidth weight. The one PE in PE1, PE2, ... 206 and PEn, which has the smallest ratio, is elected as the DF of the 207 new multicast flow. When the smallest ratios of more than one PE are 208 the same, the PE with the maximum bandwidth weight of the link or 209 the maximum EVPN peer IP address is elected as the DF. 211 2.2. DF Election Based on State Qunatity 213 The procedure is almost the same as described in section 2.1. The 214 only difference is that each PE calculates the ratio of the current 215 number of multicast states instead of the bandwidth to the link 216 bandwidth weight because of lacking specific bandwidth value of the 217 multicast flows. 219 2.3. Inconsistent Timing between Multi-homed PEs 221 As a result of the same multicast join, only one of the multi-homed 222 PEs can receive the multicast join message and advertise the EVPN 223 Join Synch route (Type 7). The other PEs need to install the new 224 multicast join state according to the received Synch route. 226 The inconsistent processing timing of the same multicast group 227 joining process between PEs may cause electing different DFs. For 228 example: 230 * Multicast group G1, G2, and G3 join packets are sent from the CE 231 to PE1, PE2 and PE3. 233 * PE1 calculates the DF of G1, while PE2 calculates the DF of G2, 234 and PE calculates the DF of G3, and at this moment each PE has not 235 received the EVPN Join Synch route. 237 * PE1, PE2 and PE3 select the link on the same ES to the CE using 238 the algorithm as described in section 2.1 or 2.2, and the same DF 239 may be elected for G1, G2, and G3. 241 * After receiving the EVPN Join Synch route sent by PE2, PE1 may 242 calculate the DF of G2 as PE3, which is inconsistent with the 243 calculation result of PE2. 245 The DF calculation results of the PEs are inconsistent, which may 246 result in multiple flows or traffic interruptions of the same 247 multicast flow state. Therefore, EVPN Join Synch routes need to 248 carry elected DF information in the route advertisement as the 249 extended community called Multicast DF Extended Community, which can 250 make the DF information for a given multicast flow state between PEs 251 consistent. The actual effect is that the PE that receives the 252 multicast join packet completes the calculation of the DF election 253 and notifies other PEs on the same ES. 255 2.4. Increase or Decrease of Multi-homed PEs 257 2.4.1. Decrease of Multi-homed PEs 259 When one of the multi-homed PEs on the same ES fails or is shut down 260 for maintenance reasons, because the other PEs have received the 261 synch routes of all the multicast flows, the multicast flows 262 destined to the failed PE need to be in a specific order (for 263 example, the group and source address ascending order) to reassign 264 the DF. The DF election calculation based on the multicast flows 265 bandwidth, or the number of multicast states, is completed by one of 266 the specified multi-homing PEs, and the specified calculated PE can 267 be selected according to the link bandwidth weight value or the IP 268 address of the EVPN peer. The specified PE needs to advertise each 269 DF election result of the multicast flow that belongs to the 270 original faulty PE to the other multi-homed PEs that belong to the 271 same ES by the EVPN Join Synch route carrying the Multicast DF 272 Extended Community. 274 If a new multicast join is received in the above calculation 275 process, the DF election calculation of the new multicast flow is 276 still completed by the PE receiving the multicast join packet. 277 Similarly, the PE needs to advertise the DF information to other 278 multi-homed PEs belonging to the same ES by the EVPN Join Synch 279 route carrying the Multicast DF Extended Community. 281 2.4.2. Increase of Multi-homed PEs 283 One multi-homing PE of the same ES is added, and no active 284 adjustment can be performed. The DF of the subsequent new multicast 285 flow is elected according to the algorithm of this document. The new 286 multicast flow must be preferentially assigned to the new PE, and 287 finally the multicast flows on the PEs of the same ES are 288 approximately equalized. 290 If active adjustment is required, consider calculating the ratio 291 using the algorithm as described in section 2.1 and 2.2. Each time 292 the multicast entries in the PE, whose ratio of the existing multi- 293 homed PE is the largest, are migrated to the new PE. The multicast 294 entries are migrated in descending order of multicast flow bandwidth 295 or in ascending order of the group and source address until the 296 ratio of the new PE is greater than the existing smallest ratio of 297 other multi-homed PEs. 299 The calculation of the active adjustment is still performed by one 300 specific PE among the multi-homed PEs. The specified calculated PE 301 can be selected according to the link bandwidth weight value or the 302 IP address of the EVPN peer. 304 After the new PE is started, in the synchronization process of all 305 the multicast entries of other multi-homed PEs, the existing 306 multicast join packet may be received on the new PE. To avoid having 307 the existing multicast join appear as a new multicast join, and 308 recalculating the DF and notifying the other PEs belonging to the 309 same ES, it is necessary to start a timer to suppress the 310 synchronization process from the new PE to other existing PE's. The 311 timer range should also be configured. 313 3. BGP Encoding 315 3.1. DF Election Extended Community 317 [RFC8584] defines an extended community, which would be used for 318 multi-homed PEs to reach a consensus as to which DF election 319 procedure is desired. A PE can notify other participating PEs its DF 320 election capability by signaling a DF election extended community 321 along with Ethernet-Segment Route (Type-4). The current document 322 extends the existing extended community defined in [RFC8584]. This 323 document defines a new DF type. 325 o DF type (1 octet) - Encodes the DF Election algorithm values 326 (between 0 and 255) that the advertising PE desires to use for the 327 ES. 329 * Type TBD1: Based on bandwidth of multicast flow DF 330 election(detailed in this document) 332 * Type TBD2: Based on quantity of multicast flow state DF 333 election(detailed in this document) 335 3.2. Multicast DF Extended Community 337 This document defines a new extended community in EVPN Type 7 route 338 to notify other multi-homed PEs the elected DF of a given multicast 339 flow. The new extended community is called Multicast DF Extended 340 Community and it belongs to the transitive extended community. The 341 type is to be assigned. It is used to carry DF information of a 342 given (S,G) or (*,G) multicast flow selection. The role of this 343 extended community has been described in sections 2.3 and 2.4. 345 0 1 2 3 346 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 347 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 348 | Type=0x06 | Sub-Type=TBD3 | Reserved | DF Length | 349 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 350 | DF IP Address(Variable) | 351 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 353 o Type is 0x06 as registered with IANA for EVPN Extended Communities 355 o Sub-Type: TBD3 357 o DF Length: the length of the DF IP Address field, 4 octets for 358 IPv4 address, 16 octets for IPv6 address 360 o DF IP Address: the elected DF IP address of the given (S,G) or 361 (*,G) route in the EVPN type 7 route 363 4. Security Considerations 365 For general EVPN Security Considerations, see [RFC7432]. 367 TBD 369 5. IANA Considerations 371 TBD 373 6. References 375 6.1. Normative References 377 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 378 Requirement Levels", BCP 14, RFC 2119, March 1997. 380 [RFC7432] A. Sajassi, Ed., R. Aggarwal, N. Bitar, A. Isaac, J. 381 Uttaro, J. Drake, and W. Henderickx, "BGP MPLS-Based 382 Ethernet VPN", RFC 7432, February 2015 384 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 385 2119 Key Words", BCP 14, RFC 8174, May 2017 387 [RFC8584] J. Rabadan Ed., S. Mohanty, Ed., A. Sajassi, J. Drake, K. 388 Nagaraj and S. Sathappan, " Framework for Ethernet VPN 389 Designated Forwarder Election Extensibility ", RFC8584, 390 April 2019. 392 [I-D.ietf-bess-evpn-per-mcast-flow-df-election] Ali Sajassi, 393 Mankamana Mishra, Samir Thoria, Jorge Rabadan and John 394 Drake, " Per multicast flow Designated Forwarder Election 395 for EVPN ", August 2020, work-in-progress, draft-ietf- 396 bess-evpn-per-mcast-flow-df-election-04. 398 [I-D.ietf-idr-link-bandwidth] P. Mohapatra and R. Fernando, " BGP 399 Link Bandwidth Extended Community ", March 2018, expired, 400 draft-ietf-idr-link-bandwidth-07. 402 [I-D.ietf-bess-evpn-igmp-mld-proxy] Ali Sajassi, Samir Thoria, Keyur 403 Patel, John Drake and Wen Lin, "IGMP and MLD 404 Proxy for EVPN", January 2021, work-in-progress, draft- 405 ietf-bess-evpn-igmp-mld-proxy-06. 407 [I-D.skr-bess-evpn-pim-proxy] J. Rabadan, Ed., J. Kotalwar, S. 408 Sathappan, Z. Zhang and A. Sajassi, "PIM Proxy in EVPN 409 Networks", October 2017, expired, draft-skr-evpn-bess-pim- 410 proxy-01. 412 6.2. Informative References 414 TBD 416 7. Acknowledgments 418 The authors would like to thank the following for their valuable 419 contributions of this document: 421 TBD 423 Authors' Addresses 425 Yisong Liu 426 China Mobile 428 Email: liuyisong@chinamobile.com 430 Mike McBride 431 Futurewei Inc. 433 Email: michael.mcbride@futurewei.com 435 Zheng(Sandy) Zhang 436 ZTE Corporation 438 Email: zhang.zheng@zte.com.cn 440 Jingrong Xie 441 Huawei Technologies 443 Email: xiejingrong@huawei.com