idnits 2.17.00 (12 Aug 2021) /tmp/idnits54105/draft-ietf-pim-drlb-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 3, 2020) is 862 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Cai 3 Internet-Draft H. Ou 4 Intended status: Standards Track Alibaba Group 5 Expires: July 6, 2020 S. Vallepalli 6 M. Mishra 7 S. Venaas 8 Cisco Systems, Inc. 9 A. Green 10 British Telecom 11 January 3, 2020 13 PIM Designated Router Load Balancing 14 draft-ietf-pim-drlb-15 16 Abstract 18 On a multi-access network, one of the PIM-SM (PIM Sparse Mode) 19 routers is elected as a Designated Router. One of the 20 responsibilities of the Designated Router is to track local multicast 21 listeners and forward data to these listeners if the group is 22 operating in PIM-SM. This document specifies a modification to the 23 PIM-SM protocol that allows more than one of the PIM-SM routers to 24 take on this responsibility so that the forwarding load can be 25 distributed among multiple routers. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on July 6, 2020. 44 Copyright Notice 46 Copyright (c) 2020 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (https://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 5 64 4. Functional Overview . . . . . . . . . . . . . . . . . . . . . 5 65 4.1. GDR Candidates . . . . . . . . . . . . . . . . . . . . . 6 66 5. Protocol Specification . . . . . . . . . . . . . . . . . . . 7 67 5.1. Hash Mask and Hash Algorithm . . . . . . . . . . . . . . 7 68 5.2. Modulo Hash Algorithm . . . . . . . . . . . . . . . . . . 8 69 5.2.1. Modulo Hash Algorithm Examples . . . . . . . . . . . 9 70 5.2.2. Limitations . . . . . . . . . . . . . . . . . . . . . 10 71 5.3. PIM Hello Options . . . . . . . . . . . . . . . . . . . . 11 72 5.3.1. PIM DR Load Balancing Capability (DRLB-Cap) Hello 73 Option . . . . . . . . . . . . . . . . . . . . . . . 11 74 5.3.2. PIM DR Load Balancing List (DRLB-List) Hello Option . 12 75 5.4. PIM DR Operation . . . . . . . . . . . . . . . . . . . . 13 76 5.5. PIM GDR Candidate Operation . . . . . . . . . . . . . . . 14 77 5.6. DRLB-List Hello Option Processing . . . . . . . . . . . . 14 78 5.7. PIM Assert Modification . . . . . . . . . . . . . . . . . 15 79 5.8. Backward Compatibility . . . . . . . . . . . . . . . . . 16 80 6. Operational Considerations . . . . . . . . . . . . . . . . . 16 81 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 82 7.1. Initial registry . . . . . . . . . . . . . . . . . . . . 17 83 7.2. Assignment of new Hash Algorithms . . . . . . . . . . . . 17 84 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 85 9. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 18 86 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 87 10.1. Normative References . . . . . . . . . . . . . . . . . . 18 88 10.2. Informative References . . . . . . . . . . . . . . . . . 19 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 91 1. Introduction 93 On a multi-access LAN, such as an Ethernet, with one or more PIM-SM 94 (PIM Sparse Mode) [RFC7761] routers, one of the PIM-SM routers is 95 elected as a Designated Router (DR). The PIM DR has two 96 responsibilities in the PIM-SM protocol. For any active sources on a 97 LAN, the PIM DR is responsible for registering with the Rendezvous 98 Point (RP) if the group is operating in PIM-SM. Also, the PIM DR is 99 responsible for tracking local multicast listeners and forwarding to 100 these listeners if the group is operating in PIM-SM. 102 Consider the following LAN in Figure 1: 104 (core networks) 105 | | | 106 | | | 107 R1 R2 R3 108 | | | 109 ----(LAN)---- 110 | 111 | 112 (many receivers) 114 Figure 1: LAN with receivers 116 Assume R1 is elected as the DR. According to the PIM-SM protocol, R1 117 will be responsible for forwarding traffic to that LAN on behalf of 118 all local members. In addition to keeping track of membership 119 reports, R1 is also responsible for initiating the creation of source 120 and/or shared trees towards the senders or the RPs. The membership 121 reports would be IGMP or MLD messages. This applies to any versions 122 of the IGMP and MLD protocols. The most recent versions are IGMPv3 123 [RFC3376] and MLDv2 [RFC3810]. 125 Having a single router acting as DR and being responsible for data 126 plane forwarding leads to several issues. One of the issues is that 127 the aggregated bandwidth will be limited to what R1 can handle with 128 regards to capacity of incoming links, the interface on the LAN, and 129 total forwarding capacity. It is very common that a LAN consists of 130 switches that run IGMP/MLD or PIM snooping [RFC4541]. This allows 131 the forwarding of multicast packets to be restricted only to segments 132 leading to receivers that have indicated their interest in multicast 133 groups using either IGMP or MLD. The emergence of the switched 134 Ethernet allows the aggregated bandwidth to exceed, sometimes by a 135 large number, that of a single link. For example, let us modify 136 Figure 1 and introduce an Ethernet switch in Figure 2. 138 (core networks) 139 | | | 140 | | | 141 R1 R2 R3 142 | | | 143 +=gi1===gi2===gi3=+ 144 + + 145 + switch + 146 + + 147 +=gi4===gi5===gi6=+ 148 | | | 149 H1 H2 H3 151 Figure 2: LAN with Ethernet Switch 153 Let us assume that each individual link is a Gigabit Ethernet. Each 154 router, R1, R2 and R3, and the switch have enough forwarding capacity 155 to handle hundreds of Gigabits of data. 157 Let us further assume that each of the hosts requests 500 Mbps of 158 unique multicast data. This totals to 1.5 Gbps of data, which is 159 less than what each switch or the combined uplink bandwidth across 160 the routers can handle, even under failure of a single router. 162 On the other hand, the link between R1 and switch, via port gi1, can 163 only handle a throughput of 1Gbps. And if R1 is the only DR (the PIM 164 DR elected using the procedure defined by [RFC7761]) at least 500 165 Mbps worth of data will be lost because the only link that can be 166 used to draw the traffic from the routers to the switch is via gi1. 167 In other words, the entire network's throughput is limited by the 168 single connection between the PIM DR and the switch (or LAN as in 169 Figure 1). 171 Another important issue is related to failover. If R1 is the only 172 forwarder on a shared LAN, when R1 goes out of service, multicast 173 forwarding for the entire LAN has to be rebuilt by the newly elected 174 PIM DR. However, if there were a way that allowed multiple routers 175 to forward to the LAN for different groups, failure of one of the 176 routers would only lead to disruption to a subset of the flows, 177 therefore improving the overall resilience of the network. 179 This document specifies a modification to the PIM-SM protocol that 180 allows more than one of these routers, called Group Designated 181 Routers (GDR) to be selected so that the forwarding load can be 182 distributed among a number of routers. 184 2. Terminology 186 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 187 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 188 "OPTIONAL" in this document are to be interpreted as described in BCP 189 14 [RFC2119] [RFC8174] when, and only when, they appear in all 190 capitals, as shown here. 192 With respect to PIM-SM, this document follows the terminology that 193 has been defined in [RFC7761]. 195 This document also introduces the following new acronyms: 197 o GDR: Group Designated Router. For each multicast flow, either a 198 (*,G) for Any-Source Multicast (ASM), or an (S,G) for Source- 199 Specific Multicast (SSM) [RFC4607], a Hash Algorithm (described 200 below) is used to select one of the routers as a GDR. The GDR is 201 responsible for initiating the forwarding tree building process 202 for the corresponding multicast flow. 204 o GDR Candidate: a router that has the potential to become a GDR. 205 There might be multiple GDR Candidates on a LAN, but only one can 206 become the GDR for a specific multicast flow. 208 3. Applicability 210 The extension specified in this document applies to PIM-SM routers 211 acting as last hop routers (there are directly connected receivers). 212 It does not alter the behavior of a PIM DR, or any other routers, on 213 the first hop network (directly connected sources). This is because 214 the source tree is built using the IP address of the sender, not the 215 IP address of the PIM DR that sends PIM registers towards the RP. 216 The load balancing between first hop routers can be achieved 217 naturally if an IGP provides equal cost multiple paths (which it 218 usually does in practice). Also distributing the load to do source 219 registration does not justify the additional complexity required to 220 support it. 222 4. Functional Overview 224 In the PIM DR election as defined in [RFC7761], when multiple routers 225 are connected to a multi-access LAN (for example, an Ethernet), one 226 of them is elected to act as PIM DR. The PIM DR is responsible for 227 sending local Join/Prune messages towards the RP or source. In order 228 to elect the PIM DR, each PIM router on the LAN examines the received 229 PIM Hello messages and compares its own DR priority and IP address 230 with those of its neighbors. The router with the highest DR priority 231 is the PIM DR. If there are multiple such routers, their IP 232 addresses are used as the tie-breaker, as described in [RFC7761]. 234 In order to share forwarding load among last hop routers, besides the 235 normal PIM DR election, one or more GDRs are elected on the multi- 236 access LAN. There is only one PIM DR on the multi-access LAN, but 237 there might be multiple GDR Candidates. 239 For each multicast flow, that is, (*,G) for ASM and (S,G) for SSM, a 240 Hash Algorithm [Section 5.1] is used to select one of the routers to 241 be the GDR. The new DR Load Balancing Capability (DRLB-Cap) PIM 242 Hello Option is used to announce the Capability as well as the Hash 243 Algorithm type. Routers with the new DRLB-Cap Option advertised in 244 their PIM Hello, using the same GDR election Hash Algorithm and the 245 same DR priority as the PIM DR, are considered as GDR Candidates. 247 Hash Masks are defined for Source, Group and RP separately, in order 248 to handle PIM ASM/SSM. The masks, as well as a sorted list of GDR 249 Candidate Addresses, are announced by the DR in a new DR Load 250 Balancing List (DRLB-List) PIM Hello Option. 252 A Hash Algorithm based on the announced Source, Group, or RP masks 253 allows one GDR to be assigned to a corresponding multicast state. 254 That GDR is responsible for initiating the creation of the multicast 255 forwarding tree for multicast traffic. 257 4.1. GDR Candidates 259 GDR is the new concept introduced by this specification. GDR 260 Candidates are routers eligible for GDR election on the LAN. To 261 become a GDR Candidate, a router must have the same DR priority and 262 run the same GDR election Hash Algorithm as the DR on the LAN. 264 For example, assume there are 4 routers on the LAN: R1, R2, R3 and 265 R4, each announcing a DRLB-Cap option. R1, R2 and R3 have the same 266 DR priority while R4's DR priority is less preferred. In this 267 example, R4 will not be eligible for GDR election, because R4 will 268 not become a PIM DR unless all of R1, R2 and R3 go out of service. 270 Furthermore, assume router R1 wins the PIM DR election, R1 and R2 271 advertise the same Hash Algorithm for GDR election, while R3 272 advertises a different one. In this case, only R1 and R2 will be 273 eligible for GDR election, while R3 will not. 275 As a DR, R1 will include its own Load Balancing Hash Masks and the 276 identity of R1 and R2 (the GDR Candidates) in its DRLB-List Hello 277 Option. 279 5. Protocol Specification 281 5.1. Hash Mask and Hash Algorithm 283 A Hash Mask is used to extract a number of bits from the 284 corresponding IP address field (32 for IPv4, 128 for IPv6) and 285 calculate a hash value. A hash value is used to select a GDR from 286 GDR Candidates advertised by the PIM DR. Hash masks allow for 287 certain flows to always be forwarded by the same GDR, by ignoring 288 certain bits in the hash value calculation, so that the hash values 289 are the same. For example, 0.0.255.0 defines a Hash Mask for an IPv4 290 address that masks the first, the second, and the fourth octets, 291 which means that only the third octet will influence the hash value 292 computed. Note that the masks need not be a contiguous set of bits. 293 E.g, for IPv4, 15.15.15.15 would be a valid mask. 295 In the text below, a hash mask is in some places said to be zero. A 296 hash mask is zero if no bits are set. That is, 0.0.0.0 for IPv4 and 297 :: for IPv6. Also, a hash mask is said to be an all-bits-set mask if 298 it is 255.255.255.255 for IPv4 or 299 ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff for IPv6. 301 There are three Hash Masks defined: 303 o RP Hash Mask 305 o Source Hash Mask 307 o Group Hash Mask 309 The hash masks need to be configured on the PIM routers that can 310 potentially become a PIM DR, unless the implementation provides 311 default hash mask values. An implementation SHOULD have default hash 312 mask values as follows. The default RP Hash Mask SHOULD be zero (no 313 bits set). The default Source and Group Hash Masks SHOULD both be 314 all-bits-set masks. These default values are likely acceptable for 315 most deployments, and simplify configuration. There is only a need 316 to use other masks if one needs to ensure that certain flows are 317 forwarded by the same GDR. 319 The DRLB-List Hello Option contains a list of GDR Candidates. The 320 first one listed has ordinal number 0, the second listed ordinal 321 number 1, and the last one has ordinal number N - 1 if there are N 322 candidates listed. The hash value computed will be the ordinal 323 number of the GDR Candidate that is acting as GDR for the flow in 324 question. 326 The input to be hashed is determined as follows: 328 o If the group is in ASM mode and the RP Hash Mask announced by the 329 PIM DR is not zero (at least one bit is set), calculate the value 330 of hashvalue_RP [Section 5.2] to determine the GDR. 332 o If the group is in ASM mode and the RP Hash Mask announced by the 333 PIM DR is zero (no bits are set), obtain the value of 334 hashvalue_Group [Section 5.2] to determine the GDR. 336 o If the group is in SSM mode, use hashvalue_SG [Section 5.2] to 337 determine the GDR. 339 A simple Modulo Hash Algorithm is defined in this document. However, 340 to allow another Hash Algorithms to be used, a 1-octet "Hash 341 Algorithm" field is included in the DRLB-Cap Hello Option to specify 342 the Hash Algorithm used by the router. 344 If different Hash Algorithms are advertised among the routers on a 345 LAN, only the routers advertising the same Hash Algorithm as the DR 346 (as well as having the same DR priority as the DR) are eligible for 347 GDR election. 349 5.2. Modulo Hash Algorithm 351 As part of computing the hash, the notation LSZC(hash_mask) is used 352 to denote the number of zeroes counted from the least significant bit 353 of a Hash Mask hash_mask. As an example, LSZC(255.255.128) is 7 and 354 also LSZC(ffff:8000::) is 111. If all bits are set, LSZC will be 0. 355 If the mask is zero, then LSZC will be 32 for IPv4, and 128 for IPv6. 357 The number of GDR Candidates is denoted as GDRC. 359 The idea behind the Modulo Hash Algorithm is in simple terms that the 360 corresponding mask is applied to a value, then the result is shifted 361 right LSZC(mask) bits so that the least significant bits that were 362 masked out are not considered. Then this result is masked by 363 0xffffffff, keeping only the last 32 bits of the result (this only 364 makes a difference for IPv6). Finally, the hash value is this result 365 modulo the number of GDR Candidates (GDRC). 367 The Modulo Hash Algorithm for computing the values hashvalue_RP, 368 hashvalue_Group and hashvalue_SG is defined as follows. 370 hashvalue_RP is calculated as: 372 (((RP_address & RP_mask) >> LSZC(RP_mask)) & 0xffffffff) % GDRC 374 RP_address is the address of the RP defined for the group and 375 RP_mask is the RP Hash Mask. 377 hashvalue_Group is calculated as: 379 (((Group_address & Group_mask) >> LSZC(Group_mask)) & 0xffffffff) 380 % GDRC 382 Group_address is the group address and Group_mask is the Group 383 Hash Mask. 385 hashvalue_SG is calculated as: 387 ((((Source_address & Source_mask) >> LSZC(Source_mask)) & 388 0xffffffff) ^ (((Group_address & Group_mask) >> LSZC(Group_mask)) 389 & 0xffffffff)) % GDRC 391 Group_address is the group address and Group_mask is the Group 392 Hash Mask. 394 5.2.1. Modulo Hash Algorithm Examples 396 To help illustrate the algorithm, consider this example. Router X 397 with IPv4 address 203.0.113.1 receives a DRLB-List Hello Option from 398 the DR, which announces RP Hash Mask 0.0.255.0 and a list of GDR 399 Candidates, sorted by IP addresses from high to low: 203.0.113.3, 400 203.0.113.2 and 203.0.113.1. The ordinal number assigned to those 401 addresses would be: 403 0 for 203.0.113.3; 1 for 203.0.113.2; 2 for 203.0.113.1 (Router X). 405 Assume there are 2 RPs: RP1 192.0.2.1 for Group1 and RP2 198.51.100.2 406 for Group2. Following the modulo Hash Algorithm: 408 LSZC(0.0.255.0) is 8 and GDRC is 3. The hashvalue_RP for Group1 with 409 RP RP1 is: 411 (((192.0.2.1 & 0.0.255.0) >> 8) & 0xffffffff % 3) = 2 % 3 = 2 413 which matches the ordinal number assigned to Router X. Router X will 414 be the GDR for Group1. 416 The hashvalue_RP for Group2 with RP RP2 is: 418 (((198.51.100.2 & 0.0.255.0) >> 8) & 0xffffffff % 3) = 100 % 3 = 1 419 which is different from the ordinal number of Router X (2). Hence, 420 Router X will not be GDR for Group2. 422 For IPv6 consider this example, similar to the above. Router X with 423 IPv6 address fe80::1 receives a DRLB-List Hello Option from the DR, 424 which announces RP Hash Mask ::ffff:ffff:ffff:0 and a list of GDR 425 Candidates, sorted by IP addresses from high to low: fe80::3, fe80::2 426 and fe80::1. The ordinal number assigned to those addresses would 427 be: 429 0 for fe80::3; 1 for fe80::2; 2 for fe80::1 (Router X). 431 Assume there are 2 RPs: RP1 2001:db8::1:0:5678:1 for Group1 and RP2 432 2001:db8::1:0:1234:2 for Group2. Following the modulo Hash 433 Algorithm: 435 LSZC(::ffff:ffff:ffff:0) is 16 and GDRC is 3. The hashvalue_RP for 436 Group1 with RP RP1 is: 438 (((2001:db8::1:0:5678:1 & ::ffff:ffff:ffff:0) >> 16) & 0xffffffff % 439 3) = ((::1:0:5678:0 >> 16) & 0xffffffff % 3) = (::1:0:5678 & 440 0xffffffff % 3) = ::5678 % 3 = 2 442 which matches the ordinal number assigned to Router X. Router X will 443 be the GDR for Group1. 445 The hashvalue_RP for Group2 with RP RP2 is: 447 (((2001:db8::1:0:1234:1 & ::ffff:ffff:ffff:0) >> 16) & 0xffffffff % 448 3) = ((::1:0:1234:0 >> 16) & 0xffffffff % 3) = (::1:0:1234 & 449 0xffffffff % 3) = ::1234 % 3 = 1 451 which is different from the ordinal number of Router X (2). Hence, 452 Router X will not be GDR for Group2. 454 5.2.2. Limitations 456 The Modulo Hash Algorithm has poor failover characteristics when a 457 shared LAN has more than two GDRs. In the case of more than two GDRs 458 on a LAN, when one GDR fails, all of the groups may be reassigned to 459 a different GDR, even if they were not assigned to the failed GDR. 460 However, many deployments use only two routers on a shared LAN for 461 redundancy purposes. Future work may define new Hash Algorithms 462 where only groups assigned to the failed GDR get reassigned. 464 The Modulo Hash Algorithm will use at most 32 consecutive bits of the 465 input addresses for its computation. Exactly which bits are used of 466 the source, group or RP addresses, depend on the respective masks. 468 This limitation may be an issue for IPv6 deployments, since not all 469 bits of the IPv6 addresses are considered. If this causes 470 operational issues, a new hash algorithm would need to be defined. 472 5.3. PIM Hello Options 474 PIM routers include a new option, called "Load Balancing Capability 475 (DRLB-Cap)" in their PIM Hello messages. 477 Besides this DRLB-Cap Hello Option, the elected PIM DR also includes 478 a new "DR Load Balancing List (DRLB-List) Hello Option". The DRLB- 479 List Hello Option consists of three Hash Masks as defined above and 480 also a list of GDR Candidate addresses on the LAN. It is recommended 481 that the GDR Candidate addresses are sorted in descending order. 482 This ensures that when using algorithms such as the Modulo algorithm 483 in this document, that it is predictable which GDR is responsible for 484 which groups, regardless of the order the DR learned about the 485 candidates. 487 5.3.1. PIM DR Load Balancing Capability (DRLB-Cap) Hello Option 489 0 1 2 3 490 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 491 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 492 | Type = 34 | Length = 4 | 493 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 494 | Reserved |Hash Algorithm | 495 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 Figure 3: PIM DR Load Balancing Capability Hello Option 499 Type: 34 501 Length: 4 503 Reserved: Transmitted as zero, ignored on receipt. 505 Hash Algorithm: Hash Algorithm type. A value listed in the IANA 506 Designated Router Load Balancing Hash Algorithms registry. 0 is 507 used for the Modulo algorithm defined in this document. 509 This DRLB-Cap Hello Option MUST be advertised by routers on all 510 interfaces where DR Load Balancing is enabled. Note that the option 511 is included at most once. 513 5.3.2. PIM DR Load Balancing List (DRLB-List) Hello Option 515 0 1 2 3 516 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 518 | Type = 35 | Length | 519 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 520 | Group Mask | 521 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 522 | Source Mask | 523 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 524 | RP Mask | 525 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 526 | GDR Candidate Address(es) | 527 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 529 Figure 4: PIM DR Load Balancing List Hello Option 531 Type: 35 533 Length: (3 + n) x (4 or 16) bytes, where n is the number of GDR 534 candidates. 536 Group Mask (32/128 bits): Mask applied to group addresses as part 537 of hash computation. 539 Source Mask (32/128 bits): Mask applied to source addresses as 540 part of hash computation. 542 RP Mask (32/128 bits): Mask applied to RP addresses as part of 543 hash computation. 545 All masks MUST have the same number of bits as the IP source 546 address in the PIM Hello IP header. 548 GDR Candidate Address(es) (32/128 bits): List of GDR Candidate(s) 550 All addresses MUST be in the same address family as the PIM 551 Hello IP header. It is recommended that the addresses are 552 sorted in descending order. 554 If the "Interface ID" option, as specified in [RFC6395], is 555 present in a GDR Candidate's PIM Hello message, and the "Router 556 Identifier" portion is non-zero: 558 + For IPv4, the "GDR Candidate Address" will be set directly 559 to the "Router Identifier". 561 + For IPv6, the "GDR Candidate Address" will be 96 bits of 562 zeroes followed by the 32 bit Router Identifier. 564 If the "Interface ID" option is not present in a GDR Candidate' 565 PIM Hello message, or if the "Interface ID" option is present 566 but the "Router Identifier" field is zero, the "GDR Candidate 567 Address" will be the IPv4 or IPv6 source address of the PIM 568 Hello message. 570 This DRLB-List Hello Option MUST only be advertised by the 571 elected PIM DR. It MUST be ignored if received from a non-DR. 572 The option MUST also be ignored if the hash masks are not the 573 correct number of bits, or GDR Candidate addresses are in the 574 wrong address family. 576 5.4. PIM DR Operation 578 The DR election process is still the same as defined in [RFC7761]. 579 The DR advertises the new DRLB-List Hello Option, which contains mask 580 values from user configuration (or default values), followed by a 581 list of GDR Candidate Addresses. Note that if a router included the 582 "Interface ID" option in the hello message, and the Router ID is non- 583 zero, the Router ID will be used to form the GDR Candidate address of 584 the router, as discussed in the previous section. It is recommended 585 that the list be sorted, from the highest value to the lowest value. 586 The reason for sorting the list is to make the behavior 587 deterministic, regardless of the order in which the DR learns of new 588 candidates. Note that, as for non-DR routers, the DR also advertises 589 the DRLB-Cap Hello Option to indicate its ability to support the new 590 functionality and the type of GDR election Hash Algorithm it uses. 592 If a PIM DR receives a neighbor DRLB-Cap Hello Option, which contains 593 the same Hash Algorithm as the DR, and the neighbor has the same DR 594 priority as the DR, PIM DR SHOULD consider the neighbor as a GDR 595 Candidate and insert the GDR Candidate' Address into the list of the 596 DRLB-List Option. However, the DR may have policies limiting which 597 GDR Candidates, or the number of GDR Candidates to include. 598 Likewise, the DR SHOULD include itself in the list of GDR Candidates, 599 but it is permissible not to do so, if for instance there is some 600 policy restricting the candidate set. 602 If a PIM neighbor included in the list expires, stops announcing the 603 DRLB-Cap Hello Option, changes DR priority, changes Hash Algorithm or 604 otherwise becomes ineligible as a candidate, the DR SHOULD 605 immediately send a triggered hello with a new list in the DRLB-List 606 option, excluding the neighbor. 608 If a new router becomes eligible as a candidate, there is no urgency 609 in sending out an updated list. An updated list SHOULD be included 610 in the next hello. 612 5.5. PIM GDR Candidate Operation 614 When an IGMP/MLD report is received, a Hash Algorithm is used by the 615 GDR Candidates to determine which router is going to be responsible 616 for building forwarding trees on behalf of the host. 618 The router MUST include the DRLB-Cap Hello Option in all PIM Hello 619 messages sent on the interface. Note that the presence of the DRLB- 620 Cap Option in the PIM Hello does not guarantee that the router will 621 be considered as a GDR candidate. Once the DR election is done, the 622 DRLB-List Hello Option is received from the current PIM DR containing 623 a list of the selected GDRs Candidates. 625 A router only acts as a GDR Candidate if it is included in the GDR 626 Candidate list of the DRLB-List Hello Option. See next section for 627 details. 629 5.6. DRLB-List Hello Option Processing 631 This section discusses processing of the DRLB-List Hello Option, 632 including the case where it was received in the previous hello, but 633 not in the current hello. All routers MUST ignore the DRLB-List 634 Hello Option if it is received from a PIM router which is not the DR. 635 The option MUST only be processed by routers that are announcing the 636 DRLB-Cap Option, and only if the Hash Algorithm announced by the DR 637 is the same as the local announcement. All GDR Candidates MUST use 638 the Hash Masks advertised in the Option, even if they differ from 639 those the candidate was configured with. The DR MUST also process 640 its own DRLB-List Hello Option. 642 A router stores the latest option contents that was announced, if 643 any, and deletes the previous contents. The router MUST also compare 644 the new contents with any previous contents, and if there are any 645 changes, continue processing as below. Note that if the option does 646 not pass the above checks, the below processing MUST be done as if 647 the option was not announced. 649 If the contents of the DRLB-List Option, the masks or the candidate 650 list, differs from the previously saved copy, it is received for the 651 first time, or it is no longer being received or accepted, the option 652 MUST be processed as below. 654 1. If the local router is included in the GDR Candidate Address(es) 655 field (it will look for its own address, or its Router ID if it 656 announces a non-zero Router ID), for each of the groups, or 657 source and group pairs if the group is in SSM mode, with local 658 receiver interest, the router MUST run the Hash Algorithm to 659 determine which of them it is the GDR for. 661 If there is no change in the GDR status, then no further 662 action is required. 664 If the router becomes the new GDR, then a multicast forwarding 665 tree MUST be built [RFC7761]. 667 If the router is no longer the GDR, then it uses an Assert as 668 explained in [Section 5.7]. 670 2. If the local router is not included in the GDR Candidate 671 Address(es) field, or if the DRLB-List Hello Option is no longer 672 included in the DR's Hello, or if the DR's Neighbor Liveness 673 Timer expires [RFC7761], for each of the groups, or source and 674 group pairs if the group is in SSM mode, with local receiver 675 interest, for which the router is the GDR, it uses an Assert as 676 explained in [Section 5.7]. 678 5.7. PIM Assert Modification 680 GDR changes may occur due to configuration change, due to GDR 681 candidates going down, and also new routers coming up and becoming 682 GDR candidates. This may occur while flows are being forwarded. If 683 the GDR for an active flow changes, there is likely to be some 684 disruption, such as packet loss or duplicates. By using asserts, 685 packet loss is minimized, while allowing a small amount of 686 duplicates. 688 When a router stops acting as the GDR for a group, or source and 689 group pair if SSM, it MUST set the Assert metric preference to 690 maximum (0x7fffffff) and the Assert metric to one less than maximum 691 (0xfffffffe). That is, whenever it sends or receives an Assert for 692 the group, it must use these values as the metric preference and 693 metric rather than the values provided by the unicast routing 694 protocol. 696 The rest of this section is just for illustration purposes and not 697 part of the protocol definition. 699 To illustrate the behavior when there is a GDR change, consider the 700 following scenario where there are two flows G1 and G2. R1 is the 701 GDR for G1, and R2 is the GDR for G2. When R3 comes up, it is 702 possible that R3 becomes GDR for both G1 and G2, hence R3 starts to 703 build the forwarding tree for G1 and G2. If R1 and R2 stop 704 forwarding before R3 completes the process, packet loss might occur. 705 On the other hand, if R1 and R2 continue forwarding while R3 is 706 building the forwarding trees, duplicates might occur. 708 When the role of GDR changes as above, instead of immediately 709 stopping forwarding, R1 and R2 continue forwarding to G1 and G2 710 respectively, while, at the same time, R3 build forwarding trees for 711 G1 and G2. This will lead to PIM Asserts. 713 For G1, using the functionality described in this document, R1 and R3 714 determine the new GDR, which is R3. With the modified Assert 715 behavior, R1 sets its Assert metric to the near maximum value 716 discussed above. That will make R3, which has normal metric in its 717 Assert as the Assert winner. 719 5.8. Backward Compatibility 721 In the case of a hybrid Ethernet shared LAN (where some PIM routers 722 support the functionality defined in this document, and some do not); 724 o If the DR does not support the new functionality, then there will 725 be no load-balancing. 727 o If non-DR routers do not support the new functionality, they will 728 not be considered as Candidate GDRs and it will not take part in 729 load-balancing. Load-balancing may still happen on the link. 731 6. Operational Considerations 733 An administrator needs to consider what the total bandwidth 734 requirements are and find a set of routers that together has enough 735 available capacity, while making sure that each of the routers can 736 handle its part, assuming that the traffic is distributed roughly 737 equally among the routers. Ideally, one should also have enough 738 bandwidth to handle the case where at least one router fails. All 739 routers should have reachability to the sources, and RPs if 740 applicable, that is not via the LAN. 742 Care must be taken when choosing what hash masks to configure. One 743 would typically configure the same masks on all the routers, so that 744 they are the same, regardless of which router is elected as DR. The 745 default masks are likely suitable for most deployment. The RP Hash 746 Mask must be configured (the default is no bits set) if one wishes to 747 hash based on the RP address rather than the group address for ASM. 748 The default masks will use the entire group addresses, and source 749 addresses if SSM, as part of the hash. An administrator may set 750 other masks that masks out part of the addresses to ensure that 751 certain flows always get hashed to the same router. How this is 752 achieved depends on how the group addresses are allocated. 754 Only the routers announcing the same Hash Algorithm as the DR would 755 be considered as GDR candidates. Network administrators need to make 756 sure that the desired set of routers announce the same algorithm. 757 Migration between different algorithms is not considered in this 758 document. 760 7. IANA Considerations 762 IANA has temporarily assigned type 34 for the PIM DR Load Balancing 763 Capability (DRLB-Cap) Hello Option, and type 35 for the PIM DR Load 764 Balancing List (DRLB-List) Hello Option in the PIM-Hello Options 765 registry. IANA is requested to make these assignments permanent when 766 this document is published as an RFC. Note that the option names 767 have changed slightly since the temporary assignments were made. 768 Also, the length of option 34 is always 4, the registry currently 769 says it is variable. 771 This document requests IANA to create a registry called "Designated 772 Router Load Balancing Hash Algorithms" in the "Protocol Independent 773 Multicast (PIM)" branch of the registry tree. The registry lists 774 Hash Algorithms for use by PIM Designated Router Load Balancing. 776 7.1. Initial registry 778 The initial content of the registry should be as follows. 780 Type Name Reference 781 ------ ---------------------------------------- -------------------- 782 0 Modulo This document 783 1-255 Unassigned 785 7.2. Assignment of new Hash Algorithms 787 Assignment of new Hash Algorithms is done according to the "IETF 788 Review" model, see [RFC8126]. 790 8. Security Considerations 792 Security of the new DR Load Balancing PIM Hello Options is only 793 guaranteed by the security of PIM Hello messages, so the security 794 considerations for PIM Hello messages as described in PIM-SM 795 [RFC7761] apply here. 797 If the DR is subverted it could omit or add certain GDRs or announce 798 an unsupported algorithm. If another router is subverted, it could 799 be made DR and cause similar issues. While these issues are specific 800 to this specification, they are not that different from existing 801 attacks such as subverting a DR and lowering the DR priority, causing 802 a different router to become the DR. 804 If for any reason, the DR includes a GDR in the announced list which 805 announces a different algorithm from what the DR announces, the GDR 806 is required to ignore the announcement, and there will be no router 807 acting as the DR for the flows that hash to that GDR. 809 If a GDR is subverted, it could potentially be made to stop 810 forwarding all the traffic it is expected to forward. This is also 811 similar today to if a DR is subverted. 813 An administrator may be able to achieve the desired load-balancing of 814 known flows, but an attacker may send a single high rate flow which 815 is served by a single GDR, or send multiple flows that are expected 816 to be hashed to the same GDR. 818 9. Acknowledgement 820 The authors would like to thank Steve Simlo and Taki Millonis for 821 helping with the original idea; Alia Atlas, Bill Atwood, Joe Clarke, 822 Alissa Cooper, Jake Holland, Bharat Joshi, Anish Kachinthaya, Anvitha 823 Kachinthaya, Benjamin Kaduk, Mirja Kuhlewind, Barry Leiba, Ben Niven- 824 Jenkins, Alvaro Retana, Adam Roach, Michael Scharf, Eric Vyncke and 825 Carl Wallace for reviews and comments; and Toerless Eckert and 826 Rishabh Parekh for helpful conversation on the document. 828 10. References 830 10.1. Normative References 832 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 833 Requirement Levels", BCP 14, RFC 2119, 834 DOI 10.17487/RFC2119, March 1997, 835 . 837 [RFC6395] Gulrajani, S. and S. Venaas, "An Interface Identifier (ID) 838 Hello Option for PIM", RFC 6395, DOI 10.17487/RFC6395, 839 October 2011, . 841 [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., 842 Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent 843 Multicast - Sparse Mode (PIM-SM): Protocol Specification 844 (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 845 2016, . 847 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 848 Writing an IANA Considerations Section in RFCs", BCP 26, 849 RFC 8126, DOI 10.17487/RFC8126, June 2017, 850 . 852 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 853 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 854 May 2017, . 856 10.2. Informative References 858 [RFC3376] Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A. 859 Thyagarajan, "Internet Group Management Protocol, Version 860 3", RFC 3376, DOI 10.17487/RFC3376, October 2002, 861 . 863 [RFC3810] Vida, R., Ed. and L. Costa, Ed., "Multicast Listener 864 Discovery Version 2 (MLDv2) for IPv6", RFC 3810, 865 DOI 10.17487/RFC3810, June 2004, 866 . 868 [RFC4541] Christensen, M., Kimball, K., and F. Solensky, 869 "Considerations for Internet Group Management Protocol 870 (IGMP) and Multicast Listener Discovery (MLD) Snooping 871 Switches", RFC 4541, DOI 10.17487/RFC4541, May 2006, 872 . 874 [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for 875 IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, 876 . 878 Authors' Addresses 880 Yiqun Cai 881 Alibaba Group 883 Email: yiqun.cai@alibaba-inc.com 884 Heidi Ou 885 Alibaba Group 887 Email: heidi.ou@alibaba-inc.com 889 Sri Vallepalli 890 Cisco Systems, Inc. 891 3625 Cisco Way 892 San Jose CA 95134 893 USA 895 Email: svallepa@cisco.com 897 Mankamana Mishra 898 Cisco Systems, Inc. 899 821 Alder Drive, 900 Milpitas CA 95035 901 USA 903 Email: mankamis@cisco.com 905 Stig Venaas 906 Cisco Systems, Inc. 907 Tasman Drive 908 San Jose CA 95134 909 USA 911 Email: stig@cisco.com 913 Andy Green 914 British Telecom 915 Adastral Park 916 Ipswich IP5 2RE 917 United Kingdom 919 Email: andy.da.green@bt.com