idnits 2.17.00 (12 Aug 2021) /tmp/idnits34450/draft-chen-lsvr-flood-reduction-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (13 November 2021) is 182 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC4721' is defined on line 800, but no explicit reference was found in the text == Unused Reference: 'RFC4760' is defined on line 805, but no explicit reference was found in the text == Unused Reference: 'RFC7938' is defined on line 810, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-lsr-dynamic-flooding' is defined on line 817, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-lsr-flooding-topo-min-degree' is defined on line 825, but no explicit reference was found in the text == Unused Reference: 'RFC8670' is defined on line 833, but no explicit reference was found in the text == Outdated reference: A later version (-16) exists of draft-ietf-lsvr-bgp-spf-15 ** Downref: Normative reference to an Informational RFC: RFC 7938 == Outdated reference: A later version (-10) exists of draft-ietf-lsr-dynamic-flooding-09 == Outdated reference: A later version (-04) exists of draft-ietf-lsr-flooding-topo-min-degree-02 Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Chen 3 Internet-Draft Futurewei 4 Intended status: Standards Track G. Mishra 5 Expires: 17 May 2022 Verizon Inc. 6 A. Wang 7 China Telecom 8 Y. Liu 9 China Mobile 10 H. Wang 11 Huawei 12 Y. Fan 13 Casa Systems 14 13 November 2021 16 BGP-SPF Flooding Reduction 17 draft-chen-lsvr-flood-reduction-01 19 Abstract 21 This document describes extensions to Border Gateway Protocol (BGP) 22 for flooding the link states on a topology that is a subgraph of the 23 complete topology of a BGP-SPF domain, so that the amount of flooding 24 traffic in the domain is greatly reduced. This would reduce 25 convergence time with a more stable and optimized routing 26 environment. 28 Requirements Language 30 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 31 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 32 document are to be interpreted as described in RFC 2119 [RFC2119]. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on 17 May 2022. 50 Copyright Notice 52 Copyright (c) 2021 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 57 license-info) in effect on the date of publication of this document. 58 Please review these documents carefully, as they describe your rights 59 and restrictions with respect to this document. Code Components 60 extracted from this document must include Simplified BSD License text 61 as described in Section 4.e of the Trust Legal Provisions and are 62 provided without warranty as described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 67 2. Terminologies . . . . . . . . . . . . . . . . . . . . . . . . 3 68 3. Overview of BGP-SPF Link State Flooding . . . . . . . . . . . 3 69 3.1. Flooding in RR Model . . . . . . . . . . . . . . . . . . 4 70 3.2. Flooding in Node Connections Model . . . . . . . . . . . 5 71 3.3. Flooding in Directly-Connected Nodes Model . . . . . . . 6 72 4. Revised Flooding Procedures . . . . . . . . . . . . . . . . . 6 73 4.1. Revised Flooding Procedure for RR Model . . . . . . . . . 6 74 4.2. Revised Flooding Procedure for Node Connections Model . . 7 75 5. BGP Extensions for Flooding Reduction . . . . . . . . . . . . 9 76 5.1. Extensions for RR Model . . . . . . . . . . . . . . . . . 9 77 5.2. Extensions for Node Connections Model . . . . . . . . . . 11 78 5.2.1. New TLVs . . . . . . . . . . . . . . . . . . . . . . 11 79 5.2.2. Flooding Topology Distribution in Centralized Mode . 15 80 5.2.3. An Algorithm for Distributed Mode . . . . . . . . . . 16 81 6. Security Considerations . . . . . . . . . . . . . . . . . . . 17 82 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 83 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 84 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 85 9.1. Normative References . . . . . . . . . . . . . . . . . . 18 86 9.2. Informative References . . . . . . . . . . . . . . . . . 18 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 89 1. Introduction 91 For some networks such as dense Data Center (DC) networks with BGP- 92 SPF, the existing Link State (LS) flooding mechanism defined in 93 [I-D.ietf-lsvr-bgp-spf] for a BGP-SPF domain may not be efficient and 94 may have some issues. The extra LS flooding consumes network 95 bandwidth. Processing the extra LS flooding, including receiving, 96 buffering and decoding the extra LSs, wastes memory space and 97 processor time. This may cause scalability issues and affect the 98 network convergence negatively. 100 This document describes extensions to Border Gateway Protocol (BGP) 101 for flooding the link states on a topology that is a subgraph of the 102 complete topology of a BGP-SPF domain, so that the amount of flooding 103 traffic in the domain is greatly reduced. 105 2. Terminologies 107 The following terminologies are used in this document. 109 BGP: Border Gateway Protocol 111 LS: Link State 113 SPF: Shortest Path First 115 RR: Route Reflector 117 3. Overview of BGP-SPF Link State Flooding 119 [I-D.ietf-lsvr-bgp-spf] defines three BGP peering models: 121 * BGP Peering in Route-Reflector or Controller Topology (RR or 122 Sparse model for short). 124 * BGP Single-Hop Peering on Network Node Connections (Node 125 Connections model for short), and 127 * BGP Peering Between Directly-Connected Nodes (Directly-Connected 128 Nodes model for short). 130 This section briefs the BGP-SPF Link State Flooding in each of these 131 models. 133 3.1. Flooding in RR Model 135 In RR model, BGP-SPF speakers/nodes peer solely with one or more 136 Route Reflectors (RRs) or controllers over eBGP sessions. A BGP-SPF 137 speaker sends/advertises its BGP-LS-SPF Link NLRI in a BGP update 138 message to the RRs or controllers that the speaker peers with when it 139 discovers that its corresponding link is up. After receiving the 140 Link NLRI, each of the RRs or controllers sends the NLRI in a BGP 141 update message to the other BGP-SPF speakers that peer with the RRs 142 or controllers. 144 For example, Figure 1 shows a BGP-SPF domain, which contains two RRs 145 RR1 and RR2, and three network nodes A, B and C. RR1 peers with all 146 three nodes A, B and C in the network. RR2 also peers with all three 147 nodes A, B and C in the network. There is a link between A and B, a 148 link between A and C, and a link between B and C. 150 +-------+ +-------+ 151 | RR1 |------| RR2 | 152 +-------+ +-------+ 153 / \ \ ____/ / \ 154 / \___\/ / \ 155 / /\ \___ / \ 156 / ______/ \ \/ \ 157 / /--> \ /\__________ \ 158 / / ( B ) \ \ 159 / / ___/ \___ \ \ 160 / / ____/ \____ \ \ 161 ^ / / ____/ \____ \ \ 162 | / / / \ \ \ 163 / / / / \ \ \ 164 ( A )-------------------------------------( C ) 166 Figure 1: BGP-SPF Domain with two RRs 168 Each of the nodes A, B and C in the network sends/advertises its link 169 NLRIs in BGP update messages to both RR1 and RR2. After receiving a 170 link NLRI in a BGP update message from a node (e.g., node A), each of 171 RR1 and RR2 sends the NLRI in a BGP update message to the other nodes 172 (e.g., nodes B and C). Each of the other nodes receives two copies 173 of the same NLRI, one from RR1 and the other from RR2. One copy is 174 enough, the other redundant copy should be reduced. 176 3.2. Flooding in Node Connections Model 178 In Node Connections model, EBGP single-hop sessions are established 179 over direct point-to-point links interconnecting the nodes in the 180 BGP-SPF routing domain. Once the session has been established and 181 the BGP-LS-SPF AFI/SAFI capability has been exchanged for the 182 corresponding session, then the link is considered up from a BGP-SPF 183 perspective and the corresponding BGP-LS-SPF Link NLRI is advertised 184 to all the nodes in the domain through all the BGP sessions over the 185 links. If the session goes down, the corresponding Link NLRI will be 186 withdrawn. The withdrawal is done through advertising a BGP update 187 containing the NLRI in MP_UNREACH_NLRI to all the nodes in the domain 188 using all BGP sessions over the links. 190 For example, Figure 2 shows a BGP-SPF domain, which contains four 191 nodes A, B, C and D. These four nodes are connected by six links. 192 There are two parallel links between A and B, a link between A and C, 193 a link between A and D, a link between B and C and a link between C 194 and D. 196 --> 197 _____________________ 198 ( A )-------------------( B ) 199 | |\ --> | | 200 v | \_____ | v 201 | --> \_______ | 202 | \_____ | 203 | \ | ^ 204 | \| | 205 ( D )-------------------( C ) 206 --> <-- 208 Figure 2: BGP-SPF Domain with parallel links 210 Suppose that the BGP sessions over all the links except for the 211 session over the link between A and D have been established and the 212 BGP-LS-SPF AFI/SAFI capability has been exchanged for the 213 corresponding sessions. When the BGP session over the link between A 214 and D is established and the BGP-LS-SPF AFI/SAFI capability is 215 exchanged for the corresponding session, node A considers that the 216 link from A to D is up and sends the BGP-LS-SPF Link NLRI for the 217 link through its four BGP sessions (i.e., the session between A and B 218 over the first parallel link between A and B, the session between A 219 and B over the second parallel link between A and B, the session 220 between A and C over the link between A and C, and the session 221 between A and D over the link between A and D) to nodes B, C and D. 222 After receiving the NLRI from node A, each of the nodes B, C and D 223 sends the NLRI to the other nodes that have BGP sessions with the 224 node. Node B sends the NLRI to node C. Node C sends the NLRI to 225 nodes B and D. Node D sends the NLRI to node C. 227 Similarly, when the BGP session over the link between A and D is 228 established and the BGP-LS-SPF AFI/SAFI capability is exchanged for 229 the corresponding session, node D considers that the link from D to A 230 is up and sends the BGP-LS-SPF Link NLRI for the link through its two 231 BGP sessions (i.e., the session between D and C over the link between 232 D and C, and the session between D and A over the link between D and 233 A) to nodes C and A. After receiving the NLRI from node D, each of 234 the nodes A and C sends the NLRI to the other nodes that have BGP 235 sessions with the node. Node C sends the NLRI to nodes A and B. 236 Node A sends the NLRI to nodes B and C through two parallel BGP 237 sessions to B and the BGP session to C. 239 3.3. Flooding in Directly-Connected Nodes Model 241 In Directly-Connected Nodes model, BGP-SPF speakers peer with all 242 directly-connected nodes but the sessions may be between loopback 243 addresses. Consequently, there will be a single BGP session even if 244 there are multiple direct connections between BGP-SPF speakers. BGP- 245 LS-SPF Link NLRI is advertised as long as a BGP session has been 246 established, the BGP-LS-SPF AFI/SAFI capability has been exchanged. 247 Since there are BGP sessions between every directly-connected nodes 248 in the BGP-SPF routing domain, there is only a reduction in BGP 249 sessions when there are parallel links between nodes comparing to 250 node connections model. 252 4. Revised Flooding Procedures 254 This section describes the revised flooding procedures to support 255 flooding reduction for different models, including RR Model and Node 256 Connections Model. These procedures are backward compatible. In a 257 network with some nodes (including RRs) not supporting flooding 258 reduction, a link NLRI originated from any node will be distributed 259 to every node in the network. 261 4.1. Revised Flooding Procedure for RR Model 263 In RR model, the revised flooding procedure is as follows: 265 * Every BGP-SPF speaker/node sends its BGP-LS-SPF Link NLRI to the 266 same one or more of the RRs or controllers that the speaker peers 267 with when it discovers that its corresponding link is up. 269 * After receiving the Link NLRI, the RR or controller sends the NLRI 270 to the other BGP-SPF speakers that peer with the RR or controller. 272 For example, for the BGP-SPF domain in Figure 1, using the revised 273 flooding procedure, speaker/Node A sends its Link NLRI for link A to 274 B to RR1 when A discovers that link A to B is up. Node A does not 275 send the NLRI to RR2. After receiving the Link NLRI for link A to B 276 from speaker/node A, RR1 sends the NLRI to the other nodes B and C. 277 Each of the other nodes receives only one copy of the same NLRI, 278 which is from RR1. There is no redundant copy of the same NLRI. 279 Comparing to the normal flooding in RR model as illustrated in 280 Figure 1, the revised flooding procedure reduces the amount of link 281 states flooding by half. 283 4.2. Revised Flooding Procedure for Node Connections Model 285 In Node Connections model, the revised flooding procedure is as 286 follows: 288 * A BGP-SPF speaker/node has a flooding topology of the BGP-SPF 289 domain. In an option, the flooding topology is computed in a 290 distributed mode, where every BGP-SPF speaker computes a flooding 291 topology for the domain using a same algorithm. In another 292 option, the flooding topology is computed in a centralized mode, 293 where one BGP-SPF speaker elected as a leader computes a flooding 294 topology for the domain and advertises the flooding topology to 295 every BGP-SPF speaker in the domain. 297 * A BGP-SPF speaker/node sends its link NLRI in a BGP update message 298 for its link up or down to its peers that are directly connected 299 on the flooding topology, and sends its link NLRI in a BGP update 300 message for its link down to all its peers. When receiving the 301 NLRI in a new BGP update message for a link up or down from a 302 peer, the speaker sends the NLRI in a BGP update message to its 303 other peers that are directly connected on the flooding topology. 305 * When a BGP-SPF session is down, the BGP-SPF speaker/node that was 306 connected to the session will not withdraw the link NLRIs received 307 from the session right away. It keeps the NLRIs for some time. 309 Given a real network topology (RT), a flooding topology (FT) of the 310 RT is a sub network topology of the RT and connects all the nodes in 311 the RT. 313 For example, Figure 3 shows a flooding topology of the real topology 314 in Figure 2. 316 ( A )-------------------( B ) 317 | | 318 | | 319 | | 320 | | 321 | | 322 | | 323 ( D )-------------------( C ) 325 Figure 3: A Flooding Topology 327 The flooding topology in Figure 3 is a sub network topology of the RT 328 in Figure 2 and connects all the nodes (i.e., nodes A, B, C and D) in 329 the RT in Figure 2. 331 Figure 4 shows a reduced flooding flow of a link NLRI in a BGP update 332 message for a link up or down in the BGP-SPF domain, which is the 333 same as the one in Figure 2. 335 --> 336 _____________________ 337 ( A )-------------------( B ) 338 | |\ | | 339 v | \_____ | v 340 | \_______ | 341 | \_____ | 342 | \ | 343 | \| 344 ( D )-------------------( C ) 345 --> 347 Figure 4: A Reduced Link State Flooding Flow 349 Speaker/Node A sends the NLRI in a BGP update message for its link to 350 its peers B and D. Nodes B and D are peers of node A and are 351 directly connected to A on the flooding topology (FT). Node A does 352 not send the NLRI to its peer C since C is not directly connected to 353 A on the FT. 355 After receiving the NLRI in the message from A, node B sends the NLRI 356 in a BGP update message to B's other peer C (which is directly 357 connected to B on the FT). After receiving the NLRI in a BGP update 358 message from A, node D sends the NLRI in a BGP update message to D's 359 other peer C (which is directly connected to D on the FT). 361 The number of NLRIs in messages flooded in Figure 4 is much less than 362 that in Figure 2. The performance of network is improved using the 363 revised flooding procedure. 365 5. BGP Extensions for Flooding Reduction 367 This section specifies BGP extensions for flooding reduction in two 368 models: RR model and Node Connections model. The extensions for 369 Directly-Connected Node model are included in the extensions for Node 370 Connections model. 372 5.1. Extensions for RR Model 374 A single RR for a BGP-SPF domain is elected as a leader RR of the 375 domain. The leader RR is the RR with the highest priority to become 376 a leader in the domain. If there are more than one RRs having the 377 same highest priority, the RR with the highest Node ID and the 378 highest priority is the leader RR in the domain. In a deployment, 379 only every RR advertises its priority for becoming a leader using a 380 Leader Priority TLV defined below. 382 Two new TLVs are defined for flooding reduction in RR model. 384 * Leader Priority TLV: A node uses it to advertise its priority for 385 becoming a leader. 387 * Node Flood TLV: A RR or controller uses it to tell every node the 388 flooding behavior the node needs to follow. 390 The format of Leader Priority TLV is illustrated in Figure 5. 392 0 1 2 3 393 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 394 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 395 | Type = TBD1 | Length = 4 | 396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 397 | Reserved | Priority | 398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 400 Figure 5: Leader Priority TLV 402 Type: It is to be assigned by IANA. 404 Length: 4. 406 Reserved: MUST be set to zero in transmission and should be ignored 407 on reception. 409 Priority: A unsigned integer from 0 to 255 in one octet indicating 410 priority to become a leader. 412 The format of Node Flood TLV is illustrated in Figure 6. 414 0 1 2 3 415 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 417 | Type = TBD2 | Length = 4 | 418 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 419 | Reserved | Flood-behavior| 420 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 422 Figure 6: Node Flood TLV 424 Type: It is to be assigned by IANA. 426 Length: 4. 428 Reserved: MUST be set to zero in transmission and should be ignored 429 on reception. 431 Flood-behavior: The following flooding behavior are defined. 433 0 - Reserved. 434 1 - send link states to the RR with the minimum ID 435 2 - send link states to the RR with the maximum ID 436 3 - send link states to 2 RRs with smaller IDs 437 4 - send link states to 2 RRs with larger IDs 438 6-127 - Standardized flooding behaviors for RR Model 439 128-254 - Private flooding behaviors for RR Model. 441 In a deployment, the flooding behavior for every node is configured 442 on a RR or controller such as the leader RR and the RR advertises the 443 behavior to the other RRs and every node in the network though using 444 a Node Flood TLV. 446 For example, if we want every node in the network to send its link 447 states to only one RR, we configure this behavior on a RR and the RR 448 advertises the behavior to every node using a Node Flood TLV with 449 Flood-behavior set to one, which tells every node to send its link 450 states to the RR with the minimum ID. If we want every node in the 451 network to send its link states to two RRs for redundancy, we 452 configure this behavior on a RR and the RR advertises the behavior to 453 every node using a Node Flood TLV with Flood-behavior set to 3, which 454 tells every node to send its link states to the two RRs with smaller 455 IDs (i.e., the RR with the minimum ID and the RR with the second 456 minimum ID). 458 5.2. Extensions for Node Connections Model 460 There are two modes for the flooding topology computation: 461 centralized mode and distributed mode. In a centralized mode, one 462 BGP-SPF node is elected as a leader. The leader computes a flooding 463 topology for the BGP-SPF domain and advertises the flooding topology 464 to every BGP-SPF node in the domain. In a distributed mode, every 465 BGP-SPF node computes a flooding topology for the BGP-SPF domain 466 using a same algorithm. There is not any flooding topology 467 distribution. 469 This section defines the new TLVs for the two modes, describes the 470 flooding topology distribution in centralized mode and an algorithm 471 that can be used by every node to compute its flooding topology in 472 distributed mode. 474 5.2.1. New TLVs 476 Five new TLVs are defined for flooding reduction in Node Connections 477 model. 479 * Node Algorithm TLV: A leader uses this TLV to tell every node the 480 algorithm to be used to compute a flooding topology. 482 * Algorithms Support TLV: A node uses this TLV to indicate the 483 algorithms that it supports for distributed mode. 485 * Node IDs TLV: A leader uses this TLV to indicate the mapping from 486 nodes to their indices for centralized mode. 488 * Paths TLV: A leader uses this TLV to advertise a part of flooding 489 topology for centralized mode. 491 * Connection Used for Flooding TLV: A node uses this TLV to indicate 492 that a connection/link is a part of the flooding topology and used 493 for flooding. 495 5.2.1.1. Node Algorithm TLV 497 The format of Node Algorithm TLV is illustrated in Figure 7. 499 0 1 2 3 500 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 501 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 502 | Type = TBD3 | Length = 4 | 503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 504 | Reserved | Algorithm | 505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 507 Figure 7: Node Algorithm TLV 509 Type: It is to be assigned by IANA. 511 Length: 4. 513 Reserved: MUST be set to zero in transmission and should be ignored 514 on reception. 516 Algorithm: 518 0 - The leader computes a flooding topology using its own 519 algorithm and advertises the flooding topology to every 520 node. 521 1-127 - Every node computes its flooding topology using this 522 standardized distributed algorithm. 523 128-254 - Private distributed algorithms. 525 A node such as the leader node can use this TLV to tell every node in 526 the domain to use the flooding topology from the leader for flooding 527 the link states through advertising the TLV with the Algorithm field 528 set to zero, or to tell every node to compute its own flooding 529 topology using the algorithm given by the Algorithm field in the TLV 530 containing an identifier of an algorithm when the Algorithm field is 531 not zero. 533 5.2.1.2. Algorithms Support TLV 535 The format of Algorithms Support TLV is illustrated in Figure 8. 537 0 1 2 3 538 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 539 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 540 | Type = TBD4 | Length (variable) | 541 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 542 | Algorithm | Algorithm | . . . 543 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 Figure 8: Algorithms Support TLV 547 Type: It is to be assigned by IANA. 549 Length: The number of Algorithms in the TLV. 551 Algorithm: A numeric identifier in the range 0-255 indicating the 552 algorithm that can be used to compute the flooding topology. 554 5.2.1.3. Node IDs TLV 556 The format of Node IDs TLV is illustrated in Figure 9. 558 0 1 2 3 559 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 560 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 561 | Type = TBD5 | Length (variable) | 562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 563 | Reserved |L| Starting Index | 564 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 565 | Node ID | 566 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 567 ~ . . . . . . ~ 568 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 569 | Node ID | 570 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 Figure 9: Node IDs TLV 574 Type: It is to be assigned by IANA. 576 Length: 4 * (number of Node IDs + 1). 578 Reserved: MUST be set to zero in transmission and should be ignored 579 on reception. 581 L: This bit is set to one if the index of the last node ID in this 582 TLV is equal to the last index in the full list of node IDs for 583 the BFP-SPF domain. 585 Starting Index: The index of the first node ID in this TLV is 586 Starting Index; the index of the second node ID in this TLV is 587 Starting Index + 1; the index of the third node ID in this TLV 588 is Starting Index + 2; and so on. 590 Node ID: The BGP identifier of a node in the BGP-SPF domain. 592 5.2.1.4. Paths TLV 594 The format of Paths TLV is illustrated in Figure 10. A leader uses 595 this TLV to advertise a part of flooding topology for centralized 596 mode. A path may be described as a sequence of indices: (Index 1, 597 Index 2, Index 3, ...), denoting a connection between the node with 598 index 1 and the node with index 2, a connection between the node with 599 index 2 and the node with index 3, and so on. A single link/ 600 connection is a simple case of a path that only connects two nodes. 601 A single link path may be encoded in a paths TLV of 8 bytes with two 602 indices. 604 0 1 2 3 605 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 606 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 607 | Type = TBD6 | Length (variable) | 608 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 609 | Index 1 | Index 2 | 610 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 611 ~ . . . . . . ~ 612 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 614 Figure 10: Paths TLV 616 Type: It is to be assigned by IANA. 618 Length: 2 * (number of indices in the path) when the TLV contains 619 the indices for one path. 621 Index 1: The index of the first node in the path. 623 Index 2: The index of the second (next) node in the path. 625 Multiple such as N paths may be encoded in one paths TLV. Each of 626 the multiple paths is represented as a sequence of indices of the 627 nodes on the path, and two paths (i.e., two sequences of indices for 628 the two paths) are separated by a special index value such as 0xFFFF. 629 In this case, there are (N - 1) special indices as separators to 630 separate N paths, and the Length field has a value of 2 * (number of 631 indices in N paths + N - 1). 633 When there are a number such as N of single link paths, using one 634 paths TLV to represent them is more efficient than using N paths TLVs 635 to represent them (i.e., each paths TLV represents a single link 636 path). Using one TLV consumes 4 + 2 * (2*N + N - 1) = 6*N + 2 bytes. 637 Using N TLVs occupies N * (4 + 4) = 8*N bytes. The space used by the 638 former is about three quarters of the space used by the latter for a 639 big N such as 30. 641 5.2.1.5. Connection Used for Flooding TLV 643 The format of Connection Used for Flooding TLV is illustrated in 644 Figure 11. 646 0 1 2 3 647 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 649 | Type = TBD7 | Length = 8 | 650 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 651 | Local Node ID | 652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 653 | Remote Node ID | 654 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 656 Figure 11: Connection Used for Flooding TLV 658 Type: It is to be assigned by IANA. 660 Length: 8. 662 Local Node ID: The BGP ID of the local node of the session over the 663 connection on the flooding topology which is used for flooding 664 link states. 666 Remote Node ID: The BGP ID of the remote node of the session over 667 the connection on the flooding topology which is used for 668 flooding link states. 670 5.2.2. Flooding Topology Distribution in Centralized Mode 672 In centralized mode, the leader computes a flooding topology for the 673 domain whenever there is a change in the real network topology of the 674 domain and advertises the flooding topology to every node in the 675 domain. 677 After the current leader has failed, a new leader is elected. The 678 new leader computes a flooding topology for the domain and advertises 679 the flooding topology to every node in the domain. 681 For a brand new flooding topology of the domain computed, the leader 682 advertises the whole flooding topology to every node in the domain. 683 The leader advertises the mappings between all the node IDs and their 684 indices to every node in the domain using a number of node IDs TLVs 685 first. These node IDs TLVs contain the IDs of all the nodes in the 686 domain and indicates the index corresponding to each of the node IDs 687 and are advertised under MP_REACH_NLRI in BGP update messages. And 688 then the leader advertises the connections/links on the flooding 689 topology to every node in the domain using a number of paths TLVs. 690 These paths TLVs contain all the connections/links on the flooding 691 topology and are advertised under MP_REACH_NLRI in BGP update 692 messages. 694 After advertising a flooding topology to every node in the domain, 695 which is called the current flooding topology, for a new flooding 696 topology computed for the updated real network topology of the 697 domain, the leader advertises only the changes in the new flooding 698 topology comparing to the current flooding topology to every node in 699 the domain. The leader advertises the changes in the mappings 700 between all the node IDs and their indices to every node in the 701 domain using node IDs TLVs first, and then advertises the changes in 702 the flooding topology to every node in the domain using paths TLVs. 704 For the new nodes added into the domain, the leader advertises the 705 mappings between the IDs of the new nodes and their indices using a 706 node IDs TLV under MP_REACH_NLRI in a BGP update message to add the 707 mappings. For the dead nodes removed from the domain, the leader 708 advertises the mappings between the IDs of the dead nodes and their 709 indices using a node IDs TLV under MP_UNREACH_NLRI in a BGP update 710 message to withdraw the mappings. 712 For the new connections/links added into the current flooding 713 topology, the leader advertises the new connections/links using a 714 paths TLV under MP_REACH_NLRI in a BGP update message to add the new 715 connections/inks to the current flooding topology. For the old 716 connections/links removed from the current flooding topology, the 717 leader advertises the old connections/links using a paths TLV under 718 MP_UNREACH_NLRI in a BGP update message to withdraw the old 719 connections/links from the current flooding topology. 721 5.2.3. An Algorithm for Distributed Mode 723 This section specifies an algorithm that can be used by every node to 724 compute its flooding topology. 726 The algorithm for computing a flooding topology of a BGP-SPF domain 727 (real topology) is described as follows. 729 * Select a node R0 with the smallest node ID and without the status 730 indicating that the node does not support transit; 732 * Build a tree using R0 as root of the tree (details below); 734 * And then connect a leaf to the tree to have a flooding topology 735 (details follow). 737 The algorithm starts from 739 * a variable MaxD with an initial value 3, 741 * an initial flooding topology FT = {(R0, D=0, PHs={})} with node R0 742 as root, where R0's Degree D = 0, Previous Hops PHs = { }; 744 * an initial candidate queue Cq = {(R1,D=0, PHs={R0}), (R2,D=0, 745 PHs={R0}), ..., (Rm,D=0, PHs={R0})}, where each of nodes R1 to Rm 746 is connected to R0, its Degree D = 0 and Previous Hops PHs ={R0}, 747 R1 to Rm are in increasing order by their IDs. 749 1. Find and remove the first element with node A from Cq that is not 750 on FT and one PH's D in PHs < MaxD, and add the element with A 751 into FT; Set A's D to one, increase A's PH's D by one. If no 752 element in Cq satisfies the conditions, algorithm is restarted 753 with ++MaxD, the initial FT and Cq. 755 2. If all the nodes are on the FT, then goto step 4; 757 3. Suppose that node Xi (i = 1, 2,..., n) is connected to node A and 758 not on FT, and X1, X2,..., Xn are in increasing order by their 759 IDs (i.e., X1's ID < X2's ID < ... < Xn's ID). If they are not 760 ordered, then make them in the order. If Xi is not in Cq, then 761 add it into the end of Cq with D = 0 and PHs = {A}; otherwise 762 (i.e., Xi is in Cq), add A into the end of Xi's PHs; Goto step 1. 764 4. For each node B on FT whose D is one (from minimum to maximum 765 node ID), find a link L attached to B such that L's remote node R 766 can transit traffic and has minimum D and ID (if there is no node 767 R which can transit traffic, then find a link L to node R whose D 768 and ID are minimum), add link L between B and R into FT and 769 increase B's D and R's D by one. Return FT. 771 6. Security Considerations 773 TBD 775 7. Acknowledgements 777 The authors of this document would like to thank Donald E. Eastlake, 778 Acee Lindem and Keyur Patel for the comments. 780 8. IANA Considerations 782 TBD 784 9. References 786 9.1. Normative References 788 [I-D.ietf-lsvr-bgp-spf] 789 Patel, K., Lindem, A., Zandi, S., and W. Henderickx, "BGP 790 Link-State Shortest Path First (SPF) Routing", Work in 791 Progress, Internet-Draft, draft-ietf-lsvr-bgp-spf-15, 1 792 July 2021, . 795 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 796 Requirement Levels", BCP 14, RFC 2119, 797 DOI 10.17487/RFC2119, March 1997, 798 . 800 [RFC4721] Perkins, C., Calhoun, P., and J. Bharatia, "Mobile IPv4 801 Challenge/Response Extensions (Revised)", RFC 4721, 802 DOI 10.17487/RFC4721, January 2007, 803 . 805 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 806 "Multiprotocol Extensions for BGP-4", RFC 4760, 807 DOI 10.17487/RFC4760, January 2007, 808 . 810 [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of 811 BGP for Routing in Large-Scale Data Centers", RFC 7938, 812 DOI 10.17487/RFC7938, August 2016, 813 . 815 9.2. Informative References 817 [I-D.ietf-lsr-dynamic-flooding] 818 Li, T., Psenak, P., Ginsberg, L., Chen, H., Przygienda, 819 T., Cooper, D., Jalil, L., Dontula, S., and G. S. Mishra, 820 "Dynamic Flooding on Dense Graphs", Work in Progress, 821 Internet-Draft, draft-ietf-lsr-dynamic-flooding-09, 9 June 822 2021, . 825 [I-D.ietf-lsr-flooding-topo-min-degree] 826 Chen, H., Toy, M., Yang, Y., Wang, A., Liu, X., Fan, Y., 827 and L. Liu, "Flooding Topology Minimum Degree Algorithm", 828 Work in Progress, Internet-Draft, draft-ietf-lsr-flooding- 829 topo-min-degree-02, 1 June 2021, 830 . 833 [RFC8670] Filsfils, C., Ed., Previdi, S., Dawra, G., Aries, E., and 834 P. Lapukhov, "BGP Prefix Segment in Large-Scale Data 835 Centers", RFC 8670, DOI 10.17487/RFC8670, December 2019, 836 . 838 Authors' Addresses 840 Huaimo Chen 841 Futurewei 842 Boston, MA, 843 United States of America 845 Email: huaimo.chen@futurewei.com 847 Gyan S. Mishra 848 Verizon Inc. 849 13101 Columbia Pike 850 Silver Spring, MD 20904 851 United States of America 853 Phone: 301 502-1347 854 Email: gyan.s.mishra@verizon.com 856 Aijun Wang 857 China Telecom 858 Beiqijia Town, Changping District 859 Beijing 860 102209 861 China 862 Email: wangaj3@chinatelecom.cn 864 Yisong Liu 865 China Mobile 867 Email: liuyisong@chinamobile.com 869 Haibo Wang 870 Huawei 871 Huawei Bld., No.156 Beiqing Rd. 872 Beijing 873 100095 874 China 876 Email: rainsword.wang@huawei.com 878 Yanhe Fan 879 Casa Systems 880 United States of America 882 Email: yfan@casa-systems.com