idnits 2.17.00 (12 Aug 2021) /tmp/idnits47959/draft-ietf-lsvr-bgp-spf-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 31, 2018) is 1451 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2328' is mentioned on line 625, but not defined == Missing Reference: 'RFC5286' is mentioned on line 659, but not defined == Missing Reference: 'RFC4456' is mentioned on line 629, but not defined == Missing Reference: 'RFC4915' is mentioned on line 654, but not defined == Missing Reference: 'RFC5549' is mentioned on line 664, but not defined ** Obsolete undefined reference: RFC 5549 (Obsoleted by RFC 8950) == Missing Reference: 'RFC4790' is mentioned on line 649, but not defined == Missing Reference: 'RFC5880' is mentioned on line 669, but not defined == Missing Reference: 'RFC4760' is mentioned on line 644, but not defined == Missing Reference: 'RFC4750' is mentioned on line 639, but not defined == Missing Reference: 'RFC4724' is mentioned on line 634, but not defined == Outdated reference: draft-ietf-idr-bgpls-segment-routing-epe has been published as RFC 9086 == Outdated reference: draft-ietf-ospf-segment-routing-extensions has been published as RFC 8665 ** Downref: Normative reference to an Informational RFC: RFC 7938 Summary: 2 errors (**), 0 flaws (~~), 14 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Patel 3 Internet-Draft Arrcus, Inc. 4 Intended status: Standards Track A. Lindem 5 Expires: December 2, 2018 Cisco Systems 6 S. Zandi 7 Linkedin 8 W. Henderickx 9 Nokia 10 May 31, 2018 12 Shortest Path Routing Extensions for BGP Protocol 13 draft-ietf-lsvr-bgp-spf-01.txt 15 Abstract 17 Many Massively Scaled Data Centers (MSDCs) have converged on 18 simplified layer 3 routing. Furthermore, requirements for 19 operational simplicity have lead many of these MSDCs to converge on 20 BGP as their single routing protocol for both their fabric routing 21 and their Data Center Interconnect (DCI) routing. This document 22 describes a solution which leverages BGP Link-State distribution and 23 the Shortest Path First (SPF) algorithm similar to Internal Gateway 24 Protocols (IGPs) such as OSPF. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on December 2, 2018. 43 Copyright Notice 45 Copyright (c) 2018 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 This document may contain material from IETF Documents or IETF 59 Contributions published or made publicly available before November 60 10, 2008. The person(s) controlling the copyright in some of this 61 material may not have granted the IETF Trust the right to allow 62 modifications of such material outside the IETF Standards Process. 63 Without obtaining an adequate license from the person(s) controlling 64 the copyright in such materials, this document may not be modified 65 outside the IETF Standards Process, and derivative works of it may 66 not be created outside the IETF Standards Process, except to format 67 it for publication as an RFC or to translate it into languages other 68 than English. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 73 1.1. BGP Shortest Path First (SPF) Motivation . . . . . . . . 4 74 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 5 75 2. BGP Peering Models . . . . . . . . . . . . . . . . . . . . . 5 76 2.1. BGP Single-Hop Peering on Network Node Connections . . . 5 77 2.2. BGP Peering Between Directly Connected Network Nodes . . 5 78 2.3. BGP Peering in Route-Reflector or Controller Topology . . 6 79 3. BGP-LS Shortest Path Routing (SPF) SAFI . . . . . . . . . . . 6 80 4. Extensions to BGP-LS . . . . . . . . . . . . . . . . . . . . 6 81 4.1. Node NLRI Usage and Modifications . . . . . . . . . . . . 7 82 4.2. Link NLRI Usage . . . . . . . . . . . . . . . . . . . . . 7 83 4.3. Prefix NLRI Usage . . . . . . . . . . . . . . . . . . . . 8 84 4.4. BGP-LS Attribute Sequence-Number TLV . . . . . . . . . . 8 85 5. Decision Process with SPF Algorithm . . . . . . . . . . . . . 9 86 5.1. Phase-1 BGP NLRI Selection . . . . . . . . . . . . . . . 10 87 5.2. Dual Stack Support . . . . . . . . . . . . . . . . . . . 10 88 5.3. NEXT_HOP Manipulation . . . . . . . . . . . . . . . . . . 11 89 5.4. IPv4/IPv6 Unicast Address Family Interaction . . . . . . 11 90 5.5. NLRI Advertisement and Convergence . . . . . . . . . . . 11 91 5.6. Error Handling . . . . . . . . . . . . . . . . . . . . . 12 92 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 93 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 94 7.1. Acknowledgements . . . . . . . . . . . . . . . . . . . . 12 95 7.2. Contributors . . . . . . . . . . . . . . . . . . . . . . 12 97 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 98 8.1. Normative References . . . . . . . . . . . . . . . . . . 13 99 8.2. Information References . . . . . . . . . . . . . . . . . 14 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 102 1. Introduction 104 Many Massively Scaled Data Centers (MSDCs) have converged on 105 simplified layer 3 routing. Furthermore, requirements for 106 operational simplicity have lead many of these MSDCs to converge on 107 BGP [RFC4271] as their single routing protocol for both their fabric 108 routing and their Data Center Interconnect (DCI) routing. 109 Requirements and procedures for using BGP are described in [RFC7938]. 110 This document describes an alternative solution which leverages BGP- 111 LS [RFC7752] and the Shortest Path First algorithm similar to 112 Internal Gateway Protocols (IGPs) such as OSPF [RFC2328]. 114 [RFC4271] defines the Decision Process that is used to select routes 115 for subsequent advertisement by applying the policies in the local 116 Policy Information Base (PIB) to the routes stored in its Adj-RIBs- 117 In. The output of the Decision Process is the set of routes that are 118 announced by a BGP speaker to its peers. These selected routes are 119 stored by a BGP speaker in the speaker's Adj-RIBs-Out according to 120 policy. 122 [RFC7752] describes a mechanism by which link-state and TE 123 information can be collected from networks and shared with external 124 components using BGP. This is achieved by defining NLRI advertised 125 within the BGP-LS/BGP-LS-SPF AFI/SAFI. The BGP-LS extensions defined 126 in [RFC7752] makes use of the Decision Process defined in [RFC4271]. 128 This document augments [RFC7752] by replacing its use of the existing 129 Decision Process. Rather than reusing the BGP-LS SAFI, the BGP-LS- 130 SPF SAFI is introduced to insure backward compatibility. The Phase 1 131 and 2 decision functions of the Decision Process are replaced with 132 the Shortest Path First (SPF) algorithm also known as the Dijkstra 133 algorithm. The Phase 3 decision function is also simplified since it 134 is no longer dependent on the previous phases. This solution avails 135 the benefits of both BGP and SPF-based IGPs. These include TCP based 136 flow-control, no periodic link-state refresh, and completely 137 incremental NLRI advertisement. These advantages can reduce the 138 overhead in MSDCs where there is a high degree of Equal Cost Multi- 139 Path (ECMPs) and the topology is very stable. Additionally, using a 140 SPF-based computation can support fast convergence and the 141 computation of Loop-Free Alternatives (LFAs) [RFC5286] in the event 142 of link failures. Furthermore, a BGP based solution lends itself to 143 multiple peering models including those incorporating route- 144 reflectors [RFC4456] or controllers. 146 Support for Multiple Topology Routing (MTR) as described in [RFC4915] 147 is an area for further study dependent on deployment requirements. 149 1.1. BGP Shortest Path First (SPF) Motivation 151 Given that [RFC7938] already describes how BGP could be used as the 152 sole routing protocol in an MSDC, one might question the motivation 153 for defining an alternate BGP deployment model when a mature solution 154 exists. For both alternatives, BGP offers the operational benefits 155 of a single routing protocol. However, BGP SPF offers some unique 156 advantages above and beyond standard BGP distance-vector routing. 158 A primary advantage is that all BGP speakers in the BGP SPF routing 159 domain will have a complete view of the topology. This will allow 160 support for ECMP, IP fast-reroute (e.g., Loop-Free Alternatives), 161 Shared Risk Link Groups (SRLGs), and other routing enhancements 162 without advertisement of addition BGP paths or other extensions. In 163 short, the advantages of an IGP such as OSPF [RFC2328] are availed in 164 BGP. 166 With the simplified BGP decision process as defined in Section 5.1, 167 NLRI changes can be disseminated throughout the BGP routing domain 168 much more rapidly (equivalent to IGPs with the proper 169 implementation). 171 Another primary advantage is a potential reduction in NLRI 172 advertisement. With standard BGP distance-vector routing, a single 173 link failure may impact 100s or 1000s prefixes and result in the 174 withdrawal or re-advertisement of the attendant NLRI. With BGP SPF, 175 only the BGP speakers corresponding to the link NLRI need withdraw 176 the corresponding BGP-LS Link NLRI. This advantage will contribute 177 to both faster convergence and better scaling. 179 With controller and route-reflector peering models, BGP SPF 180 advertisement and distributed computation require a minimal number of 181 sessions and copies of the NLRI since only the latest version of the 182 NLRI from the originator is required. Given that verification of the 183 adjacencies is done outside of BGP (see Section 2), each BGP speaker 184 will only need as many sessions and copies of the NLRI as required 185 for redundancy (e.g., one for the SPF computation and another for 186 backup). Functions such as Optimized Route Reflection (ORR) are 187 supported without extension by virtue of the primary advantages. 188 Additionally, a controller could inject topology that is learned 189 outside the BGP routing domain. 191 Given that controllers are already consuming BGP-LS NLRI [RFC7752], 192 reusing for the BGP-LS SPF leverages the existing controller 193 implementations. 195 Another potential advantage of BGP SPF is that both IPv6 and IPv4 can 196 be supported in the same address family using the same topology. 197 Although not described in this version of the document, multi- 198 topology extensions can be used to support separate IPv4, IPv6, 199 unicast, and multicast topologies while sharing the same NLRI. 201 Finally, the BGP SPF topology can be used as an underlay for other 202 BGP address families (using the existing model) and realize all the 203 above advantages. A simplified peering model using IPv6 link-local 204 addresses as next-hops can be deployed similar to [RFC5549]. 206 1.2. Requirements Language 208 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 209 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 210 "OPTIONAL" in this document are to be interpreted as described in BCP 211 14 [RFC2119] [RFC8174] when, and only when, they appear in all 212 capitals, as shown here. 214 2. BGP Peering Models 216 Depending on the requirements, scaling, and capabilities of the BGP 217 speakers, various peering models are supported. The only requirement 218 is that all BGP speakers in the BGP SPF routing domain receive link- 219 state NLRI on a timely basis, run an SPF calculation, and update 220 their data plane appropriately. The content of the Link NLRI is 221 described in Section 4.2. 223 2.1. BGP Single-Hop Peering on Network Node Connections 225 The simplest peering model is the one described in section 5.2.1 of 226 [RFC7938]. In this model, EBGP single-hop sessions are established 227 over direct point-to-point links interconnecting the SPF domain 228 nodes. For the purposes of BGP SPF, Link NLRI is only advertised if 229 a single-hop BGP session has been established and the Link-State/SPF 230 address family capability has been exchanged [RFC4790] on the 231 corresponding session. If the session goes down, the corresponding 232 Link NLRI will be withdrawn. 234 2.2. BGP Peering Between Directly Connected Network Nodes 236 In this model, BGP speakers peer with all directly connected network 237 nodes but the sessions may be multi-hop and the direct connection 238 discovery and liveliness detection for those connections are 239 independent of the BGP protocol. How this is accomplished is outside 240 the scope of this document. Consequently, there will be a single 241 session even if there are multiple direct connections between BGP 242 speakers. For the purposes of BGP SPF, Link NLRI is advertised as 243 long as a BGP session has been established, the Link-State/SPF 244 address family capability has been exchanged [RFC4790] and the 245 corresponding link is considered is up and considered operational. 247 2.3. BGP Peering in Route-Reflector or Controller Topology 249 In this model, BGP speakers peer solely with one or more Route 250 Reflectors [RFC4456] or controllers. As in the previous model, 251 direct connection discovery and liveliness detection for those 252 connections are done outside the BGP protocol. More specifically, 253 the Liveliness detection is done using BFD protocol described in 254 [RFC5880]. For the purposes of BGP SPF, Link NLRI is advertised as 255 long as the corresponding link is up and considered operational. 257 3. BGP-LS Shortest Path Routing (SPF) SAFI 259 In order to replace the Phase 1 and 2 decision functions of the 260 existing Decision Process with an SPF-based Decision Process and 261 streamline the Phase 3 decision functions in a backward compatible 262 manner, this draft introduces the BGP-LS-SFP SAFI for BGP-LS SPF 263 operation. The BGP-LS-SPF (AF 16388 / SAFI TBD1) [RFC4790] is 264 allocated by IANA as specified in the Section 6. A BGP speaker using 265 the BGP-LS SPF extensions described herein MUST exchange the AFI/SAFI 266 using Multiprotocol Extensions Capability Code [RFC4760] with other 267 BGP speakers in the SPF routing domain. 269 4. Extensions to BGP-LS 271 [RFC7752] describes a mechanism by which link-state and TE 272 information can be collected from networks and shared with external 273 components using BGP protocol. It describes both the definition of 274 BGP-LS NLRI that describes links, nodes, and prefixes comprising IGP 275 link-state information and the definition of a BGP path attribute 276 (BGP-LS attribute) that carries link, node, and prefix properties and 277 attributes, such as the link and prefix metric or auxiliary Router- 278 IDs of nodes, etc. 280 The BGP protocol will be used in the Protocol-ID field specified in 281 table 1 of [I-D.ietf-idr-bgpls-segment-routing-epe]. The local and 282 remote node descriptors for all NLRI will be the BGP Router-ID (TLV 283 516) and either the AS Number (TLV 512) [RFC7752] or the BGP 284 Confederation Member (TLV 517) 285 [I-D.ietf-idr-bgpls-segment-routing-epe]. However, if the BGP 286 Router-ID is known to be unique within the BGP Routing domain, it can 287 be used as the sole descriptor. 289 4.1. Node NLRI Usage and Modifications 291 The SPF capability is a new Node Attribute TLV that will be added to 292 those defined in table 7 of [RFC7752]. The new attribute TLV will 293 only be applicable when BGP is specified in the Node NLRI Protocol ID 294 field. The TBD TLV type will be defined by IANA. The new Node 295 Attribute TLV will contain a single-octet SPF algorithm as defined in 296 [I-D.ietf-ospf-segment-routing-extensions]. 298 0 1 2 3 299 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 301 | Type | Length | 302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 303 | SPF Algorithm | 304 +-+-+-+-+-+-+-+-+ 306 The SPF Algorithm may take the following values: 308 0 - Normal Shortest Path First (SPF) algorithm based on link 309 metric. This is the standard shortest path algorithm as 310 computed by the IGP protocol. Consistent with the deployed 311 practice for link-state protocols, Algorithm 0 permits any 312 node to overwrite the SPF path with a different path based on 313 its local policy. 314 1 - Strict Shortest Path First (SPF) algorithm based on link 315 metric. The algorithm is identical to Algorithm 0 but Algorithm 316 1 requires that all nodes along the path will honor the SPF 317 routing decision. Local policy at the node claiming support for 318 Algorithm 1 MUST NOT alter the SPF paths computed by Algorithm 1. 320 When computing the SPF for a given BGP routing domain, only BGP nodes 321 advertising the SPF capability attribute will be included the 322 Shortest Path Tree (SPT). 324 4.2. Link NLRI Usage 326 The criteria for advertisement of Link NLRI are discussed in 327 Section 2. 329 Link NLRI is advertised with local and remote node descriptors as 330 described above and unique link identifiers dependent on the 331 addressing. For IPv4 links, the links local IPv4 (TLV 259) and 332 remote IPv4 (TLV 260) addresses will be used. For IPv6 links, the 333 local IPv6 (TLV 261) and remote IPv6 (TLV 262) addresses will be 334 used. For unnumbered links, the link local/remote identifiers (TLV 335 258) will be used. For links supporting having both IPv4 and IPv6 336 addresses, both sets of descriptors may be included in the same Link 337 NLRI. The link identifiers are described in table 5 of [RFC7752]. 339 The link IGP metric attribute TLV (TLV 1095) as well as any others 340 required for non-SPF purposes SHOULD be advertised. Algorithms such 341 as setting the metric inversely to the link speed as done in the OSPF 342 MIB [RFC4750] MAY be supported. However, this is beyond the scope of 343 this document. 345 4.3. Prefix NLRI Usage 347 Prefix NLRI is advertised with a local node descriptor as described 348 above and the prefix and length used as the descriptors (TLV 265) as 349 described in [RFC7752]. The prefix metric attribute TLV (TLV 1155) 350 as well as any others required for non-SPF purposes SHOULD be 351 advertised. For loopback prefixes, the metric should be 0. For non- 352 loopback prefixes, the setting of the metric is a local matter and 353 beyond the scope of this document. 355 4.4. BGP-LS Attribute Sequence-Number TLV 357 A new BGP-LS Attribute TLV to BGP-LS NLRI types is defined to assure 358 the most recent version of a given NLRI is used in the SPF 359 computation. The TBD TLV type will be defined by IANA. The new BGP- 360 LS Attribute TLV will contain an 8-octet sequence number. The usage 361 of the Sequence Number TLV is described in Section 5.1. 363 0 1 2 3 364 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 365 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 366 | Type | Length | 367 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 368 | Sequence Number (High-Order 32 Bits) | 369 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 370 | Sequence Number (Low-Order 32 Bits) | 371 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 373 Sequence Number 375 The 64-bit strictly increasing sequence number is incremented for 376 every version of BGP-LS NLRI originated. BGP speakers implementing 377 this specification MUST use available mechanisms to preserve the 378 sequence number's strictly increasing property for the deployed life 379 of the BGP speaker (including cold restarts). One mechanism for 380 accomplishing this would be to use the high-order 32 bits of the 381 sequence number as a wrap/boot count that is incremented anytime the 382 BGP router loses its sequence number state or the low-order 32 bits 383 wrap. 385 When incrementing the sequence number for each self-originated NLRI, 386 the sequence number should be treated as an unsigned 64-bit value. 387 If the lower-order 32-bit value wraps, the higher-order 32-bit value 388 should be incremented and saved in non-volatile storage. If by some 389 chance the BGP Speaker is deployed long enough that there is a 390 possibility that the 64-bit sequence number may wrap or a BGP Speaker 391 completely loses its sequence number state (e.g., the BGP speaker 392 hardware is replaced or experiences a cold-start), the phase 1 393 decision function (see Section 5.1) rules will insure convergence, 394 albeit, not immediately. 396 5. Decision Process with SPF Algorithm 398 The Decision Process described in [RFC4271] takes place in three 399 distinct phases. The Phase 1 decision function of the Decision 400 Process is responsible for calculating the degree of preference for 401 each route received from a BGP speaker's peer. The Phase 2 decision 402 function is invoked on completion of the Phase 1 decision function 403 and is responsible for choosing the best route out of all those 404 available for each distinct destination, and for installing each 405 chosen route into the Loc-RIB. The combination of the Phase 1 and 2 406 decision functions is characterized as a Path Vector algorithm. 408 The SPF based Decision process replaces the BGP best-path Decision 409 process described in [RFC4271]. This process starts with selecting 410 only those Node NLRI whose SPF capability TLV matches with the local 411 BGP speaker's SPF capability TLV value. Since Link-State NLRI always 412 contains the local descriptor [RFC7752], it will only be originated 413 by a single BGP speaker in the BGP routing domain. These selected 414 Node NLRI and their Link/Prefix NLRI are used to build a directed 415 graph during the SPF computation. The best paths for BGP prefixes 416 are installed as a result of the SPF process. 418 When BGP-LS-SPF NLRI is received, all that is required is to 419 determine whether it is the best-path by examining the Node-ID and 420 sequence number as described in Section 5.1. If the received best- 421 path NLRI had changed, it will be advertised to other BGP-LS-SPF 422 peers. If the attributes have changed (other than the sequence 423 number), a BGP SPF calculation will be scheduled. However, a changed 424 NLRI MAY be advertised to other peers almost immediately and 425 propagation of changes can approach IGP convergence times. To 426 accomplish this, the MinRouteAdvertisementIntervalTimer and 427 MinRouteAdvertisementIntervalTimer [RFC4271] are not applicable to 428 the BGP-LS-SPF SAFI. 430 The Phase 3 decision function of the Decision Process [RFC4271] is 431 also simplified since under normal SPF operation, a BGP speaker would 432 advertise the NLRI selected for the SPF to all BGP peers with the 433 BGP-LS/BGP-LS-SPF AFI/SAFI. Application of policy would not be 434 prevented however its usage to best-path process would be limited as 435 the SPF relies solely on link metrics. 437 5.1. Phase-1 BGP NLRI Selection 439 The rules for NLRI selection are greatly simplified from [RFC4271]. 441 1. If the NLRI is received from the BGP speaker originating the NLRI 442 (as determined by the comparing BGP Router ID in the NLRI Node 443 identifiers with the BGP speaker Router ID), then it is preferred 444 over the same NLRI from non-originators. This rule will assure 445 that stale NLRI is updated even if a BGP-LS router loses its 446 sequence number state due to a cold-start. 448 2. If the Sequence-Number TLV is present in the BGP-LS Attribute, 449 then the NLRI with the most recent, i.e., highest sequence number 450 is selected. BGP-LS NLRI with a Sequence-Number TLV will be 451 considered more recent than NLRI without a BGP-LS Attribute or a 452 BGP-LS Attribute that doesn't include the Sequence-Number TLV. 454 3. The final tie-breaker is the NLRI from the BGP Speaker with the 455 numerically largest BGP Router ID. 457 The modified SPF Decision Process performs an SPF calculation rooted 458 at the BGP speaker using the metrics from Link and Prefix NLRI 459 Attribute TLVs [RFC7752]. As a result, any attributes that would 460 influence the Decision process defined in [RFC4271] like ORIGIN, 461 MULTI_EXIT_DISC, and LOCAL_PREF attributes are ignored by the SPF 462 algorithm. Furthermore, the NEXT_HOP attribute value is preserved 463 but otherwise ignored during the SPF or best-path. 465 5.2. Dual Stack Support 467 The SPF-based decision process operates on Node, Link, and Prefix 468 NLRIs that support both IPv4 and IPv6 addresses. Whether to run a 469 single SPF instance or multiple SPF instances for separate AFs is a 470 matter of a local implementation. Normally, IPv4 next-hops are 471 calculated for IPv4 prefixes and IPv6 next-hops are calculated for 472 IPv6 prefixes. However, an interesting use-case is deployment of 473 [RFC5549] where IPv6 next-hops are calculated for both IPv4 and IPv6 474 prefixes. As stated in Section 1, support for Multiple Topology 475 Routing (MTR) is an area for future study. 477 5.3. NEXT_HOP Manipulation 479 A BGP speaker that supports SPF extensions MAY interact with peers 480 that don't support SPF extensions. If the BGP-LS address family is 481 advertised to a peer not supporting the SPF extensions described 482 herein, then the BGP speaker MUST conform to the NEXT_HOP rules 483 specified in [RFC4271] when announcing the Link-State address family 484 routes to those peers. 486 All BGP peers that support SPF extensions would locally compute the 487 Loc-RIB next-hops as a result of the SPF process. Consequently, the 488 NEXT_HOP attribute is always ignored on receipt. However, BGP 489 speakers SHOULD set the NEXT_HOP address according to the NEXT_HOP 490 attribute rules specified in [RFC4271]. 492 5.4. IPv4/IPv6 Unicast Address Family Interaction 494 While the BGP-LS SPF address family and the IPv4/IPv6 unicast address 495 families install routes into the same device routing tables, they 496 will operate independently much the same as OSPF and IS-IS would 497 operate today (i.e., "Ships-in-the-Night" mode). There will be no 498 implicit route redistribution between the BGP address families. 499 However, implementation specific redistribution mechanisms SHOULD be 500 made available with the restriction that redistribution of BGP-LS SPF 501 routes into the IPv4 address family applies only to IPv4 routes and 502 redistribution of BGP-LS SPF route into the IPv6 address family 503 applies only to IPv6 routes. 505 Given the fact that SPF algorithms are based on the assumption that 506 all routers in the routing domain calculate the precisely the same 507 SPF tree and install the same set of routers, it is RECOMMENDED that 508 BGP-LS SPF IPv4/IPv6 routes be given priority by default when 509 installed into their respective RIBs. In common implementations the 510 prioritization is governed by route preference or administrative 511 distance with lower being more preferred. 513 5.5. NLRI Advertisement and Convergence 515 A local failure will prevent a link from being used in the SPF 516 calculation due to the IGP bi-directional connectivity requirement. 517 Consequently, local link failures should always be given priority 518 over updates (e.g., withdrawing all routes learned on a session) in 519 order to ensure the highest priority propagation and optimal 520 convergence. 522 Delaying the withdrawal of non-local routes is an area for further 523 study as more IGP-like mechanisms would be required to prevent usage 524 of stale NLRI. 526 5.6. Error Handling 528 When a BGP speaker receives a BGP Update containing a malformed SPF 529 Capability TLV in the Node NLRI BGP-LS Attribute [RFC7752], it MUST 530 ignore the received TLV and the Node NLRI and not pass it to other 531 BGP peers as specified in [RFC7606]. When discarding a Node NLRI 532 with malformed TLV, a BGP speaker SHOULD log an error for further 533 analysis. 535 6. IANA Considerations 537 This document defines an AFI/SAFI for BGP-LS SPF operation and 538 requests IANA to assign the BGP-LS/BGP-LS-SPF (AFI 16388 / SAFI TBD1) 539 as described in [RFC4750]. 541 This document also defines two attribute TLV for BGP LS NLRI. We 542 request IANA to assign TLVs for the SPF capability and the Sequence 543 Number from the "BGP-LS Node Descriptor, Link Descriptor, Prefix 544 Descriptor, and Attribute TLVs" Registry. 546 7. Security Considerations 548 This extension to BGP does not change the underlying security issues 549 inherent in the existing [RFC4724] and [RFC4271]. 551 7.1. Acknowledgements 553 The authors would like to thank Sue Hares, Jorge Rabadan, and Boris 554 Hassanov for the review and comments. 556 7.2. Contributors 558 In addition to the authors listed on the front page, the following 559 co-authors have contributed to the document. 561 Derek Yeung 562 Arrcus, Inc. 563 derek@arrcus.com 565 Gunter Van De Velde 566 Nokia 567 gunter.van_de_velde@nokia.com 569 Abhay Roy 570 Cisco Systems 571 akr@cisco.com 573 Venu Venugopal 574 Cisco Systems 575 venuv@cisco.com 577 8. References 579 8.1. Normative References 581 [I-D.ietf-idr-bgpls-segment-routing-epe] 582 Previdi, S., Filsfils, C., Patel, K., Ray, S., and J. 583 Dong, "BGP-LS extensions for Segment Routing BGP Egress 584 Peer Engineering", draft-ietf-idr-bgpls-segment-routing- 585 epe-14 (work in progress), December 2017. 587 [I-D.ietf-ospf-segment-routing-extensions] 588 Psenak, P., Previdi, S., Filsfils, C., Gredler, H., 589 Shakir, R., Henderickx, W., and J. Tantsura, "OSPF 590 Extensions for Segment Routing", draft-ietf-ospf-segment- 591 routing-extensions-25 (work in progress), April 2018. 593 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 594 Requirement Levels", BCP 14, RFC 2119, 595 DOI 10.17487/RFC2119, March 1997, . 598 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 599 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 600 DOI 10.17487/RFC4271, January 2006, . 603 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 604 Patel, "Revised Error Handling for BGP UPDATE Messages", 605 RFC 7606, DOI 10.17487/RFC7606, August 2015, 606 . 608 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 609 S. Ray, "North-Bound Distribution of Link-State and 610 Traffic Engineering (TE) Information Using BGP", RFC 7752, 611 DOI 10.17487/RFC7752, March 2016, . 614 [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of 615 BGP for Routing in Large-Scale Data Centers", RFC 7938, 616 DOI 10.17487/RFC7938, August 2016, . 619 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 620 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 621 May 2017, . 623 8.2. Information References 625 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 626 DOI 10.17487/RFC2328, April 1998, . 629 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 630 Reflection: An Alternative to Full Mesh Internal BGP 631 (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, 632 . 634 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 635 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 636 DOI 10.17487/RFC4724, January 2007, . 639 [RFC4750] Joyal, D., Ed., Galecki, P., Ed., Giacalone, S., Ed., 640 Coltun, R., and F. Baker, "OSPF Version 2 Management 641 Information Base", RFC 4750, DOI 10.17487/RFC4750, 642 December 2006, . 644 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 645 "Multiprotocol Extensions for BGP-4", RFC 4760, 646 DOI 10.17487/RFC4760, January 2007, . 649 [RFC4790] Newman, C., Duerst, M., and A. Gulbrandsen, "Internet 650 Application Protocol Collation Registry", RFC 4790, 651 DOI 10.17487/RFC4790, March 2007, . 654 [RFC4915] Psenak, P., Mirtorabi, S., Roy, A., Nguyen, L., and P. 655 Pillay-Esnault, "Multi-Topology (MT) Routing in OSPF", 656 RFC 4915, DOI 10.17487/RFC4915, June 2007, 657 . 659 [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for 660 IP Fast Reroute: Loop-Free Alternates", RFC 5286, 661 DOI 10.17487/RFC5286, September 2008, . 664 [RFC5549] Le Faucheur, F. and E. Rosen, "Advertising IPv4 Network 665 Layer Reachability Information with an IPv6 Next Hop", 666 RFC 5549, DOI 10.17487/RFC5549, May 2009, 667 . 669 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 670 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 671 . 673 Authors' Addresses 675 Keyur Patel 676 Arrcus, Inc. 678 Email: keyur@arrcus.com 680 Acee Lindem 681 Cisco Systems 682 301 Midenhall Way 683 Cary, NC 27513 684 USA 686 Email: acee@cisco.com 688 Shawn Zandi 689 Linkedin 690 222 2nd Street 691 San Francisco, CA 94105 692 USA 694 Email: szandi@linkedin.com 695 Wim Henderickx 696 Nokia 697 Antwerp 698 Belgium 700 Email: wim.henderickx@nokia.com