idnits 2.17.00 (12 Aug 2021) /tmp/idnits8485/draft-ietf-spring-segment-protection-sr-te-paths-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 18 instances of too long lines in the document, the longest one being 7 characters in excess of 72. == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (7 March 2022) is 68 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: '1000-2000' is mentioned on line 246, but not defined == Missing Reference: '3000-4000' is mentioned on line 246, but not defined -- Looks like a reference, but probably isn't: '1100' on line 201 -- Looks like a reference, but probably isn't: '1005' on line 201 == Missing Reference: '400000-405000' is mentioned on line 680, but not defined == Outdated reference: draft-ietf-idr-bgpls-segment-routing-epe has been published as RFC 9086 == Outdated reference: A later version (-06) exists of draft-li-rtgwg-enhanced-ti-lfa-05 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Routing area S. Hegde 3 Internet-Draft C. Bowers 4 Intended status: Informational Juniper Networks Inc. 5 Expires: 8 September 2022 S. Litkowski 6 Cisco Systems 7 X. Xu 8 Alibaba Inc. 9 F. Xu 10 Tencent 11 7 March 2022 13 Segment Protection for SR-TE Paths 14 draft-ietf-spring-segment-protection-sr-te-paths-03 16 Abstract 18 Segment routing supports the creation of explicit paths using Adj- 19 Segment-ID (SID), Node-SIDs, and BSIDs. It is important to provide 20 fast reroute (FRR) mechanisms to respond to failures of links and 21 nodes in the Segment-Routed Traffic-Engineered(SR-TE) path. A point 22 of local repair (PLR) can provide FRR protection against the failure 23 of a link in an SR-TE path by examining only the first (top) label in 24 the SR label stack. In order to protect against the failure of a 25 node, a PLR may need to examine the second label in the stack as 26 well, in order to determine SR-TE path beyond the failed node. This 27 document specifies how a PLR can use the first and second label in 28 the SR-MPLS label stack describing an SR-TE path to provide 29 protection against node failures. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on 8 September 2022. 48 Copyright Notice 50 Copyright (c) 2022 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 55 license-info) in effect on the date of publication of this document. 56 Please review these documents carefully, as they describe your rights 57 and restrictions with respect to this document. Code Components 58 extracted from this document must include Revised BSD License text as 59 described in Section 4.e of the Trust Legal Provisions and are 60 provided without warranty as described in the Revised BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Node Failures Along SR-TE Paths . . . . . . . . . . . . . . . 3 66 2.1. Segment protection for explicit paths with Node-SIDs . . 4 67 2.2. Segment Protection for Anycast-SIDs . . . . . . . . . . . 4 68 2.3. Segment protection for explicit paths with Adj-SIDs . . . 5 69 3. Detailed Solution using Context Tables . . . . . . . . . . . 7 70 3.1. Building Context Tables . . . . . . . . . . . . . . . . . 7 71 3.2. Segment protection for Node-SIDs . . . . . . . . . . . . 8 72 3.3. Segment protection for Adj-SIDs . . . . . . . . . . . . . 9 73 3.4. Segment protection for edge nodes . . . . . . . . . . . . 10 74 3.4.1. Detailed Example for Segment protection for edge 75 nodes . . . . . . . . . . . . . . . . . . . . . . . . 11 76 4. Determining node can be bypassed . . . . . . . . . . . . . . 12 77 5. Hold timers for Node-SID/Prefix-SIDs and Adj-SIDs . . . . . . 13 78 5.1. Interaction with micro-loop avoidance . . . . . . . . . . 14 79 6. Optimization Considerations . . . . . . . . . . . . . . . . . 14 80 6.1. Segment Protection Example with Common SRGB . . . . . . . 15 81 7. Alternate path protection mechanisms . . . . . . . . . . . . 17 82 8. Operational Considerations . . . . . . . . . . . . . . . . . 17 83 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 84 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 85 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 86 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 87 12.1. Normative References . . . . . . . . . . . . . . . . . . 18 88 12.2. Informative References . . . . . . . . . . . . . . . . . 18 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 91 1. Introduction 93 It is possible for a routing device to completely go out of service 94 abruptly due to power failure, hardware failure or software crashes. 95 Node protection is an important property of the Fast Reroute 96 mechanism. It provides protection against a node failure by 97 rerouting traffic around the failed node. For example, the 98 mechanisms described in Loop Free Alternates ([RFC5286]), Remote Loop 99 Free Alternates ([RFC8102]), and 100 [I-D.ietf-rtgwg-segment-routing-ti-lfa] can be used to provide node 101 protection to ensure minimal traffic loss after a node failure. 103 Section 2 describes problems with SR-TE paths and the need for a 104 specialized mechanism to provide node protection for SR-TE paths. 105 Section 3 describes the solution applied to paths built using Adj- 106 SIDs and Node-SIDs. In order to distinguish the node failures of the 107 segment endpoints (mid points) in an SR-TE path from the usual node 108 protection mechanisms described in various LFA mechansims, this 109 document uses the term Segment Protection. 111 2. Node Failures Along SR-TE Paths 113 The topology shown in Figure 1. illustrates a example network 114 topology with Segment Routing enabled on each node. 116 Node Node Node Node Node 117 SID:1 SID:2 SID:3 SID:4 SID:5 118 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 119 | R1 |--------| R2 |--------| R3 |--------| R4 |--------| R5 | 120 +----+ +----+ +----+ +----+ +----+ 121 \ \ / 122 \ 10 \ 100 / 60 123 \ \ / 124 \ +----+ +----+ 125 +--| R7 |------------------| R8 | 126 +----+ 30 +----+ 127 / Node Node Label stack: 128 / SID:7 SID:8 +------------+ 129 +----+ SRGB: | 1008 (top)| 130 | R6 | 3000-4000 +------------+ 131 +----+ | 3005 | 132 Node +------------+ 133 SID:6 135 * Numbers on the links represent the symmetric link cost 137 Figure 1: Example topology. The segment index for each node is 138 shown in the diagram. All nodes have SRGB = [1000-2000], except 139 for R8 which has SRGB = [3000-4000]. A label stack that 140 represents the path R1->R7->R8->R4->R5 is shown as well. 142 2.1. Segment protection for explicit paths with Node-SIDs 144 Consider an explicit path in the topology in Figure 1 from R1->R5 via 145 R1->R7->R8->R4->R5. This path can be built using the shortest paths 146 from R1-to-R8 and R8-to-R5. The label stack to instantiate this path 147 contains two Node-SIDs 1008 and 3005. The 1008 label will take the 148 packet from R1 to R8 via R7 and get popped. The next label in the 149 stack 3005 will take the packet from R8 to the destination R5 via R4. 150 If the node R8 goes down, it is not possible for R7 to perform FRR 151 without examining the second label in the incoming label stack 152 (3005). 154 Note that in the absence of a failure, R7 does not need to understand 155 the meaning of the second label (3005) in order to perform normal 156 forwarding. However, in order to support segment protection, R7 will 157 need to understand the meaning of label 3005 in order to determine 158 where the packet is headed after R8. 160 The mechanisms used to detect whether a node failed or a link failed, 161 is outside the scope of this document. The possible options for node 162 failure detection capabilities of a device and resultant forwarding 163 state is described in section 5.2 in [RFC8679] are applicable to this 164 draft as well. 166 2.2. Segment Protection for Anycast-SIDs 168 A prefix segment advertised as a Node-SID may only be advertised by 169 one node in the network. Instead, an anycast prefix segment may be 170 advertised by more than one node. In some situations, one can use 171 Anycast-SIDs to construct SR-TE paths that are protected against node 172 failure, without the need for the mechanism described in this 173 document. 175 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 176 | R1 |--------| R2 |--------| R3 |--------| R4 |--------| R5 | 177 +----+ +----+ +----+ +----+ +----+ 178 \ \ / | 179 \ 10 \100 60/ | 180 \ \ / | 181 \ +----+ 30 +----+ | 182 +--| R7 |------------------| R8 | | 183 +----+ +----+ | 184 / \ Anycast + 185 / \ SID:100 / 186 +----+ \ / 187 | R6 | \ 40 +----+ /60 188 +----+ +---------------| R9 |+ Label stack: 189 +----+ +------------+ 190 Anycast | 1100 (top)| 191 SID:100 +------------+ 192 | 1005 | 193 +------------+ 194 * Numbers on the links represent the symmetric link cost 196 Figure 2: Topology illustrating use of Anycast-SIDs to protect 197 against node failures. All nodes have SRGB = [1000-2000]. 199 An example of this is shown in Figure 2. In this example, R8 and R9 200 advertise an Anycast-SID of 100. The label stack in this example = 201 [1100, 1005];. The top label (1100) corresponds to the Anycast-SID 202 advertised by both R8 and R9. In the absence of a failure, the 203 packet sent by R1 with this label stack will follow the path from 204 R1->R5 along R1->R7->R8->R4->R5. 206 If R7 is performing a per-prefix LFA calculation [RFC5286], then R7 207 will install a backup next-hop to R9 for this Anycast-SID, protecting 208 against the failure of the primary next-hop to R8. This backup path 209 does not pass through R8, so it is would not be affected by a 210 complete failure of node R8. As illustrated by this example, for 211 some topologies segment-protecting SR-TE paths can be constructed 212 through the use of Anycast-SIDs, as opposed to the mechanism 213 described in this document. 215 2.3. Segment protection for explicit paths with Adj-SIDs 216 Adj-SID: 217 R3-R8:9044 219 Node- Node Node Node Node 220 SID:1 SID:2 SID:3 SID:4 SID:5 221 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ 222 | R1 |--------| R2 |--------| R3 |--------| R4 |--------| R5 | 223 +----+ +----+ +----+ +----+ +----+ 224 \ \ / | 225 \ 10 \ 100 / 60 | 10 226 \ \ / | 227 \ +----+ +----+ +----+ 228 +--| R7 |------------------| R8 |---------------| R9 | 229 +----+ 30 +----+ 10 +----+ 230 / Node Node Node 231 / SID:7 SID:8 SID:9 232 +----+ SRGB: 233 | R6 | 3000-4000 Label stack: 234 +----+ +------------+ 235 Node Adj-SIDs: | 1003 (top)| 236 SID:6 R8-R4:9054 +------------+ 237 | 9044 | 238 +------------+ 239 | 9054 | 240 +------------+ 241 | 1005 | 242 +------------+ 243 * Numbers on the links represent the symmetric link cost 245 Figure 3: Explicit path using an Adj-SID. All nodes have SRGB = 246 [1000-2000], except for R8 which has SRGB = [3000-4000]. 248 Consider an explicit path from R1->R5 via R1->R2->R3->R8->R4->R5. 249 This path can be built using a combination of Node-SIDs and Adj-SIDs, 250 as shown in Figure 3. The diagram shows the label stack needed to 251 instantiate this path, as well as several Adj-SIDs advertised by 252 nodes involved in this path. When a packet leaving R1 with this 253 label stack reaches R3, the top label is 9044, which will take the 254 packet to R8. The next-next-hop in the path is R4. To provide 255 protection for the failure of node R8, R3 would need to send the the 256 packet to R4 without going through R8. However, the only way R3 can 257 learn that the packet needs to go to the R4 is to examine the next 258 label in the stack, label 9054. Since R3 knows that R8 has 259 advertised label 9054 as the adjacency segment for the link from R8 260 to R4, R3 knows that a backup path can merge back into the original 261 explicit path at R4. 263 3. Detailed Solution using Context Tables 265 This section provides a detailed description of how to construct 266 node-protecting backup paths for SR-TE paths using context tables. 267 The end result of this description is externally visible forwarding 268 behavior that can be specified as a packet arriving at a PLR with a 269 particular incoming label stack and leaving the PLR on a particular 270 outgoing interface with a particular outgoing label stack. There may 271 be other methods of arriving at the same externally visible 272 forwarding behavior as described in draft 273 [I-D.ietf-rtgwg-segment-routing-ti-lfa]section 6.2. It is not the 274 intent of this document to exclude other methods, as long as the 275 externally visible forwarding behavior is the same as produced by 276 this method. 278 3.1. Building Context Tables 280 [RFC5331] introduced the concept of Context Specific Label Spaces and 281 there are various applications making use of this concept.A context 282 label table on a router represents the Label Forwarding Information 283 Base (LFIB) from the point of view of a particular neighbor . Context 284 tables are built by constructing incoming label mappings advertised 285 by the neighbor and the actions corresponding to those labels. The 286 labels advertised by each node are local to the node and may not be 287 unique across the segment routing domain. The context tables are 288 separate tables built on a per-neighbor basis on every node to ensure 289 they represent LFIBs of a particular neighbor. 291 When a PLR needs to protect an SR-TE path against the failure of a 292 neighbor N, it creates a context table associated with N. This 293 context table is populated with the following segment routing 294 forwarding entries: 296 - All the Prefix-SIDs of the network. The programmed incoming 297 label map uses the SRGB of N to compute the input label value. 298 The NHLFE (Next Hop Label Forwarding Entry) is then constructed by 299 looking into all the nexthops for the Prefix-SID and choosing a 300 loop-free path as explained in Section 3.2 302 - All the Adj-SIDs advertised by N. The NHLFE is constructed as 303 explained in Section 3.3 305 The following section illustrates how the context table is 306 constructed to allow the PLR to provide node-protecting paths for the 307 next-next hops in the topology shown in Figure 1 and Figure 3. 309 3.2. Segment protection for Node-SIDs 311 Figure 4 shows the routing table entries on R7 corresponding to the 312 Node-SIDs to reach R1 and R8 for the topology in Figure 1. In the 313 absence of a failure, a packet with a label stack whose top label is 314 1008 will have its top label popped by R7 (assuming PHP behavior), 315 and R7 will forward the packet to R8. When the interface to R8 is 316 down, the backup next-hop entry is used. R7 will pop the top label 317 of 1008, and use the context table that R7 computed for R8 to 318 evaluate the next label on the stack. 320 R7's Routing Table (partial) 321 Transits routes for Node-SIDs for R1 and R8 322 +=============+=============================================+ 323 | In label | Outgoing label action | 324 +=============+=============================================+ 325 | 1001 | Primary: pop, fwd to R1 | 326 | | Backup: pop, lookup context.r1 | 327 +-------------+---------------------------------------------+ 328 | 1008 | Primary: pop, fwd to R8 | 329 | | Backup: pop, lookup context.r8 | 330 +-------------+---------------------------------------------+ 332 R7's Context Table for R8 (context.r8, partial) 333 +=============+=============================================+ 334 | In label | Outgoing label action | 335 +=============+=============================================+ 336 | 3004 | swap 1004, fwd to R1 | 337 +-------------+---------------------------------------------+ 338 | 3005 | swap 1005, fwd to R1 | 339 +-------------+---------------------------------------------+ 340 | 3008 | drop | 341 +-------------+---------------------------------------------+ 343 Figure 4: Building node-protecting backup paths for SR-TE paths 344 involving Node- SIDs 346 R7 builds context table for R8 using the following process. R7 347 computes the mapping of incoming label to Node-SID that R8 expects to 348 see based on the SRGB advertised by R8. In the example in Figure 1, 349 R7 can determine that R8 interprets in incoming label of 3005 as 350 mapping to the the Node-SID for R5. 352 R7 then computes a loop-free backup path to reach R5 which is node- 353 protecting with respect to the failure of R8. In this example, the 354 backup path computed by R7 to reach R5 without passing through R8 can 355 be achieved forwarding the packet to R1 with a top label of 1005, 356 corresponding to the Node-SID for R5 in the context of R1's SRGB. 357 The loop-free path computation may be based on a mechanism such as 358 LFA, R-LFA, TI-LFA, or constraint based SPF avoiding failure. To 359 populate the context table for R8, R7 maps the out label actions 360 corresponding to the backup path to R5 to the incoming label 3005. 361 This results in the entry for label 3005 shown in context.r8 in 362 Figure 4. 364 Therefore, when a packet arrives at R7 with label stack = [1008, 365 3005], and the link from R7 to R8 has recently failed, R7 will use 366 backup next-hop entry for label 1008 in its main routing table. 367 Based on this entry, R7 will pop label 1008, and use context.r8 to 368 lookup the new top label = 3005. R7 will swap label 3005 for 1005 369 and forward the packet to R1. This will get the packet to R5 on a 370 node protecting backup path. 372 Note that R7 activates the node-protecting backup path when it 373 detects that the link to R8 has failed. R7 does not know that node 374 R8 has actually failed. However, the node-protecting backup path is 375 computed assuming that the failure of the link to R8 implies that R8 376 has failed. 378 3.3. Segment protection for Adj-SIDs 380 This section gives an example of how to constuct node-protecting 381 backup paths when the SR-TE path uses Adj-SIDs. Figure 5 shows some 382 of the routing table entries for R3 corresponding to the sample 383 network shown in Figure 3. When the top label of the label stack is 384 an Adj-SID, the PLR needs to recognize that in order to provide a 385 node-protecting backup path, it needs to pop the top label and 386 examine the next label in the context of the next-hop router 387 identified by the top label Adj-SID. In this example, when R3 is 388 constructing its routing table, it recognizes that label 9044 389 corresponds to a next-hop of R8, so it installs a backup entry, 390 corresponding to the failure of the link to R8, when pops label 9044, 391 and then examines the new top label in the context of R8. 393 R3's Routing Table (partial) 394 Transit route for Adj-SID 395 +=============+=============================================+ 396 | In label | Outgoing label action | 397 +=============+=============================================+ 398 | 9044 | Primary: pop, fwd to R8 | 399 | | Backup: pop, lookup context.r8 | 400 +-------------+---------------------------------------------+ 402 R3's Context Table for R8 (context.r8, partial) 403 +=============+=============================================+ 404 | In label | Outgoing label action | 405 +=============+=============================================+ 406 | 3005 | swap 1005, fwd to R4 | 407 +-------------+---------------------------------------------+ 408 | 9054 | pop, fwd to R4 | 409 +-------------+---------------------------------------------+ 411 Figure 5: Building node-protecting backup paths for SR-TE paths 412 involving Adj- SIDs 414 R3 constructs its context table for R8 by determining which labels R8 415 expects to receive to accomplish different forwarding actions. The 416 entry for incoming label 3005 in context.r8 in Figure 5 corresponds 417 to a Node-SID This entry is computed using the methods described in 418 Section 3.2 420 The entry for incoming label 9054 in context.r8 corresponds to an 421 Adj-SID. R3 recognizes that R8 has advertised this Adj-SID for the 422 link from R8 to R4 in Figure 3. So R3 determines the outgoing label 423 action needed to reach R4 without passing through R8. This can be 424 accomplished by popping the label 9054, and forwarding the packet 425 directly on the link from R3 to R4. 427 3.4. Segment protection for edge nodes 429 The segment protection mechanism described in the previous sections 430 depends on the assumption that the label immediately below the top 431 label in the label stack is understood in the IGP domain.When the 432 provider edge routers exchange service labels via BGP or some other 433 non-IGP mechanism the bottom label is not understood in the IGP 434 domain. 436 The EPE-SIDs as described in [I-D.ietf-idr-bgpls-segment-routing-epe] 437 are used to choose egress interface among a set of egress paths. 438 EPE-SID can be a bottom-most label in a SR-TE path. EPE-SIDs are not 439 understood in the IGP domain. In order to support the procedures 440 described in this document, EPE-SIDs should always be added after 441 Anycast-SID for the nodes that advertised the EPE-SIDs. Same EPE-SID 442 should be configured on all these Anycast nodes so that in case of 443 node failure, the traffic is correctly forwarded by the other 444 protector nodes. If a Node-SID is used instead of an Anycast SID, 445 above the EPE-SID in the label stack, if procedures in this document 446 are in use, it may cause packets to be dropped. 448 The egress node protection mechanisms described in the draft 449 [RFC8679] is applicable to this usecase and no additional changes 450 will be required for SR based networks 452 3.4.1. Detailed Example for Segment protection for edge nodes 454 sid:1 sid:2 sid:3 sid:4 sid:5 455 1000-2000 1000-2000 1000-2000 1000-2000 1000-2000 456 R2:1024 R3:1034 R8:1044 R5:1064 457 R4:2014 ========================= 458 +----+ 10 +----+ 10 +----+ 10 +----+ 10 +----+ Primary 459 | PE1|----| R2 |----| R3 |-------| R4 |-- | PE2| context 1.1.1.1: sid 10 460 +----+ +----+ +----+ +----+ +----+\ 461 \ \ / \+-----+ 462 \ 10 \ 100 / 60 /| CE1 | 463 \ \ / / +-----+ 464 \ +----+ +----+ R4:1054 +-----+ 465 +--| R7 |---------| R8 | --------| PE3 |context 1.1.1.1 sid 10 466 +----+ 30 +----+ +-----+ Protector mirror SID 100 467 / sid:7 sid:8 sid:9 468 / 1000-2000 3000-4000 1000-2000 469 / 10 470 +----+ 471 | R6 | 472 +----+ 473 sid:6 474 1000-2000 476 R4's Context Table for PE2 (context.PE2, partial) 477 +=============+=============================================+ 478 | In label | Outgoing label action | 479 +=============+=============================================+ 480 | 1010 | swap 1100(mirror sid), push 1010 fwd to R8 | 481 +-------------+---------------------------------------------+ 483 * Numbers on the links represent the symmetric link cost 485 Figure 6: Node protection for edge nodes Adj-SIDs 487 The segment protection mechanisms that are described in previous 488 sections depend on the assumption that the label below the top label 489 in the label stack are understood in the IGP domain. If the edge 490 node goes down, the label below the top label representing the edge 491 node could be BGP service label or labels representing other 492 applications. Service mirroring use case is described in [RFC8402] 493 section 5.1. The Customer edges are multi-homed to provider edges 494 and one of the PE's acts in primary role and the other in protector 495 role. The two PEs advertise a context ip address for each customer 496 site and attaches a Anycast-SID to the context. The protector PE 497 advertises a binding sid with M bit set (Mirror-SID)which implies 498 mirroring capability for the context. Protector PE builds the 499 context table for the BGP service labels advertised by the primary PE 500 for the same context. The BGP service resolves on a transport that 501 has stack of labels with context-sid at the bottom of the label 502 stack. Any penultimate node of PE2 builds a context table for PE2 as 503 explained in the section Section 3.1. This context table contains 504 the sid for the context-id and output action is to pop the top label 505 and replace with the Mirror-SID that the protector PE advertised for 506 the context 1.1.1.1. As shown in the example Section 3.4.1 the SID 507 10 attached to context-id 1.1.1.1 has been programmed in the 508 context.PE2 on the penultimate router R4. The action is to swap 1010 509 with Mirror-SID 1100 and push 1010 which is PE2's context SID. When 510 packet reaches PE2, it has top label of 1100 which is a Mirror- 511 SID(context label)on PE2 and directs the protector PE to lookup the 512 context table of Primary PE for the BGP service labels. 514 4. Determining node can be bypassed 516 In certain scenarios, the node in the label stack may represent an 517 important function such as firewall filter which must be performed. 518 Bypassing such a functionality may cause major security issues. When 519 segment protection mechanisms described in this document are applied, 520 it's possible that if the firewall goes down, traffic is re-routed 521 via the next label in the stack. There are multiple ways this 522 problem could be solved. 524 The procedures described in this document should be optional and 525 should be enabled when devices are configured to apply the procedures 526 and examine next label in the stack. The feature should be 527 controllable on a per neighbor granularity. When certain devices 528 offer a critical function, the neighbors of the devices may disable 529 the segment protection for this particular neighbor providing 530 critical functions. 532 IGP protocol extensions are proposed in 533 [I-D.li-rtgwg-enhanced-ti-lfa] which define a "no bypass" flag for 534 the SIDs. The nodes that indicate critical functions may advertise 535 SIDs with "NB" bit set. Segment protection procedures described in 536 this document should not be applied on these SIDs and in case of 537 failure either link protecting backup paths can be programmed or 538 packet can be dropped with no protection. 540 5. Hold timers for Node-SID/Prefix-SIDs and Adj-SIDs 542 SR-TE paths may be computed by a controller or by the head-end 543 router. When there is a node failure in the network, the controller 544 or head-end router has to learn about the failure, recompute the 545 label stacks of any affected SR-TE paths, and get the new label 546 stacks programmed into the forwarding plane of the head-end router. 547 This process may be slow compared to the speed with which routers in 548 the network react to the event. After learning about a node failure, 549 the non-PLR routers in the network will no longer be able to compute 550 a path to reach the failed node. If no special precautions are 551 taken, these non-PLR routers will remove the forwarding entries 552 corresponding the Node-SID and Prefix-SIDs advertised by the failed 553 node. If the head-end router is still sending traffic with that 554 Node-SID/Prefix-SID in the stack, traffic can be blackholed at a non- 555 PLR router. In this case, the node-protection FRR mechanisms do not 556 bring full benefit. 558 In order to solve the above problem, hold timers are recommended. 559 The hold-timer corresponds to the maximum time that a combination of 560 controller and head-end router or a head-end router alone takes to 561 compute and install label stacks corresponding to a new SR-TE paths 562 in the event of a node failure. The hold times should be applied to 563 forwarding entries for Node-SIDs and Prefix-SIDs that are advertised 564 by a single node in the network. If the Node-SID or Prefix-SID 565 becomes unreachable, the event and resulting forwarding changes 566 should not be communicated to the forwarding planes on all configured 567 routers (including PLRs for the failed node) until the hold-timer 568 expires. The traffic will continue to follow the previous path and 569 get FRR protection on the PLR. 571 A route corresponding to a global Adj-SID advertised by a node that 572 becomes unreachable should also be left in the forwarding table for 573 the duration of the hold-timer. 575 The node-protecting backup forwarding entry on the PLR corresponding 576 to the local Adj-SID from the PLR to the failed node should also be 577 left in the forwarding table for the duration of the hold-timer. 579 The Node-SID/Prefix-SID becoming unreachable is not a single event in 580 IGP. This unreachability is recognized by combining multiple link- 581 down events from the neighbouring nodes. If these link-down events 582 arrive at different time, the remote nodes converge to a state 583 corresponding to the link-down events. When the Node-SID 584 unreachability is finally recognized in a remote node, the previous 585 state may be that of a link-down event. This could lead to hold-down 586 states that are undesirable. 588 When a network event such as link-up/link-down or metric change event 589 is received, IGP schedules SPF computation. A small configurable 590 delay called spf-delay can be enabled, which will schedule the SPF 591 after the spf-delay time on receiving the first event. In case of a 592 node going down, the spf-delay time coupled with fast-flooding can 593 help to accumulate link-down events reported by all neighbors in one 594 single SPF. This mechansim is on best effort basis and does not 595 guarantee that all link-down events are accumulated before SPF is 596 triggered. If there are flooding delays, the SPF might get triggered 597 before receiving all events related to node going down. 599 The protection mechanisms are expected to work well when there is 600 single network event. If there are simultaneous network events, the 601 protection mechansims do not guarantee that the traffic will not be 602 impacted. When a node is running hold-down timer and is holding 603 Node-SID and other routes in forwarding plane, if there is another 604 link-down/link-up event or metric change event is received, the hold- 605 down should be aborted and the global convergence procedures should 606 be excecuted. 608 5.1. Interaction with micro-loop avoidance 610 During network convergence, the micro-loop avoidance mechansims as 611 described in [I-D.bashandy-rtgwg-segment-routing-uloop] may be 612 applied.For the failed node, all the nodes in the network should 613 consistently detect the failure and maintain the pre-failure shortest 614 path in the forwarding plane so that the traffic can follow pre- 615 failure shortest path and take the node-protecting backup path at the 616 PLR of the failed node. 618 6. Optimization Considerations 620 The solution described in this document requires that a PLR build a 621 context table for each neighbor for which node-protection is desired. 622 The context table for each protected neighbor needs to contain route 623 entries for all of the Prefix-SIDs in the network, as well as the 624 route entries corresponding to the Adj-SIDs advertised by the 625 protected neighbor. Although the scale of IGP domain is limited, 626 this may result in considerable additional memory consumption on the 627 routers. It is possible to take advantage of an optimization that 628 allows the PLR to avoid creating context-tables when all of the nodes 629 in the network advertise the same Segment Routing Global Block (SRGB) 630 and all Adj-SIDs in the network are advertised as global Adj-SIDs. 632 In this case, all labels in the stack representing an SR-TE path are 633 globally unique.Protection for node failure cases in such a 634 deployment can be achieved by doing a lookup of the first label and 635 potentially a second lookup of the second label using a common route 636 table with primary and backup entries for all Prefix-SIDs as well as 637 for all of the global Adj-SIDs. 639 The primary route entries for global Adj-SIDs not advertised by the 640 PLR will be the shortest path to the node advertising the global Adj- 641 SID. The backup route entries for these global Adj-SIDs will 642 generally correspond to the node-protecting backup path to the node 643 advertising the global Adj-SID. However, for a global Adj-SID 644 advertised by the direct neighbor of the PLR the node-protecting 645 backup route entry will correspond to the backup path to the node on 646 the far end of the Adj-SID. 648 With the common route table constructed in this manner, when the PLR 649 receives a packet whose first label is a global Adj-SID advertised by 650 the failed neighbor of the PLR, the lookup of the first label will 651 produce the correct backup path directly. When the PLR receives a 652 packet whose first label is the Node-SID of the failed neighbor,or an 653 Adj-SID advertised by the PLR corresponding to the failed neighbor, 654 the route entry will instruct the PLR to lookup the second label 655 using the common route table. Finally, when the PLR receives a 656 packet whose first label is a global Adj-SID or a Node-SID advertised 657 by a node which is neither the PLR nor the failed neighbor, then the 658 usual link-protecting backup path will be produced based on a lookup 659 of the first label only. 661 6.1. Segment Protection Example with Common SRGB 663 Node Node Node Node Node 664 sid:1000 sid:1001 sid:1002 sid:1003 sid:1004 665 +----+2001 1 2100+----+2102 1 2201+----+2203 1 2302+----+2304 1 2403+----+ 666 | R0 |-----------| R1 |-------------| R2 |-------------| R3 |------------| R4 | 667 +----+ +----+ +----+ +----+ +----+ 668 \ 2005 \ 2206 / 2306 2407 | 669 \ \ / | 670 \ 1 \ 10 / 6 1 | 671 \ \ / | 672 \ 2602 \ / 2603 2704 | 673 \ 2500+----+ 2506 2605+----+2607 2706+----+ 674 +----| R5 |------------------------| R6 |----------------------| R7 | 675 +----+ 3 +----+ 1 +----+ 676 Node Node Node 677 sid:1005 sid:1006 sid:1007 679 * Numbers on the links represent the symmetric link cost 680 * All nodes have SRGB = [400000-405000] size 5000 682 R2's Routing Table (partial) 684 +=============+=============================================+ 685 | In label | Outgoing label action | 686 +=============+=============================================+ 687 | 4001003 | Primary: pop, fwd to R3 | 688 | | Backup: pop, lookup ilm table or ip table | 689 | | based on BOS bit | 690 +-------------+---------------------------------------------+ 691 | 4001007 | Primary: swap 401007, fwd to R6 | 692 | | Backup: Swap 401007, Push 401005(top),fwd R1| 693 +-------------+---------------------------------------------+ 694 | 4002203 | Primary: pop, fwd to R3 | 695 | | Backup: pop, lookup ilm table or ip table | 696 | | based on BOS bit | 697 +-------------+---------------------------------------------+ 699 Label Stack 1: 700 +-------------+ 701 |4001003 (top)| 702 +-------------+ 703 | 4001007 | 704 +-------------+ Label Stack 2: 705 +-------------+ 706 |4001003 (top)| 707 +-------------+ 708 | 4001007 | 709 +-------------+ 711 Figure 7: Common SRGB 713 The diagram Figure 7 shows an example where optimized Segment 714 Protection mechanism is deployed. All the nodes have a common SRGB 715 of 400000 to 4005000. The Node-SIDs are in the range 1001, 1002 etc 716 and the global Adj-SIDs are in the range 2001, 2005 and so on. R2's 717 partial ILM table consisting of primary and backup nexthops is also 718 shown in the diagram. Node-SID of R3 which is represented by label 719 4001003 has a primary nexthop pointing to R3 and backup nexthop which 720 pops the label and looks up ILM table with next label in the packet. 721 For Example consider a path from R0 to R7 with a label stack 722 consisting of 4001003 and 4001007. When the node R3 fails, R2 which 723 is the PLR, will pop the label 4001003 and lookup for next label in 724 the same table. Next label in this example is 4001007. Based on the 725 primary nexthop for 4001007, traffic is forwarded to R6. Another 726 example label stack consists of global Adj-SID of 4002203 (Adj-SID 727 from R2->R3). As shown in the partial ILM table on R2, 4002203 also 728 has a backup nexthop which pops the label and looks-up next label in 729 the packet.On R3's failure, traffic will get forwarded via R6. 731 7. Alternate path protection mechanisms 733 The current document describes protection mechanisms when nodes that 734 are mid-points in an SR-TE path fail. The solution described here 735 focuses on triggering protection locally on the Point of local 736 repair. There are other path protection mechanisms which provide 737 end-to-end path protection. In end-to-end path protection mechanism, 738 path liveness is monitored using liveness detection protocols such as 739 S-BFD[RFC7880]. A backup path is pre-programmed on the head end of 740 the SR-TE path. When the S-BFD running on a particular SR-TE path 741 detects path failure, the head end of SR-TE path switches the traffic 742 from primary path to backup path. The granularity of failure 743 detection timers configured on the headend depend on the scale of SR- 744 TE tunnels on the device and also capabilty of the device to support 745 fast switchover. 747 8. Operational Considerations 749 The procedures described in this document should be configurable and 750 applied only when enabled explicitly. In order to satisfy scenarios 751 described in Section 4, the feature should be controllable on the per 752 neighbor basis. The optimisation procedures described in Section 6, 753 should be applied only when the entire network has a common SRGB and 754 all nodes advertise global Adj-SIDs. This optimization should be 755 applied based on explicit configuration. 757 9. Security Considerations 759 The procedures described in this document will in most common cases 760 be deployed inside a single ownership IGP domain. No new security 761 risks are exposed due to the procedures described in this document. 762 The security considerations for SR-MPLS with label stacking is 763 described in detail in [RFC8402] are applicable for this document as 764 well. This document introduces the context table lookup for the 765 labels in the label stack. As described in [RFC8402] MPLS packet 766 filtering at the boundaries ensures the operations on the MPLS labels 767 inside the domain is safe includingcontext table lookup operation. 768 The security procedures applicable to IGP protocols are also 769 applicable to segment routing extensions as described in [RFC8667] 770 and [RFC8665] and ensure required protection for the segment 771 protection procedures described in this document. 773 10. IANA Considerations 775 11. Acknowledgments 777 The authors would like to thank Peter Psenak, Bruno Decraene, 778 Alexander Vainshtein and Huzibo, Dhruv Dhody Ketan Talaulikar for 779 their review and suggestions. Thanks to Bharath R for suggesting 780 Node-SID hold down mechanisms. 782 12. References 784 12.1. Normative References 786 [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for 787 IP Fast Reroute: Loop-Free Alternates", RFC 5286, 788 DOI 10.17487/RFC5286, September 2008, 789 . 791 [RFC5331] Aggarwal, R., Rekhter, Y., and E. Rosen, "MPLS Upstream 792 Label Assignment and Context-Specific Label Space", 793 RFC 5331, DOI 10.17487/RFC5331, August 2008, 794 . 796 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 797 Decraene, B., Litkowski, S., and R. Shakir, "Segment 798 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 799 July 2018, . 801 12.2. Informative References 803 [I-D.bashandy-rtgwg-segment-routing-uloop] 804 Bashandy, A., Filsfils, C., Litkowski, S., Decraene, B., 805 Francois, P., and P. Psenak, "Loop avoidance using Segment 806 Routing", Work in Progress, Internet-Draft, draft- 807 bashandy-rtgwg-segment-routing-uloop-12, 22 December 2021, 808 . 811 [I-D.ietf-idr-bgpls-segment-routing-epe] 812 Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray, 813 S., and J. Dong, "Border Gateway Protocol - Link State 814 (BGP-LS) Extensions for Segment Routing BGP Egress Peer 815 Engineering", Work in Progress, Internet-Draft, draft- 816 ietf-idr-bgpls-segment-routing-epe-19, 16 May 2019, 817 . 820 [I-D.ietf-rtgwg-segment-routing-ti-lfa] 821 Litkowski, S., Bashandy, A., Filsfils, C., Francois, P., 822 Decraene, B., and D. Voyer, "Topology Independent Fast 823 Reroute using Segment Routing", Work in Progress, 824 Internet-Draft, draft-ietf-rtgwg-segment-routing-ti-lfa- 825 08, 21 January 2022, . 828 [I-D.li-rtgwg-enhanced-ti-lfa] 829 Li, C., Hu, Z., Zhu, Y., and S. Hegde, "Enhanced Topology 830 Independent Loop-free Alternate Fast Re-route", Work in 831 Progress, Internet-Draft, draft-li-rtgwg-enhanced-ti-lfa- 832 05, 21 October 2021, . 835 [RFC7880] Pignataro, C., Ward, D., Akiya, N., Bhatia, M., and S. 836 Pallagatti, "Seamless Bidirectional Forwarding Detection 837 (S-BFD)", RFC 7880, DOI 10.17487/RFC7880, July 2016, 838 . 840 [RFC8102] Sarkar, P., Ed., Hegde, S., Bowers, C., Gredler, H., and 841 S. Litkowski, "Remote-LFA Node Protection and 842 Manageability", RFC 8102, DOI 10.17487/RFC8102, March 843 2017, . 845 [RFC8665] Psenak, P., Ed., Previdi, S., Ed., Filsfils, C., Gredler, 846 H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF 847 Extensions for Segment Routing", RFC 8665, 848 DOI 10.17487/RFC8665, December 2019, 849 . 851 [RFC8667] Previdi, S., Ed., Ginsberg, L., Ed., Filsfils, C., 852 Bashandy, A., Gredler, H., and B. Decraene, "IS-IS 853 Extensions for Segment Routing", RFC 8667, 854 DOI 10.17487/RFC8667, December 2019, 855 . 857 [RFC8679] Shen, Y., Jeganathan, M., Decraene, B., Gredler, H., 858 Michel, C., and H. Chen, "MPLS Egress Protection 859 Framework", RFC 8679, DOI 10.17487/RFC8679, December 2019, 860 . 862 Authors' Addresses 863 Shraddha Hegde 864 Juniper Networks Inc. 865 Exora Business Park 866 Bangalore 560103 867 KA 868 India 869 Email: shraddha@juniper.net 871 Chris Bowers 872 Juniper Networks Inc. 873 Email: cbowers@juniper.net 875 Stephane Litkowski 876 Cisco Systems 877 Email: slitkows.ietf@gmail.com 879 Xiaohu Xu 880 Alibaba Inc. 881 Beijing 882 China 883 Email: xiaohu.xxh@alibaba-inc.com 885 Feng Xu 886 Tencent 887 China 888 Email: oliverxu@tencent.com