idnits 2.17.00 (12 Aug 2021) /tmp/idnits22580/draft-ietf-spring-resiliency-use-cases-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (December 19, 2017) is 1607 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2119' is defined on line 464, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Filsfils, Ed. 3 Internet-Draft S. Previdi, Ed. 4 Intended status: Informational Cisco Systems, Inc. 5 Expires: June 22, 2018 B. Decraene 6 Orange 7 R. Shakir 8 Google 9 December 19, 2017 11 Resiliency use cases in SPRING networks 12 draft-ietf-spring-resiliency-use-cases-12 14 Abstract 16 This document identifies and describes the requirements for a set of 17 use cases related to network resiliency on Segment Routing (SPRING) 18 networks. 20 Requirements Language 22 The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT" and "MAY" in 23 this document are used to define requirements for protocol and 24 architecture design. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on June 22, 2018. 43 Copyright Notice 45 Copyright (c) 2017 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 61 2. Path Protection . . . . . . . . . . . . . . . . . . . . . . . 4 62 3. Management-free Local Protection . . . . . . . . . . . . . . 5 63 3.1. Management-free Bypass Protection . . . . . . . . . . . . 6 64 3.2. Management-free Shortest Path Based Protection . . . . . 6 65 4. Managed Local Protection . . . . . . . . . . . . . . . . . . 7 66 4.1. Managed Bypass Protection . . . . . . . . . . . . . . . . 7 67 4.2. Managed Shortest Path Protection . . . . . . . . . . . . 8 68 5. Loop Avoidance . . . . . . . . . . . . . . . . . . . . . . . 8 69 6. Co-existence of multiple resilience techniques in the same 70 infrastructure . . . . . . . . . . . . . . . . . . . . . . . 9 71 7. Security Considerations . . . . . . . . . . . . . . . . . . . 10 72 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 73 9. Manageability Considerations . . . . . . . . . . . . . . . . 10 74 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 10 75 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 76 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 77 12.1. Normative References . . . . . . . . . . . . . . . . . . 10 78 12.2. Informative References . . . . . . . . . . . . . . . . . 11 79 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 81 1. Introduction 83 This document reviews various use cases for the protection of 84 services in a SPRING network. The terminology used hereafter is in 85 line with [RFC5286] and [RFC5714]. 87 The resiliency use cases described in this document can be applied 88 not only to traffic that is forwarded according to the SPRING 89 architecture but also to traffic that originally is forwarded using 90 other paradigms such as LDP signalling or pure IP traffic (IP routed 91 traffic). 93 Three key alternatives are described: path protection, local 94 protection without operator management and local protection with 95 operator management. 97 Path protection lets the ingress node be in charge of the failure 98 recovery, as discussed in Section 2. 100 The rest of the document focuses on approaches where protection is 101 performed by the node adjacent to the failed component, commonly 102 referred to as local protection techniques or Fast Reroute techniques 103 ([RFC5286], [RFC5714]). 105 In Section 3 we discuss two different approaches providing unmanaged 106 local protection, namely link/node bypass protection and shortest 107 path based protection. 109 Section 4 illustrates a case allowing the operator to manage the 110 local protection behavior in order to accommodate specific policies. 112 In Section 5 we discuss the opportunity for the SPRING architecture 113 to provide loop-avoidance mechanisms, such that transient forwarding 114 state inconsistencies during routing convergence do not lead into 115 traffic loss. 117 The purpose of this document is to illustrate the different use cases 118 and explain how an operator could combine them in the same network 119 (see Section 6). Solutions are not defined in this document. 121 B------C------D------E 122 /| | \ / | \ / |\ 123 / | | \/ | \/ | \ 124 A | | /\ | /\ | Z 125 \ | | / \ | / \ | / 126 \| |/ \|/ \|/ 127 F------G------H------I 129 Figure 1: Reference topology 131 We use Figure 1 as a reference topology throughout the document. 132 Following link metrics are applied: 134 Link metrics are bidirectional. In other words, the same metric 135 value is configured at both side of each link. 137 Links from/to A and Z are configured with a metric of 100. 139 CH, GD, DI and HE links are configured with a metric of 6. 141 All other links are configured with a metric of 5. 143 2. Path Protection 145 As a reminder, one of the major network operator requirements is path 146 disjointness capability. Network operators have deployed 147 infrastructures with topologies that allow paths to be computed in a 148 complete disjoint fashion where two paths wouldn't share any 149 component (link or router) hence allowing an optimal protection 150 strategy. 152 A first protection strategy consists of excluding any local repair 153 but instead uses end-to-end path protection where each SPRING path is 154 protected by a second disjoint SPRING path. In this case, the local 155 protection is not used along the path. 157 For example, a Pseudo Wire (PW) from A to Z can be "path protected" 158 in the direction A to Z in the following manner: the operator 159 configures two SPRING paths T1 (primary) and T2 (backup) from A to Z. 161 The two paths may be used: 163 o concurrently, where the ingress router sends the same traffic over 164 the primary and secondary path. This is usually known as 1+1 165 protection. 167 o concurrently, where the ingress router splits the traffic over the 168 primary and secondary path. This is usually known as equal cost 169 multi path (ECMP) or unequal cost multi path (UCMP). 171 o as a primary and backup path, where the secondary path is used 172 only when the primary failed. This is usually known as 1:1 173 protection. 175 T1 is established over path {AB, BC, CD, DE, EZ} as the primary path 176 and T2 is established over path {AF, FG, GH, HI, IZ} as the backup 177 path. The two paths MUST be disjoint in their links, nodes and 178 shared risk link groups (SRLGs) to satisfy the requirement of 179 disjointness. 181 In the case of primary/backup paths, when the primary path T1 is up, 182 the packets of the PW are sent on T1. When T1 fails, the packets of 183 the PW are sent on backup path T2. When T1 comes back up, the 184 operator either allows for an automated reversion of the traffic onto 185 T1 or selects an operator-driven reversion. Typically, the 186 switchover from path T1 to path T2 is done in a fast reroute fashion 187 (e.g.: sub-50 milliseconds range) but depending on the service that 188 needs to be delivered, other restoration times may be used. 190 It is essential that any path, primary or backup, benefit from an 191 end-to-end liveness monitoring/verification. The method and 192 mechanisms that provide such liveness check are outside the scope of 193 this document. An example is given by [RFC5880]. 195 There are multiple options for liveness check, e.g., path liveness 196 where the path is monitored at the network level (either by the head- 197 end node or by a network controller/monitoring system). Another 198 possible approach consists of a service-based path monitored by the 199 service instance (verifying reachability of the endpoint). All these 200 options are given here as examples. While this document does express 201 the requirement for a liveness mechanism, it does not mandate, nor 202 define, any specific one. 204 From a SPRING viewpoint, we would like to highlight the following 205 requirements: 207 o SPRING architecture MUST provide a way to compute paths that are 208 not protected by local repair techniques (as illustrated in the 209 example of paths T1 and T2). 211 o SPRING architecture MUST provide a way to instantiate pairs of 212 disjoint paths on a topology based on a protection strategy (link, 213 node or SRLG protection) and allow the validation or re- 214 computation of these paths upon network events. 216 o The SPRING architecture MUST provide end-to-end liveness check of 217 SPRING based paths. 219 3. Management-free Local Protection 221 This section describes two alternatives providing local protection 222 without requiring operator management, namely bypass protection and 223 shortest-path based protection. 225 For example, a traffic from A to Z, transported over the shortest 226 paths provided by the SPRING architecture, benefits from management- 227 free local protection by having each node along the path 228 automatically pre-compute and pre-install a backup path for the 229 destination Z. Upon local detection of the failure, the traffic is 230 repaired over the backup path in sub-50 milliseconds. When the 231 primary path comes back up, the operator either allows for an 232 automated reversion of the traffic onto it or selects an operator- 233 driven reversion. 235 The backup path computation SHOULD support the following 236 requirements: 238 o 100% link, node, and SRLG protection in any topology. 240 o Automated computation by the IGP. 242 o Selection of the backup path such as to minimize the chance for 243 transient congestion and/or delay during the protection period, as 244 reflected by the IGP metric configuration in the network. 246 3.1. Management-free Bypass Protection 248 One way to provide local repair is to enforce a fail-over along the 249 shortest path around the failed component. 251 In case of link protection, the point of local repair will create a 252 repair path avoiding the protected link and merging back to primary 253 path at the nexthop. 255 In case of node protection, the repair path will avoid the protected 256 node and merge back to primary path at the next-nexthop. 258 In case of SRLG protection, the repair path will avoid members of the 259 same group and merge back to primary path just after. 261 In our example, C protects destination Z against a failure of CD link 262 by enforcing the traffic over the bypass {CH, HD}. The resulting end- 263 to-end path between A and Z, upon recovery against the failure of CD, 264 is depicted in Figure 2. 266 B * * *C------D * * *E 267 *| | * / * \ / |* 268 * | | */ * \/ | * 269 A | | /* * /\ | Z 270 \ | | / * * / \ | / 271 \| |/ **/ \|/ 272 F------G------H------I 274 Figure 2: Bypass protection around link CD 276 When the primary path comes back up, the operator either allows for 277 an automated reversion of the traffic onto the primary path or 278 selects an operator-driven reversion. 280 3.2. Management-free Shortest Path Based Protection 282 An alternative protection strategy consists in management-free local 283 protection, aiming at providing a repair for the destination based on 284 the shortest path to the destination. 286 In our example, C protects Z, that it initially reaches via CD, by 287 enforcing the traffic over its shortest path to Z, considering the 288 failure of the protected component. The resulting end-to-end path 289 between A and Z, upon recovery against the failure of CD, is depicted 290 in Figure 3. 292 B * * *C------D------E 293 *| | * / | \ / |\ 294 * | | */ | \/ | \ 295 A | | /* | /\ | Z 296 \ | | / * | / \ | * 297 \| |/ *|/ \|* 298 F------G------H * * *I 300 Figure 3: Shortest path protection around link CD 302 When the primary path comes back up, the operator either allows for 303 an automated reversion of the traffic onto the primary path or 304 selects an operator-driven reversion. 306 4. Managed Local Protection 308 There may be cases where a management free repair does not fit the 309 policy of the operator. For example, in our illustration, the 310 operator may not want to have CD and CH used to protect each other 311 due the BW availability in each link and that could not suffice to 312 absorb the other link traffic. 314 In this context, the protection mechanism MUST support the explicit 315 configuration of the backup path either under the form of high-level 316 constraints (end at the next-hop, end at the next-next-hop, minimize 317 this metric, avoid this SRLG...) or under the form of an explicit 318 path. Upon local detection of the failure, the traffic is repaired 319 over the backup path in sub-50 milliseconds. When primary path comes 320 back up, the operator either allows for an automated reversion of the 321 traffic onto it or selects an operator-driven reversion. 323 We discuss such aspects for both bypass and shortest path based 324 protection schemes. 326 4.1. Managed Bypass Protection 328 Let us illustrate the case using our reference example. For the 329 demand from A to Z, the operator does not want to use the shortest 330 failover path to the nexthop, {CH, HD}, but rather the path {CG, GH, 331 HD}, as illustrated in Figure 4. 333 B * * *C------D * * *E 334 *| * \ / * \ / |* 335 * | * \/ * \/ | * 336 A | * /\ * /\ | Z 337 \ | * / \ * / \ | / 338 \| */ \*/ \|/ 339 F------G * * *H------I 341 Figure 4: Managed Bypass Protection 343 The computation of the repair path SHOULD be possible in an automated 344 fashion as well as statically expressed in the point of local repair. 346 4.2. Managed Shortest Path Protection 348 In the case of shortest path protection, the operator does not want 349 to use the shortest failover via link CH, but rather reach H via {CG, 350 GH}, for example, due to delay, BW, SRLG or other constraint. 352 The resulting end-to-end path upon activation of the protection is 353 illustrated in Figure 5. 355 B * * *C------D------E 356 *| * \ / | \ / |\ 357 * | * \/ | \/ | \ 358 A | * /\ | /\ | Z 359 \ | * / \ | / \ | * 360 \| */ \|/ \|* 361 F------G * * *H * * *I 363 Figure 5: Managed Shortest Path Protection 365 The computation of the repair path SHOULD be possible in an automated 366 fashion as well as statically expressed in the point of local repair. 368 The computation of the repair path based on a specific constraint 369 SHOULD be possible on a per-destination prefix base. 371 5. Loop Avoidance 373 It is part of routing protocols behavior to have what are called 374 "transient routing inconsistencies". This is due to the routing 375 convergence that happens in each node at different times and during a 376 different lapse of time. 378 These inconsistencies may cause routing loops that last the time that 379 it takes for the node impacted by a network event to converge. These 380 loops are called "microloops". 382 Usually, in a normal routing protocol operations, microloops do not 383 last long and in general they are only noticed during the time it 384 takes the network to converge. However, with the emerging of fast- 385 convergence and fast-reroute technologies, microloops can be an issue 386 in networks where sub-50 millisecond convergence/reroute is required. 387 Therefore, the microloop problem needs to be addressed. 389 Networks may be affected by microloops during convergence depending 390 of their topologies. Detecting microloops can be done during 391 topology computation (e.g., SPF computation) and therefore 392 microloops-avoidance techniques may be applied. An example of such 393 technique is to compute microloop-free path that would be used during 394 network convergence. 396 The SPRING architecture SHOULD provide solutions to prevent the 397 occurrence of microloops during convergence following a change in the 398 network state. Traditionally, the lack of packet steering capability 399 made it difficult to apply efficient solutions to microloops. A 400 SPRING enabled router could take advantage of the increased packet 401 steering capabilities offered by SPRING in order to steer packets in 402 a way that packets do not enter such loops. 404 6. Co-existence of multiple resilience techniques in the same 405 infrastructure 407 The operator may want to support several very different services on 408 the same packet-switching infrastructure. As a result, the SPRING 409 architecture SHOULD allow for the co-existence of the different use 410 cases listed in this document, in the same network. 412 Let us illustrate this with the following example: 414 o Flow F1 is supported over path {C, CD, E} 416 o Flow F2 is supported over path {C, CD, I} 418 o Flow F3 is supported over path {C, CD, Z} 420 o Flow F4 is supported over path {C, CD, Z} 422 It should be possible for the operator to configure the network to 423 achieve path protection for F1, management free shortest path local 424 protection for F2, managed protection over path {CG, GH, Z} for F3, 425 and management free bypass protection for F4. 427 7. Security Considerations 429 This document describes requirements for the SPRING architecture to 430 provide resiliency in SPRING networks. As such it does not introduce 431 any new security considerations beyond that is discussed in 432 [RFC7855]. 434 8. IANA Considerations 436 This document does not request any IANA allocations. 438 9. Manageability Considerations 440 This document provides use cases. Solutions aimed at supporting 441 these use cases should provide the necessary mechanisms in order to 442 allow for manageability as described in [RFC7855]. 444 Manageability concerns the computation, installation and 445 troubleshooting of the repair path. Also, necessary mechanisms 446 SHOULD be provided in order for the operator to control when a repair 447 path is computed, how it has been computed and if it's installed and 448 used. 450 10. Contributors 452 Pierre Francois contributed to the writing of the first version of 453 this document. 455 11. Acknowledgements 457 Authors would like to thank Stephane Litkowski and Alexander 458 Vainshtein for the comments and review of this document. 460 12. References 462 12.1. Normative References 464 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 465 Requirement Levels", BCP 14, RFC 2119, 466 DOI 10.17487/RFC2119, March 1997, 467 . 469 [RFC7855] Previdi, S., Ed., Filsfils, C., Ed., Decraene, B., 470 Litkowski, S., Horneffer, M., and R. Shakir, "Source 471 Packet Routing in Networking (SPRING) Problem Statement 472 and Requirements", RFC 7855, DOI 10.17487/RFC7855, May 473 2016, . 475 12.2. Informative References 477 [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for 478 IP Fast Reroute: Loop-Free Alternates", RFC 5286, 479 DOI 10.17487/RFC5286, September 2008, 480 . 482 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", 483 RFC 5714, DOI 10.17487/RFC5714, January 2010, 484 . 486 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 487 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 488 . 490 Authors' Addresses 492 Clarence Filsfils (editor) 493 Cisco Systems, Inc. 494 Brussels 495 BE 497 Email: cfilsfil@cisco.com 499 Stefano Previdi (editor) 500 Cisco Systems, Inc. 501 Via Del Serafico, 200 502 Rome 00142 503 Italy 505 Email: stefano@previdi.net 507 Bruno Decraene 508 Orange 509 FR 511 Email: bruno.decraene@orange.com 513 Rob Shakir 514 Google, Inc. 515 1600 Amphitheatre Parkway 516 Mountain View, CA 94043 518 Email: robjs@google.com