idnits 2.17.00 (12 Aug 2021) /tmp/idnits25781/draft-filsfils-spring-segment-routing-use-cases-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 21, 2014) is 2762 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '100' on line 238 -- Looks like a reference, but probably isn't: '199' on line 238 -- Looks like a reference, but probably isn't: '200' on line 239 -- Looks like a reference, but probably isn't: '299' on line 239 -- Looks like a reference, but probably isn't: '300' on line 241 -- Looks like a reference, but probably isn't: '399' on line 241 -- Looks like a reference, but probably isn't: '400' on line 243 -- Looks like a reference, but probably isn't: '499' on line 243 -- Looks like a reference, but probably isn't: '500' on line 245 -- Looks like a reference, but probably isn't: '599' on line 245 -- Looks like a reference, but probably isn't: '600' on line 247 -- Looks like a reference, but probably isn't: '699' on line 247 == Outdated reference: A later version (-03) exists of draft-filsfils-spring-segment-routing-ldp-interop-02 == Outdated reference: A later version (-02) exists of draft-francois-spring-segment-routing-ti-lfa-00 == Outdated reference: draft-ietf-i2rs-architecture has been published as RFC 7921 == Outdated reference: draft-ietf-idr-ls-distribution has been published as RFC 7752 == Outdated reference: draft-ietf-isis-segment-routing-extensions has been published as RFC 8667 == Outdated reference: draft-ietf-isis-te-metric-extensions has been published as RFC 7810 == Outdated reference: draft-ietf-ospf-segment-routing-extensions has been published as RFC 8665 == Outdated reference: draft-ietf-pce-pce-initiated-lsp has been published as RFC 8281 == Outdated reference: draft-ietf-pce-stateful-pce has been published as RFC 8231 Summary: 0 errors (**), 0 flaws (~~), 12 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group C. Filsfils, Ed. 3 Internet-Draft Cisco Systems, Inc. 4 Intended status: Standards Track P. Francois, Ed. 5 Expires: April 24, 2015 IMDEA Networks 6 S. Previdi 7 Cisco Systems, Inc. 8 B. Decraene 9 S. Litkowski 10 Orange 11 M. Horneffer 12 Deutsche Telekom 13 I. Milojevic 14 Telekom Srbija 15 R. Shakir 16 British Telecom 17 S. Ytti 18 TDC Oy 19 W. Henderickx 20 Alcatel-Lucent 21 J. Tantsura 22 S. Kini 23 Ericsson 24 E. Crabbe 25 Individual 26 October 21, 2014 28 Segment Routing Use Cases 29 draft-filsfils-spring-segment-routing-use-cases-01 31 Abstract 33 Segment Routing (SR) leverages the source routing and tunneling 34 paradigms. A node steers a packet through a controlled set of 35 instructions, called segments, by prepending the packet with an SR 36 header. A segment can represent any instruction, topological or 37 service-based. SR allows to enforce a flow through any topological 38 path and service chain while maintaining per-flow state only at the 39 ingress node of the SR domain. 41 The Segment Routing architecture can be directly applied to the MPLS 42 dataplane with no change on the forwarding plane. It requires minor 43 extension to the existing link-state routing protocols. Segment 44 Routing can also be applied to IPv6 with a new type of routing 45 extension header. 47 Requirements Language 49 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 50 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 51 document are to be interpreted as described in RFC 2119 [RFC2119]. 53 Status of This Memo 55 This Internet-Draft is submitted in full conformance with the 56 provisions of BCP 78 and BCP 79. 58 Internet-Drafts are working documents of the Internet Engineering 59 Task Force (IETF). Note that other groups may also distribute 60 working documents as Internet-Drafts. The list of current Internet- 61 Drafts is at http://datatracker.ietf.org/drafts/current/. 63 Internet-Drafts are draft documents valid for a maximum of six months 64 and may be updated, replaced, or obsoleted by other documents at any 65 time. It is inappropriate to use Internet-Drafts as reference 66 material or to cite them other than as "work in progress." 68 This Internet-Draft will expire on April 24, 2015. 70 Copyright Notice 72 Copyright (c) 2014 IETF Trust and the persons identified as the 73 document authors. All rights reserved. 75 This document is subject to BCP 78 and the IETF Trust's Legal 76 Provisions Relating to IETF Documents 77 (http://trustee.ietf.org/license-info) in effect on the date of 78 publication of this document. Please review these documents 79 carefully, as they describe your rights and restrictions with respect 80 to this document. Code Components extracted from this document must 81 include Simplified BSD License text as described in Section 4.e of 82 the Trust Legal Provisions and are provided without warranty as 83 described in the Simplified BSD License. 85 Table of Contents 87 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 88 1.1. Companion Documents . . . . . . . . . . . . . . . . . . . 4 89 1.2. Editorial simplification . . . . . . . . . . . . . . . . 4 90 2. IGP-based MPLS Tunneling . . . . . . . . . . . . . . . . . . 5 91 3. Fast Reroute . . . . . . . . . . . . . . . . . . . . . . . . 7 92 3.1. Protecting node and adjacency segments . . . . . . . . . 7 93 3.2. Protecting a node segment upon the failure of its 94 advertising node . . . . . . . . . . . . . . . . . . . . 8 96 3.2.1. Advertisement of the Mirroring Capability . . . . . . 9 97 3.2.2. Mirroring Table . . . . . . . . . . . . . . . . . . . 9 98 3.2.3. LFA FRR at the Point of Local Repair . . . . . . . . 10 99 3.2.4. Modified IGP Convergence upon Node deletion . . . . . 10 100 3.2.5. Conclusions . . . . . . . . . . . . . . . . . . . . . 11 101 4. Traffic Engineering . . . . . . . . . . . . . . . . . . . . . 11 102 4.1. Traffic Engineering without Bandwidth Admission Control . 11 103 4.1.1. Anycast Node Segment . . . . . . . . . . . . . . . . 12 104 4.1.2. Distributed CSPF-based Traffic Engineering . . . . . 16 105 4.1.3. Egress Peering Traffic Engineering . . . . . . . . . 17 106 4.1.4. Deterministic non-ECMP Path . . . . . . . . . . . . . 19 107 4.1.5. Load-balancing among non-parallel links . . . . . . . 21 108 4.2. Traffic Engineering with Bandwidth Admission Control . . 21 109 4.2.1. Capacity Planning Process . . . . . . . . . . . . . . 22 110 4.2.2. SDN/SR use-case . . . . . . . . . . . . . . . . . . . 24 111 4.2.3. Residual Bandwidth . . . . . . . . . . . . . . . . . 28 112 5. Service chaining . . . . . . . . . . . . . . . . . . . . . . 28 113 6. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 114 6.1. Monitoring a remote bundle . . . . . . . . . . . . . . . 29 115 6.2. Monitoring a remote peering link . . . . . . . . . . . . 30 116 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 117 8. Manageability Considerations . . . . . . . . . . . . . . . . 30 118 9. Security Considerations . . . . . . . . . . . . . . . . . . . 30 119 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 30 120 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 121 11.1. Normative References . . . . . . . . . . . . . . . . . . 30 122 11.2. Informative References . . . . . . . . . . . . . . . . . 31 123 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 125 1. Introduction 127 The objective of this document is to illustrate the properties and 128 benefits of the SR architecture, through the documentation of various 129 SR use-cases. 131 Section 2 illustrates the ability to tunnel traffic towards remote 132 service points without any other protocol than the IGP. 134 Section 3 reports various FRR use-cases leveraging the SR 135 functionality. 137 Section 4 documents traffic-engineering use-cases, with and without 138 support of bandwidth admission control. 140 Section 5 documents the use of SR to perform service chaining. 142 Section 6 illustrates OAM use-cases. 144 1.1. Companion Documents 146 The main reference for this document is the SR architecture defined 147 in [I-D.filsfils-spring-segment-routing]. 149 The SR instantiation in the MPLS dataplane is described in 150 [I-D.filsfils-spring-segment-routing-mpls]. 152 [I-D.filsfils-spring-segment-routing-ldp-interop] documents the co- 153 existence and interworking with MPLS Signaling protocols. 155 IS-IS protocol extensions for Segment Routing are described in 156 [I-D.ietf-isis-segment-routing-extensions]. 158 OSPF protocol extensions for Segment Routing are defined in 159 [I-D.ietf-ospf-segment-routing-extensions]. 161 Fast-Reroute for Segment Routing is described in 162 [I-D.francois-spring-segment-routing-ti-lfa]. 164 The PCEP protocol extensions for Segment Routing are defined in 165 [I-D.sivabalan-pce-segment-routing]. 167 The SR instantiation in the IPv6 dataplane will be described in a 168 future draft. 170 1.2. Editorial simplification 172 A unique index is allocated to each IGP Prefix Segment. The related 173 absolute segment associated to an IGP Prefix SID is determined by 174 summing the index and the base of the SRGB. In the SR architecture, 175 each node can be configured with a different SRGB and hence the 176 absolute SID associated to an IGP Prefix Segment can change from node 177 to node. 179 We have described the first use-case of this document in the most 180 generic way, i.e. with different SRGB at each node in the SR IGP 181 domain. We have detailed the packet path highlighting that the SID 182 of a Prefix Segment may change hop by hop. 184 For editorial simplification purpose, we will assume for all the 185 other use cases that the operator ensures a single consistent SRGB 186 across all the nodes in the SR IGP domain. In that case, all the 187 nodes associate the same absolute SID with the same index and hence 188 one can use the absolute SID value instead of the index to refer to a 189 Prefix SID. 191 Several operators have indicated that they would deploy the SR 192 technology in this way: with a single consistent SRGB across all the 193 nodes. They motivated their choice based on operational simplicity 194 (e.g. troubleshooting across different nodes). 196 While this document notes this operator feedback and we use this 197 deployment model to simplify the text, we highlight that the SR 198 architecture is not limited to this specific deployment use-case 199 (different nodes may have different SRGB thanks to the indexation of 200 Prefix SID's). 202 2. IGP-based MPLS Tunneling 204 SR, applied to the MPLS dataplane, offers the ability to tunnel 205 services (VPN, VPLS, VPWS) from an ingress PE to an egress PE, 206 without any other protocol than ISIS or OSPF. LDP and RSVP-TE 207 signaling protocols are not required. 209 The operator only needs to allocate one node segment per PE and the 210 SR IGP control-plane automatically builds the required MPLS 211 forwarding constructs from any PE to any PE. 213 P1---P2 214 / \ 215 A---CE1---PE1 PE2---CE2---Z 216 \ / 217 P4---P4 219 Figure 1: IGP-based MPLS Tunneling 221 In Figure 1 above, the four nodes A, CE1, CE2 and Z are part of the 222 same VPN. CE2 advertises to PE2 a route to Z. PE2 binds a local 223 label LZ to that route and propagates the route and its label via 224 MPBGP to PE1 with nhop 192.168.0.2. PE1 installs the VPN prefix Z in 225 the appropriate VRF and resolves the next-hop onto the node segment 226 associated with PE2. Upon receiving a packet from A destined to Z, 227 PE1 pushes two labels onto the packet: the top label is the Prefix 228 SID attached to 192.168.0.2/32, the bottom label is the VPN label LZ 229 attached to the VPN route Z. 231 The Prefix-SID attached to prefix 192.168.0.2 is a shared segment 232 within the IGP domain, as such it is indexed. 234 Let us assume that: 236 - the operator allocated the index 2 to the prefix 192.168.0.2/32 238 - the operator allocated SRGB [100, 199] at PE1 239 - the operator allocated SRGB [200, 299] at P1 241 - the operator allocated SRGB [300, 399] at P2 243 - the operator allocated SRGB [400, 499] at P3 245 - the operator allocated SRGB [500, 599] at P4 247 - the operator allocated SRGB [600, 699] at PE2 249 Thanks to this context, any SR-capable IGP node in the domain can 250 determine what is the segment associated with the Prefix-SID attached 251 to prefix 192.168.0.2/32: 253 - PE1's SID is 100+2=102 255 - P1's SID is 200+2=202 257 - P2's SID is 300+2=302 259 - P3's SID is 400+2=402 261 - P4's SID is 500+2=502 263 - PE2's SID is 600+2=602 265 Specifically to our example this means that PE1 load-balance the 266 traffic to VPN route Z between P1 and P4. The packets sent to P1 267 have a top label 202 while the packets sent to P4 have a top label 268 502. P1 swaps 202 for 302 and forwards to P2. P2 pops 302 and 269 forwards to PE2. The packets sent to P4 had label 502. P4 swaps 502 270 for 402 and forwards the packets to P3. P3 pops the top label and 271 forwards the packets to PE2. Eventually all the packets reached PE2 272 with one single lable: LZ, the VPN label attached to VPN route Z. 274 This scenario illustrates how supporting MPLS services (VPN, VPLS, 275 VPWS) with SR has the following benefits: 277 - Simple operation: one single intra-domain protocol to operate: 278 the IGP. No need to support IGP synchronization extensions as 279 described in [RFC5443] and [RFC6138]. 281 - Excellent scaling: one Node-SID per PE. 283 3. Fast Reroute 285 Segment Routing aims at supporting services with tight SLA guarantees 286 [I-D.filsfils-spring-segment-routing]. To meet this goal, local 287 protection mechanisms can be useful to provide fast connectivity 288 restoration after the sudden failure of network components. 289 Protection mechanisms for segments aim at letting a point of local 290 repair (PLR) pre-compute and install state allowing to locally 291 recover the delivery of packets when the primary outgoing interface 292 corresponding to the protected active segment is down. 294 This section describes use-cases leading to the definition of 295 different protection mechanisms for node, adjacency, and service 296 segments to be supported by the SR architecture. 298 3.1. Protecting node and adjacency segments 300 Node and adjacency segments are used to determine the path that a 301 packet should follow from an ingress node to an egress node of the SR 302 domain or a service node. 304 Ensuring fast recovery of the packet delivery service may wear 305 different requirements depending on the application using the 306 segment. For this reason, the SR architecture should be able to 307 accomodate multiple protection mechanisms and provide means to the 308 operator to configure the protection scheme applied for the segments 309 that are advertised in the SR domain. 311 The operator may want to achieve fast recovery in case of failures 312 with as little management effort as possible, using a protection 313 mechanism provided by the Segment Routing architecture itself. In 314 this case, a Segment Routing node is in charge of discovering "by 315 default" protection paths for each of its adjacent network component, 316 with minimal operational impact. Approaches for such applications, 317 typically in line with classical IP-FRR solutions, are discussed in 318 [I-D.francois-spring-segment-routing-ti-lfa]. 320 The operator of a Segment Routing network may also have strict 321 policies on how a given network component should be protected against 322 failures. A typical case is the knowledge by an external controller 323 (or through any other tool used by the operator) of shared risk among 324 different components, which should not be used to protect each other. 325 An operator could notably use [I-D.sivabalan-pce-segment-routing] for 326 this purpose. 328 Third, some SR applications have strict requirements in terms of 329 guaranteed performance, disjointness in the infrastructure components 330 used for different services, or for redundant provisioning of such 331 services. An approach for providing resiliency in these contexts 332 consists of letting the ingress node in the SR domain be in charge of 333 the recovery of the Segment Routing paths that it uses to support 334 these services. 336 The protection behavior applied to a given SID must be advertised in 337 the routing information that is propagated in the SR domain for that 338 SID, e.g., in [I-D.ietf-isis-segment-routing-extensions]. Nodes 339 injecting traffic in the SR domain can hence select segments based on 340 the protection mechanism that is required for their application. 342 3.2. Protecting a node segment upon the failure of its advertising node 344 Service segments can also benefit from a fast restoration mechanism 345 provided by the SR architecture. 347 Referring to the below figure, let us assume: 349 A is identified by IP address 192.0.2.1/32 to which Node-SID 101 350 is attached. 352 B is identified by IP address 192.0.2.2/32 to which Node-SID 102 353 is attached 355 A and B host the same set of services. 357 Each service is identified by a local segment at each node: i.e. 358 node A allocates a local service segment 9001 to identify a 359 specific service S while the same service is identified by a local 360 service segment 9002 at B. Specifically, for the sake of this 361 illustration, let us assume that service S is a BGP-VPN service 362 where A announces a VPN route V with BGP nhop 192.0.2.1/32 and 363 local VPN label 9001 and B announces the same VPN route V with BGP 364 nhop 192.0.2.2/32 and local VPN label 9002. 366 A generic mesh interconnects the three nodes M, Q and B. 368 N prefers to use the service S offered by A and hence sends its 369 S-destined traffic with segment list {101, 9001}. 371 Q is a node connected to A. 373 Q has a method to detect the loss of node A within a few 10's of 374 msec. 376 __ 377 { }---Q---A(service S) 378 N--M--{ } 379 {__}---B(service S) 381 Figure 2: Service Mirroring 383 In that context, we would like to protect the traffic destined to 384 service S upon the failure of node A. 386 The solution is built upon several components: 388 1. B advertises its mirroring capability for mirrored Node-SID 101 389 2. B pre-installs a mirroring table in order to process the 390 packets originally destined to 101. 391 3. Q and any neighbor of A pre-install the Mirror_FRR LFA 392 extension 393 4. All nodes implements a modified SRDB convergence upon Node-SID 394 101 deletion 396 3.2.1. Advertisement of the Mirroring Capability 398 B advertises a MIRROR sub-TLV in its IGP Link-State Router Capability 399 TLV with the values (TTT=000, MIRRORED_OBJECT=101, 400 CONTEXT_SEGMENT=10002),[I-D.filsfils-spring-segment-routing], 401 [I-D.ietf-isis-segment-routing-extensions] and 402 [I-D.ietf-ospf-segment-routing-extensions] for more details in the 403 encodings. 405 Doing so, B advertises within the routing domain that it is willing 406 to backup any traffic originally sent to Node-SID 101 provided that 407 this rerouted traffic gets to B with the context segment 10002 408 directly preceding any local service segment advertised by A. 10002 409 is a local context segment allocated by B to identify traffic that 410 was originally meant for A. This allows B to match the subsequent 411 service segment (e.g. 9001) correctly. 413 3.2.2. Mirroring Table 415 We assume that B is able to discover all the local service segments 416 allocated by A (e.g. BGP route reflection and add-path). B maps all 417 the services advertised by A to its similar service representations. 418 For example, service 9001 advertised by A is mapped to service 9002 419 advertised by B as both relate to the same service S (the same VPN 420 route V). For example, B applies the same service treatment to a 421 packet received with top segments {102, 10002, 9001} or with top 422 segments {102, 9002}. Basically, B treats {10002, 9001} as a synonym 423 of {9002}. 425 3.2.3. LFA FRR at the Point of Local Repair 427 In advance of any failure of A, Q (and any other node connected to A) 428 learns the identity of the IGP Mirroring node for each Node-SID 429 advertised by A (MIRROR_TLV advertised by B) and pre-installs the 430 following new MIRROR_FRR entry: 432 - Trigger condition: the loss of nhop A 433 - Incoming active segment: 101 (a Node-SID advertised by A) 434 - Primary Segment processing: pop 101 435 - Backup Segment processing: pop 101, push {102, 10002} 436 - Primary nhop: A 437 - Backup nhop: primary path to node B 439 Upon detecting the loss of node A, Q intercepts any traffic destined 440 to Node-SID 101, pops the segment to A (101) and push a repair tunnel 441 {102, 10002}. Node-SID 102 steers the repaired traffic to B while 442 context segment 10002 allows B to process the following service 443 segment {9001} in the right context table. 445 3.2.4. Modified IGP Convergence upon Node deletion 447 Upon the failure of A, all the neighbors of A will flood the loss of 448 their adjacency to A and eventually every node within the IGP domain 449 will delete 192.0.2.1/32 from their RIB. 451 The RIB deletion of 192.0.2.1/32 at N is beneficial as it triggers 452 the BGP FRR Protection onto the precomputed backup next-hop. 454 The RIB deletion at node M, if it occurs before the RIB deletion at 455 N, would be disastrous as it would lead to the loss of the traffic 456 from N to A before Q is able to apply the Mirroring protection. 458 The solution consists in delaying the deletion of the SRDB entry for 459 101 by 2 seconds while still deleting the IP RIB 192.0.2.1/32 entry 460 immediately. 462 The RIB deletion triggers the BGP FRR and BGP Convergence. This is 463 beneficial and must occur without delay. 465 The deletion of the SRDB entry to Node-SID101 is delayed to ensure 466 that the traffic still in transit towards Node-SID 101 is not 467 dropped. 469 The delay timer should be long enough to ensure that either the BGP 470 FRR or the BGP Convergence has taken place at N. 472 3.2.5. Conclusions 474 In our reference figure, N sends its packets towards A with the 475 segment list {101, 9001}. The shortest-path from S to A transits via 476 M and Q. 478 Within a few msec of the loss of A, Q activates its pre-installed 479 Mirror_FRR entry and reroutes the traffic to B with the following 480 segment list {102, 10002, 9001}. 482 Within a few 100's of msec, any IGP node deletes its RIB entry to A 483 but keeps its SRDB entry to Node-SID 101 for an extra 2 seconds. 485 Upon deleting its RIB entry to 192.0.2.1/32, N activates its BGP FRR 486 entry and reroutes its S destined traffic towards B with segment list 487 {102, 9002}. 489 By the time any IGP node deletes the SRDB entry to Node-SID 101, N no 490 longer sends any traffic with Node-SID 101. 492 The deletion of the SRDB entry to Node-SID101 is delayed to ensure 493 that the traffic still in transit towards Node-SID 101 is not 494 dropped. 496 In conclusion, the traffic loss only depends on the ability of Q to 497 detect the node failure of its adjacent node A. 499 4. Traffic Engineering 501 In this section, we describe Traffic Engineering use-cases for SR, 502 distinguishing use-cases for traffic engineering with bandwidth 503 admission control from those without. 505 4.1. Traffic Engineering without Bandwidth Admission Control 507 This section describes traffic-engineering use-cases which do not 508 require bandwidth admission control. 510 The first sub-section illustrates the use of anycast segments to 511 express macro policies. Two examples are provided: one involving a 512 disjointness enforcement within a so-called dual-plane network, and 513 the other involving CoS-based policies. 515 The second sub-section illustrate how a head-end router can combine a 516 distributed CSPF computation with SR. Various examples are provided 517 where the CSPF constraint or objective is either a TE affinity, an 518 SRLG or a latency metric. 520 The third sub-section illustrates how SR can help traffic-engineer 521 outbound traffic among different external peers, overriding the best 522 installed IP path at the egress border routers. 524 The fourth sub-section describes how SR can be used to express 525 deterministic non-ECMP paths. Several techniques to compress the 526 related segment lists are also introduced. 528 The fifth sub-section describes a use-case where a node attaches an 529 Adj-SID to a set of its interfaces however not sharing the same 530 neighbor. The illustrated benefit relates to loadbalancing. 532 4.1.1. Anycast Node Segment 534 The SR architecture defines an anycast segment as a segment attached 535 to an anycast IP prefix ([RFC4786]). 537 The anycast node segment is an interesting tool for traffic 538 engineering: 540 Macro-policy support: anycast segments allow to express policies 541 such as "go via plane1 of a dual-plane network" (Section 4.1.1.1) 542 or "go via Region3" (Section 4.1.3). 544 Implicit node resiliency: the traffic-engineering policy is not 545 anchored to a specific node whose failure could impact the 546 service. It is anchored to an anycast address/Anycast-SID and 547 hence the flow automatically reroutes on any ECMP-aware shortest- 548 path to any other router part of the anycast set. 550 The two following sub-sections illustrate to traffic-engineering use- 551 cases leveraging Anycast-SID. 553 4.1.1.1. Disjointness in dual-plane networks 555 Many networks are built according to the dual-plane design: 557 Each access region k is connected to the core by two C routers 558 (C(1,k) and C(2,k)). 560 C(1,k) is part of plane 1 and aggregation region K 562 C(2,k) is part of plane 2 and aggregation region K 564 C(1,k) has a link to C(2, j) iff k = j. 566 The core nodes of a given region are directly connected. 567 Inter-region links only connect core nodes of the same plane. 569 {C(1,k) has a link to C(1, j)} iff {C(2,k) has a link to C(2, j)}. 571 The distribution of these links depends on the topological 572 properties of the core of the AS. The design rule presented 573 above specifies that these links appear in both core planes. 575 We assume a common design rule found in such deployments: the inter- 576 plane link costs (Cik-Cjk where i<>j) are set such that the route to 577 an edge destination from a given plane stays within the plane unless 578 the plane is partitioned. 580 Edge Router A 581 / \ 582 / \ 583 / \ Agg Region A 584 / \ 585 / \ 586 C1A----------C2A 587 | \ | \ 588 | \ | \ 589 | C1B----------C2B 590 Plane1 | | | | Plane2 591 | | | | 592 C1C--|-----C2C | 593 \ | \ | 594 \ | \ | 595 C1Z----------C2Z 596 \ / 597 \ / Agg Region Z 598 \ / 599 \ / 600 Edge Router Z 602 Figure 3: Dual-Plane Network and Disjointness 604 In the above network diagram, let us that the operator configures: 606 The four routers (C1A, C1B, C1C, C1Z) with an anycast loopback 607 address 192.0.2.1/32 and an Anycast-SID 101. 609 The four routers (C2A, C2B, C2C, C2Z) with an anycast loopback 610 address 192.0.2.2/32 and an Anycast-SID 102. 612 Edge router Z with Node-SID 109. 614 A can then use the three following segment lists to control its 615 Z-destined traffic: 617 {109}: the traffic is load-balanced across any ECMP path through 618 the network. 620 {101, 109}: the traffic is load-balanced across any ECMP path 621 within the Plane1 of the network. 623 {102, 109}: the traffic is load-balanced across any ECMP path 624 within the Plane2 of the network. 626 Most of the data traffic to Z would use the first segment list, such 627 as to exploit the capacity efficiently. The operator would use the 628 two other segment lists for specific premium traffic that has 629 requested disjoint transport. 631 For example, let us assume a bank or a government customer has 632 requested that the two flows F1 and F2 injected at A and destined to 633 Z should be transported across disjoint paths. The operator could 634 classify F1 (F2) at A and impose and SR header with the second 635 (third) segment list. Focusing on F1 for the sake of illustration, A 636 would route the packets based on the active segment, Anycast-SID 101, 637 which steers the traffic along the ECMP-aware shortest-path to the 638 closest router part of the Anycast-SID 101, C1A is this example. 639 Once the packets have reached C1A, the second segment becomes active, 640 Node-SID 109, which steers the traffic on the ECMP-aware shortest- 641 path to Z. C1A load-balances the traffic between C1B-C1Z and C1C-C1Z 642 and then C1Z forwards to Z. 644 This SR use-case has the following benefits: 646 Zero per-service state and signaling on midpoint and tail-end 647 routers. 649 Only two additional node segments (one Anycast-SID per plane). 651 ECMP-awareness. 653 Node resiliency property: the traffic-engineering policy is not 654 anchored to a specific core node whose failure could impact the 655 service. 657 4.1.1.2. CoS-based Traffic Engineering 659 Frequently, different classes of service need different path 660 characteristics. 662 In the example below, a single-area international network with 663 presence in four different regions of the world has lots of cheap 664 network capacity from Region4 to Region1 via Region2 and some scarce 665 expensive capacity via Region3. 667 +-------[Region2]-------+ 668 | | 669 A----[Region4] [Region1]----Z 670 | | 671 +-------[Region3]-------+ 673 Figure 4: International Topology Example 675 In such case, the IGP metrics would be tuned to have a shortest-path 676 from A to Z via Region2. 678 This would provide efficient capacity planning usage while fulfilling 679 the requirements of most of the traffic demands. However, it may not 680 suite the latency requirements of the voice traffic between the two 681 cities. 683 Let us illustrate how this can be solved with Segment Routing. 685 The operator would configure: 687 - All the core routers in Region3 with an anycast loopback 688 192.0.2.3/32 to which Anycast-SID 333 is attached. 689 - A loopback 192.0.2.9/32 on Z and would attach Node-SID 109 690 to it. 691 - The IGP metrics such that the shortest-path from Region4 to 692 Region1 is via Region2, from Region4 to Region3 is directly 693 to Region3, the shortest-path from Region3 to Region1 is not 694 back via Region4 and Region2 but straight to Region1. 696 With this in mind, the operator would instruct A to apply the 697 following policy for its Z-destined traffic: 699 - Voice traffic: impose segment-list {333, 109} 700 - Anycast-SID 333 steers the Voice traffic along the 701 ECMP-aware shortest-path to the closest core router in 702 Region3, then Node-SID 109 steers the Voice traffic along 703 the ECMP-aware shortest-path to Z. Hence the Voice traffic 704 reaches Z from A via the low-latency path through Region3. 706 - Any other traffic: impose segment-list {109}: Node-SID 109 707 steers the Voice traffic along the ECMP-aware shortest-path 708 to Z. Hence the bulk traffic reaches Z from A via the cheapest 709 path for the operator. 711 This SR use-case has the following benefits: 713 Zero per-service state and signaling at midpoint and tailend 714 nodes. 716 One additional anycast segment per region. 718 ECMP-awareness. 720 Node resiliency property: the traffic-engineering policy is not 721 anchored to a specific core node whose failure could impact the 722 service. 724 4.1.2. Distributed CSPF-based Traffic Engineering 726 In this section, we illustrate how a head-end router can map the 727 result of its distributed CSPF computation into an SR segment list. 729 +---E---+ 730 | | 731 A-----B-------C-----Z 732 | | 733 +---D---+ 735 Figure 5: SRLG-based CSPF 737 Let us assume that in the above network diagram: 739 The operator configures a policy on A such that its Z-destined 740 traffic must avoid SRLG1. 742 The operator configures SRLG1 on the link BC (or is learned 743 dynamically from the IP/Optical interaction with the DWDM 744 network). 746 The SRLG's are flooded in the link-state IGP. 748 The operator respectively configures the Node-SIDs 101, 102, 103, 749 104, 105 and 109 at nodes A, B, C, D, E and Z. 751 In that context, A can apply the following CSPF behavior: 753 - It prunes all the links affected by the SRLG1, computes an SPF 754 on the remaining topology and picks one of the SPF paths. 755 - In our example, A finds two possible paths ABECZ and ABDCZ 756 and let's assume it takes the ABDCZ path. 758 - It translates the path as a list of segments 759 - In our example, ABDCZ can be expressed as {104, 109}: a 760 shortest path to node D, followed by a shortest-path to 761 node Z. 763 - It monitors the status of the LSDB and upon any change 764 impacting the policy, it either recomputes a path meeting the 765 policy or update its translation as a list of segments. 766 - For example, upon the loss of the link DC, the shortest-path 767 to Z from D (Node-SID 109) goes via the undesired link BC. 768 After a transient time immediately following such failure, 769 the node A would figure out that the chosen path is no longer 770 valid and instead select ABECZ which is translated as 771 {103, 109}. 773 - This behavior is a local matter at node A and hence the details 774 are outside the scope of this document. 776 The same use-case can be derived from any other C-SPF objective or 777 constraint (TE affinity, TE latency, SRLG, etc.) as defined in 778 [RFC5305] and [I-D.ietf-isis-te-metric-extensions]. Note that the 779 bandwidth case is specific and hence is treated in Section 4.2. 781 4.1.3. Egress Peering Traffic Engineering 783 +------+ 784 | | 785 +---D F 786 +---------+ / | AS 2 |\ +------+ 787 | |/ +------+ \| Z | 788 A C | | 789 | |\ +------+ /| AS 4 | 790 B AS1 | \ | |/ +------+ 791 | | +---E G 792 +---------+ | AS 3 | 793 +------+\ 795 Figure 6: Egress peering traffic engineering 797 Let us assume that: 799 C in AS1 learns about destination Z of AS 4 via two BGP paths 800 (AS2, AS4) and (AS3, AS4). 802 C sets next-hop-self before propagating the paths within AS1. 804 C propagates all the paths to Z within AS1 (add-path). 806 C only installs the path via AS2 in its RIB. 808 In that context, the operator of AS1 cannot apply the following 809 traffic-engineering policy: 811 Steer 60% of the Z-destined traffic received at A via AS2 and 40% 812 via AS3. 814 Steer 80% of the Z-destined traffic received at B via AS2 and 20% 815 via AS3. 817 This traffic-engineering policy can be supported thanks to the 818 following SR configuration. 820 The operator configures: 822 C with a loopback 192.0.2.1/32 and attach the Node-SID 101 to it. 824 C to bind an external adjacency segment 825 ([I-D.filsfils-spring-segment-routing]) to each of its peering 826 interface. 828 For the sake of this illustration, let us assume that the external 829 adjacency segments bound by C for its peering interfaces to (D, AS2) 830 and (E, AS3) are respectively 9001 and 9002. 832 These external adjacencies (and their attached segments) are flooded 833 within the IGP domain of AS1 [RFC5316]. 835 As a result, the following information is available within AS1: 837 ISIS Link State Database: 839 - Node-SID 101 is attached to IP address 192.0.2.1/32 advertised 840 by C. 841 - C is connected to a peer D with external adjacency segment 9001. 842 - C is connected to a peer E with external adjacency segment 9002. 844 BGP Database: 846 - Z is reachable via 192.0.2.1 with AS Path {AS2, AS4}. 847 - Z is reachable via 192.0.2.1 with AS Path {AS3, AS4}. 849 The operator of AS1 can thus meet its traffic-engineering objective 850 by enforcing the following policies: 852 A should apply the segment list {101, 9001} to 60% of the 853 Z-destined traffic and the segment list {101, 9002} to the rest. 855 B should apply the segment list {101, 9001} to 80% of the 856 Z-destined traffic and the segment list {101, 9002} to the rest. 858 Node segment 101 steers the traffic to C. 860 External adjacency segment 9001 forces the traffic from C to (D, 861 AS2), without any IP lookup at C. 863 External adjacency segment 9002 forces the traffic from C to (E, 864 AS3), without any IP lookup at C. 866 A and B can also use the described segments to assess the liveness of 867 the remote peering links, see OAM section. 869 4.1.4. Deterministic non-ECMP Path 871 The previous sections have illustrated the ability to steer traffic 872 along ECMP-aware shortest-paths. SR is also able to express 873 deterministic non-ECMP path: i.e. as a list of adjacency segments. 874 We illustrate such an use-case in this section. 876 A-B-C-D-E-F-G-H-Z 877 | | 878 +-I-J-K-L-M-+ 880 Figure 7: Non-ECMP deterministic path 882 In the above figure, it is assumed all nodes are SR capable and only 883 the following SIDs are advertised: 885 - A advertises Adj-SID 9001 for its adjacency to B 886 - B advertises Adj-SID 9002 for its adjacency to C 887 - C advertises Adj-SID 9003 for its adjacency to D 888 - D advertises Adj-SID 9004 for its adjacency to E 889 - E advertises Adj-SID 9001 for its adjacency to F 890 - F advertises Adj-SID 9002 for its adjacency to G 891 - G advertises Adj-SID 9003 for its adjacency to H 892 - H advertises Adj-SID 9004 for its adjacency to Z 893 - E advertises Node-SID 101 894 - Z advertises Node-SID 109 896 The operator can steer the traffic from A to Z via a specific non- 897 ECMP path ABCDEFGHZ by imposing the segment list {9001, 9002, 9003, 898 9004, 9001, 9002, 9003, 9004}. 900 The following sub-sections illustrate how the segment list can be 901 compressed. 903 4.1.4.1. Node Segment 905 Clearly the same exact path can be expressed with a two-entry segment 906 list {101, 109}. 908 This example illustrates that a Node Segment can also be used to 909 express deterministic non-ECMP path. 911 4.1.4.2. Forwarding Adjacency 913 The operator can configure Node B to create a forwarding-adjacency to 914 node H along an explicit path BCDEFGH. The following behaviors can 915 then be automated by B: 917 B attaches an Adj-SID (e.g. 9007) to that forwarding adjacency 918 together with an ERO sub-sub-TLV which describes the explicit path 919 BCDEFGH. 921 B installs in its Segment Routing Database the following entry: 923 Active segment: 9007. 925 Operation: NEXT and PUSH {9002, 9003, 9004, 9001, 9002, 9003} 927 As a result, the operator can configure node A with the following 928 compressed segment list {9001, 9007, 9004}. 930 4.1.5. Load-balancing among non-parallel links 932 A given node may assign the same Adj-SID to multiple of its 933 adjacencies, even if these ones lead to different neighbors. This 934 may be useful to support traffic engineering policies. 936 +---C---D---+ 937 | | 938 PE1---A---B-----F-----E---PE2 940 Figure 8: Adj-SID For Multiple (non-parallel) Adjacencies 942 In the above example, let us assume that the operator: 944 Requires PE1 to load-balance its PE2-destined traffic between the 945 ABCDE and ABFE paths. 947 Configures B with Node-SID 102 and E with Node-SID 202. 949 Configures B to advertise an individual Adj-SID per adjacency 950 (e.g. 9001 for BC and 9002 for BF) and, in addition, an Adj-SID 951 for the adjacency set (BC, BF) (e.g. 9003). 953 With this context in mind, the operator achieves its objective by 954 configuring the following traffic-engineering policy at PE1 for the 955 PE2-destined traffic: {102, 9003, 202}: 957 Node-SID 102 steers the traffic to B. 959 Adj-SID 9003 load-balances the traffic to C or F. 961 From either C or F, Node-SID 202 steers the traffic to PE2. 963 In conclusion, the traffic is load-balanced between the ABCDE and 964 ABFE paths, as desired. 966 4.2. Traffic Engineering with Bandwidth Admission Control 968 The implementation of bandwidth admission control within a network 969 (and its possible routing consequence which consists in routing along 970 explicit paths where the bandwidth is available) requires a capacity 971 planning process. 973 The spreading of load among ECMP paths is a key attribute of the 974 capacity planning processes applied to packet-based networks. 976 The first sub-section details the capacity planning process and the 977 role of ECMP load-balancing. We highlight the relevance of SR in 978 that context. 980 The next two sub-sections document two use-cases of SR-based traffic 981 engineering with bandwidth admission control. 983 The second sub-section documents a concrete SR applicability 984 involving centralized-based admission control. This is often 985 referred to as the "SDN/SR use-case". 987 The third sub-section introduces a future research topic involving 988 the notion of residual bandwidth introduced in 989 [I-D.ietf-mpls-te-express-path]. 991 4.2.1. Capacity Planning Process 993 Capacity Planning anticipates the routing of the traffic matrix onto 994 the network topology, for a set of expected traffic and topology 995 variations. The heart of the process consists in simulating the 996 placement of the traffic along ECMP-aware shortest-paths and 997 accounting for the resulting bandwidth usage. 999 The bandwidth accounting of a demand along its shortest-path is a 1000 basic capability of any planning tool or PCE server. 1002 For example, in the network topology described below, and assuming a 1003 default IGP metric of 1 and IGP metric of 2 for link GF, a 1600Mbps 1004 A-to-Z flow is accounted as consuming 1600Mbps on links AB and FZ, 1005 800Mbps on links BC, BG and GF, and 400Mbps on links CD, DF, CE and 1006 EF. 1008 C-----D 1009 / \ \ 1010 A---B +--E--F--Z 1011 \ / 1012 G------+ 1014 Figure 9: Capacity Planning an ECMP-based demand 1016 ECMP is extremely frequent in SP, Enterprise and DC architectures and 1017 it is not rare to see as much as 128 different ECMP paths between a 1018 source and a destination within a single network domain. It is a key 1019 efficiency objective to spread the traffic among as many ECMP paths 1020 as possible. 1022 This is illustrated in the below network diagram which consists of a 1023 subset of a network where already 5 ECMP paths are observed from A to 1024 M. 1026 C 1027 / \ 1028 B-D-L-- 1029 / \ / \ 1030 A E \ 1031 \ M 1032 \ G / 1033 \ / \ / 1034 F K 1035 \ / 1036 I 1038 Figure 10: ECMP Topology Example 1040 Segment Routing offers a simple support for such ECMP-based shortest- 1041 path placement: a node segment. A single node segment enumerates all 1042 the ECMP paths along the shortest-path. 1044 When the capacity planning process detects that a traffic growth 1045 scenario and topology variation would lead to congestion, a capacity 1046 increase is triggered and if it cannot be deployed in due time, a 1047 traffic engineering solution is activated within the network. 1049 A basic traffic engineering objective consists of finding the 1050 smallest set of demands that need to be routed off their shortest 1051 path to eliminate the congestion, then to compute an explicit path 1052 for each of them and instantiating these traffic-engineered policies 1053 in the network. 1055 Segment Routing offers a simple support for explicit path policy. 1056 Let us provide two examples based on Figure 10. 1058 First example: let us assume that the process has selected the flow 1059 AM for traffic-engineering away from its ECMP-enabled shortest path 1060 and flow AM must avoid consuming resources on the LM and the FG 1061 links. 1063 The solution is straightforward: A sends its M-destined traffic 1064 towards the nhop F with a two-label stack where the top label is the 1065 adjacent segment FI and the next label is the node segment to M. 1066 Alternatively, a three-label stack with adjacency segments FI, IK and 1067 KM could have been used. 1069 Second example: let us assume that AM is still the selected flow but 1070 the constraint is relaxed to only avoid using resources from the LM 1071 link. 1073 The solution is straightforward: A sends its M-destined traffic 1074 towards the nhop F with a one-label stack where the label is the node 1075 segment to M. Note that while the AM flow has been traffic- 1076 engineered away from its natural shortest-path (ECMP across three 1077 paths), the traffic-engineered path is still ECMP-aware and leverages 1078 two of the three initial paths. This is accomplished with a single- 1079 label stack and without the enumeration of one tunnel per path. 1081 Under the light of these examples, Segment Routing offers an 1082 interesting solution for Capacity Planning because: 1084 One node segment represents the set of ECMP-aware shortest paths. 1086 Adjacency segments allow to express any explicit path. 1088 The combination of node and adjacency segment allows to express 1089 any path without having to enumerate all the ECMP options. 1091 The capacity planning process ensures that the majority of the 1092 traffic rides on node segments (ECMP-based shortest path), while a 1093 minority of the traffic is routed off its shortest-path. 1095 The explicitly-engineered traffic (which is a minority) still 1096 benefits from the ECMP-awareness of the node segments within their 1097 segment list. 1099 Only the head-end of a traffic-engineering policy maintains state. 1100 The midpoints and tail-ends do not maintain any state. 1102 4.2.2. SDN/SR use-case 1104 The heart of the application of SR to the SDN use-case lies in the 1105 SDN controller, also called Stateful PCE 1106 ([I-D.ietf-pce-stateful-pce]). 1108 The SDN controller is responsible to control the evolution of the 1109 traffic matrix and topology. It accepts or denies the addition of 1110 new traffic into the network. It decides how to route the accepted 1111 traffic. It monitors the topology and upon failure, determines the 1112 minimum traffic that should be rerouted on an alternate path to 1113 alleviate a bandwidth congestion issue. 1115 The algorithms supporting this behavior are a local matter of the SDN 1116 controller and are outside the scope of this document. 1118 The means of collecting traffic and topology information are the same 1119 as what would be used with other SDN-based traffic-engineering 1120 solutions (e.g. [RFC7011] and [I-D.ietf-idr-ls-distribution]. 1122 The means of instantiating policy information at a traffic- 1123 engineering head-end are the same as what would be used with other 1124 SDN-based traffic-engineering solutions (e.g.: 1125 [I-D.ietf-i2rs-architecture], [I-D.ietf-pce-pce-initiated-lsp] and 1126 [I-D.sivabalan-pce-segment-routing]). 1128 4.2.2.1. Illustration 1130 _______________ 1131 { } 1132 +--C--+ V { SDN Controller } 1133 |/ \| / {_______________} 1134 A===B--G--D==F--Y 1135 |\ /| \ 1136 +--E--+ Z 1138 SDN/SR use-case 1140 Let us assume that in the above network diagram: 1142 An SDN Controller (SC) is connected to the network and is able to 1143 retrieve the topology and traffic information, as well as set 1144 traffic-engineering policies on the network nodes. 1146 The operator (likely via the SDN Controller) as provisioned the 1147 Node-SIDs 101, 102, 103, 104, 105, 106, 107, 201, 202 and 203 1148 respectively at nodes A, B, C, D, E, F, G, V, Y and Z. 1150 All the links have the same BW (e.g. 10G) and IGP cost (e.g. 10) 1151 except the links BG and GD which have IGP cost 50. 1153 Each described node connectivity is formed as a bundle of two 1154 links, except (B, G) and (G, D) which are formed by a single link 1155 each. 1157 Flow FV is traveling from A to destinations behind V. 1159 Flow FY is traveling from A to destinations behind Y. 1161 Flow FZ is traveling from A to destinations behind Z. 1163 The SDN Controller has admitted all these flows and has let A 1164 apply the default SR policy: "map a flow onto its ECMP-aware 1165 shortest-path". 1167 In this example, this means that A respectively maps the flows 1168 FV onto segment list {201}, FY onto segment list {202} and FZ 1169 onto segment list {203}. 1171 In this example, the reader should note that the SDN Controller 1172 knows what A would do and hence knows and controls that none of 1173 these flows are mapped through G. 1175 Let us describe what happens upon the failure of one of the two links 1176 E-D. 1178 The SDN Controller monitors the link-state database and detects a 1179 congestion risk due to the reduced capacity between E and D. 1180 Specifically, SC updates its simulation of the traffic according to 1181 the policies he instructed the network to use and discovers that too 1182 much traffic is mapped on the remaining link E-D. 1184 The SDN Controller then computes the minimum number of flows that 1185 should be deviated from their existing path. For example, let us 1186 assume that the flow FZ is selected. 1188 The SDN controller then computes an explicit path for this flow. For 1189 example, let us assume that the chosen path is ABGDFZ. 1191 The SDN controller then maps the chosen path into an SR-based policy. 1192 In our example, the path ABGDFZ is translated into a segment list 1193 {107, 203}. Node-SID steers the traffic along ABG and then Node-SID 1194 203 steers the traffic along GDFZ. 1196 The SDN controller then applies the following traffic-engineering 1197 policy at A: "map any packet of the classified flow FZ onto segment- 1198 list {107, 203}". The SDN Controller uses PCEP extensions to 1199 instantiate that policy at A ([I-D.sivabalan-pce-segment-routing]). 1201 As soon as A receives the PCEP message, it enforces the policy and 1202 the traffic classified as FZ is immediately mapped onto segment list 1203 {107, 203}. 1205 This immediately eliminate the congestion risk. Flows FV and FY were 1206 untouched and keep using the ECMP-aware shortest-path. The minimum 1207 amount of traffic was rerouted (FZ). No signaling hop-by-hop through 1208 the network from A to Z is required. No admission control hop-by-hop 1209 is required. No state needs to be maintained by B, G, D, F or Z. 1210 The only maintained state is within the SDN controller and the head- 1211 end node (A). 1213 4.2.2.2. Benefits 1215 In the context of Centralized-Based Optimization and the SDN use- 1216 case, here are the benefits provided by the SR architecture: 1218 Explicit routing capability with or without ECMP-awareness. 1220 No signaling hop-by-hop through the network. 1222 State is only maintained at the policy head-end. No state is 1223 maintained at mid-points and tail-ends. 1225 Automated guaranteed FRR for any topology (Section 3. 1227 Optimum virtualization: the policy state is in the packet header 1228 and not in the intermediate node along the policy. The policy is 1229 completely virtualized away from midpoints and tail-ends. 1231 Highly responsive to change: the SDN Controller only needs to 1232 apply a policy change at the head-end. No delay is lost 1233 programming the midpoints and tail-end along the policy. 1235 4.2.2.3. Dataset analysis 1237 A future version of this document will report some analysis of the 1238 application of the SDN/SR use-case to real operator data sets. 1240 A first, incomplete, report is available here below. 1242 4.2.2.3.1. Example 1 1244 The first data-set consists in a full-mesh of 12000 explicitly-routed 1245 tunnels observed on a real network. These tunnels resulted from 1246 distributed headend-based CSPF computation. 1248 We measured that only 65% of the traffic is riding on its shortest 1249 path. 1251 Three well-known defects are illustrated in this data set: 1253 The lack of ECMP support in explicitly--routed tunnels: ATM-alike 1254 traffic-steering mechanisms steer the traffic along a non-ECMP 1255 path. 1257 The increase of the number of explicitly-routed non-ECMP tunnels 1258 to enumerate all the ECMP options. 1260 The inefficiency of distributed optimization: too much traffic is 1261 riding off its shortest path. 1263 We applied the SDN/SR use-case to this dataset. This means that: 1265 The distributed CSPF computation is replaced by centralized 1266 optimization and BW admission control, supported by the SDN 1267 Controller. 1269 As part of the optimization, we also optimized the IGP-metrics 1270 such as to get a maximum of traffic load-spread among ECMP- 1271 paths by default. 1273 The traffic-engineering policies are supported by SR segment- 1274 lists. 1276 As a result, we measured that 98% of the traffic would be kept on its 1277 normal policy (ride shortest-path) and only 2% of the traffic 1278 requires a path away from the shortest-path. 1280 Let us highlight a few benefits: 1282 98% of the traffic-engineering head-end policies are eliminated. 1284 Indeed, by default, an SR-capable ingress edge node maps the 1285 traffic on a single Node-ID to the egress edge node. No 1286 configuration or policy needs to be maintained at the ingress 1287 edge node to realize this. 1289 100% of the states at mid/tail nodes are eliminated. 1291 4.2.3. Residual Bandwidth 1293 The notion of Residual Bandwidth (RBW) is introduced by 1294 [I-D.ietf-mpls-te-express-path]. 1296 A future version of this document will describe the SR/RBW research 1297 opportunity. 1299 5. Service chaining 1301 Segment routing can be used to steer packets through services offered 1302 by middleboxes to perform specific actions such as DPI, accounting, 1303 etc. 1305 I---A---B---C---E 1306 \ | / \ / 1307 \ | / F 1308 \|/ 1309 D 1311 Figure 11 1313 For example, as illustrated in Figure 11, an ingress node I selects 1314 an egress node E for a packet P. An application however requires 1315 that P undergoes a specific treatment (DPI, firewalling, ...) offered 1316 by a node D, reachable in the SR domain. In the SR architecture, 1317 this application can be supported through the use of a service 1318 segment with a local scope to D, say SS, following the nodal segment 1319 which corresponds to D. The Ingress box keeps the control of the 1320 egress node through which the packet needs to exit the network, by 1321 placing a nodal segment identifying the egress node after the service 1322 segment. 1324 This would be achieved by letting I forward the packet P with the 1325 following sequence of segments: {D,SS,E}. D is a nodal segment, SS is 1326 the service segment corresponding to the service to apply to the 1327 packet P, and E is the nodal segment corresponding to the egress node 1328 selected by I for that packet. 1330 6. OAM 1332 6.1. Monitoring a remote bundle 1334 This section documents a few representative SR/OAM use-cases. 1336 +--+ _ +--+ +-------+ 1337 | | { } | |---991---L1---662---| | 1338 |MS|--{ }-|R1|---992---L2---663---|R2 (72)| 1339 | | {_} | |---993---L3---664---| | 1340 +--+ +--+ +-------+ 1342 Figure 12: Probing all the links of a remote bundle 1344 In the above figure, a monitoring system (MS) needs to assess the 1345 dataplane availability of all the links within a remote bundle 1346 connected to routers R1 and R2. 1348 The monitoring system retrieves the segment information from the IGP 1349 LSDB and appends the following segment list: {72, 662, 992, 664} on 1350 its IP probe (whose source and destination addresses are the address 1351 of AA). 1353 MS sends the probe to its connected router. If the connected router 1354 is not SR compliant, a tunneling technique can be used to tunnel the 1355 SR-based probe to the first SR router. The SR domain forwards the 1356 probe to R2 (72 is the node segment of R2). R2 forwards the probe to 1357 R1 over link L1 (adjacency segment 662). R1 forwards the probe to R2 1358 over link L2 (adjacency segment 992). R2 forwards the probe to R1 1359 over link L3 (adjacency segment 664). R1 then forwards the IP probe 1360 to AA as per classic IP forwarding. 1362 6.2. Monitoring a remote peering link 1364 In Figure 6, node A can monitor the dataplane liveness of the 1365 unidirectional peering link from C to D of AS2 by sending an IP probe 1366 with destination address A and segment list {101, 9001}. Node-SID 101 1367 steers the probe to C and External Adj-SID 9001 steers the probe from 1368 C over the desired peering link to D of AS2. The SR header is 1369 removed by C and D receives a plain IP packet with destination 1370 address A. D returns the probe to A through classic IP forwarding. 1371 BFD Echo mode ([RFC5880]) would support such liveliness 1372 unidirectional link probing application. 1374 7. IANA Considerations 1376 TBD 1378 8. Manageability Considerations 1380 TBD 1382 9. Security Considerations 1384 TBD 1386 10. Acknowledgements 1388 We would like to thank Dave Ward, Dan Frost, Stewart Bryant, Thomas 1389 Telkamp, Ruediger Geib and Les Ginsberg for their contribution to the 1390 content of this document. 1392 11. References 1394 11.1. Normative References 1396 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1397 Requirement Levels", BCP 14, RFC 2119, March 1997. 1399 [RFC4786] Abley, J. and K. Lindqvist, "Operation of Anycast 1400 Services", BCP 126, RFC 4786, December 2006. 1402 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 1403 Engineering", RFC 5305, October 2008. 1405 [RFC5316] Chen, M., Zhang, R., and X. Duan, "ISIS Extensions in 1406 Support of Inter-Autonomous System (AS) MPLS and GMPLS 1407 Traffic Engineering", RFC 5316, December 2008. 1409 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1410 (BFD)", RFC 5880, June 2010. 1412 [RFC7011] Claise, B., Trammell, B., and P. Aitken, "Specification of 1413 the IP Flow Information Export (IPFIX) Protocol for the 1414 Exchange of Flow Information", STD 77, RFC 7011, September 1415 2013. 1417 11.2. Informative References 1419 [I-D.filsfils-spring-segment-routing] 1420 Filsfils, C., Previdi, S., Bashandy, A., Decraene, B., 1421 Litkowski, S., Horneffer, M., Milojevic, I., Shakir, R., 1422 Ytti, S., Henderickx, W., Tantsura, J., and E. Crabbe, 1423 "Segment Routing Architecture", draft-filsfils-spring- 1424 segment-routing-04 (work in progress), July 2014. 1426 [I-D.filsfils-spring-segment-routing-ldp-interop] 1427 Filsfils, C., Previdi, S., Bashandy, A., Decraene, B., 1428 Litkowski, S., Horneffer, M., Milojevic, I., Shakir, R., 1429 Ytti, S., Henderickx, W., Tantsura, J., and E. Crabbe, 1430 "Segment Routing interoperability with LDP", draft- 1431 filsfils-spring-segment-routing-ldp-interop-02 (work in 1432 progress), September 2014. 1434 [I-D.filsfils-spring-segment-routing-mpls] 1435 Filsfils, C., Previdi, S., Bashandy, A., Decraene, B., 1436 Litkowski, S., Horneffer, M., Milojevic, I., Shakir, R., 1437 Ytti, S., Henderickx, W., Tantsura, J., and E. Crabbe, 1438 "Segment Routing with MPLS data plane", draft-filsfils- 1439 spring-segment-routing-mpls-03 (work in progress), August 1440 2014. 1442 [I-D.francois-spring-segment-routing-ti-lfa] 1443 Francois, P., Filsfils, C., Bashandy, A., and B. Decraene, 1444 "Topology Independent Fast Reroute using Segment Routing", 1445 draft-francois-spring-segment-routing-ti-lfa-00 (work in 1446 progress), May 2014. 1448 [I-D.ietf-i2rs-architecture] 1449 Atlas, A., Halpern, J., Hares, S., Ward, D., and T. 1450 Nadeau, "An Architecture for the Interface to the Routing 1451 System", draft-ietf-i2rs-architecture-05 (work in 1452 progress), July 2014. 1454 [I-D.ietf-idr-ls-distribution] 1455 Gredler, H., Medved, J., Previdi, S., Farrel, A., and S. 1456 Ray, "North-Bound Distribution of Link-State and TE 1457 Information using BGP", draft-ietf-idr-ls-distribution-06 1458 (work in progress), September 2014. 1460 [I-D.ietf-isis-segment-routing-extensions] 1461 Previdi, S., Filsfils, C., Bashandy, A., Gredler, H., 1462 Litkowski, S., Decraene, B., and J. Tantsura, "IS-IS 1463 Extensions for Segment Routing", draft-ietf-isis-segment- 1464 routing-extensions-02 (work in progress), June 2014. 1466 [I-D.ietf-isis-te-metric-extensions] 1467 Previdi, S., Giacalone, S., Ward, D., Drake, J., Atlas, 1468 A., Filsfils, C., and W. Wu, "IS-IS Traffic Engineering 1469 (TE) Metric Extensions", draft-ietf-isis-te-metric- 1470 extensions-03 (work in progress), April 2014. 1472 [I-D.ietf-mpls-te-express-path] 1473 Atlas, A., Drake, J., Giacalone, S., Ward, D., Previdi, 1474 S., and C. Filsfils, "Performance-based Path Selection for 1475 Explicitly Routed LSPs using TE Metric Extensions", draft- 1476 ietf-mpls-te-express-path-00 (work in progress), October 1477 2013. 1479 [I-D.ietf-ospf-segment-routing-extensions] 1480 Psenak, P., Previdi, S., Filsfils, C., Gredler, H., 1481 Shakir, R., Henderickx, W., and J. Tantsura, "OSPF 1482 Extensions for Segment Routing", draft-ietf-ospf-segment- 1483 routing-extensions-02 (work in progress), August 2014. 1485 [I-D.ietf-pce-pce-initiated-lsp] 1486 Crabbe, E., Minei, I., Sivabalan, S., and R. Varga, "PCEP 1487 Extensions for PCE-initiated LSP Setup in a Stateful PCE 1488 Model", draft-ietf-pce-pce-initiated-lsp-01 (work in 1489 progress), June 2014. 1491 [I-D.ietf-pce-stateful-pce] 1492 Crabbe, E., Minei, I., Medved, J., and R. Varga, "PCEP 1493 Extensions for Stateful PCE", draft-ietf-pce-stateful- 1494 pce-09 (work in progress), June 2014. 1496 [I-D.sivabalan-pce-segment-routing] 1497 Sivabalan, S., Medved, J., Filsfils, C., Crabbe, E., 1498 Raszuk, R., Lopez, V., and J. Tantsura, "PCEP Extensions 1499 for Segment Routing", draft-sivabalan-pce-segment- 1500 routing-03 (work in progress), July 2014. 1502 [RFC5443] Jork, M., Atlas, A., and L. Fang, "LDP IGP 1503 Synchronization", RFC 5443, March 2009. 1505 [RFC6138] Kini, S. and W. Lu, "LDP IGP Synchronization for Broadcast 1506 Networks", RFC 6138, February 2011. 1508 Authors' Addresses 1510 Clarence Filsfils (editor) 1511 Cisco Systems, Inc. 1512 Brussels 1513 BE 1515 Email: cfilsfil@cisco.com 1517 Pierre Francois (editor) 1518 IMDEA Networks 1519 Leganes 1520 ES 1522 Email: pierre.francois@imdea.org 1524 Stefano Previdi 1525 Cisco Systems, Inc. 1526 Via Del Serafico, 200 1527 Rome 00142 1528 Italy 1530 Email: sprevidi@cisco.com 1532 Bruno Decraene 1533 Orange 1534 FR 1536 Email: bruno.decraene@orange.com 1537 Stephane Litkowski 1538 Orange 1539 FR 1541 Email: stephane.litkowski@orange.com 1543 Martin Horneffer 1544 Deutsche Telekom 1545 Hammer Str. 216-226 1546 Muenster 48153 1547 DE 1549 Email: Martin.Horneffer@telekom.de 1551 Igor Milojevic 1552 Telekom Srbija 1553 Takovska 2 1554 Belgrade 1555 RS 1557 Email: igormilojevic@telekom.rs 1559 Rob Shakir 1560 British Telecom 1561 London 1562 UK 1564 Email: rob.shakir@bt.com 1566 Saku Ytti 1567 TDC Oy 1568 Mechelininkatu 1a 1569 TDC 00094 1570 FI 1572 Email: saku@ytti.fi 1573 Wim Henderickx 1574 Alcatel-Lucent 1575 Copernicuslaan 50 1576 Antwerp 2018 1577 BE 1579 Email: wim.henderickx@alcatel-lucent.com 1581 Jeff Tantsura 1582 Ericsson 1583 300 Holger Way 1584 San Jose, CA 95134 1585 US 1587 Email: Jeff.Tantsura@ericsson.com 1589 Sriganesh Kini 1590 Ericsson 1591 300 Holger Way 1592 San Jose, CA 95134 1593 US 1595 Email: sriganesh.kini@ericsson.com 1597 Edward Crabbe 1598 Individual 1600 Email: edward.crabbe@gmail.com