idnits 2.17.00 (12 Aug 2021) /tmp/idnits18980/draft-previdi-filsfils-isis-segment-routing-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 20, 2013) is 3342 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4971 (Obsoleted by RFC 7981) == Outdated reference: draft-ietf-rtgwg-remote-lfa has been published as RFC 7490 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IS-IS for IP Internets S. Previdi, Ed. 3 Internet-Draft C. Filsfils, Ed. 4 Intended status: Standards Track A. Bashandy 5 Expires: September 21, 2013 Cisco Systems, Inc. 6 M. Horneffer 7 Deutsche Telekom 8 B. Decraene 9 S. Litkowski 10 Orange 11 I. Milojevic 12 Telekom Srbija 13 R. Shakir 14 British Telecom 15 S. Ytti 16 TDC Oy 17 W. Henderickx 18 Alcatel-Lucent 19 J. Tantsura 20 Ericsson 21 March 20, 2013 23 Segment Routing with IS-IS Routing Protocol 24 draft-previdi-filsfils-isis-segment-routing-02 26 Abstract 28 Segment Routing (SR) enables any node to select any path (explicit or 29 derived from IGPs SPT computations) for each of its traffic classes. 30 The path does not depend on a hop-by-hop signaling technique (neither 31 LDP nor RSVP). It only depends on a set of "segments" that are 32 advertised by the IS-IS routing protocol. These segments act as 33 topological sub-paths that can be combined together to form the 34 desired path. 36 There are two forms of segments: node and adjacency. A node segment 37 represents a path to a node. An adjacency segment represents a 38 specific adjacency to a node. A node segment is typically a multi- 39 hop path while an adjacency segment is a one-hop path. SR's control- 40 plane can be applied to IPv6 and MPLS dataplanes. 42 Segment Routing control-plane can be applied to the MPLS dataplane: a 43 node segment to node N is instantiated in the MPLS dataplane as an 44 LSP along the shortest-path (SPT) to the node. An adjacency segment 45 is instantiated in the MPLS dataplane as a cross-connect entry 46 pointing to a specific egress datalink. 48 This document describes the Segment Routing functions, a set of use 49 cases it addresses and the necessary changes that are required in the 50 IS-IS protocol. 52 Requirements Language 54 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 55 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 56 document are to be interpreted as described in RFC 2119 [RFC2119]. 58 Status of This Memo 60 This Internet-Draft is submitted in full conformance with the 61 provisions of BCP 78 and BCP 79. 63 Internet-Drafts are working documents of the Internet Engineering 64 Task Force (IETF). Note that other groups may also distribute 65 working documents as Internet-Drafts. The list of current Internet- 66 Drafts is at http://datatracker.ietf.org/drafts/current/. 68 Internet-Drafts are draft documents valid for a maximum of six months 69 and may be updated, replaced, or obsoleted by other documents at any 70 time. It is inappropriate to use Internet-Drafts as reference 71 material or to cite them other than as "work in progress." 73 This Internet-Draft will expire on September 22, 2013. 75 Copyright Notice 77 Copyright (c) 2013 IETF Trust and the persons identified as the 78 document authors. All rights reserved. 80 This document is subject to BCP 78 and the IETF Trust's Legal 81 Provisions Relating to IETF Documents 82 (http://trustee.ietf.org/license-info) in effect on the date of 83 publication of this document. Please review these documents 84 carefully, as they describe your rights and restrictions with respect 85 to this document. Code Components extracted from this document must 86 include Simplified BSD License text as described in Section 4.e of 87 the Trust Legal Provisions and are provided without warranty as 88 described in the Simplified BSD License. 90 Table of Contents 92 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 93 2. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 4 94 2.1. Simplicity . . . . . . . . . . . . . . . . . . . . . . . 4 95 2.2. Capacity Planning and Traffic Engineering (TE) . . . . . 5 96 2.2.1. Disjointness in dual-plane networks . . . . . . . . . 8 97 2.2.2. QoS-based Routing Policies . . . . . . . . . . . . . 9 98 2.2.3. Deterministic non-ECMP Path . . . . . . . . . . . . . 10 99 2.3. Fast Reroute . . . . . . . . . . . . . . . . . . . . . . 11 100 2.4. Segment Routing in Software Defined Networks (SR-SDN) . . 12 101 3. Segment Routing Identifiers . . . . . . . . . . . . . . . . . 13 102 3.1. Node Segment Identifier (Node-SID) . . . . . . . . . . . 13 103 3.1.1. Node-SID SubTLV . . . . . . . . . . . . . . . . . . . 14 104 3.2. Adjacency Segment Identifier (Adj-SID) . . . . . . . . . 14 105 3.2.1. Adj-SID and Interface Address . . . . . . . . . . . . 16 106 3.2.2. Adjacency Segment Identifier (Adj-SID) SubTLV . . . . 16 107 3.2.3. Adjacency Segment Identifiers in LANs . . . . . . . . 17 108 4. Segment Routing Capabilities . . . . . . . . . . . . . . . . 19 109 5. Elements of Procedure . . . . . . . . . . . . . . . . . . . . 20 110 5.1. Unicity . . . . . . . . . . . . . . . . . . . . . . . . . 20 111 5.2. IS-IS Multi-Level . . . . . . . . . . . . . . . . . . . . 20 112 5.3. Data-Plane Encodings . . . . . . . . . . . . . . . . . . 21 113 5.3.1. Segment Routing RIB (SR-RIB) . . . . . . . . . . . . 21 114 5.3.2. Multiprotocol Label Switching (MPLS) . . . . . . . . 23 115 5.3.3. IP Version 6 . . . . . . . . . . . . . . . . . . . . 24 116 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 117 7. Manageability Considerations . . . . . . . . . . . . . . . . 24 118 8. Security Considerations . . . . . . . . . . . . . . . . . . . 24 119 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 120 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 121 10.1. Normative References . . . . . . . . . . . . . . . . . . 24 122 10.2. Informative References . . . . . . . . . . . . . . . . . 25 123 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 125 1. Introduction 127 Segment Routing (SR) enables any node to select any path (explicit or 128 derived from IGPs SPT computations) for each of its traffic classes. 129 The path does not depend on a hop-by-hop signaling technique (neither 130 LDP nor RSVP). It only depends on a set of "segments" that are 131 advertised by the IS-IS routing protocol. These segments act as 132 topological sub-paths that can be combined together to form the 133 desired path. 135 There are two forms of segments: node and adjacency. A Node Segment 136 represents the shortest path to a node. A Node Segment is typically 137 a multi-hop sortest path. An adjacency Segment represents a specific 138 adjacency to a node. 140 SR's control-plane can be applied to IPv6 and MPLS dataplanes. 142 In the MPLS dataplane, a node segment to node N is instantiated as an 143 LSP along the shortest-path (spt) to the node. An adjacency segment 144 is instantiated as a crossconnect entry pointing to a specific egress 145 datalink. 147 At the heart of the SR technology, we find node segments. Node 148 segments must be globally unique within the network domain. 150 A----B----C----D 152 Figure 1 154 In Figure 1, all the nodes must be configured with the same Segment 155 Routing Identifiers Block (called SRB Node Registry), e.g. 64-5000, 156 and any node segment be uniquely allocated from that SRB Node 157 Registry (e.g. A, B, C and D are configured respectively with node 158 segments 64, 65, 66 and 67). 160 In the MPLS dataplane instantiation, this means that all the nodes 161 need to be able to reserve and allocate to the SR control-plane the 162 same MPLS label range (e.g. 64-5000). 164 2. Applicability 166 Segment Routing is applicable to the following use-cases: simplicity, 167 TE, FRR and SDN. 169 2.1. Simplicity 171 The vast majority of IP traffic travels on shortest-paths to their 172 destination. SR delivers a very efficient control-plane technique to 173 instantiate shortest-path-based node segments into the forwarding 174 dataplane. In the example described in Figure 1, considering the 175 MPLS forwarding plane, when node D advertises node segment 64 for its 176 loopbacks D/32, node A and B introduce the following MPLS Dataplane 177 entries: 179 A: IP2MPLS: FEC D/32 => push 64, nhop B 180 A: MPLS2MPLS: 64 => swap 64, nhop B 181 B: IP2MPLS: FEC D/32 => push 64, nhop C 182 B: MPLS2MPLS: 64 => swap 64, nhop C 184 If D advertises node segment 64 with the P flag reset: 186 C: IP2MPLS: FEC D/32 => push explicit-null, nhop D 187 C: MPLS2MPLS: 64 => pop, nhop D 189 If D advertises node segment 64 with the P flag set: 191 C: IP2MPLS: FEC D/32 => push 64, nhop D 192 C: MPLS2MPLS: 64 => swap 64, nhop D 194 LDP is no longer required to instantiate shortest-path LSP's to a 195 remote node. The reduction in the number of protocols to operate, 196 helps reduce the overall operational complexity of the network. For 197 example, the complex IGP/LDP synchronization, described in [RFC5443] 198 and [RFC6138] no longer needs to be considered hence drastically 199 improving the scaling and reliability of the network. 201 For example, when a core node C has 40 TE tunnels to 40 remote core 202 routers and 260 adjacent aggregation routers and LDP LSP's need to be 203 signaled to 5000 FEC's, then node C maintains an LDP label database 204 of (260+40)*5000 = 1.500.000 label bindings. In fact several 205 networks today are exposed to much more difficult LDP scaling 206 constraints. 208 In comparison, in the same use case, SR control-plane only maintains 209 5000 node segments. This is 300 times more scalable. 211 2.2. Capacity Planning and Traffic Engineering (TE) 213 Capacity Planning deals with anticipating the placement of the 214 traffic matrix onto the network topology, for a set of expected 215 traffic and topology variations. The heart of the process consists 216 in simulating the placement of the traffic along ECMP-aware shortest- 217 path and accounting for the resulting bandwidth usage. The bandwidth 218 accounting of a demand along its shortest-path is a basic capability 219 of any planning tool or PCE server. 221 For example, in the network topology described in Figure 2 and 222 assuming a default IGP metric of 2 and IGP metrics 223 BC=BG=CD=CE=DF=EF=1, a 1600Mbps A-to-Z flow is accounted as consuming 224 1600Mbps on links AB and FZ, 800Mbps on links BC, BG and GF, and 225 400Mbps on links CD, DF, CE and EF. 227 C-----D 228 / \ \ 229 A---B +--E--F--Z 230 \ / 231 G------+ 233 Figure 2: Example Topology 1 235 ECMP is extremely frequent in SP, Enterprise and DC architectures and 236 it is not rare to see as much as 128 different ECMP paths between a 237 source and a destination within a single network domain. 239 This is illustrated in Figure 3 which consists of a subset of a 240 network where already 6 ECMP paths are observed from A to M. 242 C 243 / \ 244 B-D-L-- 245 / \ / \ 246 A E \ 247 \ M 248 \ G / 249 \ / \ / 250 F-H-K 251 \ / 252 I 254 Figure 3: ECMP Topology Example 256 Segment Routing offers a simple support for such ECMP-based shortest- 257 path placement: a node segment. A single node segment enumerates all 258 the ECMP paths along the shortest-path. 260 This is much simpler to the RSVP-TE model where one TE tunnel is 261 required for each enumerated ECMP path. 263 When the capacity planning process detects that a traffic or topology 264 variation would lead to congestion traffic engineering or capacity 265 increase is triggered. 267 The most basic traffic engineering option consists of finding the 268 smallest set of demands that need to be routed off their shortest 269 path to eliminate the congestion, then to compute an explicit path 270 for each of them and instantiating these traffic-engineered policies 271 in the network. 273 Segment Routing offers a simple support for explicit path policy. 275 In the diagram described in Figure 3, it is assumed that the 276 requirement is that AM flow should not consume any resource on the LM 277 and the FG links. 279 The first option would consists of using the following encapsulation 280 at A: A sends any traffic to M towards the nhop F with a two-label 281 stack where the top label is the adjacent segment FI and the next 282 label is the node segment to M. Alternatively, a three-label stack 283 with adjacency segments FI, IK and KM could have been used. 285 The first option seems preferred as classically IP capacity planners 286 optimize traffic along ECMP-aware shortest-path. The more node 287 segment can be used, the better. However, both options are available 288 and one can favor adjacency segments. 290 In the same way, if the requirement in the diagram described in 291 Figure 3, is that the AM flow should not consume any resource along 292 the LM link but should use any resource on the bottom of the 293 topology, then A could send the AM flow to its nhop F with a single 294 label: the label representing the node segment to M. 296 We believe that Segment Routing offers an excellent solution for 297 Capacity Planning because: 299 Node segment represents an ECMP-aware shortest path. 301 Adjacency segments allow to express any explicit path. 303 The combination of node and adjacency segment allows to express 304 any path without having to enumerate all the ECMP path options. 306 The capacity planning process ensures that the majority of the 307 traffic rides on node segments (ECMP-based shortest path) while a 308 minority of the traffic is routed off its shortest-path. 310 The network does not hold any signaling state for the traffic 311 engineered flows. 313 In comparison, a classic RSVP-TE Full-mesh traffic engineering 314 solution involves a full-mesh of tunnels from any edge to edge of the 315 network. For any specific edge to edge pair, tens of RSVP-TE tunnels 316 may need to be enumerated to load-share the traffic along all the 317 possible shortest paths. 319 Analytically, assuming a single tunnel from an edge to an edge 320 (optimistic assumption), an RSVP-TE Full-Mesh traffic engineering 321 solution scales as E^2 where E is the number of edge nodes. The 322 number of LSP's signaled and maintained by the network (in control- 323 plane and in dataplane) scales quadratically with the number of edge 324 nodes. 326 In contrast, the Segment Routing solutions maintains E node segments. 327 The number of control-plane and dataplane states scale linearly with 328 the number of edge nodes. 330 A network of 1000 edges is very frequent nowadays. In such a case, 331 the capacity planning solution based on segment routing scales 1000 332 times better than the RSVP-TE Full-Mesh solution. 334 We have applied this comparative study to a use-case using real 335 topology and real demand matrix. The data-set consisted in a full- 336 mesh of 12000 Tunnels where originally only 65% of the traffic was 337 riding on their shortest path. Two well-known defects are 338 illustrated in this data set: the lack of ECMP support in RSVP-TE and 339 hence the increase of the number of tunnels to enumerate all the ECMP 340 options, the inefficiency of distributed optimization as too much 341 traffic is riding off its shortest path. Using centralized 342 optimization, we could optimize the IGP metrics such as to place 98% 343 of the traffic on ECMP-aware shortest-path (one single node segment) 344 while only 2% of the traffic required explicit traffic engineering 345 tunnels away from the shortest path. Only 250 demands required 346 explicit paths. 348 In this example, we increased the efficiency of the network by 150%. 349 Indeed, 98% is riding on shortest path instead of 65%. Furthermore, 350 we reduced the operational complexity of the network by 60 times (200 351 explicit routing policies instead of 12000). 353 The next two sections provide other examples illustrating the 354 simplicity and efficiency benefits of the SR-based traffic 355 engineering solution. 357 2.2.1. Disjointness in dual-plane networks 359 Many networks are built according to the dual-plane design: 361 Each access region k is connected to the core by two C routers 362 (C(1,k) and C(2,k)). 364 C(1,k) is part of plane-1 of the dual-plane core. 366 C(2,k) is part of plane-2 of the dual-plane core. 368 C(1,k) has a link to C(2, l) iff k = l. 370 {C(1,k) has a link to C(1, l)} iff {C(2,k) has a link to C(2, l)}. 372 Many networks need to deliver disjoint-based services (bank, 373 government...): an access node A connected to core nodes C(1, A) and 374 C(2, A) need to provide two disjoint services towards an access node 375 Z connected to core nodes C(1, Z) and C(2, Z). 377 Classic IP routing cannot fulfill this requirement as A would load- 378 balance between the dual planes across ECMP paths. 380 RSVP-TE traffic-engineering would allow to signal two disjoint paths: 381 one across the first plane and one across the second plane with the 382 following two draw-backs: 384 Many ECMP paths exist within each plane (from (Ci, A) to (Ci, Z)) 385 and hence many RSVP-TE tunnels might be required to efficiently 386 distribute the load within each plane. 388 Many such services might need to be supported. 390 Assuming 10000 such services across the network and assuming an 391 average of 4 ECMP paths within each plane, a straight application of 392 RSVP-TE would require 10000 * 2 * 4 tunnels hence 80000 tunnels. 393 Even if load-sharing of traffic along ECMP paths in each plane is 394 dropped, the solution would still need 20000 tunnels. 396 Segment Routing (SR) offers a simpler solution. 398 Any node of the first plane can be configured with an anycast 399 loopback say 1.1.1.1/32 to which node segment 111 is attached. Any 400 node of the second plane can be configured with an anycast loopback 401 say 2.2.2.2/32 to which node segment 222 is attached. Let us also 402 assume that access node Z is advertising node segment 500. 404 A flow from A to Z via the first plane is simply represented by the 405 segment list {111, 500}. In the MPLS dataplane case, A pushes a two- 406 label stack for Z-destined packets: the top label is 111 and the 407 second label is 500. 409 Node segment 111 gets the traffic on ECMP-aware shortest path to the 410 first plane and then node segment 500 gets the traffic on ECMP-aware 411 shortest path to Z. 413 Similarly, a flow from A to Z via the second plane is simply 414 represented by the segment list {222, 500}. 416 This simple solution would only add two node segments to the network 417 instead of 80000 LSP's signaled by the RSVP-TE solution. This is 418 40000 better. 420 2.2.2. QoS-based Routing Policies 422 Frequently, different classes of service need different path 423 characteristics. 425 For example, an international network with presence in Tokyo and 426 Brussels may have lots of cheap network capacity from Tokyo to Europe 427 via USA and some scarce expensive capacity via Russia. 429 ...USA...Brussels...Russia...Tokyo...USA... 431 Figure 4: International Topology Example 433 In such case, the IGP metrics would be tuned to have a shortest-path 434 from Tokyo to Brussels via USA. 436 This would provide efficient capacity planning usage while fulfilling 437 the requirements of most of the data traffic. However, it may not 438 suite the latency requirements of the voice traffic between the two 439 cities. 441 Segment Routing (SR) offers a simple solution to the problem. 443 The core routers in Russia are configured with an extra anycast 444 loopback 3.3.3.3/32 to which node segment 333 is attached. 446 If we assume that Brussels is configured with node segment 600, Tokyo 447 can send all its data traffic to Brussels with one single segment: 448 600. 600 gets the traffic from Tokyo to Brussels via USA and 449 exploits any ECMP-path along this shortest-path. 451 Tokyo can send all its voice traffic to Brussels with a list of two 452 segments: {333, 600}. 333 gets the traffic to Russia and exploit any 453 ECMP path along the shortest path. 600 gets the traffic from Russia 454 to Brussels via ECMP-aware shortest-path. 456 One single metric per link is sufficient as clearly it is possible to 457 set the IGP metrics such that the shortest-path from Brussels to 458 Russia is via Russia and not via USA and the shortest-path from 459 Russia to Brussels is not back via Tokyo and USA but straight to 460 Brussels. 462 2.2.3. Deterministic non-ECMP Path 464 The previous sections have illustrated the ease of capacity planning 465 traffic with ECMP-awareness and shortest-path. The key benefits in 466 terms of drastic reduction of the number of routing policies signaled 467 by the network control plane and maintained by the data plane have 468 been explained and several orders of scaling simplifications have 469 been illustrated. 471 In this section, we highlight SR's ability to support a completely 472 different model: the deterministic expression of a path avoiding any 473 ECMP behavior. This is realized by expressing the end-to-end path as 474 a list of adjacency segments. 476 For example, in Figure 3, one can force the AM traffic on the 477 explicit path AFGKM by using the segment list {AF, FG, GK, KM}. 479 Once again, SR offers simplicity and scaling benefits compared to the 480 alternative RSVP-TE solution: no state is signaled through the 481 network. 483 In Figure 3, with SR, nodes F, G, K and M do not maintain any SR 484 state for the A-to-M policy. With RSVP-TE, each nodes along the 485 RSVP-TE tunnel must maintain one LSP state per tunnel. 487 Here is a technique to decrease the number of adjacency segments to 488 describe non-ECMP paths. 490 In the topology example illustrated in Figure 5 node C can be 491 configured with an SR explicit policy to node G via the path CDEFG. 493 A-B-C-D-E-F-G-H 495 Figure 5: Topology Example 3 497 Node C can advertise a (forwarding) adjacency to node G and attach an 498 SR subTLV to identify the related adjacency segment (e.g 72). The 499 ERO SubTLV is further attached to identify that this adjacency is not 500 describing a real datalink between C and G but instead an SR non-ECMP 501 sub-path along the route {BC, CD, DE, EF, FG}. 503 Node A can then express its desired non-ECMP path has {AB, BC, 72, 504 GH} instead of {AB, BC, CD, DE, EF, FG, GH}. 506 Future versions of the document will document other techniques to 507 decrease the number of adjacency segments in non-ECMP source-routed 508 paths. 510 2.3. Fast Reroute 512 This section assumes familiarity with Remote-LFA concepts described 513 in [I-D.ietf-rtgwg-remote-lfa]. 515 Lemma 1: In networks with symmetric IGP metrics (the metric of a link 516 AB is the same as metric of the reverse link BA), we can prove that 517 either the P and the Q sets intersect or there is at least one P node 518 that is adjacent to a Q node. 520 Consider an arbitrary protected link S-E. In LFA FRR, if a path to 521 the destination from a neighbor N of S does not cause a packet to 522 loop back over the link S-E (i.e. N is a loop-free alternate), then 523 S can send the packet to N and the packet will be delivered to the 524 destination using the pre-failure forwarding information. 526 If there is no such LFA neighbor, then S may be able to create a 527 virtual LFA by using a tunnel to carry the packet to a point in the 528 network which is not a direct neighbor of S and from which the packet 529 will be delivered to the destination without looping back to S. 530 Remote LFA (RLFA, [I-D.ietf-rtgwg-remote-lfa]) calls such a tunnel a 531 repair tunnel. The tail-end of this tunnel is called a "remote LFA" 532 or a "PQ node". We refer to RLFA for the definitions of the P and Q 533 sets. 535 If there is no such RLFA PQ node, we propose to use a Directed LFA 536 (DLFA) repair tunnel to a Q node that is adjacent to the P space. 537 The 539 DLFA repair tunnel only requires two segments: a node segment to a P 540 node which is adjacent to the Q node and an adjacency segment from 541 the P node to its adjacent Q node. 543 It results from lemma1, that thanks to the DLFA extension, we have a 544 guaranteed LFA-based FRR technique for any network with symmetric IGP 545 metrics. 547 The solution is simple: 549 it does not require any extra computation on top of the one 550 required for RLFA. 552 The repair tunnel can be encoded efficiently with only two 553 segments. 555 The solution preserves the capacity planning properties of LFA FRR. 557 2.4. Segment Routing in Software Defined Networks (SR-SDN) 559 Some of the SDN requirements are: 561 Guarantees of Tight SLA's (FRR and bandwidth admission control). 563 Efficient use of the network resources. 565 Very high scaling to support application-based transactions. 567 Segment Routing (SR) is a compelling architecture to support SDN for 568 the following reasons. 570 SR supports a simple but efficient capacity planning process based on 571 centralized optimization. 573 SR optimizes network resources by providing a very simple support for 574 ECMP-based shortest-path flows 576 SR provides for much better scaling than alternative solution: 577 several orders of scaling gains have been illustrated in the 578 simplicity and Capacity Planning sections. 580 SR provides for guaranteed-FRR for any topology. 582 SR provides for ultimate virtualization as the network does not 583 contain any application state. The state is in the packet. It is 584 encoded as a list of segments. 586 SR provides for very frequent transaction-based application as the 587 network does not hold any state for the SR-encoded flows. 589 3. Segment Routing Identifiers 591 Segment Routing defines two types of Segment Identifiers: Node-SID 592 and Adj-SID. 594 3.1. Node Segment Identifier (Node-SID) 596 A node-SID is associated to a prefix advertised by a node (e.g. in a 597 TLV-135). The Node-SID SubTLV MAY be present in one of the following 598 TLVs: 600 TLV-135 (IPv4) defined in [RFC5305]. 602 TLV-235 (MT-ipv4) [RFC5120]. 604 TLV-236 (IPv6) [RFC5308]. 606 TLV-237 (MT-IPv6) [RFC5120]. 608 Multiple Node-SID SubTLVs MAY be attached to a prefix. A node 609 receiving a Node-SID subTLV containing more than one Node-SID MAY 610 consider only one encoded Node-SID, in which case, the first encoded 611 Node-SID MUST be considered and any additional Node-SID ignored. 613 The value of the Node-SID is a 32 bit number. 615 3.1.1. Node-SID SubTLV 617 The Node-SID SubTLV has the following format: 619 0 1 2 3 620 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 622 | Type | Length | Flags | 623 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 624 | Segment Identifier (SID) | 625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 627 where: 629 Type: TBA 631 Length: 6 octets 633 Flags: 2 octets field of following flags: 635 0 1 636 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 637 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 638 |P|E|L| | 639 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 641 where: 643 P-Flag: PHP flag. If set, then the penultimate hop MUST NOT 644 pop the Nodal-SID before delivering the packet to the node that 645 advertised the Node-SID. 647 E-Flag: External flag. If set, then the prefix is not local to 648 the ISIS protocol. It is redistributed from another protocol. 650 L-Flag: Level flag. If set, then the prefix has been 651 propagated to the router in this level from another level 652 (i.e.: from level-1 into level-2 or from level-2 into level-1). 654 Other bits: MUST be zero when sent and ignored when received. 656 Segment Identifier (SID): 32 bits of Segment Identifier 658 3.2. Adjacency Segment Identifier (Adj-SID) 659 An Adjacency Segment Identifier (Adj-SID) represents a router 660 adjacency. The value of the Adj-SID is local to the router and it is 661 encoded as a 32 bit number value using a new SubTLV. According to 662 IS-IS, each adjacency is advertised using one of the IS-IS Neighbor 663 TLVs below: 665 TLV-22 [RFC5305] 667 TLV-222 [RFC5120] 669 TLV-23 [RFC5311] 671 TLV-223 [RFC5311] 673 TLV-141[RFC5316] 675 Currently, [RFC5316] defines TLV-141 with the purpose of inter-AS 676 connectivity. In the Segment Routing context, we relax the 677 constraint and we allow TLV-141 to be used for advertising any link 678 that is external to the IS-IS domain no matter if it connects another 679 AS or not. 681 The newly defined Adj-SID subTLV carries the Adj-SID value for each 682 of the advertised adjacencies and MAY be present in any of the 683 neighbor TLVs described above. 685 Multiple Adj-SID SubTLVs MAY be attached to the Neighbor TLVs (e.g.: 686 TLV-22). An example where more than one is useful is the case of 687 parallel adjacencies between two neighbors. In the figure here 688 below: 690 _____ 691 / \ 692 ----A------B------C---- 693 \_____/ 695 Figure 6: Parallel Adjacencies 697 Router B nd C have 3 parallel adjacencies. Router B advertises three 698 distinct Neighbor TLVs (e.g.: TLV-22), one for each parallel 699 adjacency. Each of these advertisements will have its own Adj-SID 700 SubTLV with a unique value (inside the Adj-SID space of the router). 702 When router A inspects its IS-IS Link State Database (LSDB) it can 703 figure out which link to use on a source routed path going through 704 B-C links. It has knowledge of each individual parallel adjacency 705 and can handle load sharing across them on its own (i.e.: decide in 706 advance which packet should use which link). 708 However, router A may prefer not to select a specific parallel 709 interface and leave the load sharing decision to router B so that 710 load sharing is handled locally (i.e.: where parallel interfaces 711 resides). 713 In order to achieve that, router B inserts an additional Adj-SID 714 value on each of the parallel adjacencies it advertises. The value 715 of this second Adj-SID is common to all parallel adjacencies. 717 Again, when router A inspects its IS-IS LSDB, it finds that the 718 parallel adjacencies advertised by router B have a second Adj-SID 719 with a value that is common across all parallel adjacencies. Using 720 that value will bring packets into router B and the load sharing 721 decision is owned by router B itself. 723 When the same Adj-SID value is used on parallel adjacencies, we 724 called the Adj-SID a "Bundle-Adj-SID". 726 3.2.1. Adj-SID and Interface Address 728 When advertising one or more Adj-SID SubTLVs, the router MUST also 729 advertise Interface Address and Neighbor Address SubTLVs (IPv4 or 730 IPv6). The two MUST be present. The encoding is defined in 731 [RFC5305] for IPv4 and in [RFC6119] for IPv6. 733 3.2.2. Adjacency Segment Identifier (Adj-SID) SubTLV 735 The following format is defined for the Adj-SID. 737 0 1 2 3 738 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 739 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 740 | Type | Length | Flags | 741 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 742 | Adj-SID | 743 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 745 where: 747 Type: TBA 749 Length: variable. 751 Flags: 2 octets field of following flags: 753 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 754 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 755 |B|F| | 756 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 758 where: 760 B-Flag: Bundle flag. If set, then Adj-SID refers to a bundle 761 (i.e.: a set of parallel adjacencies). 763 F-Flag: FA flag. If set, then Adj-SID refers to a Forwarding 764 Adjacency. 766 Other bits: MUST be zero when sent and ignored when received. 768 Adj-SID: 32 bits of Adjacency Segment Identifier 770 Forwarding Adjacencies are defined in [RFC4206]. 772 If the F-flag is set, then the explicit path taken by the Forwarding 773 Adjacency MUST be encoded using the following subTLV in the Adj-SID 774 SubTLV: 776 0 1 2 3 777 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 778 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 779 | Type | Length | Flags | 780 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 781 | Segment Identifier (SID) #1 | 782 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 783 | Segment Identifier (SID) #... | 784 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 786 where: 788 Type: TBA. 790 Flags: none are currently used. 792 Length: variable, 2 + multiple of 4 octets. 794 Segment Identifier (SID): The SID value of each hop in the 795 explicit path of the Forwarding Adjacency. 797 3.2.3. Adjacency Segment Identifiers in LANs 798 In LAN subnetworks, the Designated Intermediate System (DIS) is 799 elected and originates the Pseudonode-LSP (PN-LSP) including all 800 neighbors of the DIS. 802 Still, when Segment Routing is used, each router in the LAN MUST 803 advertise the Adj-SID of each of its neighbors. Since, on LANs, 804 there are no neighbor advertisements in non-PN-LSPs (other than the 805 adjacency to the DIS), each router advertises the set of Adj-SIDs 806 (for each its neighbors) inside the Intermediate To Intermediate 807 Hello (IIH) packets as soon as the adjacency to that neighbor reaches 808 the UP state. 810 We define a new IIH TLV, the IIH-Adj-SID TLV with following format: 812 0 1 2 3 813 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 814 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 815 | Type | Length | Flags | System-ID | 816 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 817 | Adj-SID | 818 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 820 Where: 821 Type: TBA 822 Length: 6 octets 824 Flags: 10 bits of flags. None are used at this stage. 825 MUST be zero when sent and ignored when received. 826 System-ID: 6 octets of system ID and pseudonode number of 827 the neighbor. 828 Adj-SID: 32 bits of IIH Adjacency Segment Identifier 830 Therefore, each router in the LAN advertises in its IIH packet the 831 list of UP adjacencies in the form of tuples: . 833 The DIS, as any other router in the LAN, receives IIHs from all 834 routers on the LANand stores the set of tuples . 836 The DIS includes the Adj-SID information received in the IIHs when 837 advertising IS-Neighbors in its PN-LSPs. 839 The result is that the PN-LS contains the neighbors of the DIS and, 840 for each of them, the list of their Adj-SIDs to their respective 841 neighbors in the LAN. 843 This could require multiple IS-Neighbor TLVs for the same neighbor if 844 there are more than 25 ISs on a LAN. 846 Each router within the level-1 area or level-2 subdomain, when 847 receiving the PN-LSP, will extract each neighbor and its 848 corresponding Adj-SID table in order to figure out which Adj-SID has 849 to be used between any two neighbors in the LAN. 851 4. Segment Routing Capabilities 853 Segment Routing requires each router to advertise its capabilities to 854 the rest of the routing domain. TLV-242 (defined in [RFC4971]) 855 describes router capabilities. For the purposes of Segment Routing 856 we define an additional subTLV: the SR-Cap SubTLV. 858 The SR-Cap SubTLV MUST be present in the Router Capability TLV 859 (TLV-242), MUST appear only once and has following format: 861 0 1 2 3 862 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 863 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 864 | Type | Length | SR Capabilities Flags | 865 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 867 where: 869 Type: TBA. 871 Length: 2 octets. 873 SR Capabilities Flags: 2 octets field of following flags: 875 0 1 876 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 877 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 878 |M|F|S| | 879 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 881 where: 883 M-Flag: MPLS flag. If set, then the advertising router is 884 capable of MPLS label based forwarding. 886 F-Flag: IPv4 flag. If set, then the advertising router is 887 capable of IPv4 based forwarding. 889 S-Flag: IPv6 flag. If set, then the advertising router is 890 capable of IPv6 based forwarding. 892 Other bits: MUST be zero when sent and ignored when 893 received. 895 The Router Capability TLV defined in [RFC4971]specifies the S and D 896 bits. The SR-Capability SubTLV MUST be propagated throughout the 897 entire routing domain and therefore the S bit in the Router 898 Capability TLV MUST be set. 900 The D bit of Router Capability TLV must be set accordingly. I.e.: it 901 MUST be set when the Router Capability TLV is leaked from level-2 to 902 level-1. 904 5. Elements of Procedure 906 This section describes aspects of Segment Routing procedures. 908 5.1. Unicity 910 The benefits of the Segment Routing solution build up on a small set 911 of rules. The first 64 values of the 32-bit segment space are 912 reserved and cannot be used by the SR Control-Plane neither for node 913 or adjacency segment. 915 All the nodes in the ISIS domain must be configured with the node SRB 916 range. The range is a local policy and is not advertised by ISIS. A 917 node segment must be allocated from the node SRB range. 919 A given Node-SID must be allocated to a unique IP prefix. If the IP 920 prefix is of anycast type and is advertised by two nodes N and M, 921 then N and M attach the same (anycast) Node-SID to the same anycast 922 IP address. 924 If a node N learns a remote Adj-SID S but advertised with a value 925 that falls in its locally configured Node SRB range, N SHOULD issue 926 an error log warning for a misconfiguration. 928 If a node N learns a remote Node-SID S but with a value that falls 929 outside its locally configured node SRB range, N SHOULD NOT insert 930 any RIB entry for segment S. Node N SHOULD issue an error log 931 warning for misconfiguration. 933 If a node N learns about two different IP addresses advertised with 934 the same Node-SID, N MUST insert a RIB entry only for the node 935 segment related to the highest IP address. N SHOULD issue an error 936 log warning for misconfiguration. 938 5.2. IS-IS Multi-Level 939 In IS-IS protocol, adjacencies advertisements (e.g.: TLV-22) are not 940 propagated across level/area boundaries hence the adjacency segment 941 (Adj-SID) is not propagated across levels either. 943 If a prefix is propagated across levels, then its Node-SID SubTLVs 944 are also propagated. The Node-SID S flag is set accordingly, 945 independently from the settings of the U/D bit defined in [RFC5305]. 947 5.3. Data-Plane Encodings 949 The SR control-plane supports different forwarding planes. The first 950 section describes the SR source routing concept and its RIB 951 representation. The next sections map the SR-RIB entries into the 952 MPLS and IPv6 forwarding planes. 954 5.3.1. Segment Routing RIB (SR-RIB) 956 SR leverages source routing and introduces the following terminology: 958 A packet is prepended with an SR header which contains a list of 959 segments. 961 A list of segments is ordered and has a pointer identifying the 962 active segment. 964 The active segment is the segment identified by the pointer. 966 Forwarding is based on the active segment. 968 The following forwarding operations are defined for SR: 970 CONTINUE: the active segment remains active after the forwarding 971 operation and the pointer is left unchanged. 973 NEXT: the active segment is completed after the forwarding 974 operation and the pointer is advanced to the next segment in the 975 ordered list. 977 INSERT: a list of segments is inserted in the segment list. The 978 INSERT operation can be coupled with the CONTINUE or NEXT 979 operation. 981 Other operations will be introduced in future versions of the 982 document. 984 Two types of SR-RIB entries are defined: 986 TRANSIT: the ingress packet comes with an active segment. A 987 Transit SR-RIB entry is represented as: 989 Ingress active segment. 991 Operation on the active segment. 993 Egress Interface. 995 INGRESS: the ingress packet comes without active segment (plain 996 IP). 998 5.3.1.1. SR-RIB entry for local segments 1000 A node MUST install a transit SR-RIB entry for any local adjacency 1001 segment (Adj-SID) of value V attached to datalink L with: 1003 Ingress active segment : V 1005 Ingress operation: NEXT 1007 Egress interface: L 1009 A node MUST install a transit SR-RIB entry for any local adjacency 1010 segment (Adj-SID) of value W attached to ISIS link bundle B with: 1012 Ingress active segment: W 1014 Ingress operation: NEXT 1016 Egress interface: hash between any datalink within bundle B 1018 A node MUST install a transit SR-RIB entry for any local node segment 1019 (Node-SID) of value N with: 1021 Ingress active segment: N 1023 Ingress operation: NEXT (if not the last segment, then process the 1024 next segment else lookup in IP table) 1026 5.3.1.2. Transit SR-RIB entry for remote segments 1028 A node MUST install a transit SR-RIB entry for any remote node 1029 segment (Node-SID) of value R attached to IP prefix P with: 1031 Ingress active segment: R 1032 Ingress operation: CONTINUE (However, if the P flag is reset and P 1033 is advertised by the next-hop, then the operation is NEXT instead 1034 of CONTINUE). 1036 Egress interface: interface to next-hop along the shortest-path to 1037 P. 1039 A transit SR-RIB entry is never installed for a remote adjacency 1040 segment. 1042 5.3.1.3. Ingress SR-RIB entry for remote segments 1044 Ingress SR-RIB entries enable traffic injection in the SR forwarding 1045 plane. An ingress SR-RIB entry is generally represented as: 1047 Classification: what traffic 1049 Encapsulation: what list of segments to insert 1051 In this section, we define its simplest instantiation: the automated 1052 ingress SR-RIB entry insertion towards remote node segments (Node- 1053 SID). 1055 A node SHOULD install an ingress SR-RIB entry for any remote node 1056 segment (Node-SID) of value V attached to IP prefix P with: 1058 FEC: prefix P 1060 Ingress operation: insert nodal segment V. 1062 Egress interface: interface to next-hop along the shortest-path to 1063 P. 1065 5.3.1.4. Policy-based Ingress SRIB entry 1067 The text will be added in future revision. 1069 5.3.2. Multiprotocol Label Switching (MPLS) 1071 The mapping of SR-RIB entries into the MPLS forwarding plane is 1072 straightforward. The following elements MUST be considered: 1074 A list of segments is represented as a stack of labels. 1076 The active segment is the top label. 1078 The CONTINUE operation is implemented as a swap where the outgoing 1079 label value is set to the incoming label value. 1081 The NEXT operation is implemented as a MPLS pop operation. 1083 The INSERT operation is implemented as a MPLS push of a label 1084 stack. 1086 The Node-SID value or Adj-SID value rightmost 20 bits MUST be used 1087 for label values. This implies SID values to be allocated 1088 according to the 20 bit space in MPLS labels. 1090 5.3.3. IP Version 6 1092 The text will be added in future revision. 1094 6. IANA Considerations 1096 TBD 1098 7. Manageability Considerations 1100 TBD 1102 8. Security Considerations 1104 TBD 1106 9. Acknowledgements 1108 We would like to thank Dave Ward, Dan Frost, Stewart Bryant, Pierre 1109 Francois, Thomas Telkamp and Les Ginsberg for their contribution to 1110 the content of this document. 1112 10. References 1114 10.1. Normative References 1116 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1117 Requirement Levels", BCP 14, RFC 2119, March 1997. 1119 [RFC4206] Kompella, K. and Y. Rekhter, "Label Switched Paths (LSP) 1120 Hierarchy with Generalized Multi-Protocol Label Switching 1121 (GMPLS) Traffic Engineering (TE)", RFC 4206, October 2005. 1123 [RFC4971] Vasseur, JP., Shen, N., and R. Aggarwal, "Intermediate 1124 System to Intermediate System (IS-IS) Extensions for 1125 Advertising Router Information", RFC 4971, July 2007. 1127 [RFC5120] Przygienda, T., Shen, N., and N. Sheth, "M-ISIS: Multi 1128 Topology (MT) Routing in Intermediate System to 1129 Intermediate Systems (IS-ISs)", RFC 5120, February 2008. 1131 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 1132 Engineering", RFC 5305, October 2008. 1134 [RFC5308] Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, October 1135 2008. 1137 [RFC5311] McPherson, D., Ginsberg, L., Previdi, S., and M. Shand, 1138 "Simplified Extension of Link State PDU (LSP) Space for 1139 IS-IS", RFC 5311, February 2009. 1141 [RFC5316] Chen, M., Zhang, R., and X. Duan, "ISIS Extensions in 1142 Support of Inter-Autonomous System (AS) MPLS and GMPLS 1143 Traffic Engineering", RFC 5316, December 2008. 1145 [RFC6119] Harrison, J., Berger, J., and M. Bartlett, "IPv6 Traffic 1146 Engineering in IS-IS", RFC 6119, February 2011. 1148 10.2. Informative References 1150 [I-D.ietf-rtgwg-remote-lfa] 1151 Bryant, S., Filsfils, C., Previdi, S., Shand, M., and S. 1152 Ning, "Remote LFA FRR", draft-ietf-rtgwg-remote-lfa-01 1153 (work in progress), December 2012. 1155 [RFC5443] Jork, M., Atlas, A., and L. Fang, "LDP IGP 1156 Synchronization", RFC 5443, March 2009. 1158 [RFC6138] Kini, S. and W. Lu, "LDP IGP Synchronization for Broadcast 1159 Networks", RFC 6138, February 2011. 1161 Authors' Addresses 1163 Stefano Previdi (editor) 1164 Cisco Systems, Inc. 1165 Via Del Serafico, 200 1166 Rome 00142 1167 Italy 1169 Email: sprevidi@cisco.com 1170 Clarence Filsfils (editor) 1171 Cisco Systems, Inc. 1172 Brussels 1173 BE 1175 Email: cfilsfil@cisco.com 1177 Ahmed Bashandy 1178 Cisco Systems, Inc. 1179 170, West Tasman Drive 1180 San Jose, CA 95134 1181 US 1183 Email: bashandy@cisco.com 1185 Martin Horneffer 1186 Deutsche Telekom 1187 Hammer Str. 216-226 1188 Muenster 48153 1189 DE 1191 Email: Martin.Horneffer@telekom.de 1193 Bruno Decraene 1194 Orange 1195 FR 1197 Email: bruno.decraene@orange.com 1199 Stephane Litkowski 1200 Orange 1201 FR 1203 Email: stephane.litkowski@orange.com 1205 Igor Milojevic 1206 Telekom Srbija 1207 Takovska 2 1208 Belgrade 1209 RS 1211 Email: igormilojevic@telekom.rs 1212 Rob Shakir 1213 British Telecom 1214 London 1215 UK 1217 Email: rob.shakir@bt.com 1219 Saku Ytti 1220 TDC Oy 1221 Mechelininkatu 1a 1222 TDC 00094 1223 FI 1225 Email: saku@ytti.fi 1227 Wim Henderickx 1228 Alcatel-Lucent 1229 Copernicuslaan 50 1230 Antwerp 2018 1231 BE 1233 Email: wim.henderickx@alcatel-lucent.com 1235 Jeff Tantsura 1236 Ericsson 1237 300 Holger Way 1238 San Jose, CA 95134 1239 US 1241 Email: Jeff.Tantsura@ericsson.com