idnits 2.17.00 (12 Aug 2021) /tmp/idnits48138/draft-previdi-filsfils-isis-segment-routing-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 12, 2013) is 3357 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4971 (Obsoleted by RFC 7981) == Outdated reference: draft-ietf-rtgwg-remote-lfa has been published as RFC 7490 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IS-IS for IP Internets S. Previdi, Ed. 3 Internet-Draft C. Filsfils, Ed. 4 Intended status: Standards Track A. Bashandy 5 Expires: September 13, 2013 Cisco Systems, Inc. 6 M. Horneffer 7 Deutsche Telekom 8 B. Decraene 9 S. Litkowski 10 Orange 11 I. Milojevic 12 Telekom Srbija 13 R. Shakir 14 British Telecom 15 S. Ytti 16 TDC Oy 17 March 12, 2013 19 Segment Routing with IS-IS Routing Protocol 20 draft-previdi-filsfils-isis-segment-routing-00 22 Abstract 24 Segment Routing (SR) enables any node to select any path (explicit or 25 derived from IGPs SPT computations) for each of its traffic classes. 26 The path does not depend on a hop-by-hop signaling technique (neither 27 LDP nor RSVP). It only depends on a set of "segments" that are 28 advertised by the IS-IS routing protocol. These segments act as 29 topological sub-paths that can be combined together to form the 30 desired path. 32 There are two forms of segments: node and adjacency. A node segment 33 represents a path to a node. An adjacency segment represents a 34 specific adjacency to a node. A node segment is typically a multi- 35 hop path while an adjacency segment is a one-hop path. SR's control- 36 plane can be applied to IPv6 and MPLS dataplanes. 38 Segment Routing control-plane can be applied to the MPLS dataplane: a 39 node segment to node N is instantiated in the MPLS dataplane as an 40 LSP along the shortest-path (SPT) to the node. An adjacency segment 41 is instantiated in the MPLS dataplane as a cross-connect entry 42 pointing to a specific egress datalink. 44 This document describes the Segment Routing functions, a set of use 45 cases it addresses and the necessary changes that are required in the 46 IS-IS protocol. 48 Requirements Language 50 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 51 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 52 document are to be interpreted as described in RFC 2119 [RFC2119]. 54 Status of this Memo 56 This Internet-Draft is submitted in full conformance with the 57 provisions of BCP 78 and BCP 79. 59 Internet-Drafts are working documents of the Internet Engineering 60 Task Force (IETF). Note that other groups may also distribute 61 working documents as Internet-Drafts. The list of current Internet- 62 Drafts is at http://datatracker.ietf.org/drafts/current/. 64 Internet-Drafts are draft documents valid for a maximum of six months 65 and may be updated, replaced, or obsoleted by other documents at any 66 time. It is inappropriate to use Internet-Drafts as reference 67 material or to cite them other than as "work in progress." 69 This Internet-Draft will expire on September 13, 2013. 71 Copyright Notice 73 Copyright (c) 2013 IETF Trust and the persons identified as the 74 document authors. All rights reserved. 76 This document is subject to BCP 78 and the IETF Trust's Legal 77 Provisions Relating to IETF Documents 78 (http://trustee.ietf.org/license-info) in effect on the date of 79 publication of this document. Please review these documents 80 carefully, as they describe your rights and restrictions with respect 81 to this document. Code Components extracted from this document must 82 include Simplified BSD License text as described in Section 4.e of 83 the Trust Legal Provisions and are provided without warranty as 84 described in the Simplified BSD License. 86 Table of Contents 88 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 89 2. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 4 90 2.1. Simplicity . . . . . . . . . . . . . . . . . . . . . . . . 5 91 2.2. Capacity Planning and Traffic Engineering (TE) . . . . . . 5 92 2.2.1. Disjointness in dual-plane networks . . . . . . . . . 9 93 2.2.2. QoS-based Routing Policies . . . . . . . . . . . . . . 10 94 2.2.3. Deterministic non-ECMP Path . . . . . . . . . . . . . 11 95 2.3. Fast Reroute . . . . . . . . . . . . . . . . . . . . . . . 12 96 2.4. Segment Routing in Software Defined Networks (SR-SDN) . . 13 97 3. Segment Routing Identifiers . . . . . . . . . . . . . . . . . 13 98 3.1. Node Segment Identifier (Node-SID) . . . . . . . . . . . . 14 99 3.1.1. Node-SID SubTLV . . . . . . . . . . . . . . . . . . . 14 100 3.2. Adjacency Segment Identifier (Adj-SID) . . . . . . . . . . 15 101 3.2.1. Adj-SID and Interface Address . . . . . . . . . . . . 16 102 3.2.2. Adjacency Segment Identifier (Adj-SID) SubTLV . . . . 16 103 3.2.3. Adjacency Segment Identifiers in LANs . . . . . . . . 17 104 4. Segment Routing Capabilities . . . . . . . . . . . . . . . . . 18 105 5. Elements of Procedure . . . . . . . . . . . . . . . . . . . . 19 106 5.1. Unicity . . . . . . . . . . . . . . . . . . . . . . . . . 20 107 5.2. IS-IS Multi-Level . . . . . . . . . . . . . . . . . . . . 20 108 5.3. Data-Plane Encodings . . . . . . . . . . . . . . . . . . . 20 109 5.3.1. Segment Routing RIB (SR-RIB) . . . . . . . . . . . . . 21 110 5.3.2. Multiprotocol Label Switching (MPLS) . . . . . . . . . 23 111 5.3.3. IP Version 6 . . . . . . . . . . . . . . . . . . . . . 24 112 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 113 7. Manageability Considerations . . . . . . . . . . . . . . . . . 24 114 8. Security Considerations . . . . . . . . . . . . . . . . . . . 24 115 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 24 116 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 117 10.1. Normative References . . . . . . . . . . . . . . . . . . . 24 118 10.2. Informative References . . . . . . . . . . . . . . . . . . 25 119 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 121 1. Introduction 123 Segment Routing (SR) enables any node to select any path (explicit or 124 derived from IGPs SPT computations) for each of its traffic classes. 125 The path does not depend on a hop-by-hop signaling technique (neither 126 LDP nor RSVP). It only depends on a set of "segments" that are 127 advertised by the IS-IS routing protocol. These segments act as 128 topological sub-paths that can be combined together to form the 129 desired path. 131 There are two forms of segments: node and adjacency. A node segment 132 represents a path to a node. A Node Segment is typically a multi-hop 133 path. An adjacency segment represents a specific adjacency to a 134 node. 136 SR's control-plane can be applied to IPv6 and MPLS dataplanes. 138 In the MPLS dataplane, a node segment to node N is instantiated as an 139 LSP along the shortest-path (spt) to the node. An adjacency segment 140 is instantiated as a crossconnect entry pointing to a specific egress 141 datalink. 143 At the heart of the SR technology, we find node segments. Node 144 segments must be globally unique within the network domain. 146 A----B----C----D 148 Figure 1 150 In Figure 1, all the nodes must be configured with the same Segment 151 Routing Identifiers Block (called SRB Node Registry), e.g. 64-5000, 152 and any node segment be uniquely allocated from that SRB Node 153 Registry (e.g. A, B, C and D are configured respectively with node 154 segments 64, 65, 66 and 67). 156 In the MPLS dataplane instantiation, this means that all the nodes 157 need to be able to reserve and allocate to the SR control-plane the 158 same MPLS label range (e.g. 64-5000). 160 2. Applicability 162 Segment Routing is applicable to the following use-cases: simplicity, 163 TE, FRR and SDN. 165 2.1. Simplicity 167 The vast majority of IP traffic travels on shortest-paths to their 168 destination. SR delivers a very efficient control-plane technique to 169 instantiate shortest-path-based node segments into the forwarding 170 dataplane. In the example described in Figure 1, considering the 171 MPLS forwarding plane, when node D advertises node segment 64 for its 172 loopbacks D/32, node A and B introduce the following MPLS Dataplane 173 entries: 175 A: IP2MPLS: FEC D/32 => push 64, nhop B 176 A: MPLS2MPLS: 64 => swap 64, nhop B 177 B: IP2MPLS: FEC D/32 => push 64, nhop C 178 B: MPLS2MPLS: 64 => swap 64, nhop C 180 If D advertises node segment 64 with the P flag reset: 182 C: IP2MPLS: FEC D/32 => push explicit-null, nhop D 183 C: MPLS2MPLS: 64 => pop, nhop D 185 If D advertises node segment 64 with the P flag set: 187 C: IP2MPLS: FEC D/32 => push 64, nhop D 188 C: MPLS2MPLS: 64 => swap 64, nhop D 190 LDP is no longer required to instantiate shortest-path LSP's to a 191 remote node. The reduction in the number of protocols to operate, 192 helps reduce the overall operational complexity of the network. For 193 example, the complex IGP/LDP synchronization, described in [RFC5443] 194 and [RFC6138] no longer needs to be considered hence drastically 195 improving the scaling and reliability of the network. 197 For example, when a core node C has 40 TE tunnels to 40 remote core 198 routers and 260 adjacent aggregation routers and LDP LSP's need to be 199 signaled to 5000 FEC's, then node C maintains an LDP label database 200 of (260+40)*5000 = 1.500.000 label bindings. In fact several 201 networks today are exposed to much more difficult LDP scaling 202 constraints. 204 In comparison, in the same use case, SR control-plane only maintains 205 5000 node segments. This is 300 times more scalable. 207 2.2. Capacity Planning and Traffic Engineering (TE) 209 Capacity Planning deals with anticipating the placement of the 210 traffic matrix onto the network topology, for a set of expected 211 traffic and topology variations. The heart of the process consists 212 in simulating the placement of the traffic along ECMP-aware shortest- 213 path and accounting for the resulting bandwidth usage. The bandwidth 214 accounting of a demand along its shortest-path is a basic capability 215 of any planning tool or PCE server. 217 For example, in the network topology described in Figure 2 and 218 assuming a default IGP metric of 2 and IGP metrics 219 BC=BG=CD=CE=DF=EF=1, a 1600Mbps A-to-Z flow is accounted as consuming 220 1600Mbps on links AB and FZ, 800Mbps on links BC, BG and GF, and 221 400Mbps on links CD, DF, CE and EF. 223 C-----D 224 / \ \ 225 A---B +--E--F--Z 226 \ / 227 G------+ 229 Figure 2: Example Topology 1 231 ECMP is extremely frequent in SP, Enterprise and DC architectures and 232 it is not rare to see as much as 128 different ECMP paths between a 233 source and a destination within a single network domain. 235 This is illustrated in Figure 3 which consists of a subset of a 236 network where already 6 ECMP paths are observed from A to M. 238 C 239 / \ 240 B-D-L-- 241 / \ / \ 242 A E \ 243 \ M 244 \ G / 245 \ / \ / 246 F-H-K 247 \ / 248 I 250 Figure 3: ECMP Topology Example 252 Segment Routing offers a simple support for such ECMP-based shortest- 253 path placement: a node segment. A single node segment enumerates all 254 the ECMP paths along the shortest-path. 256 This is much simpler to the RSVP-TE model where one TE tunnel is 257 required for each enumerated ECMP path. 259 When the capacity planning process detects that a traffic or topology 260 variation would lead to congestion traffic engineering or capacity 261 increase is triggered. 263 The most basic traffic engineering option consists of finding the 264 smallest set of demands that need to be routed off their shortest 265 path to eliminate the congestion, then to compute an explicit path 266 for each of them and instantiating these traffic-engineered policies 267 in the network. 269 Segment Routing offers a simple support for explicit path policy. 271 In the diagram described in Figure 3, it is assumed that the 272 requirement is that AM flow should not consume any resource on the LM 273 and the FG links. 275 The first option would consists of using the following encapsulation 276 at A: A sends any traffic to M towards the nhop F with a two-label 277 stack where the top label is the adjacent segment FI and the next 278 label is the node segment to M. Alternatively, a three-label stack 279 with adjacency segments FI, IK and KM could have been used. 281 The first option seems preferred as classically IP capacity planners 282 optimize traffic along ECMP-aware shortest-path. The more node 283 segment can be used, the better. However, both options are available 284 and one can favor adjacency segments. 286 In the same way, if the requirement in the diagram described in 287 Figure 3, is that the AM flow should not consume any resource along 288 the LM link but should use any resource on the bottom of the 289 topology, then A could send the AM flow to its nhop F with a single 290 label: the label representing the node segment to M. 292 We believe that Segment Routing offers an excellent solution for 293 Capacity Planning because: 295 Node segment represents an ECMP-aware shortest path. 297 Adjacency segments allow to express any explicit path. 299 The combination of node and adjacency segment allows to express 300 any path without having to enumerate all the ECMP path options. 302 The capacity planning process ensures that the majority of the 303 traffic rides on node segments (ECMP-based shortest path) while a 304 minority of the traffic is routed off its shortest-path. 306 The network does not hold any signaling state for the traffic 307 engineered flows. 309 In comparison, a classic RSVP-TE Full-mesh traffic engineering 310 solution involves a full-mesh of tunnels from any edge to edge of the 311 network. For any specific edge to edge pair, tens of RSVP-TE tunnels 312 may need to be enumerated to load-share the traffic along all the 313 possible shortest paths. 315 Analytically, assuming a single tunnel from an edge to an edge 316 (optimistic assumption), an RSVP-TE Full-Mesh traffic engineering 317 solution scales as E^2 where E is the number of edge nodes. The 318 number of LSP's signaled and maintained by the network (in control- 319 plane and in dataplane) scales quadratically with the number of edge 320 nodes. 322 In contrast, the Segment Routing solutions maintains E node segments. 323 The number of control-plane and dataplane states scale linearly with 324 the number of edge nodes. 326 A network of 1000 edges is very frequent nowadays. In such a case, 327 the capacity planning solution based on segment routing scales 1000 328 times better than the RSVP-TE Full-Mesh solution. 330 We have applied this comparative study to a use-case using real 331 topology and real demand matrix. The data-set consisted in a full- 332 mesh of 12000 Tunnels where originally only 65% of the traffic was 333 riding on their shortest path. Two well-known defects are 334 illustrated in this data set: the lack of ECMP support in RSVP-TE and 335 hence the increase of the number of tunnels to enumerate all the ECMP 336 options, the inefficiency of distributed optimization as too much 337 traffic is riding off its shortest path. Using centralized 338 optimization, we could optimize the IGP metrics such as to place 98% 339 of the traffic on ECMP-aware shortest-path (one single node segment) 340 while only 2% of the traffic required explicit traffic engineering 341 tunnels away from the shortest path. Only 250 demands required 342 explicit paths. 344 In this example, we increased the efficiency of the network by 150%. 345 Indeed, 98% is riding on shortest path instead of 65%. Furthermore, 346 we reduced the operational complexity of the network by 60 times (200 347 explicit routing policies instead of 12000). 349 The next two sections provide other examples illustrating the 350 simplicity and efficiency benefits of the SR-based traffic 351 engineering solution. 353 2.2.1. Disjointness in dual-plane networks 355 Many networks are built according to the dual-plane design: 357 Each access region k is connected to the core by two C routers 358 (C(1,k) and C(2,k)). 360 C(1,k) is part of plane-1 of the dual-plane core. 362 C(2,k) is part of plane-2 of the dual-plane core. 364 C(1,k) has a link to C(2, l) iff k = l. 366 {C(1,k) has a link to C(1, l)} iff {C(2,k) has a link to C(2, l)}. 368 Many networks need to deliver disjoint-based services (bank, 369 government...): an access node A connected to core nodes C(1, A) and 370 C(2, A) need to provide two disjoint services towards an access node 371 Z connected to core nodes C(1, Z) and C(2, Z). 373 Classic IP routing cannot fulfill this requirement as A would load- 374 balance between the dual planes across ECMP paths. 376 RSVP-TE traffic-engineering would allow to signal two disjoint paths: 377 one across the first plane and one across the second plane with the 378 following two draw-backs: 380 Many ECMP paths exist within each plane (from (Ci, A) to (Ci, Z)) 381 and hence many RSVP-TE tunnels might be required to efficiently 382 distribute the load within each plane. 384 Many such services might need to be supported. 386 Assuming 10000 such services across the network and assuming an 387 average of 4 ECMP paths within each plane, a straight application of 388 RSVP-TE would require 10000 * 2 * 4 tunnels hence 80000 tunnels. 389 Even if load-sharing of traffic along ECMP paths in each plane is 390 dropped, the solution would still need 20000 tunnels. 392 Segment Routing (SR) offers a simpler solution. 394 Any node of the first plane can be configured with an anycast 395 loopback say 1.1.1.1/32 to which node segment 111 is attached. Any 396 node of the second plane can be configured with an anycast loopback 397 say 2.2.2.2/32 to which node segment 222 is attached. Let us also 398 assume that access node Z is advertising node segment 500. 400 A flow from A to Z via the first plane is simply represented by the 401 segment list {111, 500}. In the MPLS dataplane case, A pushes a two- 402 label stack for Z-destined packets: the top label is 111 and the 403 second label is 500. 405 Node segment 111 gets the traffic on ECMP-aware shortest path to the 406 first plane and then node segment 500 gets the traffic on ECMP-aware 407 shortest path to Z. 409 Similarly, a flow from A to Z via the second plane is simply 410 represented by the segment list {222, 500}. 412 This simple solution would only add two node segments to the network 413 instead of 80000 LSP's signaled by the RSVP-TE solution. This is 414 40000 better. 416 2.2.2. QoS-based Routing Policies 418 Frequently, different classes of service need different path 419 characteristics. 421 For example, an international network with presence in Tokyo and 422 Brussels may have lots of cheap network capacity from Tokyo to Europe 423 via USA and some scarce expensive capacity via Russia. 425 ...USA...Brussels...Russia...Tokyo...USA... 427 Figure 4: International Topology Example 429 In such case, the IGP metrics would be tuned to have a shortest-path 430 from Tokyo to Brussels via USA. 432 This would provide efficient capacity planning usage while fulfilling 433 the requirements of most of the data traffic. However, it may not 434 suite the latency requirements of the voice traffic between the two 435 cities. 437 Segment Routing (SR) offers a simple solution to the problem. 439 The core routers in Russia are configured with an extra anycast 440 loopback 3.3.3.3/32 to which node segment 333 is attached. 442 If we assume that Brussels is configured with node segment 600, Tokyo 443 can send all its data traffic to Brussels with one single segment: 444 600. 600 gets the traffic from Tokyo to Brussels via USA and exploits 445 any ECMP-path along this shortest-path. 447 Tokyo can send all its voice traffic to Brussels with a list of two 448 segments: {333, 600}. 333 gets the traffic to Russia and exploit any 449 ECMP path along the shortest path. 600 gets the traffic from Russia 450 to Brussels via ECMP-aware shortest-path. 452 One single metric per link is sufficient as clearly it is possible to 453 set the IGP metrics such that the shortest-path from Brussels to 454 Russia is via Russia and not via USA and the shortest-path from 455 Russia to Brussels is not back via Tokyo and USA but straight to 456 Brussels. 458 2.2.3. Deterministic non-ECMP Path 460 The previous sections have illustrated the ease of capacity planning 461 traffic with ECMP-awareness and shortest-path. The key benefits in 462 terms of drastic reduction of the number of routing policies signaled 463 by the network control plane and maintained by the data plane have 464 been explained and several orders of scaling simplifications have 465 been illustrated. 467 In this section, we highlight SR's ability to support a completely 468 different model: the deterministic expression of a path avoiding any 469 ECMP behavior. This is realized by expressing the end-to-end path as 470 a list of adjacency segments. 472 For example, in Figure 3, one can force the AM traffic on the 473 explicit path AFGKM by using the segment list {AF, FG, GK, KM}. 475 Once again, SR offers simplicity and scaling benefits compared to the 476 alternative RSVP-TE solution: no state is signaled through the 477 network. 479 In Figure 3, with SR, nodes F, G, K and M do not maintain any SR 480 state for the A-to-M policy. With RSVP-TE, each nodes along the 481 RSVP-TE tunnel must maintain one LSP state per tunnel. 483 Here is a technique to decrease the number of adjacency segments to 484 describe non-ECMP paths. 486 In the topology example illustrated in Figure 5 node C can be 487 configured with an SR explicit policy to node G via the path CDEFG. 489 A-B-C-D-E-F-G-H 491 Figure 5: Topology Example 3 493 Node C can advertise a (forwarding) adjacency to node G and attach an 494 SR subTLV to identify the related adjacency segment (e.g 72). The 495 ERO SubTLV is further attached to identify that this adjacency is not 496 describing a real datalink between C and G but instead an SR non-ECMP 497 sub-path along the route {BC, CD, DE, EF, FG}. 499 Node A can then express its desired non-ECMP path has {AB, BC, 72, 500 GH} instead of {AB, BC, CD, DE, EF, FG, GH}. 502 Future versions of the document will document other techniques to 503 decrease the number of adjacency segments in non-ECMP source-routed 504 paths. 506 2.3. Fast Reroute 508 This section assumes familiarity with Remote-LFA concepts described 509 in [I-D.ietf-rtgwg-remote-lfa]. 511 Lemma 1: In networks with symmetric IGP metrics (the metric of a link 512 AB is the same as metric of the reverse link BA), we can prove that 513 either the P and the Q sets intersect or there is at least one P node 514 that is adjacent to a Q node. 516 Consider an arbitrary protected link S-E. In LFA FRR, if a path to 517 the destination from a neighbor N of S does not cause a packet to 518 loop back over the link S-E (i.e. N is a loop-free alternate), then 519 S can send the packet to N and the packet will be delivered to the 520 destination using the pre-failure forwarding information. 522 If there is no such LFA neighbor, then S may be able to create a 523 virtual LFA by using a tunnel to carry the packet to a point in the 524 network which is not a direct neighbor of S and from which the packet 525 will be delivered to the destination without looping back to S. 526 Remote LFA (RLFA, [I-D.ietf-rtgwg-remote-lfa]) calls such a tunnel a 527 repair tunnel. The tail-end of this tunnel is called a "remote LFA" 528 or a "PQ node". We refer to RLFA for the definitions of the P and Q 529 sets. 531 If there is no such RLFA PQ node, we propose to use a Directed LFA 532 (DLFA) repair tunnel to a Q node that is adjacent to the P space. 533 The 535 DLFA repair tunnel only requires two segments: a node segment to a P 536 node which is adjacent to the Q node and an adjacency segment from 537 the P node to its adjacent Q node. 539 It results from lemma1, that thanks to the DLFA extension, we have a 540 guaranteed LFA-based FRR technique for any network with symmetric IGP 541 metrics. 543 The solution is simple: 545 it does not require any extra computation on top of the one 546 required for RLFA. 548 The repair tunnel can be encoded efficiently with only two 549 segments. 551 The solution preserves the capacity planning properties of LFA FRR. 553 2.4. Segment Routing in Software Defined Networks (SR-SDN) 555 Some of the SDN requirements are: 557 Guarantees of Tight SLA's (FRR and bandwidth admission control). 559 Efficient use of the network resources. 561 Very high scaling to support application-based transactions. 563 Segment Routing (SR) is a compelling architecture to support SDN for 564 the following reasons. 566 SR supports a simple but efficient capacity planning process based on 567 centralized optimization. 569 SR optimizes network resources by providing a very simple support for 570 ECMP-based shortest-path flows 572 SR provides for much better scaling than alternative solution: 573 several orders of scaling gains have been illustrated in the 574 simplicity and Capacity Planning sections. 576 SR provides for guaranteed-FRR for any topology. 578 SR provides for ultimate virtualization as the network does not 579 contain any application state. The state is in the packet. It is 580 encoded as a list of segments. 582 SR provides for very frequent transaction-based application as the 583 network does not hold any state for the SR-encoded flows. 585 3. Segment Routing Identifiers 587 Segment Routing defines two types of Segment Identifiers: Node-SID 588 and Adj-SID. 590 3.1. Node Segment Identifier (Node-SID) 592 A node-SID is associated to a prefix advertised by a node (e.g. in a 593 TLV-135). The Node-SID SubTLV MAY be present in one of the following 594 TLVs: 596 TLV-135 (IPv4) defined in [RFC5305]. 598 TLV-235 (MT-ipv4) [RFC5120]. 600 TLV-236 (IPv6) [RFC5308]. 602 TLV-237 (MT-IPv6) [RFC5120]. 604 Multiple Node-SID SubTLVs MAY be attached to a prefix. A node 605 receiving a Node-SID subTLV containing more than one Node-SID MAY 606 consider only one encoded Node-SID, in which case, the first encoded 607 Node-SID MUST be considered and any additional Node-SID ignored. 609 The value of the Node-SID is a 32 bit number. 611 3.1.1. Node-SID SubTLV 613 The Node-SID SubTLV has the following format: 615 0 1 2 3 616 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 618 | Type | Length | Flags | 619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 620 | Segment Identifier (SID) | 621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 623 where: 625 Type: TBA 627 Length: 6 octets 629 Flags: 2 octets field of following flags: 631 0 1 632 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 633 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 634 |P|E|L| | 635 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 637 where: 639 P-Flag: PHP flag. If set, then the penultimate hop MUST NOT 640 pop the Nodal-SID before delivering the packet to the node that 641 advertised the Node-SID. 643 E-Flag: External flag. If set, then the prefix is not local to 644 the ISIS protocol. It is redistributed from another protocol. 646 L-Flag: Level flag. If set, then the prefix has been 647 propagated to the router in this level from another level 648 (i.e.: from level-1 into level-2 or from level-2 into level-1). 650 Other bits: MUST be zero when sent and ignored when received. 652 Segment Identifier (SID): 32 bits of Segment Identifier 654 3.2. Adjacency Segment Identifier (Adj-SID) 656 An Adjacency Segment Identifier (Adj-SID) represents a router 657 adjacency. The value of the Adj-SID is local to the router and it is 658 encoded as a 32 bit number value using a new SubTLV in the following 659 TLVs: 661 TLV-22 [RFC5305] 663 TLV-222[RFC5120] 665 TLV-23[RFC5311] 667 TLV-223[RFC5311] 669 Multiple Adj-SID SubTLVs MAY be attached to the above-mentioned TLVs. 670 An example where more than one is useful is the case of parallel 671 adjacencies between two neighbors. Each Adjacency will be encoded 672 separately (e.g. using TLV-22) and each adjacency will have one Adj- 673 SID attached to it. This allow a remote router to explicitly 674 determine which of the parallel adjacencies should be used for 675 forwarding the packet. 677 However, the remote router may prefer not to select a specific 678 parallel interface and leave the decision to the local router so that 679 load sharing in the local router is determined locally. 681 Therefore, the local router (i.e.: the router with parallel 682 adjacencies) MAY insert a second Adj-SID, to each of its parallel 683 adjacencies, with the same value so that when packets are received 684 with that Adj-SID the decision onto which link the packet should be 685 forwarded is left to the local router. 687 When the same Adj-SID value is used on different parallel 688 adjacencies, we call such value a Bundle-Adj-SID. 690 3.2.1. Adj-SID and Interface Address 692 When advertising one or more Adj-SID SubTLVs, the router MUST also 693 advertise Interface Address and Neighbor Address SubTLVs (IPv4 or 694 IPv6). The two MUST be present. The encoding is defined in 695 [RFC5305] for IPv4 and in [RFC6119] for IPv6. 697 3.2.2. Adjacency Segment Identifier (Adj-SID) SubTLV 699 The following format is defined for the Adj-SID. 701 0 1 2 3 702 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 704 | Type | Length | Flags | 705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 706 | Adj-SID | 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 709 where: 711 Type: TBA 713 Length: variable. 715 Flags: 2 octets field of following flags: 717 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 719 |B|F| | 720 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 722 where: 724 B-Flag: Bundle flag. If set, then Adj-SID refers to a bundle 725 (i.e.: a set of parallel adjacencies). 727 F-Flag: FA flag. If set, then Adj-SID refers to a Forwarding 728 Adjacency. 730 Other bits: MUST be zero when sent and ignored when received. 732 Adj-SID: 32 bits of Adjacency Segment Identifier 734 Forwarding Adjacencies are defined in [RFC4206]. 736 If the F-flag is set, then the explicit path taken by the Forwarding 737 Adjacency MUST be encoded using the following subTLV in the Adj-SID 738 SubTLV: 740 0 1 2 3 741 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 743 | Type | Length | Flags | 744 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 745 | Segment Identifier (SID) #1 | 746 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 747 | Segment Identifier (SID) #... | 748 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 750 where: 752 Type: TBA. 754 Flags: none are currently used. 756 Length: variable, 2 + multiple of 4 octets. 758 Segment Identifier (SID): The SID value of each hop in the 759 explicit path of the Forwarding Adjacency. 761 3.2.3. Adjacency Segment Identifiers in LANs 763 In LAN subnetworks, the Designated Intermediate System (DIS) is 764 elected and originates the Pseudonode-LSP (PN-LSP) including all 765 neighbors of the DIS. 767 Still, when Segment Routing is used, each router in the LAN MUST 768 advertise the Adj-SID of each of its neighbors. Since, on LANs, 769 there are no neighbor advertisements in non-PN-LSPs (other than the 770 adjacency to the DIS), each router advertises the set of Adj-SIDs 771 (for each its neighbors) inside the Intermediate To Intermediate 772 Hello (IIH) packets as soon as the adjacency to that neighbor reaches 773 the UP state. 775 We define a new IIH TLV, the IIH-Adj-SID TLV with following format: 777 0 1 2 3 778 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 779 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 780 | Type | Length | Flags | System-ID | 781 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 782 | Adj-SID | 783 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 785 Where: 786 Type: TBA 787 Length: 6 octets 789 Flags: 10 bits of flags. None are used at this stage. 790 MUST be zero when sent and ignored when received. 791 System-ID: 6 octets of system ID and pseudonode number of 792 the neighbor. 793 Adj-SID: 32 bits of IIH Adjacency Segment Identifier 795 Therefore, each router in the LAN advertises in its IIH packet the 796 list of UP adjacencies in the form of tuples: . 798 The DIS, as any other router in the LAN, receives IIHs from all 799 routers on the LANand stores the set of tuples . 801 The DIS includes the Adj-SID information received in the IIHs when 802 advertising IS-Neighbors in its PN-LSPs. 804 The result is that the PN-LS contains the neighbors of the DIS and, 805 for each of them, the list of their Adj-SIDs to their respective 806 neighbors in the LAN. 808 This could require multiple IS-Neighbor TLVs for the same neighbor if 809 there are more than 25 ISs on a LAN. 811 Each router within the level-1 area or level-2 subdomain, when 812 receiving the PN-LSP, will extract each neighbor and its 813 corresponding Adj-SID table in order to figure out which Adj-SID has 814 to be used between any two neighbors in the LAN. 816 4. Segment Routing Capabilities 818 Segment Routing requires each router to advertise its capabilities to 819 the rest of the routing domain. TLV-242 (defined in [RFC4971]) 820 describes router capabilities. For the purposes of Segment Routing 821 we define an additional subTLV: the SR-Cap SubTLV. 823 The SR-Cap SubTLV MUST be present in the Router Capability TLV (TLV- 824 242), MUST appear only once and has following format: 825 0 1 2 3 826 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 827 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 828 | Type | Length | SR Capabilities Flags | 829 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 831 where: 833 Type: TBA. 835 Length: 2 octets. 837 SR Capabilities Flags: 2 octets field of following flags: 838 0 1 839 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 840 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 841 |M|F|S| | 842 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 844 where: 846 M-Flag: MPLS flag. If set, then the advertising router is 847 capable of MPLS label based forwarding. 849 F-Flag: IPv4 flag. If set, then the advertising router is 850 capable of IPv4 based forwarding. 852 S-Flag: IPv6 flag. If set, then the advertising router is 853 capable of IPv6 based forwarding. 855 Other bits: MUST be zero when sent and ignored when received. 857 The Router Capability TLV defined in [RFC4971]specifies the S and D 858 bits. The SR-Capability SubTLV MUST be propagated throughout the 859 entire routing domain and therefore the S bit in the Router 860 Capability TLV MUST be set. 862 The D bit of Router Capability TLV must be set accordingly. I.e.: it 863 MUST be set when the Router Capability TLV is leaked from level-2 to 864 level-1. 866 5. Elements of Procedure 868 This section describes aspects of Segment Routing procedures. 870 5.1. Unicity 872 The benefits of the Segment Routing solution build up on a small set 873 of rules. The first 64 values of the 32-bit segment space are 874 reserved and cannot be used by the SR Control-Plane neither for node 875 or adjacency segment. 877 All the nodes in the ISIS domain must be configured with the node SRB 878 range. The range is a local policy and is not advertised by ISIS. A 879 node segment must be allocated from the node SRB range. 881 A given Node-SID must be allocated to a unique IP prefix. If the IP 882 prefix is of anycast type and is advertised by two nodes N and M, 883 then N and M attach the same (anycast) Node-SID to the same anycast 884 IP address. 886 If a node N learns a remote Adj-SID S but advertised with a value 887 that falls in its locally configured Node SRB range, N SHOULD issue 888 an error log warning for a misconfiguration. 890 If a node N learns a remote Node-SID S but with a value that falls 891 outside its locally configured node SRB range, N SHOULD NOT insert 892 any RIB entry for segment S. Node N SHOULD issue an error log warning 893 for misconfiguration. 895 If a node N learns about two different IP addresses advertised with 896 the same Node-SID, N MUST insert a RIB entry only for the node 897 segment related to the highest IP address. N SHOULD issue an error 898 log warning for misconfiguration. 900 5.2. IS-IS Multi-Level 902 In IS-IS protocol, adjacencies advertisements (e.g.: TLV-22) are not 903 propagated across level/area boundaries hence the adjacency segment 904 (Adj-SID) is not propagated across levels either. 906 If a prefix is propagated across levels, then its Node-SID SubTLVs 907 are also propagated. The Node-SID S flag is set accordingly, 908 independently from the settings of the U/D bit defined in [RFC5305]. 910 5.3. Data-Plane Encodings 912 The SR control-plane supports different forwarding planes. The first 913 section describes the SR source routing concept and its RIB 914 representation. The next sections map the SR-RIB entries into the 915 MPLS and IPv6 forwarding planes. 917 5.3.1. Segment Routing RIB (SR-RIB) 919 SR leverages source routing and introduces the following terminology: 921 A packet is prepended with an SR header which contains a list of 922 segments. 924 A list of segments is ordered and has a pointer identifying the 925 active segment. 927 The active segment is the segment identified by the pointer. 929 Forwarding is based on the active segment. 931 The following forwarding operations are defined for SR: 933 CONTINUE: the active segment remains active after the forwarding 934 operation and the pointer is left unchanged. 936 NEXT: the active segment is completed after the forwarding 937 operation and the pointer is advanced to the next segment in the 938 ordered list. 940 INSERT: a list of segments is inserted in the segment list. The 941 INSERT operation can be coupled with the CONTINUE or NEXT 942 operation. 944 Other operations will be introduced in future versions of the 945 document. 947 Two types of SR-RIB entries are defined: 949 TRANSIT: the ingress packet comes with an active segment. A 950 Transit SR-RIB entry is represented as: 952 Ingress active segment. 954 Operation on the active segment. 956 Egress Interface. 958 INGRESS: the ingress packet comes without active segment (plain 959 IP). 961 5.3.1.1. SR-RIB entry for local segments 963 A node MUST install a transit SR-RIB entry for any local adjacency 964 segment (Adj-SID) of value V attached to datalink L with: 966 Ingress active segment : V 968 Ingress operation: NEXT 970 Egress interface: L 972 A node MUST install a transit SRIB entry for any local adjacency 973 segment (Adj-SID) of value W attached to ISIS link bundle B with: 975 Ingress active segment: W 977 Ingress operation: NEXT 979 Egress interface: hash between any datalink within bundle B 981 A node MUST install a transit SR-RIB entry for any local node segment 982 (Node-SID) of value N with: 984 Ingress active segment: N 986 Ingress operation: NEXT (if not the last segment, then process the 987 next segment else lookup in IP table) 989 5.3.1.2. Transit SR-RIB entry for remote segments 991 A node MUST install a transit SR-RIB entry for any remote node 992 segment (Node-SID) of value R attached to IP prefix P with: 994 Ingress active segment: R 996 Ingress operation: CONTINUE (However, if the P flag is reset and P 997 is advertised by the next-hop, then the operation is NEXT instead 998 of CONTINUE). 1000 Egress interface: interface to next-hop along the shortest-path to 1001 P. 1003 A transit SR-RIB entry is never installed for a remote adjacency 1004 segment. 1006 5.3.1.3. Ingress SR-RIB entry for remote segments 1008 Ingress SR-RIB entries enable traffic injection in the SR forwarding 1009 plane. An ingress SR-RIB entry is generally represented as: 1011 Classification: what traffic 1013 Encapsulation: what list of segments to insert 1015 In this section, we define its simplest instantiation: the automated 1016 ingress SR-RIB entry insertion towards remote node segments (Node- 1017 SID). 1019 A node MUST install an ingress SR-RIB entry for any remote node 1020 segment (Node-SID) of value V attached to IP prefix P with: 1022 FEC: prefix P 1024 Ingress operation: insert nodal segment V. 1026 Egress interface: interface to next-hop along the shortest-path to 1027 P. 1029 5.3.1.4. Policy-based Ingress SRIB entry 1031 The text will be added in future revision. 1033 5.3.2. Multiprotocol Label Switching (MPLS) 1035 The mapping of SR-RIB entries into the MPLS forwarding plane is 1036 straightforward. The following elements MUST be considered: 1038 A list of segments is represented as a stack of labels. 1040 The active segment is the top label. 1042 The CONTINUE operation is implemented as a swap where the outgoing 1043 label value is set to the incoming label value. 1045 The NEXT operation is implemented as a MPLS pop operation. 1047 The INSERT operation is implemented as a MPLS push of a label 1048 stack. 1050 The Node-SID value or Adj-SID value rightmost 20 bits MUST be used 1051 for label values. 1053 5.3.3. IP Version 6 1055 The text will be added in future revision. 1057 6. IANA Considerations 1059 TBD 1061 7. Manageability Considerations 1063 TBD 1065 8. Security Considerations 1067 TBD 1069 9. Acknowledgements 1071 We would like to thank Dave Ward, Dan Frost, Stewart Bryant, Pierre 1072 Francois, Thomas Telkamp and Les Ginsberg for their contribution to 1073 the content of this document. 1075 10. References 1077 10.1. Normative References 1079 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1080 Requirement Levels", BCP 14, RFC 2119, March 1997. 1082 [RFC4206] Kompella, K. and Y. Rekhter, "Label Switched Paths (LSP) 1083 Hierarchy with Generalized Multi-Protocol Label Switching 1084 (GMPLS) Traffic Engineering (TE)", RFC 4206, October 2005. 1086 [RFC4971] Vasseur, JP., Shen, N., and R. Aggarwal, "Intermediate 1087 System to Intermediate System (IS-IS) Extensions for 1088 Advertising Router Information", RFC 4971, July 2007. 1090 [RFC5120] Przygienda, T., Shen, N., and N. Sheth, "M-ISIS: Multi 1091 Topology (MT) Routing in Intermediate System to 1092 Intermediate Systems (IS-ISs)", RFC 5120, February 2008. 1094 [RFC5305] Li, T. and H. Smit, "IS-IS Extensions for Traffic 1095 Engineering", RFC 5305, October 2008. 1097 [RFC5308] Hopps, C., "Routing IPv6 with IS-IS", RFC 5308, 1098 October 2008. 1100 [RFC5311] McPherson, D., Ginsberg, L., Previdi, S., and M. Shand, 1101 "Simplified Extension of Link State PDU (LSP) Space for 1102 IS-IS", RFC 5311, February 2009. 1104 [RFC6119] Harrison, J., Berger, J., and M. Bartlett, "IPv6 Traffic 1105 Engineering in IS-IS", RFC 6119, February 2011. 1107 10.2. Informative References 1109 [I-D.ietf-rtgwg-remote-lfa] 1110 Bryant, S., Filsfils, C., Previdi, S., Shand, M., and S. 1111 Ning, "Remote LFA FRR", draft-ietf-rtgwg-remote-lfa-01 1112 (work in progress), December 2012. 1114 [RFC5443] Jork, M., Atlas, A., and L. Fang, "LDP IGP 1115 Synchronization", RFC 5443, March 2009. 1117 [RFC6138] Kini, S. and W. Lu, "LDP IGP Synchronization for Broadcast 1118 Networks", RFC 6138, February 2011. 1120 Authors' Addresses 1122 Stefano Previdi (editor) 1123 Cisco Systems, Inc. 1124 Via Del Serafico, 200 1125 Rome 00142 1126 Italy 1128 Email: sprevidi@cisco.com 1130 Clarence Filsfils (editor) 1131 Cisco Systems, Inc. 1132 Brussels, 1133 BE 1135 Email: cfilsfil@cisco.com 1136 Ahmed Bashandy 1137 Cisco Systems, Inc. 1138 170, West Tasman Drive 1139 San Jose, CA 95134 1140 US 1142 Email: bashandy@cisco.com 1144 Martin Horneffer 1145 Deutsche Telekom 1146 Hammer Str. 216-226 1147 Muenster 48153 1148 DE 1150 Email: Martin.Horneffer@telekom.de 1152 Bruno Decraene 1153 Orange 1154 FR 1156 Email: bruno.decraene@orange.com 1158 Stephane Litkowski 1159 Orange 1160 FR 1162 Email: stephane.litkowski@orange.com 1164 Igor Milojevic 1165 Telekom Srbija 1166 Takovska 2 1167 Belgrade 1168 RS 1170 Email: igormilojevic@telekom.rs 1172 Rob Shakir 1173 British Telecom 1174 London 1175 UK 1177 Email: rob.shakir@bt.com 1178 Saku Ytti 1179 TDC Oy 1180 Mechelininkatu 1a 1181 TDC 00094 1182 FI 1184 Email: saku@ytti.fi