idnits 2.17.00 (12 Aug 2021) /tmp/idnits62897/draft-allan-spring-mpls-multicast-framework-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 208 has weird spacing: '...f which is as...' -- The document date (February 2016) is 2287 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC6379' is defined on line 533, but no explicit reference was found in the text == Unused Reference: 'RFC6514' is defined on line 536, but no explicit reference was found in the text == Unused Reference: 'RFC7385' is defined on line 539, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 SPRING Working Group Dave Allan, Jeff Tantsura 2 Internet Draft Ericsson 3 Intended status: Standards Track 4 Expires: August 2016 6 February 2016 8 A Framework for Computed Multicast applied to MPLS based Segment 9 Routing 10 draft-allan-spring-mpls-multicast-framework-00 12 Abstract 14 This document describes a multicast solution for Segment Routing with 15 MPLS data plane. It is consistent with the Segment Routing 16 architecture in that an IGP is augmented to distribute information in 17 addition to the link state. In this solution it is multicast group 18 membership information sufficient to synchronize state in a given 19 network domain. Computation is employed to determine the topology of 20 any loosely specified multicast distribution tree. 22 Status of this Memo 24 This Internet-Draft is submitted to IETF in full conformance 25 with the provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet 28 Engineering Task Force (IETF), its areas, and its working 29 groups. Note that other groups may also distribute working 30 documents as Internet-Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six 33 months and may be updated, replaced, or obsoleted by other 34 documents at any time. It is inappropriate to use Internet- 35 Drafts as reference material or to cite them other than as "work 36 in progress". 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt. 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 This Internet-Draft will expire on August 2016. 46 Copyright and License Notice 47 Copyright (c) 2016 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with 55 respect to this document. Code Components extracted from this 56 document must include Simplified BSD License text as described 57 in Section 4.e of the Trust Legal Provisions and are provided 58 without warranty as described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction...................................................3 63 1.1. Authors......................................................3 64 1.2. Requirements Language........................................3 65 2. Conventions used in this document..............................3 66 2.1. Terminology..................................................3 67 3. Solution Overview..............................................4 68 3.1. Mapping source specific trees onto the segment routing 69 architecture......................................................5 70 3.2. Role of the Routing System...................................5 71 3.3. MDT Construction Requirements................................6 72 3.4. Pruning - theory of operation................................6 73 4. Elements of Procedure..........................................7 74 4.1. Triggers for Computation.....................................7 75 4.2. FIB Determination............................................7 76 4.2.1. Information in the IGP.....................................7 77 4.2.2. Computation of individual segments.........................7 78 4.3. FIB Generation..............................................10 79 4.4. FIB installation............................................10 80 5. Related work..................................................11 81 5.1. IGP Extensions..............................................11 82 5.2. BGP Extensions..............................................11 83 6. Observations..................................................11 84 7. Acknowledgements..............................................12 85 8. Security Considerations.......................................12 86 9. IANA Considerations...........................................12 87 10. References...................................................12 88 10.1. Normative References.......................................12 89 10.2. Informative References.....................................12 90 11. Authors' Addresses...........................................13 92 1. Introduction 94 This memo describes a solution for multicast for Segment Routing with 95 MPLS data plane in which source specific multicast distribution trees 96 (MDTs) are computed from information distributed via an IGP. 97 Computation can use information in the IGP to determine if a given 98 node in the network has a role as a root, leaf or replication point 99 in a given MDT. Unicast tunnels are employed to interconnect the 100 nodes determined to have a role. Therefore state only need be 101 installed in nodes that have one of these three roles to fully 102 instantiate an MDT. 103 Although this approach is computationally intensive, a significant 104 amount of computation can be avoided when the computing agent 105 determines that the node it is computing for has no role in a given 106 MDT. This permits a computed approach to multicast convergence to be 107 computationally tractable. 108 1.1. Authors 110 David Allan, Jeff Tantsura 112 1.2. Requirements Language 114 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 115 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 116 document are to be interpreted as described in RFC2119 [RFC2119]. 118 2. Conventions used in this document 120 2.1. Terminology 122 Candidate replication point - is a node that potentially needs to 123 install state to replicate multicast traffic as determined at an 124 intermediate step in multicast segment computation. It will either 125 resolve to having no role or a role as a replication point once 126 multicast has converged. 128 Candidate role - refers to any potential combination of roles on a 129 given multicast segment as determined at some intermediate step in 130 MDT computation. For example, a node with a candidate role may be a 131 leaf and may be a candidate replication point. 133 Downstream - refers to the direction along the shortest path to one 134 or more leaves for a given multicast distribution tree 135 Multicast convergence - is when all computation and state 136 installation to ensure the FIB reflects the multicast information in 137 the IGP is complete. 139 MDT - multicast distribution tree. Is a tree composed of one or more 140 multicast segments. 142 Multicast segment - is a portion of the multicast tree where only the 143 root and the leaves have been specified, and computation based upon 144 the current state of the IGP database will be employed to determine 145 and install the required state to implement the segment. A multicast 146 segment is identified by a multicast SID. 148 Pinned path - Is a unique shortest path extending from a leaf 149 upstream towards the root for a given multicast segment. Therefore is 150 a component of the multicast segment that it has been determined must 151 be there. It will not necessarily extend from the leaf all the way to 152 the root during intermediate computation steps. A pinned path can 153 result from pruning operations. 155 Role - refers specifically to a node that is either a root, a leaf, a 156 replication node, or a pinned waypoint for a given MDT. 158 Unicast convergence - is when all computation and state installation 159 to ensure the FIB reflects the unicast information in the IGP is 160 complete. 162 Upstream - refers to the direction along the shortest path to the 163 root of a given MDT. 165 3. Solution Overview 167 This memo describes a multicast architecture in which multicast state 168 is only installed in those nodes that have roles as a root, leaves, 169 and replication points for a given multicast segment. The a-priori 170 established segment routing unicast tunnels are used as interconnect 171 between the nodes that have a role in a given multicast SID. 173 A loosely specified MDT is composed of a single multicast segment and 174 the routing of the MDT is delegated entirely to computation driven by 175 information in the IGP database. 177 Explicitly routed MDTs are expressed as a tree of concatenated 178 multicast segments where both the leaves of each segment and the 179 waypoints coupling a given segment to the upstream and/or downstream 180 segment(s) is specified in information flooded in the IGP by the 181 overall root of the MDT. The segments themselves will be computed as 182 per a loosely specified MDT. 184 A PE acting as an overall root for a given tree is expected to be 185 configured by management as to where to source multicast traffic 186 from, be it an attachment circuit, interworking function for client 187 technology or other. Similarly a leaf for a given tree is expected to 188 be configured by management as to the disposition of received 189 multicast traffic. 191 A computed segment is guaranteed to be loop free in a stable system. 192 A concatenation of segments to construct an MDT will similarly be 193 loop free as any collision of segments can be disambiguated in the 194 data plane via the SIDs. 196 This architecture significantly reduces the amount of state that 197 needs to be installed in the data plane to support multicast. This 198 also means that the impact of many failures in the network on 199 multicast traffic distribution will be recovered by unicast local 200 repair or unicast convergence with subsequent multicast convergence 201 acting in the role of network re-optimization (as opposed to 202 restoration). 204 3.1. Mapping source specific trees onto the segment routing architecture 206 A computed source specific tree for a given multicast group 207 corresponds to one or more multicast segments in the SR architecture, 208 each of which is assigned a SID, typically by management 209 configuration of the node that will be the overall root for the 210 source specific tree, which then uses the IGP to advertise this 211 information to the root"s peers. 213 A multicast group is implemented as the set of source specific trees 214 from all nodes that have registered transmit interest to all nodes 215 that have registered receive interest in a multicast group. 217 3.2. Role of the Routing System 219 The role of the IGP is to communicate topology information, multicast 220 registrations, unicast to SID bindings, multicast to SID bindings and 221 waypoints in multi-segment MDTs. No changes to topology or unicast to 222 SID bindings advertisement are proposed by this memo. 224 The multicast registrations/bindings will be in the form of source, 225 group, transmit/receive interest and the SID to use for the source 226 specific multicast tree. Registrations are originated by any node 227 that has send or receive interest in a given multicast group. Nodes 228 will use the combination of topology and multicast registrations to 229 determine the nodes that have a role in each source specific tree and 230 the SID information to then derive the required FIB state. 232 The definition of the required IGP TLVs is out of scope of this memo 233 and will be done in relevant IGP drafts. 235 3.3. MDT Construction Requirements 237 A multicast segment in an MDT is constructed such that between any 238 pair of nodes that have a role in the segment and are connected by a 239 unicast tunnel, there is not another node on the shortest path 240 between the two with a role in that segment. This ensures that copies 241 of a packet forwarded by an multicast segment will traverse a link 242 only once in a stable system. 244 Note that this can be satisfied by a minimum cost shortest path tree, 245 but is not an absolute requirement. The pruning rules specified in 246 this memo will meet this requirement without necessarily producing 247 absolutely minimum cost multicast segment (or incurring the 248 associated computational cost). 250 3.4. Pruning - theory of operation 252 The role of nodes in a given multicast segment is determined by first 253 producing an inclusive shortest path tree with all possible paths 254 between the root and leaves, and then applying a set of pruning rules 255 repeatedly until an acyclic tree is produced or no further prunes are 256 possible. 258 For the majority of multicast segments these rules will 259 authoritatively produce a minimum cost tree. For those segments that 260 have not yet been authoritatively resolved, there is a set of pruning 261 operations applied that are not guaranteed to produce a tree that 262 meets the requirements of 3.3, therefore these trees require auditing 263 and potential correction according to a further set of agreed rules. 264 This avoids the necessity of an exhaustive search of the solution 265 space. 267 A node during computation of a segment may conclude that it will 268 absolutely not have a role at any of numerous points in the 269 computation process and abandon computation of that segment. 271 4. Elements of Procedure 273 4.1. Triggers for Computation 275 MDT computation is triggered by changes to the IGP database. These 276 are in the form of either changes in registered multicast group 277 interest, addition or removal of a multi-segment MDT descriptor, or 278 topology changes. 280 A change in registered interest for a group will require re- 281 computation of all MDTs that implement the multicast group. 283 A topology change will require the computation of some number of 284 multicast segments, the actual number will depend on the 285 implementation of tree computation but at a minimum will be all trees 286 for which there is not an optimal shortest path solution as a result 287 of the topology change. 289 4.2. FIB Determination 291 4.2.1. Information in the IGP 293 Group membership information for a multicast segment is obtained from 294 the IGP. This is true for single segment MDTs as well as multi- 295 segment MDTs. Included in the multi-segment MDT specification is the 296 waypoint nodes in MDT and the upstream and downstream SIDs. The 297 specified node is expected to cross connect the SIDs to join the 298 segments together acting in the role of leaf for the upstream segment 299 and root for the downstream segment. 301 When a waypoint in an MDT descriptor does not exist in the IGP, the 302 assumption is that the node has failed. The response of the other 303 nodes in the system in FIB determination is to add the leaves of the 304 downstream segment to the upstream segment. 306 4.2.2. Computation of individual segments 308 FIB generation for a multicast segment is the result of computation, 309 ultimately as applied to all source specific trees in the network. 310 All computing nodes implement a common algorithm for tree generation, 311 as all MUST agree on the solution. 313 One algorithm is as follows: 315 All possible shortest paths to the set of leaves for the MDT is 316 determined. Then pruning rules are repeatedly applied until no 317 further prunes are possible. 319 The philosophy of the application of these rules could be expressed 320 as "simplify as much as possible, and prune that which cannot be". 321 The rules are: 323 1) Eliminate any links and nodes not on a potential shortest path 324 from the root to the leaves for the MDT under consideration. 326 2) Simplify via the replacement of any nodes that do not have a 327 potential role in the MDT with links. 329 This will be nodes that are not a leaf, a root or a candidate 330 replication point. For example: 332 Root---------A----------B 334 B is a leaf. A is not but is in a potential shortest path from root 335 to B. However A will have no role in the MDT that serves B as it 336 provides simple transit therefore is replaced with a direct 337 connection between the root and B. 339 Root--------------------B 341 Note that such pruning also needs to avoid the creation of 342 duplicate links. For example: 344 /----------A----------\ 346 Root B 348 \----------C----------/ 350 Where A and C have no role, they can be replaced with a single link 351 from Root to B. 353 3) Simplify via the elimination of fewer hop paths 355 When for a given set of leaves, a node has multiple downstream 356 links that converge on a common downstream point, and that set of 357 leaves is only a subset of the leaves reachable on one or more of 358 the links, any link that only serves that subset of leaves can be 359 pruned. 361 For example: 363 --A---------------------------B 365 \ / 366 -----------C----------- 368 \ 370 ----D 372 B and D are leaves of a root upstream of A. From A, link AB can 373 reach leaf B. Path AC can reach leaf B and D. In this case path A-B 374 can be pruned from consideration. The set of leaves reachable via 375 link A-B is a subset of that reachable by A-C, and the paths from A 376 that serves that subset converges at B. 378 4) Prune via the elimination of upstream links where the nearest 379 reachable leaf is further than the closest leaf or pinned path, 380 and that path does not have a candidate replication point closer 381 than the closet leaf or pinned path, as the resulting tree will 382 require the shortest path to transit the closest upstream leaf or 383 pinned path. 385 For each upstream link for each leaf in a segment the nearest leaf 386 or pinned path is determined. Those links for which the nearest 387 leaf is further upstream than the closest leaf are pruned. 389 If, at the end of pruning and simplification, all leaves in a 390 multicast segment have a unique shortest path to the root, the tree 391 is considered resolved, and the computation can progress directly to 392 the FIB generation step. 394 If not all leaves have a unique shortest path, additional pruning 395 steps are applied. These steps are NOT guaranteed to produce a lowest 396 cost tree, and therefore require an additional audit and possible 397 modification to ensure when forwarding a maximum of one copy of a 398 packet will traverse an interface. 400 For segments not authoritatively resolved by the above rules, a prune 401 that will not authoritatively result in a minimum cost tree is 402 applied. For the purpose of interoperability, the following rule is 403 proposed: A computing node will select the closest node to the root 404 with a candidate role that does not have a unique shortest path to 405 the root. Where more than one such node exists, the one with the 406 lowest unicast SID is selected. For that node, the best upstream link 407 is selected and all other upstream links pruned. The best upstream 408 link is defined as the link with the closest node with a candidate 409 role that potentially serves the highest number of leaves. Where 410 there is a tie, once again the node with the lowest SID is selected. 412 Once the links have been pruned, rules 2 through 4 are repeatedly 413 applied until either the tree is fully resolved, or again no further 414 prunes are possible, in which case the next closest remaining 415 unresolved node has the same prune applied. 417 For all segments not resolved by the initial prune rules, they are 418 audited to ensure all nodes that have a role in the tree do not have 419 a node with a role between them and their upstream node on the tree. 420 If they do, the old upstream adjacency is removed, and the superior 421 one added. 423 4.3. FIB Generation 425 The topology components that remain at the end of the pruning 426 operation will reflect all nodes that have a role in a given 427 multicast segment plus the necessary tunnels (as all intervening 428 multi-path scenarios will have been simplified away). From this the 429 FIB can be generated: 431 All nodes that have a role in a given multicast segment and have 432 nodes upstream in the segment will need to accept the SID for the MDT 433 from at minimum, all upstream interfaces. 435 All nodes that have a role in a given segment and have nodes 436 immediately downstream in the segment will need to replicate packets 437 simply labelled with the multicast SID onto those interfaces. 439 All nodes that have a role in a given segment and have nodes 440 reachable via a tunnel downstream set the FIB to push the tunnel 441 unicast SID for the downstream node onto any replicated copies of a 442 received packet, and identify the set of interfaces on the shortest 443 path for the tunnel SID. 445 4.4. FIB installation 447 FIB installation needs to acknowledge two aspects of the hybrid 448 tunnel and role model of multicast tree construction. The first is 449 that because of the sparse state model simple tree adds, moves, and 450 changes may require the installation of state where it did not 451 previously exist, and such changes may impact existing services. The 452 second is that it is possible to retain the knowledge to prioritize 453 computation of those trees impacted the failure of a node with a 454 role. 456 To address this, there are three stages of state installation for 457 multicast convergence: 459 1) Immediate: 461 a. Installation of state for multicast segments impacted by the 462 failure of a node in the network, and installation of state 463 for segments in nodes that have not previously had a role in 464 the given segment. 466 b. Installation of state for waypoints in multi-segment MDTs. 468 2) After T1: Update state for nodes that both had and have a role in 469 a given multicast segment. 471 3) After T2: Removal of state for nodes that transition from having a 472 role to not having a role for a given multicast segment. 474 T1 and T2 will be network wide configurable values. 476 5. Related work 478 5.1. IGP Extensions 480 RFC 6329 provides a useful example of some of the type of IGP changes 481 that will be required. There are two aspects in RFC 6329 that are 482 worth emulating: 484 - The advertisement of multicast registrations 486 - The negotiation of the algorithm to be used for MDT computation 488 The required changes for both IS-IS and OSPF will be documented in 489 separate WG targeted I-Ds. 491 5.2. BGP Extensions 493 This memo will require the specification of a new PMSI Tunnel 494 Attribute (SPRING P2MP tunnel, tentatively 0x09) to order to 495 integrate into the multicast framework documented in RFC 6514 497 6. Observations 499 This technique is not confined to segment routing, and with the 500 provision of a global label space (to be employed as per a multicast 501 SID), an MPLS-LDP network would also provide the requisite mesh of 502 unicast tunnels and be capable of implementing this approach to 503 multicast. 505 This memo focuses on an implementation based upon nodes that are IGP 506 speakers and converge independently so is written in a form that 507 assumes a node, computing node and IGP speaker are one in the same. 508 It should be observed that the relative frugality of data plane state 509 would suggest that separation of computation from nodes in the data 510 plane combined with management or "software defined networking" based 511 population of the multicast FIB entries may also be useful modes of 512 network operation. 514 7. Acknowledgements 516 8. Security Considerations 518 For a future version of this document. 520 9. IANA Considerations 522 For a future version of this document. 524 10. References 526 10.1. Normative References 528 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 529 Requirement Levels", BCP 14, RFC 2119, March 1997. 531 10.2. Informative References 533 [RFC6379] Ashwood-Smith et.al., "IS-IS Extensions Supporting IEEE 534 802.1aq Shortest Path Bridging", IETF RFC 6329, April 2012 536 [RFC6514] Aggarwal et.al., "BGP Encodings and Procedures for Multicast 537 in MPLS/BGP IP VPNs", IETF RFC 6514, February 2012 539 [RFC7385] Andersson & Swallow "IANA Registry for P-Multicast Service 540 Interface (PMSI) Tunnel Type Code Points", IETF RFC 7385, 541 October 2014 543 11. Authors' Addresses 545 Dave Allan (editor) 546 Ericsson 547 300 Holger Way 548 San Jose, CA 95134 549 USA 550 Email: david.i.allan@ericsson.com 552 Jeff Tantsura 553 Ericsson 554 200 Holger Way 555 San Jose, CA 95134 556 Email: jeff.tantsura@ericsson.com