idnits 2.17.00 (12 Aug 2021) /tmp/idnits63179/draft-allan-spring-mpls-multicast-framework-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 2016) is 2166 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'MCAST-OPSF' is mentioned on line 498, but not defined == Unused Reference: 'MCAST-OSPF' is defined on line 551, but no explicit reference was found in the text == Unused Reference: 'RFC6514' is defined on line 555, but no explicit reference was found in the text == Unused Reference: 'RFC7385' is defined on line 558, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 SPRING Working Group Dave Allan 2 Internet Draft Ericsson 3 Intended status: Standards Track Jeff Tantsura 4 Expires: December 2016 6 June 2016 8 A Framework for Computed Multicast applied to MPLS based Segment 9 Routing 10 draft-allan-spring-mpls-multicast-framework-01 12 Abstract 14 This document describes a multicast solution for Segment Routing with 15 MPLS data plane. It is consistent with the Segment Routing 16 architecture in that an IGP is augmented to distribute information in 17 addition to the link state. In this solution it is multicast group 18 membership information sufficient to synchronize state in a given 19 network domain. Computation is employed to determine the topology of 20 any loosely specified multicast distribution tree. 22 Status of this Memo 24 This Internet-Draft is submitted to IETF in full conformance 25 with the provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet 28 Engineering Task Force (IETF), its areas, and its working 29 groups. Note that other groups may also distribute working 30 documents as Internet-Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six 33 months and may be updated, replaced, or obsoleted by other 34 documents at any time. It is inappropriate to use Internet- 35 Drafts as reference material or to cite them other than as "work 36 in progress". 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt. 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 This Internet-Draft will expire on December 2016. 46 Copyright and License Notice 47 Copyright (c) 2016 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with 55 respect to this document. Code Components extracted from this 56 document must include Simplified BSD License text as described 57 in Section 4.e of the Trust Legal Provisions and are provided 58 without warranty as described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction...................................................3 63 1.1. Authors......................................................3 64 1.2. Requirements Language........................................3 65 2. Conventions used in this document..............................3 66 2.1. Terminology..................................................3 67 3. Solution Overview..............................................4 68 3.1. Mapping source specific trees onto the segment routing 69 architecture......................................................5 70 3.2. Role of the Routing System...................................5 71 3.3. MDT Construction Requirements................................6 72 3.4. Pruning - theory of operation................................6 73 4. Elements of Procedure..........................................7 74 4.1. Triggers for Computation.....................................7 75 4.2. FIB Determination............................................7 76 4.2.1. Information in the IGP.....................................7 77 4.2.2. Computation of individual segments.........................8 78 4.3. FIB Generation..............................................10 79 4.4. FIB installation............................................11 80 5. Related work..................................................11 81 5.1. IGP Extensions..............................................11 82 5.2. BGP Extensions..............................................11 83 6. Observations..................................................12 84 7. Acknowledgements..............................................12 85 8. Security Considerations.......................................12 86 9. IANA Considerations...........................................12 87 10. References...................................................12 88 10.1. Normative References.......................................12 89 10.2. Informative References.....................................12 90 11. Authors' Addresses...........................................13 92 1. Introduction 94 This memo describes a solution for multicast for Segment Routing with 95 MPLS data plane in which source specific multicast distribution trees 96 (MDTs) are computed from information distributed via an IGP. 97 Computation can use information in the IGP to determine if a given 98 node in the network has a role as a root, leaf or replication point 99 in a given MDT. Unicast tunnels are employed to interconnect the 100 nodes determined to have a role. Therefore state only need be 101 installed in nodes that have one of these three roles to fully 102 instantiate an MDT. 103 Although this approach is computationally intensive, a significant 104 amount of computation can be avoided when the computing agent 105 determines that the node it is computing for has no role in a given 106 MDT. This permits a computed approach to multicast convergence to be 107 computationally tractable. 108 1.1. Authors 110 Dave Allan, Jeff Tantsura 112 1.2. Requirements Language 114 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 115 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 116 document are to be interpreted as described in RFC2119 [RFC2119]. 118 2. Conventions used in this document 120 2.1. Terminology 122 Candidate replication point - is a node that potentially needs to 123 install state to replicate multicast traffic as determined at an 124 intermediate step in multicast segment computation. It will either 125 resolve to having no role or a role as a replication point once 126 multicast has converged. 128 Candidate role - refers to any potential combination of roles on a 129 given multicast segment as determined at some intermediate step in 130 MDT computation. For example, a node with a candidate role may be a 131 leaf and may be a candidate replication point. 133 Downstream - refers to the direction along the shortest path to one 134 or more leaves for a given multicast distribution tree 135 Multicast convergence - is when all computation and state 136 installation to ensure the FIB reflects the multicast information in 137 the IGP is complete. 139 MDT - multicast distribution tree. Is a tree composed of one or more 140 multicast segments. 142 Multicast segment - is a portion of the multicast tree where only the 143 root and the leaves have been specified, and computation based upon 144 the current state of the IGP database is employed to determine and 145 install the required state to implement the segment. For MPLS a 146 multicast segment is implemented as a p2mp LSP. A multicast segment 147 is identified by a multicast SID. 149 Multicast SID - Is the data plane identifier that is used to 150 implement a multicast segment. As per a unicast MPLS segment, the 151 rightmost 20 bits of a multicast SID is encoded as a label. It is 152 drawn from an SRGB that is global to the SR domain. 154 Pinned path - Is a unique shortest path extending from a leaf 155 upstream towards the root for a given multicast segment. Therefore is 156 a component of the multicast segment that it has been determined must 157 be there. It will not necessarily extend from the leaf all the way to 158 the root during intermediate computation steps. A pinned path can 159 result from pruning operations. 161 Role - refers specifically to a node that is either a root, a leaf, a 162 replication node, or a pinned waypoint for a given MDT. 164 Unicast convergence - is when all computation and state installation 165 to ensure the FIB reflects the unicast information in the IGP is 166 complete. 168 Upstream - refers to the direction along the shortest path to the 169 root of a given MDT. 171 3. Solution Overview 173 This memo describes a multicast architecture in which multicast state 174 is only installed in those nodes that have roles as a root, leaves, 175 and replication points for a given multicast segment. The a-priori 176 established segment routing unicast tunnels are used as interconnect 177 between the nodes that have a role in a given multicast SID. 179 A loosely specified MDT is composed of a single multicast segment and 180 the routing of the MDT is delegated entirely to computation driven by 181 information in the IGP database. 183 Explicitly routed MDTs are expressed as a tree of concatenated 184 multicast segments where both the leaves of each segment and the 185 waypoints coupling a given segment to the upstream and/or downstream 186 segment(s) is specified in information flooded in the IGP by the 187 overall root of the MDT. The segments themselves will be computed as 188 per a loosely specified MDT. 190 A PE acting as an overall root for a given tree is expected to be 191 configured by the operator as to where to source multicast traffic 192 from, be it an attachment circuit, interworking function for client 193 technology or other. Similarly a leaf for a given tree is expected to 194 be configured by the operator as to the disposition of received 195 multicast traffic. 197 A computed segment is guaranteed to be loop free in a stable system. 198 A concatenation of segments to construct an MDT will similarly be 199 loop free as any collision of segments can be disambiguated in the 200 data plane via the SIDs. 202 This architecture significantly reduces the amount of state that 203 needs to be installed in the data plane to support multicast. This 204 also means that the impact of many failures in the network on 205 multicast traffic distribution will be recovered by unicast local 206 repair or unicast convergence with subsequent multicast convergence 207 acting in the role of network re-optimization (as opposed to 208 restoration). 210 3.1. Mapping source specific trees onto the segment routing architecture 212 A computed source specific tree for a given multicast group 213 corresponds to one or more multicast segments in the SR architecture. 214 Each multicast segment is assigned a SID, typically by management 215 configuration of the node that will be the overall root for the 216 source specific tree. The root node then uses the IGP to advertise 217 this information to all nodes in the IGP area/domain. 219 A multicast group is implemented as the set of source specific trees 220 from all nodes that have registered transmit interest to all nodes 221 that have registered receive interest in a multicast group. 223 3.2. Role of the Routing System 225 The role of the IGP is to communicate topology information, multicast 226 capability and associated algorithm, multicast registrations, unicast 227 to SID bindings, multicast to SID bindings and waypoints in multi- 228 segment MDTs. No changes to topology or unicast to SID binding 229 advertisements are proposed by this memo. 231 The multicast registrations/bindings will be in the form of source, 232 group, transmit/receive interest and the SID to use for the source 233 specific multicast tree. Registrations are originated by any node 234 that has send or receive interest in a given multicast group. Nodes 235 will use the combination of topology and multicast registrations to 236 determine the nodes that have a role in each source specific tree and 237 the SID information to then derive the required FIB state. 239 3.3. MDT Construction Requirements 241 A multicast segment in an MDT is constructed such that between any 242 pair of nodes that have a role in the segment and are connected by a 243 unicast tunnel, there is not another node on the shortest path 244 between the two with a role in that segment. This ensures that copies 245 of a packet forwarded by an multicast segment will traverse a link 246 only once in a stable system. 248 Note that this can be satisfied by a minimum cost shortest path tree, 249 but is not an absolute requirement. The pruning rules specified in 250 this memo will meet this requirement without necessarily producing 251 absolutely minimum cost multicast segment (or incurring the 252 associated computational cost). 254 3.4. Pruning - theory of operation 256 The role of nodes in a given multicast segment is determined by first 257 producing an inclusive shortest path tree with all possible paths 258 between the root and leaves, and then applying a set of pruning rules 259 repeatedly until an acyclic tree is produced or no further prunes are 260 possible. 262 For the majority of multicast segments these rules will 263 authoritatively produce a minimum cost tree. For those segments that 264 have not yet been authoritatively resolved, there is a set of pruning 265 operations applied that are not guaranteed to produce a tree that 266 meets the requirements of 3.3, therefore these trees require auditing 267 and potential correction according to a further set of agreed rules. 268 This avoids the necessity of an exhaustive search of the solution 269 space. 271 A node during computation of a segment may conclude that it will 272 absolutely not have a role at any of numerous points in the 273 computation process and abandon computation of that segment. 275 4. Elements of Procedure 277 4.1. Triggers for Computation 279 MDT computation is triggered by changes to the IGP database. These 280 are in the form of either changes in registered multicast group 281 interest, addition or removal of a multi-segment MDT descriptor, or 282 topology changes. 284 A change in registered interest for a group will require re- 285 computation of all MDTs that implement the multicast group. 287 A topology change will require the computation of some number of 288 multicast segments, the actual number will depend on the 289 implementation of tree computation but at a minimum will be all trees 290 for which there is not an optimal shortest path solution as a result 291 of the topology change. 293 4.2. FIB Determination 295 4.2.1. Information in the IGP 297 Group membership information for a multicast segment is obtained from 298 the IGP. This is true for single segment MDTs as well as multi- 299 segment MDTs. Included in the multi-segment MDT specification is the 300 waypoint nodes in MDT and the upstream and downstream SIDs. The 301 specified node is expected to cross connect the SIDs to join the 302 segments together acting in the role of leaf for the upstream segment 303 and root for the downstream segment. 305 When a waypoint in an MDT descriptor does not exist in the IGP, the 306 assumption is that the node identified by the waypoint SID has 307 failed. The response of the other nodes in the system in FIB 308 determination is to add the leaves of the downstream segment to the 309 upstream segment. 311 An example of this would be consider a node "x", and another node 312 "y". At some point in time, "x" advertises a tree that identifies "y" 313 as a waypoint that cross connects upstream SID "a" to downstream SID 314 "b". At some later point node "y" fails. The other nodes in the 315 network will compute segment "a" as if it included all leaves and 316 waypoints in segment "b". All apriori state installed for segment "b" 317 would be removed as the failure of "y" has required "b" to be 318 subsumed by "a". 320 4.2.2. Computation of individual segments 322 FIB generation for a multicast segment is the result of computation, 323 ultimately as applied to all source specific trees in the network. 324 All computing nodes implement a common algorithm for tree generation, 325 as all MUST agree on the solution. 327 One algorithm is as follows: 329 All possible shortest paths to the set of leaves for the MDT is 330 determined. Then pruning rules are repeatedly applied until no 331 further prunes are possible. 333 The philosophy of the application of these rules could be expressed 334 as "simplify as much as possible, and prune that which cannot be". 335 The rules are: 337 1) Eliminate any links and nodes not on a potential shortest path 338 from the root to the leaves for the MDT under consideration. 340 2) Simplify via the replacement of any nodes that do not have a 341 potential role in the MDT with links. 343 This will be nodes that are not a leaf, a root or a candidate 344 replication point. For example: 346 Root---------A----------B 348 B is a leaf. A is not but is in a potential shortest path from root 349 to B. However A will have no role in the MDT that serves B as it 350 provides simple transit therefore is replaced with a direct 351 connection between the root and B. 353 Root--------------------B 355 Note that such pruning also needs to avoid the creation of 356 duplicate parallel links. For example: 358 /----------A----------\ 360 Root B 362 \----------C----------/ 364 Where A and C have no role and the cost root-A-B = cost root-C-B, 365 they can be replaced with a single link from Root to B. 367 3) Simplify via the elimination of fewer hop paths 369 When for a given set of leaves, a node has multiple downstream 370 links that converge on a common downstream point, and that set of 371 leaves is only a subset of the leaves reachable on one or more of 372 the links, any link that only serves that subset of leaves can be 373 pruned. 375 For example: 377 --A---------------------------B 379 \ / 381 -----------C----------- 383 \ 385 ----D 387 Link AB is cost 2, link AC and CB are cost 1 (cost of link CD does 388 not affect the example). 390 B and D are leaves of a root upstream of A. From A, link AB can 391 reach leaf B. Path AC can reach leaf B and D. In this case path A-B 392 can be pruned from consideration. The set of leaves reachable via 393 link A-B is a subset of that reachable by A-C, and the paths from A 394 that serves that subset converges at B. 396 4) Prune via the elimination of upstream links where the nearest 397 reachable leaf is further than the closest leaf or pinned path, 398 and that path does not have a candidate replication point closer 399 than the closet leaf or pinned path, as the resulting tree will 400 require the shortest path to transit the closest upstream leaf or 401 pinned path. 403 For each upstream link for each leaf in a segment the nearest leaf 404 or pinned path is determined. Those links for which the nearest 405 leaf is further upstream than the closest leaf are pruned. 407 If, at the end of pruning and simplification, all leaves in a 408 multicast segment have a unique shortest path to the root, the tree 409 is considered resolved, and the computation can progress directly to 410 the FIB generation step. 412 If not all leaves have a unique shortest path, additional pruning 413 steps are applied. These steps are NOT guaranteed to produce a lowest 414 cost tree, and therefore require an additional audit and possible 415 modification to ensure when forwarding a maximum of one copy of a 416 packet will traverse an interface. 418 For segments not authoritatively resolved by the above rules, a prune 419 that will not authoritatively result in a minimum cost tree is 420 applied. For the purpose of interoperability, the following rule is 421 proposed: A computing node will select the closest node to the root 422 with a candidate role that does not have a unique shortest path to 423 the root. Where more than one such node exists, the one with the 424 lowest unicast SID is selected. For that node, the best upstream link 425 is selected and all other upstream links pruned. The best upstream 426 link is defined as the link with the closest node with a candidate 427 role that potentially serves the highest number of leaves. Where 428 there is a tie, once again the node with the lowest SID is selected. 430 Once the links have been pruned, rules 2 through 4 are repeatedly 431 applied until either the tree is fully resolved, or again no further 432 prunes are possible, in which case the next closest remaining 433 unresolved node has the same prune applied. 435 For all segments not resolved by the initial prune rules, they are 436 audited to ensure all nodes that have a role in the tree do not have 437 a node with a role between them and their upstream node on the tree. 438 If they do, the old upstream adjacency is removed, and the superior 439 one added. 441 4.3. FIB Generation 443 The topology components that remain at the end of the pruning 444 operation will reflect all nodes that have a role in a given 445 multicast segment plus the necessary tunnels (as all intervening 446 multi-path scenarios will have been simplified away). From this the 447 FIB can be generated: 449 All nodes that have a role in a given multicast segment and have 450 nodes upstream in the segment will need to accept the SID for the MDT 451 from at minimum, all upstream interfaces. 453 All nodes that have a role in a given segment and have nodes 454 immediately downstream in the segment will need to replicate packets 455 simply labelled with the multicast SID onto those interfaces. 457 All nodes that have a role in a given segment and have nodes 458 reachable via a tunnel downstream set the FIB to push the tunnel 459 unicast SID for the downstream node onto any replicated copies of a 460 received packet, and identify the set of interfaces on the shortest 461 path for the tunnel SID. 463 4.4. FIB installation 465 FIB installation needs to acknowledge two aspects of the hybrid 466 tunnel and role model of multicast tree construction. The first is 467 that because of the sparse state model simple tree adds, moves, and 468 changes may require the installation of state where it did not 469 previously exist, and such changes may impact existing services. The 470 second is that it is possible to retain the knowledge to prioritize 471 computation of those trees impacted the failure of a node with a 472 role. 474 To address this, there are three stages of state installation for 475 multicast convergence: 477 1) Immediate: 479 a. Installation of state for multicast segments impacted by the 480 failure of a node in the network, and installation of state 481 for segments in nodes that have not previously had a role in 482 the given segment. 484 b. Installation of state for waypoints in multi-segment MDTs. 486 2) After T1: Update state for nodes that both had and have a role in 487 a given multicast segment. 489 3) After T2: Removal of state for nodes that transition from having a 490 role to not having a role for a given multicast segment. 492 T1 and T2 are network wide configurable values. 494 5. Related work 496 5.1. IGP Extensions 498 The required IGP changes are documented in [MCAST-ISIS] and [MCAST- 499 OPSF]. 501 5.2. BGP Extensions 503 This memo will require the specification of a new PMSI Tunnel 504 Attribute (SPRING P2MP tunnel, tentatively 0x09) to order to 505 integrate into the multicast framework documented in RFC 6514 507 6. Observations 509 This technique is not confined to segment routing, and with the 510 provision of a global label space (to be employed as per a multicast 511 SID), an MPLS-LDP network would also provide the requisite mesh of 512 unicast tunnels and be capable of implementing this approach to 513 multicast. 515 This memo focuses on an implementation based upon nodes that are IGP 516 speakers and converge independently so is written in a form that 517 assumes a node, computing node and IGP speaker are one in the same. 518 It should be observed that the relative frugality of data plane state 519 would suggest that separation of computation from nodes in the data 520 plane combined with management or "software defined networking" based 521 population of the multicast FIB entries may also be useful modes of 522 network operation. 524 7. Acknowledgements 526 Thanks to Uma Chunduri for his detailed review and suggestions. 528 8. Security Considerations 530 For a future version of this document. 532 9. IANA Considerations 534 This document requires the allocation of a PMSI tunnel type to 535 identify a SPRING P2MP tunnel type from the P-Multicast Service 536 Interface Tunnel (PMSI Tunnel) Tunnel Types registry. 538 10. References 540 10.1. Normative References 542 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 543 Requirement Levels", BCP 14, RFC 2119, March 1997. 545 10.2. Informative References 547 [MCAST-ISIS] Allan et.al., "IS-IS extensions for Computed Multicast 548 applied to MPLS based Segment Routing", IETF work in progress, 549 draft-allan-isis-spring-multicast-00, July 2016 551 [MCAST-OSPF] Allan et.al., "OSPF extensions for Computed Multicast 552 applied to MPLS based Segment Routing", IETF work in progress, 553 draft-allan-ospf-spring-multicast-00, July 2016 555 [RFC6514] Aggarwal et.al., "BGP Encodings and Procedures for Multicast 556 in MPLS/BGP IP VPNs", IETF RFC 6514, February 2012 558 [RFC7385] Andersson & Swallow "IANA Registry for P-Multicast Service 559 Interface (PMSI) Tunnel Type Code Points", IETF RFC 7385, 560 October 2014 562 11. Authors' Addresses 564 Dave Allan (editor) 565 Ericsson 566 300 Holger Way 567 San Jose, CA 95134 568 USA 569 Email: david.i.allan@ericsson.com 571 Jeff Tantsura 572 Email: jefftant.ietf@gmail.com