idnits 2.17.00 (12 Aug 2021) /tmp/idnits14665/draft-ietf-rtgwg-cl-use-cases-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (August 12, 2012) is 3569 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: draft-ietf-rtgwg-cl-requirement has been published as RFC 7226 == Outdated reference: A later version (-06) exists of draft-so-yong-rtgwg-cl-framework-04 -- Obsolete informational reference (is this intentional?): RFC 1717 (Obsoleted by RFC 1990) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG S. Ning 3 Internet-Draft Tata Communications 4 Intended status: Informational A. Malis 5 Expires: February 13, 2013 D. McDysan 6 Verizon 7 L. Yong 8 Huawei USA 9 C. Villamizar 10 Outer Cape Cod Network 11 Consulting 12 August 12, 2012 14 Composite Link Use Cases and Design Considerations 15 draft-ietf-rtgwg-cl-use-cases-00 17 Abstract 19 This document provides a set of use cases and design considerations 20 for composite links. 22 Composite link is a formalization of multipath techniques currently 23 in use in IP and MPLS networks and a set of extensions to multipath 24 techniques. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on February 13, 2013. 43 Copyright Notice 45 Copyright (c) 2012 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Conventions used in this document . . . . . . . . . . . . . . 3 62 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 63 3. Composite Link Foundation Use Cases . . . . . . . . . . . . . 4 64 4. Delay Sensitive Applications . . . . . . . . . . . . . . . . . 7 65 5. Large Volume of IP and LDP Traffic . . . . . . . . . . . . . . 7 66 6. Composite Link and Packet Ordering . . . . . . . . . . . . . . 8 67 6.1. MPLS-TP in network edges only . . . . . . . . . . . . . . 10 68 6.2. Composite Link at core LSP ingress/egress . . . . . . . . 11 69 6.3. MPLS-TP as a MPLS client . . . . . . . . . . . . . . . . . 12 70 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 71 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 72 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 73 9.1. Normative References . . . . . . . . . . . . . . . . . . . 13 74 9.2. Informative References . . . . . . . . . . . . . . . . . . 13 75 Appendix A. More Details on Existing Network Operator 76 Practices and Protocol Usage . . . . . . . . . . . . 15 77 Appendix B. Existing Multipath Standards and Techniques . . . . . 17 78 B.1. Common Multpath Load Spliting Techniques . . . . . . . . . 18 79 B.2. Simple and Adaptive Load Balancing Multipath . . . . . . . 19 80 B.3. Traffic Split over Parallel Links . . . . . . . . . . . . 20 81 B.4. Traffic Split over Multiple Paths . . . . . . . . . . . . 20 82 Appendix C. Characteristics of Transport in Core Networks . . . . 20 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 85 1. Introduction 87 Composite link requirements are specified in 88 [I-D.ietf-rtgwg-cl-requirement]. A composite link framework is 89 defined in [I-D.so-yong-rtgwg-cl-framework]. 91 Multipath techniques have been widely used in IP networks for over 92 two decades. The use of MPLS began more than a decade ago. 93 Multipath has been widely used in IP/MPLS networks for over a decade 94 with very little protocol support dedicated to effective use of 95 multipath. 97 The state of the art in multipath prior to composite links is 98 documented in Appendix B. 100 Both Ethernet Link Aggregation [IEEE-802.1AX] and MPLS link bundling 101 [RFC4201] have been widely used in today's MPLS networks. Composite 102 link differs in the following caracteristics. 104 1. A composite link allows bundling of non-homogenous links together 105 as a single logical link. 107 2. A composite link provides more information in the TE-LSDB and 108 supports more explicit control over placement of LSP. 110 2. Conventions used in this document 112 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 113 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 114 document are to be interpreted as described in RFC 2119 [RFC2119]. 116 2.1. Terminology 118 Terminology defined in [I-D.ietf-rtgwg-cl-requirement] is used in 119 this document. 121 In addition, the following terms are used: 123 classic multipath: 124 Classic multipath refers to the most common current practice in 125 implementation and deployment of multipath (see Appendix A). The 126 most common current practice makes use of a hash on the MPLS 127 label stack and if IPv4 or IPv6 are indicates under the label 128 stack, makes use of the IP source and destination addresses 129 [RFC4385] [RFC4928]. 131 classic link bundling: 132 Classic link bundling refers to the use of [RFC4201] where the 133 "all ones" component is not used. Where the "all ones" component 134 is used, link bundling behaves as classic multipath does. 135 Classic link bundling selects a single component link on which to 136 put any given LSP. 138 Among the important distinctions between classic multipath or classic 139 link bundling and Composite Link are: 141 1. Classic multipath has no provision to retain order among flows 142 within a subset of LSP. Classic link bundling retains order 143 among all flows but as a result does a poor job of splitting load 144 among components and therefore is rarely (if ever) deployed. 145 Composite Link allows per LSP control of load split 146 characteristics. 148 2. Classic multipath and classic link bundling do not provide a 149 means to put some LSP on component links with lower delay. 150 Composite Link does. 152 3. Classic multipath will provide a load balance for IP and LDP 153 traffic. Classic link bundling will not. Neither classic 154 multipath or classic link bundling will measure IP and LDP 155 traffic and reduce the advertised "Available Bandwidth" as a 156 result of that measurement. Composite Link better supports 157 RSVP-TE used with significant traffic levels of native IP and 158 native LDP. 160 4. Classic link bundling cannot support an LSP that is greater in 161 capacity than any single component link. Classic multipath and 162 Composite Link support this capability but will reorder traffic 163 on such an LSP. Composite Link can retain order of an LSP that 164 is carried within an LSP that is greater in capacity than any 165 single component link if the contained LSP has such a 166 requirement. 168 None of these techniques, classic multipath, classic link bundling, 169 or Composite Link, will reorder traffic among IP microflows. None of 170 these techniques will reorder traffic among PW, if a PWE3 Control 171 Word is used [RFC4385]. 173 3. Composite Link Foundation Use Cases 175 A simple composite link composed entirely of physical links is 176 illustrated in Figure 1, where a composite link is configured between 177 LSR1 and LSR2. This composite link has three component links. 179 Individual component links in a composite link may be supported by 180 different transport technologies such as wavelength, Ethernet VLAN. 181 Even if the transport technology implementing the component links is 182 identical, the characteristics (e.g., bandwidth, latency) of the 183 component links may differ. 185 The composite link in Figure 1 may carry LSP traffic flows and 186 control plane packets. Control plane packets may appear as IP 187 packets or may be carried within a generic associated channel (G-Ach) 188 [RFC5586]. A LSP may be established over the link by either RSVP-TE 189 [RFC3209] or LDP [RFC5036] signaling protocols. All component links 190 in a composite link are summarized in the same forwarding adjacency 191 LSP (FA-LSP) routing advertisement [RFC3945]. The composite link is 192 summarized as one TE-Link advertised into the IGP by the composite 193 link end points. This information is used in path computation when a 194 full MPLS control plane is in use. The individual component links or 195 groups of component links may optionally be advertised into the IGP 196 as sub-TLV of the composite link advertisement to indicate capacity 197 available with various characteristics, such as a delay range. 199 Management Plane 200 Configuration and Measurement <------------+ 201 ^ | 202 | | 203 +-------+-+ +-+-------+ 204 | | | | | | 205 CP Packets V | | V CP Packets 206 | V | | Component Link 1 | | ^ | 207 | | |=|===========================|=| | | 208 | +----| | Component Link 2 | |----+ | 209 | |=|===========================|=| | 210 Aggregated LSPs | | | | | 211 ~|~~~~~~>| | Component Link 3 | |~~~~>~~|~~ 212 | |=|===========================|=| | 213 | | | | | | 214 | LSR1 | | LSR2 | 215 +---------+ +---------+ 216 ! ! 217 ! ! 218 !<------ Composite Link ------->! 220 Figure 1: a composite link constructed with multiple physical links 221 between two LSR 223 [I-D.ietf-rtgwg-cl-requirement] specifies that component links may 224 themselves be composite links. Figure 2 shows three three forms of 225 component links which may be deployed in a network. 227 +-------+ 1. Physical Link +-------+ 228 | |-|----------------------------------------------|-| | 229 | | | | | | 230 | | | +------+ +------+ | | | 231 | | | | MPLS | 2. Logical Link | MPLS | | | | 232 | |.|.... |......|.....................|......|....|.| | 233 | | |-----| LSR3 |---------------------| LSR4 |----| | | 234 | | | +------+ +------+ | | | 235 | | | | | | 236 | | | | | | 237 | | | +------+ +------+ | | | 238 | | | |GMPLS | 3. Logical Link |GMPLS | | | | 239 | |.|. ...|......|.....................|......|....|.| | 240 | | |-----| LSR5 |---------------------| LSR6 |----| | | 241 | | +------+ +------+ | | 242 | LSR1 | | LSR2 | 243 +-------+ +-------+ 244 |<------------- Composite Link ------------------->| 246 Figure 2: Illustration of Various Component Link Types 248 The three forms of component link shown in Figure 2 are: 250 1. The first component link is configured with direct physical 251 media. 253 2. The second component link is a TE tunnel that traverses LSR3 and 254 LSR4, where LSR3 and LSR4 are the nodes supporting MPLS, but 255 supporting few or no GMPLS extensions. 257 3. The third component link is formed by lower layer network that 258 has GMPLS enabled. In this case, LSR5 and LSR6 are not the nodes 259 controlled by the MPLS but provide the connectivity for the 260 component link. 262 A composite link forms one logical link between connected LSR and is 263 used to carry aggregated traffic [I-D.ietf-rtgwg-cl-requirement]. 264 Composite link relies on its component links to carry the traffic 265 over the composite link. The endpoints of the composite link maps 266 incoming traffic into component links. 268 For example, LSR1 in Figure 1 distributes the set of traffic flows 269 including control plane packets among the set of component links. 270 LSR2 in Figure 1 receives the packets from its component links and 271 sends them to MPLS forwarding engine with no attempt to reorder 272 packets arriving on different component links. The traffic in the 273 opposite direction, from LSR2 to LSR1, is distributed across the set 274 of component links by the LSR2. 276 These three forms of component link are only example. Many other 277 examples are possible. A component link may itself be a composite 278 link. A segment of an LSP (single hop for that LSP) may be a 279 composite link. 281 4. Delay Sensitive Applications 283 Most applications benefit from lower delay. Some types of 284 applications are far more sensitive than others. For example, real 285 time bidirectional applications such as voice communication or two 286 way video conferencing are far more sensitive to delay than 287 unidirectional streaming audio or video. Non-interactive bulk 288 transfer is almost insensitive to delay if a large enough TCP window 289 is used. 291 Some applications are sensitive to delay but unwilling to pay extra 292 to insure lower delay. For example, many SIP end users are willing 293 to accept the delay offerred to best effort services as long as call 294 quality is good most of the time. 296 Other applications are sensitive to delay and willing to pay extra to 297 insure lower delay. For example, financial trading applications are 298 extremely sensitive to delay and with a lot at stake are willing to 299 go to great lengths to reduce delay. 301 Among the requirements of Composite Link are requirements to 302 advertise capacity available within configured ranges of delay within 303 a given composite link and the support the ability to place an LSP 304 only on component links that meeting that LSP's delay requirements. 306 The Composite Link requirements to accommodate delay sensitive 307 applications are analogous to diffserv requirements to accomodate 308 applications requiring higher quality of service on the same 309 infrastructure as applications with less demanding requirements. The 310 ability to share capacity with less demanding applications, with best 311 effort applications being the least demanding, can greatly reduce the 312 cost of delivering service to the more demanding applications. 314 5. Large Volume of IP and LDP Traffic 316 IP and LDP do not support traffic engineering. Both make use of a 317 shortest (lowest routing metric) path, with an option to use equal 318 cost multipath (ECMP). Note that though ECMP is prohibited in LDP 319 specifications, it is widely implemented. Where implemented for LDP, 320 ECMP is generally disabled by default for standards compliance, but 321 often enabled in LDP deployments. 323 Without traffic engineering capability, there must be sufficient 324 capacity to accomodate the IP and LDP traffic. If not, persistent 325 queuing delay and loss will occur. Unlike RSVP-TE, a subset of 326 traffic cannot be routed using constraint based routing to avoid a 327 congested portion of an infrastructure. 329 In existing networks which accomodate IP and/or LDP with RSVP-TE, 330 either the IP and LDP can be carried over RSVP-TE, or where the 331 traffic contribution of IP and LDP is small, IP and LDP can be 332 carried native and the effect on RSVP-TE can be ignored. Ignoring 333 the traffic contribution of IP is certainly valid on high capacity 334 networks where native IP is used primarily for control and network 335 management and customer IP is carried within RSVP-TE. 337 Where it is desireable to carry native IP and/or LDP and IP and/or 338 LDP traffic volumes are not negligible, RSVP-TE needs improvement. 339 The enhancement offerred by Composite Link is an ability to measure 340 the IP and LDP, filter the measurements, and reduce the capacity 341 available to RSVP-TE to avoid congestion. The treatment given to the 342 IP or LDP traffic is similar to the treatment when using the "auto- 343 bandwidth" feature in some RSVP-TE implementations on that same 344 traffic, and giving a higher priority (numerically lower setup 345 priority and holding priority value) to the "auto-bandwidth" LSP. 346 The difference is that the measurement is made at each hop and the 347 reduction in advertised bandwidth is made more directly. 349 6. Composite Link and Packet Ordering 351 A strong motivation for Composite Link is the need to provide LSP 352 capacity in IP backbones that exceeds the capacity of single 353 wavelengths provided by transport equipment and exceeds the practical 354 capacity limits acheivable through inverse multiplexing. Appendix C 355 describes characteristics and limitations of transport systems today. 356 Section 2 defines the terms "classic multipath" and "classic link 357 bundling" used in this section. 359 For purpose of discussion, consider two very large cities, city A and 360 city Z. For example, in the US high traffic cities might be New York 361 and Los Angeles and in Europe high traffic cities might be London and 362 Amsterdam. Two other high volume cities, city B and city Y may share 363 common provider core network infrastructure. Using the same 364 examples, the city B and Y may Washington DC and San Francisco or 365 Paris and Stockholm. In the US, the common infrastructure may span 366 Denver, Chicago, Detroit, and Cleveland. Other major traffic 367 contributors on either US coast include Boston, northern Virginia on 368 the east coast, and Seattle, and San Diego on the west coast. The 369 capacity of IP/MPLS links within the shared infrastructure, for 370 example city to city links in the Denver, Chicago, Detroit, and 371 Cleveland path in the US example, have capacities for most of the 372 2000s decade that greatly exceeded single circuits available in 373 transport networks. 375 For a case with four large traffic sources on either side of the 376 shared infrastructure, up to sixteen core city to core city traffic 377 flows in excess of transport circuit capacity may be accomodated on 378 the shared infrastructure. 380 Today the most common IP/MPLS core network design makes use of very 381 large links which consist of many smaller component links, but use 382 classic multipath techniques rather than classic link bundling or 383 Composite Link. A component link typically corresponds to the 384 largest circuit that the transport system is capable of providing (or 385 the largest cost effective circuit). IP source and destination 386 address hashing is used to distribute flows across the set of 387 component links as described in Appendix B.3. 389 Classic multipath can handle large LSP up to the total capacity of 390 the multipath (within limits, see Appendix B.2). A disadvantage of 391 classic multipath is the reordering among traffic within a given core 392 city to core city LSP. While there is no reordering within any 393 microflow and therefore no customer visible issue, MPLS-TP cannot be 394 used across an infrastructure where classic multipath is in use, 395 except within pseudowires. 397 These capacity issues force the use of classic multipath today. 398 Classic multipath excludes a direct use of MPLS-TP. The desire for 399 OAM, offerred by MPLS-TP, is in conflict with the use of classic 400 multipath. There are a number of alternatives that satisfy both 401 requirements. Some alternatives are described below. 403 MPLS-TP in network edges only 405 A simple approach which requires no change to the core is to 406 disallow MPLS-TP across the core unless carried within a 407 pseudowire (PW). MPLS-TP may be used within edge domains where 408 classic multipath is not used. PW may be signaled end to end 409 using single segment PW (SS-PW), or stitched across domains using 410 multisegment PW (MS-PW). The PW and anything carried within the 411 PW may use OAM as long as fat-PW [RFC6391] load splitting is not 412 used by the PW. 414 Composite Link at core LSP ingress/egress 416 The interior of the core network may use classic link bundling, 417 with the limitation that no LSP can exceed the capacity of a 418 single circuit. Larger non-MPLS-TP LSP can be configured using 419 multiple ingress to egress component MPLS-TP LSP. This can be 420 accomplished using existing IP source and destination address 421 hashing configured at LSP ingress and egress, or using Composite 422 Link configured at ingress and egress. Each component LSP, if 423 constrained to be no larger than the capacity of a single 424 circuit. can make use of MPLS-TP and offer OAM for all top level 425 LSP across the core. 427 MPLS-TP as a MPLS client 429 A third approach involves modifying the behavior of LSR in the 430 interior of the network core, such that MPLS-TP can be used on a 431 subset of LSP, where the capacity of any one LSP within that 432 MPLS-TP subset of LSP is not larger than the capacity of a single 433 circuit. This requirement is accommodated through a combination 434 of signaling to indicate LSP for which traffic splitting needs to 435 be constrained, the ability to constrain the depth of the label 436 stack over which traffic splitting can be applied on a per LSP 437 basis, and the ability to constrain the use of IP addresses below 438 the label stack for traffic splitting also on a per LSP basis. 440 The above list of alternatives allow packet ordering within an LSP to 441 be maintained in some circumstances and allow very large LSP 442 capacities. Each of these alternatives are discussed further in the 443 following subsections. 445 6.1. MPLS-TP in network edges only 447 Classic MPLS link bundling is defined in [RFC4201] and has existed 448 since early in the 2000s decade. Classic MPLS link bundling place 449 any given LSP entirely on a single component link. Classic MPLS link 450 bundling is not in widespread use as the means to accomodate large 451 link capacities in core networks due to the simplicity and better 452 multiplexing gain, and therefore lower network cost of classic 453 multipath. 455 If MPLS-TP OAM capability in the IP/MPLS network core LSP is not 456 required, then there is no need to change existing network designs 457 which use classic multipath and both label stack and IP source and 458 destination address based hashing as a basis for load splitting. 460 If MPLS-TP is needed for a subset of LSP, then those LSP can be 461 carried within pseudowires. The pseudowires adds a thin layer of 462 encapsulation and therefore a small overhead. If only a subset of 463 LSP need MPLS-TP OAM, then some LSP must make use of the pseudowires 464 and other LSP avoid them. A straihtforward way to accomplish this is 465 with administrative attributes [RFC3209]. 467 6.2. Composite Link at core LSP ingress/egress 469 Composite Link can be configured only for large LSP that are made of 470 smaller MPLS-TP component LSP. This approach is capable of 471 supporting MPLS-TP OAM over the entire set of component link LSP and 472 therefore the entire set of top level LSP traversing the core. 474 There are two primary disadvantage of this approach. One is the 475 number of top level LSP traversing the core can be dramatically 476 increased. The other disadvantage is the loss of multiplexing gain 477 that results from use of classic link bundling within the interior of 478 the core network. 480 If component LSP use MPLS-TP, then no component LSP can exceed the 481 capacity of a single circuit. For a given composite LSP there can 482 either be a number of equal capacity component LSP or some number of 483 full capacity component links plus one LSP carrying the excess. For 484 example, a 350 Gb/s composite LSP over a 100 Gb/s infrastructure may 485 use five 70 Gb/s component LSP or three 100 Gb/s LSP plus one 50 Gb/s 486 LSP. Classic MPLS link bundling is needed to support MPLS-TP and 487 suffers from a bin packing problem even if LSP traffic is completely 488 predictable, which it never is in practice. 490 The common means of setting composite link bandwidth parameters uses 491 long term statistical measures. For example, many providers base 492 their LSP bandwidth parameters on the 95th percentile of carried 493 traffic as measured over a one week period. It is common to add 494 10-30% to the 95th percentile value measured over the prior week and 495 adjust bandwidth parameters of LSP weekly. It is also possible to 496 measure traffic flow at the LSR and adjust bandwidth parameters 497 somewhat more dynamically. This is less common in deployments and 498 where deployed, make use of filtering to track very long term trends 499 in traffic levels. In either case, short term variation of traffic 500 levels relative to signaled LSP capacity are common. Allowing a 501 large overallocation of LSP bandwidth parameters (ie: adding 30% or 502 more) avoids overutilization of any given LSP, but increases unused 503 network capacity and increases network cost. Allowing a small 504 overallocation of LSP bandwidth parameters (ie: 10-20% or less) 505 results in both underutilization and overutilization but 506 statistically results in a total utilization within the core that is 507 under capacity most or all of the time. 509 The classic multipath solution accomodates the situation in which 510 some composite LSP are underutilizing their signaled capacity and 511 others are overutilizing their capacity with the need for far less 512 unused network capacity to accomodate variation in actual traffic 513 levels. If the actual traffic levels of LSP can be described by a 514 probability distribution, the variation of the sum of LSP is less 515 than the variation of any given LSP for all but a constant traffic 516 level (where the variation of the sum and the components are both 517 zero). 519 There are two situations which can motivate the use of this approach. 520 This design is favored if the provider values MPLS-TP OAM across the 521 core more than efficiency (or is unaware of the efficiency issue). 522 This design can also make sense if transport equipment or very low 523 cost core LSR are available which support only classic link bundling 524 and regardless of loss of multiplexing gain, are more cost effective 525 at carrying transit traffic than using equipment which supports IP 526 source and destination address hashing. 528 6.3. MPLS-TP as a MPLS client 530 Accomodating MPLS-TP as a MPLS client requires a small change to 531 forwarding behavior and is therefore most applicable to major network 532 overbuilds or new deployments. The change to forwarding is an 533 ability to limit the depth of MPLS labels used in hashing on the 534 label stack on a per LSP basis. Some existing hardware, particularly 535 microprogrammed hardware, may be able to accomodate this forwarding 536 change. Providing support in new hardware is not difficult, a much 537 smaller change than, for example, changes required to disable PHP in 538 an environment where LSP hierarchy is used. 540 The advantage of this approach is an ability to accommodate MPLS-TP 541 as a client LSP but retain the high multiplexing gain and therefore 542 efficency and low network cost of a pure MPLS deployment. The 543 disadvantage is the need for a small change in forwarding. 545 7. Security Considerations 547 This document is a use cases document. Existing protocols are 548 referenced such as MPLS. Existing techniques such as MPLS link 549 bundling and multipath techniques are referenced. These protocols 550 and techniques are documented elsewhere and contain security 551 considerations which are unchanged by this document. 553 This document also describes use cases for Composite Link, which is a 554 work-in-progress. Composite Link requirements are defined in 555 [I-D.ietf-rtgwg-cl-requirement]. [I-D.so-yong-rtgwg-cl-framework] 556 defines a framework for Composite Link. Composite Link bears many 557 similarities to MPLS link bundling and multipath techniques used with 558 MPLS. Aditional security considerations, if any, beyond those 559 already identified for MPLS, MPLS link bundling and multipath 560 techniques, will be documented in the framework document if specific 561 to the overall framework of Composite Link, or in protocol extensions 562 if specific to a given protocol extension defined later to support 563 Composite Link. 565 8. Acknowledgments 567 Authors would like to thank [ no one so far ] for their reviews and 568 great suggestions. 570 In the interest of full disclosure of affiliation and in the interest 571 of acknowledging sponsorship, past affiliations of authors are noted. 572 Much of the work done by Ning So occurred while Ning was at Verizon. 573 Much of the work done by Curtis Villamizar occurred while at 574 Infinera. Infinera continues to sponsor this work on a consulting 575 basis. 577 9. References 579 9.1. Normative References 581 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 582 Requirement Levels", BCP 14, RFC 2119, March 1997. 584 9.2. Informative References 586 [I-D.ietf-rtgwg-cl-requirement] 587 Villamizar, C., McDysan, D., Ning, S., Malis, A., and L. 588 Yong, "Requirements for MPLS Over a Composite Link", 589 draft-ietf-rtgwg-cl-requirement-04 (work in progress), 590 March 2011. 592 [I-D.so-yong-rtgwg-cl-framework] 593 So, N., Malis, A., McDysan, D., Yong, L., Villamizar, C., 594 and T. Li, "Composite Link Framework in Multi Protocol 595 Label Switching (MPLS)", 596 draft-so-yong-rtgwg-cl-framework-04 (work in progress), 597 June 2011. 599 [IEEE-802.1AX] 600 IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE 601 Standard for Local and Metropolitan Area Networks - Link 602 Aggregation", 2006, . 605 [ITU-T.G.694.2] 606 ITU-T, "Spectral grids for WDM applications: CWDM 607 wavelength grid", 2003, 608 . 610 [ITU-T.G.800] 611 ITU-T, "Unified functional architecture of transport 612 networks", 2007, 613 . 615 [ITU-T.Y.1540] 616 ITU-T, "Internet protocol data communication service - IP 617 packet transfer and availability performance parameters", 618 2007, . 620 [ITU-T.Y.1541] 621 ITU-T, "Network performance objectives for IP-based 622 services", 2006, . 624 [RFC1717] Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The 625 PPP Multilink Protocol (MP)", RFC 1717, November 1994. 627 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 628 and W. Weiss, "An Architecture for Differentiated 629 Services", RFC 2475, December 1998. 631 [RFC2597] Heinanen, J., Baker, F., Weiss, W., and J. Wroclawski, 632 "Assured Forwarding PHB Group", RFC 2597, June 1999. 634 [RFC2615] Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615, 635 June 1999. 637 [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and 638 Multicast Next-Hop Selection", RFC 2991, November 2000. 640 [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path 641 Algorithm", RFC 2992, November 2000. 643 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 644 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 645 Tunnels", RFC 3209, December 2001. 647 [RFC3260] Grossman, D., "New Terminology and Clarifications for 648 Diffserv", RFC 3260, April 2002. 650 [RFC3809] Nagarajan, A., "Generic Requirements for Provider 651 Provisioned Virtual Private Networks (PPVPN)", RFC 3809, 652 June 2004. 654 [RFC3945] Mannie, E., "Generalized Multi-Protocol Label Switching 655 (GMPLS) Architecture", RFC 3945, October 2004. 657 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 658 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 660 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 661 Internet Protocol", RFC 4301, December 2005. 663 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 664 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 665 Use over an MPLS PSN", RFC 4385, February 2006. 667 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 668 Cost Multipath Treatment in MPLS Networks", BCP 128, 669 RFC 4928, June 2007. 671 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 672 Specification", RFC 5036, October 2007. 674 [RFC5586] Bocci, M., Vigoureux, M., and S. Bryant, "MPLS Generic 675 Associated Channel", RFC 5586, June 2009. 677 [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 678 J., and S. Amante, "Flow-Aware Transport of Pseudowires 679 over an MPLS Packet Switched Network", RFC 6391, 680 November 2011. 682 Appendix A. More Details on Existing Network Operator Practices and 683 Protocol Usage 685 Often, network operators have a contractual Service Level Agreement 686 (SLA) with customers for services that are comprised of numerical 687 values for performance measures, principally availability, latency, 688 delay variation. Additionally, network operators may have Service 689 Level Sepcification (SLS) that is for internal use by the operator. 690 See [ITU-T.Y.1540], [ITU-T.Y.1541], RFC3809, Section 4.9 [RFC3809] 691 for examples of the form of such SLA and SLS specifications. In this 692 document we use the term Network Performance Objective (NPO) as 693 defined in section 5 of [ITU-T.Y.1541] since the SLA and SLS measures 694 have network operator and service specific implications. Note that 695 the numerical NPO values of Y.1540 and Y.1541 span multiple networks 696 and may be looser than network operator SLA or SLS objectives. 697 Applications and acceptable user experience have an important 698 relationship to these performance parameters. 700 Consider latency as an example. In some cases, minimizing latency 701 relates directly to the best customer experience (e.g., in TCP closer 702 is faster). In other cases, user experience is relatively 703 insensitive to latency, up to a specific limit at which point user 704 perception of quality degrades significantly (e.g., interactive human 705 voice and multimedia conferencing). A number of NPOs have. a bound 706 on point-point latency, and as long as this bound is met, the NPO is 707 met -- decreasing the latency is not necessary. In some NPOs, if the 708 specified latency is not met, the user considers the service as 709 unavailable. An unprotected LSP can be manually provisioned on a set 710 of to meet this type of NPO, but this lowers availability since an 711 alternate route that meets the latency NPO cannot be determined. 713 Historically, when an IP/MPLS network was operated over a lower layer 714 circuit switched network (e.g., SONET rings), a change in latency 715 caused by the lower layer network (e.g., due to a maintenance action 716 or failure) this was not known to the MPLS network. This resulted in 717 latency affecting end user experience, sometimes violating NPOs or 718 resulting in user complaints. 720 A response to this problem was to provision IP/MPLS networks over 721 unprotected circuits and set the metric and/or TE-metric proportional 722 to latency. This resulted in traffic being directed over the least 723 latency path, even if this was not needed to meet an NPO or meet user 724 experience objectives. This results in reduced flexibility and 725 increased cost for network operators. Using lower layer networks to 726 provide restoration and grooming is expected to be more efficient, 727 but the inability to communicate performance parameters, in 728 particular latency, from the lower layer network to the higher layer 729 network is an important problem to be solved before this can be done. 731 Latency NPOs for point-to-point services are often tied closely to 732 geographic locations, while latency for multipoint services may be 733 based upon a worst case within a region. 735 Section 7 of [ITU-T.Y.1540] defines availability for an IP service in 736 terms of loss exceeding a threshold for a period on the order of 5 737 minutes. However, the timeframes for restoration (i.e., as 738 implemented by pre-determined protection, convergence of routing 739 protocols and/or signaling) for services range from on the order of 740 100 ms or less (e.g., for VPWS to emulate classical SDH/SONET 741 protection switching), to several minutes (e.g., to allow BGP to 742 reconverge for L3VPN) and may differ among the set of customers 743 within a single service. 745 The presence of only three Traffic Class (TC) bits (previously known 746 as EXP bits) in the MPLS shim header is limiting when a network 747 operator needs to support QoS classes for multiple services (e.g., 748 L2VPN VPWS, VPLS, L3VPN and Internet), each of which has a set of QoS 749 classes that need to be supported. In some cases one bit is used to 750 indicate conformance to some ingress traffic classification, leaving 751 only two bits for indicating the service QoS classes. The approach 752 that has been taken is to aggregate these QoS classes into similar 753 sets on LER-LSR and LSR-LSR links. 755 Labeled LSPs and use of link layer encapsulation have been 756 standardized in order to provide a means to meet these needs. 758 The IP DSCP cannot be used for flow identification since RFC 4301 759 Section 5.5 [RFC4301] requires Diffserv transparency, and in general 760 network operators do not rely on the DSCP of Internet packets. In 761 addition, the use of IP DSCP for flow identification is incompatible 762 with Assured Forwarding services [RFC2597] or any other service which 763 may use more than one DSCP code point to carry traffic for a given 764 microflow. 766 A label is pushed onto Internet packets when they are carried along 767 with L2/L3VPN packets on the same link or lower layer network 768 provides a mean to distinguish between the QoS class for these 769 packets. 771 Operating an MPLS-TE network involves a different paradigm from 772 operating an IGP metric-based LDP signaled MPLS network. The 773 multipoint-to-point LDP signaled MPLS LSPs occur automatically, and 774 balancing across parallel links occurs if the IGP metrics are set 775 "equally" (with equality a locally definable relation). 777 Traffic is typically comprised of a few large (some very large) flows 778 and many small flows. In some cases, separate LSPs are established 779 for very large flows. This can occur even if the IP header 780 information is inspected by a LSR, for example an IPsec tunnel that 781 carries a large amount of traffic. An important example of large 782 flows is that of a L2/L3 VPN customer who has an access line 783 bandwdith comparable to a client-client composite link bandwidth -- 784 there could be flows that are on the order of the access line 785 bandwdith. 787 Appendix B. Existing Multipath Standards and Techniques 789 Today the requirement to handle large aggregations of traffic, much 790 larger than a single component link, can be handled by a number of 791 techniques which we will collectively call multipath. Multipath 792 applied to parallel links between the same set of nodes includes 793 Ethernet Link Aggregation [IEEE-802.1AX], link bundling [RFC4201], or 794 other aggregation techniques some of which may be vendor specific. 795 Multipath applied to diverse paths rather than parallel links 796 includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS, or 797 even BGP, and equal cost LSP, as described in Appendix B.4. Various 798 mutilpath techniques have strengths and weaknesses. 800 the term Composite Link is more general than terms such as Link 801 Aggregation which is generally considered to be specific to Ethernet 802 and its use here is consistent with the broad definition in 803 [ITU-T.G.800]. The term multipath excludes inverse multiplexing and 804 refers to techniques which only solve the problem of large 805 aggregations of traffic, without addressing the other requirements 806 outlined in this document, particularly those described in Section 4 807 and Section 5. 809 B.1. Common Multpath Load Spliting Techniques 811 Identical load balancing techniqes are used for multipath both over 812 parallel links and over diverse paths. 814 Large aggregates of IP traffic do not provide explicit signaling to 815 indicate the expected traffic loads. Large aggregates of MPLS 816 traffic are carried in MPLS tunnels supported by MPLS LSP. LSP which 817 are signaled using RSVP-TE extensions do provide explicit signaling 818 which includes the expected traffic load for the aggregate. LSP 819 which are signaled using LDP do not provide an expected traffic load. 821 MPLS LSP may contain other MPLS LSP arranged hierarchically. When an 822 MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as 823 payload, there is no signaling associated with these inner LSP. 824 Therefore even when using RSVP-TE signaling there may be insufficient 825 information provided by signaling to adequately distribute load based 826 solely on signaling. 828 Generally a set of label stack entries that is unique across the 829 ordered set of label numbers in the label stack can safely be assumed 830 to contain a group of flows. The reordering of traffic can therefore 831 be considered to be acceptable unless reordering occurs within 832 traffic containing a common unique set of label stack entries. 833 Existing load splitting techniques take advantage of this property in 834 addition to looking beyond the bottom of the label stack and 835 determining if the payload is IPv4 or IPv6 to load balance traffic 836 accordingly. 838 MPLS-TP OAM violates the assumption that it is safe to reorder 839 traffic within an LSP. If MPLS-TP OAM is to be accommodated, then 840 existing multipth techniques must be modified. Such modifications 841 are outside the scope of this document. 843 For example,a large aggregate of IP traffic may be subdivided into a 844 large number of groups of flows using a hash on the IP source and 845 destination addresses. This is as described in [RFC2475] and 846 clarified in [RFC3260]. For MPLS traffic carrying IP, a similar hash 847 can be performed on the set of labels in the label stack. These 848 techniques are both examples of means to subdivide traffic into 849 groups of flows for the purpose of load balancing traffic across 850 aggregated link capacity. The means of identifying a set of flows 851 should not be confused with the definition of a flow. 853 Discussion of whether a hash based approach provides a sufficiently 854 even load balance using any particular hashing algorithm or method of 855 distributing traffic across a set of component links is outside of 856 the scope of this document. 858 The current load balancing techniques are referenced in [RFC4385] and 859 [RFC4928]. The use of three hash based approaches are described in 860 [RFC2991] and [RFC2992]. A mechanism to identify flows within PW is 861 described in [RFC6391]. The use of hash based approaches is 862 mentioned as an example of an existing set of techniques to 863 distribute traffic over a set of component links. Other techniques 864 are not precluded. 866 B.2. Simple and Adaptive Load Balancing Multipath 868 Simple multipath generally relies on the mathematical probability 869 that given a very large number of small microflows, these microflows 870 will tend to be distributed evenly across a hash space. Early very 871 simple multipath implementations assumed that all component links are 872 of equal capacity and perform a modulo operation across the hashed 873 value. An alternate simple multipath technique uses a table 874 generally with a power of two size, and distributes the table entries 875 proportionally among component links according to the capacity of 876 each component link. 878 Simple load balancing works well if there are a very large number of 879 small microflows (i.e., microflow rate is much less than component 880 link capacity). However, the case where there are even a few large 881 microflows is not handled well by simple load balancing. 883 An adaptive load balancing multipath technique is one where the 884 traffic bound to each component link is measured and the load split 885 is adjusted accordingly. As long as the adjustment is done within a 886 single network element, then no protocol extensions are required and 887 there are no interoperability issues. 889 Note that if the load balancing algorithm and/or its parameters is 890 adjusted, then packets in some flows may be briefly delivered out of 891 sequence, however in practice such adjustments can be made very 892 infrequent. 894 B.3. Traffic Split over Parallel Links 896 The load spliting techniques defined in Appendix B.1 and Appendix B.2 897 are both used in splitting traffic over parallel links between the 898 same pair of nodes. The best known technique, though far from being 899 the first, is Ethernet Link Aggregation [IEEE-802.1AX]. This same 900 technique had been applied much earlier using OSPF or ISIS Equal Cost 901 MultiPath (ECMP) over parallel links between the same nodes. 902 Multilink PPP [RFC1717] uses a technique that provides inverse 903 multiplexing, however a number of vendors had provided proprietary 904 extensions to PPP over SONET/SDH [RFC2615] that predated Ethernet 905 Link Aggregation but are no longer used. 907 Link bundling [RFC4201] provides yet another means of handling 908 parallel LSP. RFC4201 explicitly allow a special value of all ones 909 to indicate a split across all members of the bundle. This "all 910 ones" component link is signaled in the MPLS RESV to indicate that 911 the link bundle is making use of classic multipath techniques. 913 B.4. Traffic Split over Multiple Paths 915 OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of 916 traffic split over multiple paths that may traverse intermediate 917 nodes. ECMP is often incorrectly equated to only this case, and 918 multipath over multiple diverse paths is often incorrectly equated to 919 ECMP. 921 Many implementations are able to create more than one LSP between a 922 pair of nodes, where these LSP are routed diversely to better make 923 use of available capacity. The load on these LSP can be distributed 924 proportionally to the reserved bandwidth of the LSP. These multiple 925 LSP may be advertised as a single PSC FA and any LSP making use of 926 the FA may be split over these multiple LSP. 928 Link bundling [RFC4201] component links may themselves be LSP. When 929 this technique is used, any LSP which specifies the link bundle may 930 be split across the multiple paths of the LSP that comprise the 931 bundle. 933 Appendix C. Characteristics of Transport in Core Networks 935 The characteristics of primary interest are the capacity of a single 936 circuit and the use of wave division multiplexing (WDM) to provide a 937 large number of parallel circuits. 939 Wave division multiplexing (WDM) supports multiple independent 940 channels (independent ignoring crosstalk noise) at slightly different 941 wavelengths of light, multiplexed onto a single fiber. Typical in 942 the early 2000s was 40 wavelengths of 10 Gb/s capacity per 943 wavelength. These wavelengths are in the C-band range, which is 944 about 1530-1565 nm, though some work has been done using the L-band 945 1565-1625 nm. 947 The C-band has been carved up using a 100 GHz spacing from 191.7 THz 948 to 196.1 THz by [ITU-T.G.694.2]. This yields 44 channels. If the 949 outermost channels are not used, due to poorer transmission 950 characteristics, then typcially 40 are used. For practical reasons, 951 a 50 GhZ or 25 GHz spacing is used by more recent equipment, 952 yielding. 80 or 160 channels in practice. 954 The early optical modulation techniques used within a single channel 955 yielded 2.5Gb/s and 10 Gb/s capacity per channel. As modulation 956 techniques have improved 40 Gb/s and 100 Gb/s per channel have been 957 acheived. 959 The 40 channels of 10 Gb/s common in the mid 2000s yields a total of 960 400 Gb/s. Tighter spacing and better modulations are yielding up to 961 8 Tb/s or more in more recent systems. 963 Over the optical is an electrical encoding. In the 1990s this was 964 typically Synchronous Optical Networking (SONET) or Synchronous 965 Digital Hierarchy (SDH), with a maximum defined circuit capacity of 966 40 Gb/s (OC-768), though the 10 Gb/s OC-192 is more common. More 967 recently the low level electrical encoding has been Optical Transport 968 Network (OTN) defined by ITU-T. OTN currently defines circuit 969 capacities up to a nominal 100 Gb/s (ODU4). Both SONET/SDH and OTN 970 make use of time division multiplexing (TDM) where the a higher 971 capacity circuit such as a 100 Gb/s ODU4 in OTN may be subdivided 972 into lower fixed capacity circuits such as ten 10 Gb/s ODU2. 974 In the 1990s, all IP and later IP/MPLS networks either used a 975 fraction of maximum circuit capacity, or at most the full circuit 976 capacity toward the end of the decade, when full circuit capacity was 977 2.5 Gb/s or 10 Gb/s. Beyond 2000, the TDM circuit multiplexing 978 capability of SONET/SDH or OTN was rarely used. 980 Early in the 2000s both transport equipment and core LSR offerred 40 981 Gb/s SONET OC-768. However 10 Gb/s transport equipment was 982 predominantly deployed throughout the decade, partially because LSR 983 10GbE ports were far more cost effective than either OC-192 or OC-768 984 and became practical in the second half of the decade. 986 Entering the 2010 decade, LSR 40GbE and 100GbE are expected to become 987 widely available and cost effective. Slightly preceeding this 988 transport equipment making use of 40 Gb/s and 100 Gb/s modulations 989 are becoming available. This transport equipment is capable or 990 carrying 40 Gb/s ODU3 and 100 Gb/s ODU4 circuits. 992 Early in the 2000s decade IP/MPLS core networks were making use of 993 single 10 Gb/s circuits. Capacity grew quickly in the first half of 994 the decade but more IP/MPLS core networks had only a small number of 995 IP/MPLS links requiring 4-8 parallel 10 Gb/s circuits. However, the 996 use of multipath was necessary, was deemed the simplest and most cost 997 effective alternative, and became thoroughly entrenched. By the end 998 of the 2000s decade nearly all major IP/MPLS core service provider 999 networks and a few content provider networks had IP/MPLS links which 1000 exceeded 100 Gb/s, long before 40GbE was available and 40 Gb/s 1001 transport in widespread use. 1003 It is less clear when IP/MPLS LSP exceeded 10 Gb/s, 40 Gb/s, and 100 1004 Gb/s. By 2010, many service providers have LSP in excess of 100 1005 Gb/s, but few are willing to disclose how many LSP have reached this 1006 capacity. 1008 At the time of writing 40GbE and 100GbE LSR products are being 1009 evaluated by service providers and contect providers and are in use 1010 in network trials. The cost of components required to deliver 100 1011 GbE products remains high making these products less cost effective. 1012 This is expected to change within years. 1014 The important point is that IP/MPLS core network links have long ago 1015 exceeded 100 Gb/s and a small number of IP/MPLS LSP exceed 100 Gb/s. 1016 By the time 100 Gb/s circuits are widely deployed, IP/MPLS core 1017 network links are likely to exceed 1 Tb/s and many IP/MPLS LSP 1018 capacities are likely to exceed 100 Gb/s. Therefore multipath 1019 techniques are likely here to stay. 1021 Authors' Addresses 1023 So Ning 1024 Tata Communications 1026 Email: ning.so@tatacommunications.com 1027 Andrew Malis 1028 Verizon 1029 117 West St. 1030 Waltham, MA 02451 1032 Phone: +1 781-466-2362 1033 Email: andrew.g.malis@verizon.com 1035 Dave McDysan 1036 Verizon 1037 22001 Loudoun County PKWY 1038 Ashburn, VA 20147 1040 Email: dave.mcdysan@verizon.com 1042 Lucy Yong 1043 Huawei USA 1044 5340 Legacy Dr. 1045 Plano, TX 75025 1047 Phone: +1 469-277-5837 1048 Email: lucy.yong@huawei.com 1050 Curtis Villamizar 1051 Outer Cape Cod Network Consulting 1053 Email: curtis@occnc.com