idnits 2.17.00 (12 Aug 2021) /tmp/idnits9948/draft-ietf-rtgwg-cl-use-cases-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 15, 2013) is 3262 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: draft-ietf-mpls-multipath-use has been published as RFC 7190 == Outdated reference: A later version (-04) exists of draft-ietf-rtgwg-cl-framework-02 == Outdated reference: draft-ietf-rtgwg-cl-requirement has been published as RFC 7226 -- Obsolete informational reference (is this intentional?): RFC 1717 (Obsoleted by RFC 1990) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG S. Ning 3 Internet-Draft Tata Communications 4 Intended status: Informational A. Malis 5 Expires: December 17, 2013 D. McDysan 6 Verizon 7 L. Yong 8 Huawei USA 9 C. Villamizar 10 Outer Cape Cod Network 11 Consulting 12 June 15, 2013 14 Composite Link Use Cases and Design Considerations 15 draft-ietf-rtgwg-cl-use-cases-03 17 Abstract 19 This document provides a set of use cases and design considerations 20 for composite links. 22 Composite link is a formalization of multipath techniques currently 23 in use in IP and MPLS networks and a set of extensions to existing 24 multipath techniques. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on December 17, 2013. 43 Copyright Notice 45 Copyright (c) 2013 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Conventions used in this document . . . . . . . . . . . . . . 3 62 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 63 3. Composite Link Foundation Use Cases . . . . . . . . . . . . . 4 64 4. Delay Sensitive Applications . . . . . . . . . . . . . . . . . 7 65 5. Large Volume of IP and LDP Traffic . . . . . . . . . . . . . . 7 66 6. Composite Link and Packet Ordering . . . . . . . . . . . . . . 8 67 6.1. MPLS-TP in network edges only . . . . . . . . . . . . . . 10 68 6.2. Composite Link at core LSP ingress/egress . . . . . . . . 11 69 6.3. MPLS-TP as a MPLS client . . . . . . . . . . . . . . . . . 12 70 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 71 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12 72 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 73 10. Informative References . . . . . . . . . . . . . . . . . . . . 13 74 Appendix A. More Details on Existing Network Operator 75 Practices and Protocol Usage . . . . . . . . . . . . 15 76 Appendix B. Existing Multipath Standards and Techniques . . . . . 18 77 B.1. Common Multpath Load Spliting Techniques . . . . . . . . . 18 78 B.2. Static and Dynamic Load Balancing Multipath . . . . . . . 19 79 B.3. Traffic Split over Parallel Links . . . . . . . . . . . . 20 80 B.4. Traffic Split over Multiple Paths . . . . . . . . . . . . 20 81 Appendix C. Characteristics of Transport in Core Networks . . . . 21 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 84 1. Introduction 86 Composite link requirements are specified in 87 [I-D.ietf-rtgwg-cl-requirement]. A composite link framework is 88 defined in [I-D.ietf-rtgwg-cl-framework]. 90 Multipath techniques have been widely used in IP networks for over 91 two decades. The use of MPLS began more than a decade ago. 92 Multipath has been widely used in IP/MPLS networks for over a decade 93 with very little protocol support dedicated to effective use of 94 multipath. 96 The state of the art in multipath prior to composite links is 97 documented in Appendix B. 99 Both Ethernet Link Aggregation [IEEE-802.1AX] and MPLS link bundling 100 [RFC4201] have been widely used in today's MPLS networks. Composite 101 link differs in the following characteristics. 103 1. A composite link allows bundling of non-homogenous links together 104 as a single logical link. 106 2. A composite link provides more information in the TE-LSDB and 107 supports more explicit control over placement of LSP. 109 2. Conventions used in this document 111 2.1. Terminology 113 Terminology defined in [I-D.ietf-rtgwg-cl-requirement] is used in 114 this document. 116 In addition, the following terms are used: 118 classic multipath: 119 Classic multipath refers to the most common current practice in 120 implementation and deployment of multipath (see Appendix B). The 121 most common current practice makes use of a hash on the MPLS 122 label stack and if IPv4 or IPv6 are indicates under the label 123 stack, makes use of the IP source and destination addresses 124 [RFC4385] [RFC4928]. 126 classic link bundling: 127 Classic link bundling refers to the use of [RFC4201] where the 128 "all ones" component is not used. Where the "all ones" component 129 is used, link bundling behaves as classic multipath does. 130 Classic link bundling selects a single component link to carry 131 all of the traffic for a given LSP. 133 Among the important distinctions between classic multipath or classic 134 link bundling and Composite Link are: 136 1. Classic multipath has no provision to retain packet order within 137 any specific LSP. Classic link bundling retains packet order 138 among any given LSP but as a result does a poor job of splitting 139 load among components and therefore is rarely (if ever) deployed. 140 Composite Link allows per LSP control of load split 141 characteristics. 143 2. Classic multipath and classic link bundling do not provide a 144 means to put some LSP on component links with lower delay. 145 Composite Link does. 147 3. Classic multipath will provide a load balance for IP and LDP 148 traffic. Classic link bundling will not. Neither classic 149 multipath or classic link bundling will measure IP and LDP 150 traffic and reduce the advertised "Available Bandwidth" as a 151 result of that measurement. Composite Link better supports 152 RSVP-TE used with significant traffic levels of native IP and 153 native LDP. 155 4. Classic link bundling cannot support an LSP that is greater in 156 capacity than any single component link. Classic multipath 157 supports this capability but may reorder traffic on such an LSP. 158 Composite Link can retain order of an LSP that is carried within 159 an LSP that is greater in capacity than any single component link 160 if the contained LSP has such a requirement. 162 None of these techniques, classic multipath, classic link bundling, 163 or Composite Link, will reorder traffic among IP microflows. None of 164 these techniques will reorder traffic among PW, if a PWE3 Control 165 Word is used [RFC4385]. 167 3. Composite Link Foundation Use Cases 169 A simple composite link composed entirely of physical links is 170 illustrated in Figure 1, where a composite link is configured between 171 LSR1 and LSR2. This composite link has three component links. 172 Individual component links in a composite link may be supported by 173 different transport technologies such as SONET, OTN, Ethernet, etc. 174 Even if the transport technology implementing the component links is 175 identical, the characteristics (e.g., bandwidth, latency) of the 176 component links may differ. 178 The composite link in Figure 1 may carry LSP traffic flows and 179 control plane packets. Control plane packets may appear as IP 180 packets or may be carried within a generic associated channel (G-Ach) 181 [RFC5586]. A LSP may be established over the link by either RSVP-TE 182 [RFC3209] or LDP [RFC5036] signaling protocols. All component links 183 in a composite link are summarized in the same forwarding adjacency 184 LSP (FA-LSP) routing advertisement [RFC3945]. The composite link is 185 summarized as one TE-Link advertised into the IGP by the composite 186 link end points. This information is used in path computation when a 187 full MPLS control plane is in use. The individual component links or 188 groups of component links may optionally be advertised into the IGP 189 as sub-TLV of the composite link advertisement to indicate capacity 190 available with various characteristics, such as a delay range. 192 Management Plane 193 Configuration and Measurement <------------+ 194 ^ | 195 | | 196 +-------+-+ +-+-------+ 197 | | | | | | 198 CP Packets V | | V CP Packets 199 | V | | Component Link 1 | | ^ | 200 | | |=|===========================|=| | | 201 | +----| | Component Link 2 | |----+ | 202 | |=|===========================|=| | 203 Aggregated LSPs | | | | | 204 ~|~~~~~~>| | Component Link 3 | |~~~~>~~|~~ 205 | |=|===========================|=| | 206 | | | | | | 207 | LSR1 | | LSR2 | 208 +---------+ +---------+ 209 ! ! 210 ! ! 211 !<------ Composite Link ------->! 213 Figure 1: a composite link constructed with multiple physical links 214 between two LSR 216 [I-D.ietf-rtgwg-cl-requirement] specifies that component links may 217 themselves be composite links. Figure 2 shows three three forms of 218 component links which may be deployed in a network. 220 +-------+ 1. Physical Link +-------+ 221 | |-|----------------------------------------------|-| | 222 | | | | | | 223 | | | +------+ +------+ | | | 224 | | | | MPLS | 2. Logical Link | MPLS | | | | 225 | |.|.... |......|.....................|......|....|.| | 226 | | |-----| LSR3 |---------------------| LSR4 |----| | | 227 | | | +------+ +------+ | | | 228 | | | | | | 229 | | | | | | 230 | | | +------+ +------+ | | | 231 | | | |GMPLS | 3. Logical Link |GMPLS | | | | 232 | |.|. ...|......|.....................|......|....|.| | 233 | | |-----| LSR5 |---------------------| LSR6 |----| | | 234 | | +------+ +------+ | | 235 | LSR1 | | LSR2 | 236 +-------+ +-------+ 237 |<------------- Composite Link ------------------->| 239 Figure 2: Illustration of Various Component Link Types 241 The three forms of component link shown in Figure 2 are: 243 1. The first component link is configured with direct physical media 244 plus a link layer protocol. This case also includes emulated 245 physical links, for example using pseudowire emulation. 247 2. The second component link is a TE tunnel that traverses LSR3 and 248 LSR4, where LSR3 and LSR4 are the nodes supporting MPLS, but 249 supporting few or no GMPLS extensions. 251 3. The third component link is formed by lower layer network that 252 has GMPLS enabled. In this case, LSR5 and LSR6 are not the nodes 253 controlled by the MPLS but provide the connectivity for the 254 component link. 256 A composite link forms one logical link between connected LSR (LSR1 257 and LSR2 in Figure 1 and Figure 2) and is used to carry aggregated 258 traffic [I-D.ietf-rtgwg-cl-requirement]. Composite link relies on 259 its component links to carry the traffic over the composite link. 260 The endpoints of the composite link maps incoming traffic into the 261 set of component links. 263 For example, LSR1 in Figure 1 distributes the set of traffic flows 264 including control plane packets among the set of component links. 265 LSR2 in Figure 1 receives the packets from its component links and 266 sends them to MPLS forwarding engine with no attempt to reorder 267 packets arriving on different component links. The traffic in the 268 opposite direction, from LSR2 to LSR1, is distributed across the set 269 of component links by the LSR2. 271 These three forms of component link are a limited set of very simple 272 examples. Many other examples are possible. A component link may 273 itself be a composite link. A segment of an LSP (single hop for that 274 LSP) may be a composite link. 276 4. Delay Sensitive Applications 278 Most applications benefit from lower delay. Some types of 279 applications are far more sensitive than others. For example, real 280 time bidirectional applications such as voice communication or two 281 way video conferencing are far more sensitive to delay than 282 unidirectional streaming audio or video. Non-interactive bulk 283 transfer is almost insensitive to delay if a large enough TCP window 284 is used. 286 Some applications are sensitive to delay but users of those 287 applications are unwilling to pay extra to insure lower delay. For 288 example, many SIP end users are willing to accept the delay offered 289 to best effort services as long as call quality is good most of the 290 time. 292 Other applications are sensitive to delay and willing to pay extra to 293 insure lower delay. For example, financial trading applications are 294 extremely sensitive to delay and with a lot at stake are willing to 295 go to great lengths to reduce delay. 297 Among the requirements of Composite Link are requirements to 298 advertise capacity available within configured ranges of delay within 299 a given composite link and the support the ability to place an LSP 300 only on component links that meeting that LSP's delay requirements. 302 The Composite Link requirements to accommodate delay sensitive 303 applications are analogous to Diffserv requirements to accommodate 304 applications requiring higher quality of service on the same 305 infrastructure as applications with less demanding requirements. The 306 ability to share capacity with less demanding applications, with best 307 effort applications being the least demanding, can greatly reduce the 308 cost of delivering service to the more demanding applications. 310 5. Large Volume of IP and LDP Traffic 312 IP and LDP do not support traffic engineering. Both make use of a 313 shortest (lowest routing metric) path, with an option to use equal 314 cost multipath (ECMP). Note that though ECMP is prohibited in LDP 315 specifications, it is widely implemented. Where implemented for LDP, 316 ECMP is generally disabled by default for standards compliance, but 317 often enabled in LDP deployments. 319 Without traffic engineering capability, there must be sufficient 320 capacity to accommodate the IP and LDP traffic. If not, persistent 321 queuing delay and loss will occur. Unlike RSVP-TE, a subset of 322 traffic cannot be routed using constraint based routing to avoid a 323 congested portion of an infrastructure. 325 In existing networks which accommodate IP and/or LDP with RSVP-TE, 326 either the IP and LDP can be carried over RSVP-TE, or where the 327 traffic contribution of IP and LDP is small, IP and LDP can be 328 carried native and the effect on RSVP-TE can be ignored. Ignoring 329 the traffic contribution of IP is certainly valid on high capacity 330 networks where native IP is used primarily for control and network 331 management and customer IP is carried within RSVP-TE. 333 Where it is desirable to carry native IP and/or LDP and IP and/or LDP 334 traffic volumes are not negligible, RSVP-TE needs improvement. An 335 enhancement offered by Composite Link is an ability to measure the IP 336 and LDP, filter the measurements, and reduce the capacity available 337 to RSVP-TE to avoid congestion. The treatment given to the IP or LDP 338 traffic is similar to the treatment when using the "auto-bandwidth" 339 feature in some RSVP-TE implementations on that same traffic, and 340 giving a higher priority (numerically lower setup priority and 341 holding priority value) to the "auto-bandwidth" LSP. The difference 342 is that the measurement is made at each hop and the reduction in 343 advertised bandwidth is made more directly. 345 6. Composite Link and Packet Ordering 347 A strong motivation for Composite Link is the need to provide LSP 348 capacity in IP backbones that exceeds the capacity of single 349 wavelengths provided by transport equipment and exceeds the practical 350 capacity limits achievable through inverse multiplexing. Appendix C 351 describes characteristics and limitations of transport systems today. 352 Section 2 defines the terms "classic multipath" and "classic link 353 bundling" used in this section. 355 For purpose of discussion, consider two very large cities, city A and 356 city Z. For example, in the US high traffic cities might be New York 357 and Los Angeles and in Europe high traffic cities might be London and 358 Amsterdam. Two other high volume cities, city B and city Y may share 359 common provider core network infrastructure. Using the same 360 examples, the city B and Y may Washington DC and San Francisco or 361 Paris and Stockholm. In the US, the common infrastructure may span 362 Denver, Chicago, Detroit, and Cleveland. Other major traffic 363 contributors on either US coast include Boston, northern Virginia on 364 the east coast, and Seattle, and San Diego on the west coast. The 365 capacity of IP/MPLS links within the shared infrastructure, for 366 example city to city links in the Denver, Chicago, Detroit, and 367 Cleveland path in the US example, have capacities for most of the 368 2000s decade that greatly exceeded single circuits available in 369 transport networks. 371 For a case with four large traffic sources on either side of the 372 shared infrastructure, up to sixteen core city to core city traffic 373 flows in excess of transport circuit capacity may be accommodated on 374 the shared infrastructure. 376 Today the most common IP/MPLS core network design makes use of very 377 large links which consist of many smaller component links, but use 378 classic multipath techniques rather than classic link bundling or 379 Composite Link. A component link typically corresponds to the 380 largest circuit that the transport system is capable of providing (or 381 the largest cost effective circuit). IP source and destination 382 address hashing is used to distribute flows across the set of 383 component links as described in Appendix B.3. 385 Classic multipath can handle large LSP up to the total capacity of 386 the multipath (within limits, see Appendix B.2). A disadvantage of 387 classic multipath is the reordering among traffic within a given core 388 city to core city LSP. While there is no reordering within any 389 microflow and therefore no customer visible issue, MPLS-TP cannot be 390 used across an infrastructure where classic multipath is in use, 391 except within pseudowires. 393 These capacity issues force the use of classic multipath today. 394 Classic multipath excludes a direct use of MPLS-TP. The desire for 395 OAM, offered by MPLS-TP, is in conflict with the use of classic 396 multipath. There are a number of alternatives that satisfy both 397 requirements. Some alternatives are described below. 399 MPLS-TP in network edges only 401 A simple approach which requires no change to the core is to 402 disallow MPLS-TP across the core unless carried within a 403 pseudowire (PW). MPLS-TP may be used within edge domains where 404 classic multipath is not used. PW may be signaled end to end 405 using single segment PW (SS-PW), or stitched across domains using 406 multisegment PW (MS-PW). The PW and anything carried within the 407 PW may use OAM as long as fat-PW [RFC6391] load splitting is not 408 used by the PW. 410 Composite Link at core LSP ingress/egress 412 The interior of the core network may use classic link bundling, 413 with the limitation that no LSP can exceed the capacity of a 414 single circuit. Larger non-MPLS-TP LSP can be configured using 415 multiple ingress to egress component MPLS-TP LSP. This can be 416 accomplished using existing IP source and destination address 417 hashing configured at LSP ingress and egress, or using Composite 418 Link configured at ingress and egress. Each component LSP, if 419 constrained to be no larger than the capacity of a single 420 circuit. can make use of MPLS-TP and offer OAM for all top level 421 LSP across the core. 423 MPLS-TP as a MPLS client 425 A third approach involves modifying the behavior of LSR in the 426 interior of the network core, such that MPLS-TP can be used on a 427 subset of LSP, where the capacity of any one LSP within that 428 MPLS-TP subset of LSP is not larger than the capacity of a single 429 circuit. This requirement is accommodated through a combination 430 of signaling to indicate LSP for which traffic splitting needs to 431 be constrained, the ability to constrain the depth of the label 432 stack over which traffic splitting can be applied on a per LSP 433 basis, and the ability to constrain the use of IP addresses below 434 the label stack for traffic splitting also on a per LSP basis. 436 The above list of alternatives allow packet ordering within an LSP to 437 be maintained in some circumstances and allow very large LSP 438 capacities. Each of these alternatives are discussed further in the 439 following subsections. 441 6.1. MPLS-TP in network edges only 443 Classic MPLS link bundling is defined in [RFC4201] and has existed 444 since early in the 2000s decade. Classic MPLS link bundling place 445 any given LSP entirely on a single component link. Classic MPLS link 446 bundling is not in widespread use as the means to accommodate large 447 link capacities in core networks due to the simplicity and better 448 multiplexing gain, and therefore lower network cost of classic 449 multipath. 451 If MPLS-TP OAM capability in the IP/MPLS network core LSP is not 452 required, then there is no need to change existing network designs 453 which use classic multipath and both label stack and IP source and 454 destination address based hashing as a basis for load splitting. 456 If MPLS-TP is needed for a subset of LSP, then those LSP can be 457 carried within pseudowires. The pseudowires adds a thin layer of 458 encapsulation and therefore a small overhead. If only a subset of 459 LSP need MPLS-TP OAM, then some LSP must make use of the pseudowires 460 and other LSP avoid them. A straightforward way to accomplish this 461 is with administrative attributes [RFC3209]. 463 6.2. Composite Link at core LSP ingress/egress 465 Composite Link can be configured for large LSP that are made of 466 smaller MPLS-TP component LSP. This approach is capable of 467 supporting MPLS-TP OAM over the entire set of component link LSP and 468 therefore the entire set of top level LSP traversing the core. 470 There are two primary disadvantage of this approach. One is the 471 number of top level LSP traversing the core can be dramatically 472 increased. The other disadvantage is the loss of multiplexing gain 473 that results from use of classic link bundling within the interior of 474 the core network. 476 If component LSP use MPLS-TP, then no component LSP can exceed the 477 capacity of a single circuit. For a given composite LSP there can 478 either be a number of equal capacity component LSP or some number of 479 full capacity component links plus one LSP carrying the excess. For 480 example, a 350 Gb/s composite LSP over a 100 Gb/s infrastructure may 481 use five 70 Gb/s component LSP or three 100 Gb/s LSP plus one 50 Gb/s 482 LSP. Classic MPLS link bundling is needed to support MPLS-TP and 483 suffers from a bin packing problem even if LSP traffic is completely 484 predictable, which it never is in practice. 486 The common means of setting composite link bandwidth parameters uses 487 long term statistical measures. For example, many providers base 488 their LSP bandwidth parameters on the 95th percentile of carried 489 traffic as measured over a one week period. It is common to add 490 10-30% to the 95th percentile value measured over the prior week and 491 adjust bandwidth parameters of LSP weekly. It is also possible to 492 measure traffic flow at the LSR and adjust bandwidth parameters 493 somewhat more dynamically. This is less common in deployments and 494 where deployed, make use of filtering to track very long term trends 495 in traffic levels. In either case, short term variation of traffic 496 levels relative to signaled LSP capacity are common. Allowing a 497 large over allocation of LSP bandwidth parameters (ie: adding 30% or 498 more) avoids over utilization of any given LSP, but increases unused 499 network capacity and increases network cost. Allowing a small over 500 allocation of LSP bandwidth parameters (ie: 10-20% or less) results 501 in both underutilization and over utilization but statistically 502 results in a total utilization within the core that is under capacity 503 most or all of the time. 505 The classic multipath solution accommodates the situation in which 506 some composite LSP are under utilizing their signaled capacity and 507 others are over utilizing their capacity with the need for far less 508 unused network capacity to accommodate variation in actual traffic 509 levels. If the actual traffic levels of LSP can be described by a 510 probability distribution, the variation of the sum of LSP is less 511 than the variation of any given LSP for all but a constant traffic 512 level (where the variation of the sum and the components are both 513 zero). 515 There are two situations which can motivate the use of this approach. 516 This design is favored if the provider values MPLS-TP OAM across the 517 core more than efficiency (or is unaware of the efficiency issue). 518 This design can also make sense if transport equipment or very low 519 cost core LSR are available which support only classic link bundling 520 and regardless of loss of multiplexing gain, are more cost effective 521 at carrying transit traffic than using equipment which supports IP 522 source and destination address hashing. 524 6.3. MPLS-TP as a MPLS client 526 Accommodating MPLS-TP as a MPLS client requires a small change to 527 forwarding behavior and is therefore most applicable to major network 528 overbuilds or new deployments. This approach is described in 529 [I-D.ietf-mpls-multipath-use] and makes use of Entropy Labels 530 [RFC6790]. 532 The advantage of this approach is an ability to accommodate MPLS-TP 533 as a client LSP but retain the high multiplexing gain and therefore 534 efficiency and low network cost of a pure MPLS deployment. The 535 disadvantage is the need for a small change in forwarding. 537 7. IANA Considerations 539 This memo includes no request to IANA. 541 8. Security Considerations 543 This document is a use cases document. Existing protocols are 544 referenced such as MPLS. Existing techniques such as MPLS link 545 bundling and multipath techniques are referenced. These protocols 546 and techniques are documented elsewhere and contain security 547 considerations which are unchanged by this document. 549 This document also describes use cases for Composite Link. Composite 550 Link requirements are defined in [I-D.ietf-rtgwg-cl-requirement]. 551 [I-D.ietf-rtgwg-cl-framework] defines a framework for Composite Link. 553 Composite Link bears many similarities to MPLS link bundling and 554 multipath techniques used with MPLS. Additional security 555 considerations, if any, beyond those already identified for MPLS, 556 MPLS link bundling and multipath techniques, will be documented in 557 the framework document if specific to the overall framework of 558 Composite Link, or in protocol extensions if specific to a given 559 protocol extension defined later to support Composite Link. 561 9. Acknowledgments 563 In the interest of full disclosure of affiliation and in the interest 564 of acknowledging sponsorship, past affiliations of authors are noted. 565 Much of the work done by Ning So occurred while Ning was at Verizon. 566 Much of the work done by Curtis Villamizar occurred while at 567 Infinera. Infinera continues to sponsor this work on a consulting 568 basis. 570 10. Informative References 572 [I-D.ietf-mpls-multipath-use] 573 Villamizar, C., "Use of Multipath with MPLS-TP and MPLS", 574 draft-ietf-mpls-multipath-use-00 (work in progress), 575 February 2013. 577 [I-D.ietf-rtgwg-cl-framework] 578 Ning, S., McDysan, D., Osborne, E., Yong, L., and C. 579 Villamizar, "Composite Link Framework in Multi Protocol 580 Label Switching (MPLS)", draft-ietf-rtgwg-cl-framework-02 581 (work in progress), October 2012. 583 [I-D.ietf-rtgwg-cl-requirement] 584 Villamizar, C., McDysan, D., Ning, S., Malis, A., and L. 585 Yong, "Requirements for Composite Links in MPLS Networks", 586 draft-ietf-rtgwg-cl-requirement-10 (work in progress), 587 March 2013. 589 [IEEE-802.1AX] 590 IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE 591 Standard for Local and Metropolitan Area Networks - Link 592 Aggregation", 2006, . 595 [ITU-T.G.694.2] 596 ITU-T, "Spectral grids for WDM applications: CWDM 597 wavelength grid", 2003, 598 . 600 [ITU-T.G.800] 601 ITU-T, "Unified functional architecture of transport 602 networks", 2007, 603 . 605 [ITU-T.Y.1540] 606 ITU-T, "Internet protocol data communication service - IP 607 packet transfer and availability performance parameters", 608 2007, . 610 [ITU-T.Y.1541] 611 ITU-T, "Network performance objectives for IP-based 612 services", 2006, . 614 [RFC1717] Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The 615 PPP Multilink Protocol (MP)", RFC 1717, November 1994. 617 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 618 and W. Weiss, "An Architecture for Differentiated 619 Services", RFC 2475, December 1998. 621 [RFC2597] Heinanen, J., Baker, F., Weiss, W., and J. Wroclawski, 622 "Assured Forwarding PHB Group", RFC 2597, June 1999. 624 [RFC2615] Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615, 625 June 1999. 627 [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and 628 Multicast Next-Hop Selection", RFC 2991, November 2000. 630 [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path 631 Algorithm", RFC 2992, November 2000. 633 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 634 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 635 Tunnels", RFC 3209, December 2001. 637 [RFC3260] Grossman, D., "New Terminology and Clarifications for 638 Diffserv", RFC 3260, April 2002. 640 [RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen, 641 P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi- 642 Protocol Label Switching (MPLS) Support of Differentiated 643 Services", RFC 3270, May 2002. 645 [RFC3809] Nagarajan, A., "Generic Requirements for Provider 646 Provisioned Virtual Private Networks (PPVPN)", RFC 3809, 647 June 2004. 649 [RFC3945] Mannie, E., "Generalized Multi-Protocol Label Switching 650 (GMPLS) Architecture", RFC 3945, October 2004. 652 [RFC4124] Le Faucheur, F., "Protocol Extensions for Support of 653 Diffserv-aware MPLS Traffic Engineering", RFC 4124, 654 June 2005. 656 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 657 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 659 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 660 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 661 Use over an MPLS PSN", RFC 4385, February 2006. 663 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 664 Cost Multipath Treatment in MPLS Networks", BCP 128, 665 RFC 4928, June 2007. 667 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 668 Specification", RFC 5036, October 2007. 670 [RFC5586] Bocci, M., Vigoureux, M., and S. Bryant, "MPLS Generic 671 Associated Channel", RFC 5586, June 2009. 673 [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 674 J., and S. Amante, "Flow-Aware Transport of Pseudowires 675 over an MPLS Packet Switched Network", RFC 6391, 676 November 2011. 678 [RFC6790] Kompella, K., Drake, J., Amante, S., Henderickx, W., and 679 L. Yong, "The Use of Entropy Labels in MPLS Forwarding", 680 RFC 6790, November 2012. 682 Appendix A. More Details on Existing Network Operator Practices and 683 Protocol Usage 685 Often, network operators have a contractual Service Level Agreement 686 (SLA) with customers for services that are comprised of numerical 687 values for performance measures, principally availability, latency, 688 delay variation. Additionally, network operators may have Service 689 Level Specification (SLS) that is for internal use by the operator. 690 See [ITU-T.Y.1540], [ITU-T.Y.1541], RFC3809, Section 4.9 [RFC3809] 691 for examples of the form of such SLA and SLS specifications. In this 692 document we use the term Network Performance Objective (NPO) as 693 defined in section 5 of [ITU-T.Y.1541] since the SLA and SLS measures 694 have network operator and service specific implications. Note that 695 the numerical NPO values of Y.1540 and Y.1541 span multiple networks 696 and may be looser than network operator SLA or SLS objectives. 697 Applications and acceptable user experience have an important 698 relationship to these performance parameters. 700 Consider latency as an example. In some cases, minimizing latency 701 relates directly to the best customer experience (e.g., in TCP closer 702 is faster). In other cases, user experience is relatively 703 insensitive to latency, up to a specific limit at which point user 704 perception of quality degrades significantly (e.g., interactive human 705 voice and multimedia conferencing). A number of NPOs have. a bound 706 on point-to-point latency, and as long as this bound is met, the NPO 707 is met -- decreasing the latency is not necessary. In some NPOs, if 708 the specified latency is not met, the user considers the service as 709 unavailable. An unprotected LSP can be manually provisioned on a set 710 of links to meet this type of NPO, but this lowers availability since 711 an alternate route that meets the latency NPO cannot be determined. 713 Historically, when an IP/MPLS network was operated over a lower layer 714 circuit switched network (e.g., SONET rings), a change in latency 715 caused by the lower layer network (e.g., due to a maintenance action 716 or failure) was not known to the MPLS network. This resulted in 717 latency affecting end user experience, sometimes violating NPOs or 718 resulting in user complaints. 720 A response to this problem was to provision IP/MPLS networks over 721 unprotected circuits and set the metric and/or TE-metric proportional 722 to latency. This resulted in traffic being directed over the least 723 latency path, even if this was not needed to meet an NPO or meet user 724 experience objectives. This results in reduced flexibility and 725 increased cost for network operators. Using lower layer networks to 726 provide restoration and grooming is expected to be more efficient, 727 but the inability to communicate performance parameters, in 728 particular latency, from the lower layer network to the higher layer 729 network is an important problem to be solved before this can be done. 731 Latency NPOs for point-to-point services are often tied closely to 732 geographic locations, while latency for multipoint services may be 733 based upon a worst case within a region. 735 Section 7 of [ITU-T.Y.1540] defines availability for an IP service in 736 terms of loss exceeding a threshold for a period on the order of 5 737 minutes. However, the time frames for restoration (i.e., as 738 implemented by predetermined protection, convergence of routing 739 protocols and/or signaling) for services range from on the order of 740 100 ms or less (e.g., for VPWS to emulate classical SDH/SONET 741 protection switching), to several minutes (e.g., to allow BGP to 742 reconverge for L3VPN) and may differ among the set of customers 743 within a single service. 745 The presence of only three Traffic Class (TC) bits (previously known 746 as EXP bits) in the MPLS shim header is limiting when a network 747 operator needs to support QoS classes for multiple services (e.g., 748 L2VPN VPWS, VPLS, L3VPN and Internet), each of which has a set of QoS 749 classes that need to be supported and where the operator prefers to 750 use only E-LSP [RFC3270]. In some cases one bit is used to indicate 751 conformance to some ingress traffic classification, leaving only two 752 bits for indicating the service QoS classes. One approach that has 753 been taken is to aggregate these QoS classes into similar sets on 754 LER-LSR and LSR-LSR links and continue to use only E-LSP. Another 755 approach is to use L-LSP as defined in [RFC3270] or use the Class- 756 Type as defined in [RFC4124] to support up to eight mappings of TC 757 into Per-Hop Behavior (PHB). 759 Labeled LSPs and use of link layer encapsulation have been 760 standardized in order to provide a means to meet these needs. 762 The IP DSCP cannot be used for flow identification. The use of IP 763 DSCP for flow identification is incompatible with Assured Forwarding 764 services [RFC2597] or any other service which may use more than one 765 DSCP code point to carry traffic for a given microflow. In general 766 network operators do not rely on the DSCP of Internet packets in core 767 networks but must preserve DSCP values for use closer to network 768 edges. 770 A label is pushed onto Internet packets when they are carried along 771 with L2/L3VPN packets on the same link or lower layer network 772 provides a mean to distinguish between the QoS class for these 773 packets. 775 Operating an MPLS-TE network involves a different paradigm from 776 operating an IGP metric-based LDP signaled MPLS network. The 777 multipoint-to-point LDP signaled MPLS LSPs occur automatically, and 778 balancing across parallel links occurs if the IGP metrics are set 779 "equally" (with equality a locally definable relation). 781 Traffic is typically comprised of a few large (some very large) flows 782 and many small flows. In some cases, separate LSPs are established 783 for very large flows. This can occur even if the IP header 784 information is inspected by a LSR, for example an IPsec tunnel that 785 carries a large amount of traffic. An important example of large 786 flows is that of a L2/L3 VPN customer who has an access line 787 bandwidth comparable to a client-client composite link bandwidth -- 788 there could be flows that are on the order of the access line 789 bandwidth. 791 Appendix B. Existing Multipath Standards and Techniques 793 Today the requirement to handle large aggregations of traffic, much 794 larger than a single component link, can be handled by a number of 795 techniques which we will collectively call multipath. Multipath 796 applied to parallel links between the same set of nodes includes 797 Ethernet Link Aggregation [IEEE-802.1AX], link bundling [RFC4201], or 798 other aggregation techniques some of which may be vendor specific. 799 Multipath applied to diverse paths rather than parallel links 800 includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS, or 801 even BGP, and equal cost LSP, as described in Appendix B.4. Various 802 multipath techniques have strengths and weaknesses. 804 the term Composite Link is more general than terms such as Link 805 Aggregation which is generally considered to be specific to Ethernet 806 and its use here is consistent with the broad definition in 807 [ITU-T.G.800]. The term multipath excludes inverse multiplexing and 808 refers to techniques which only solve the problem of large 809 aggregations of traffic, without addressing the other requirements 810 outlined in this document, particularly those described in Section 4 811 and Section 5. 813 B.1. Common Multpath Load Spliting Techniques 815 Identical load balancing techniques are used for multipath both over 816 parallel links and over diverse paths. 818 Large aggregates of IP traffic do not provide explicit signaling to 819 indicate the expected traffic loads. Large aggregates of MPLS 820 traffic are carried in MPLS tunnels supported by MPLS LSP. LSP which 821 are signaled using RSVP-TE extensions do provide explicit signaling 822 which includes the expected traffic load for the aggregate. LSP 823 which are signaled using LDP do not provide an expected traffic load. 825 MPLS LSP may contain other MPLS LSP arranged hierarchically. When an 826 MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as 827 payload, there is no signaling associated with these inner LSP. 828 Therefore even when using RSVP-TE signaling there may be insufficient 829 information provided by signaling to adequately distribute load based 830 solely on signaling. 832 Generally a set of label stack entries that is unique across the 833 ordered set of label numbers in the label stack can safely be assumed 834 to contain a group of flows. The reordering of traffic can therefore 835 be considered to be acceptable unless reordering occurs within 836 traffic containing a common unique set of label stack entries. 837 Existing load splitting techniques take advantage of this property in 838 addition to looking beyond the bottom of the label stack and 839 determining if the payload is IPv4 or IPv6 to load balance traffic 840 accordingly. 842 MPLS-TP OAM violates the assumption that it is safe to reorder 843 traffic within an LSP. If MPLS-TP OAM is to be accommodated, then 844 existing multipath techniques must be modified. Such modifications 845 are outside the scope of this document. 847 For example,a large aggregate of IP traffic may be subdivided into a 848 large number of groups of flows using a hash on the IP source and 849 destination addresses. This is as described in [RFC2475] and 850 clarified in [RFC3260]. For MPLS traffic carrying IP, a similar hash 851 can be performed on the set of labels in the label stack. These 852 techniques are both examples of means to subdivide traffic into 853 groups of flows for the purpose of load balancing traffic across 854 aggregated link capacity. The means of identifying a set of flows 855 should not be confused with the definition of a flow. 857 Discussion of whether a hash based approach provides a sufficiently 858 even load balance using any particular hashing algorithm or method of 859 distributing traffic across a set of component links is outside of 860 the scope of this document. 862 The current load balancing techniques are referenced in [RFC4385] and 863 [RFC4928]. The use of three hash based approaches are described in 864 [RFC2991] and [RFC2992]. A mechanism to identify flows within PW is 865 described in [RFC6391]. The use of hash based approaches is 866 mentioned as an example of an existing set of techniques to 867 distribute traffic over a set of component links. Other techniques 868 are not precluded. 870 B.2. Static and Dynamic Load Balancing Multipath 872 Static multipath generally relies on the mathematical probability 873 that given a very large number of small microflows, these microflows 874 will tend to be distributed evenly across a hash space. Early very 875 static multipath implementations assumed that all component links are 876 of equal capacity and perform a modulo operation across the hashed 877 value. An alternate static multipath technique uses a table 878 generally with a power of two size, and distributes the table entries 879 proportionally among component links according to the capacity of 880 each component link. 882 Static load balancing works well if there are a very large number of 883 small microflows (i.e., microflow rate is much less than component 884 link capacity). However, the case where there are even a few large 885 microflows is not handled well by static load balancing. 887 A dynamic load balancing multipath technique is one where the traffic 888 bound to each component link is measured and the load split is 889 adjusted accordingly. As long as the adjustment is done within a 890 single network element, then no protocol extensions are required and 891 there are no interoperability issues. 893 Note that if the load balancing algorithm and/or its parameters is 894 adjusted, then packets in some flows may be briefly delivered out of 895 sequence, however in practice such adjustments can be made very 896 infrequent. 898 B.3. Traffic Split over Parallel Links 900 The load splitting techniques defined in Appendix B.1 and 901 Appendix B.2 are both used in splitting traffic over parallel links 902 between the same pair of nodes. The best known technique, though far 903 from being the first, is Ethernet Link Aggregation [IEEE-802.1AX]. 904 This same technique had been applied much earlier using OSPF or ISIS 905 Equal Cost MultiPath (ECMP) over parallel links between the same 906 nodes. Multilink PPP [RFC1717] uses a technique that provides 907 inverse multiplexing, however a number of vendors had provided 908 proprietary extensions to PPP over SONET/SDH [RFC2615] that predated 909 Ethernet Link Aggregation but are no longer used. 911 Link bundling [RFC4201] provides yet another means of handling 912 parallel LSP. RFC4201 explicitly allow a special value of all ones 913 to indicate a split across all members of the bundle. This "all 914 ones" component link is signaled in the MPLS RESV to indicate that 915 the link bundle is making use of classic multipath techniques. 917 B.4. Traffic Split over Multiple Paths 919 OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of 920 traffic split over multiple paths that may traverse intermediate 921 nodes. ECMP is often incorrectly equated to only this case, and 922 multipath over multiple diverse paths is often incorrectly equated to 923 ECMP. 925 Many implementations are able to create more than one LSP between a 926 pair of nodes, where these LSP are routed diversely to better make 927 use of available capacity. The load on these LSP can be distributed 928 proportionally to the reserved bandwidth of the LSP. These multiple 929 LSP may be advertised as a single PSC FA and any LSP making use of 930 the FA may be split over these multiple LSP. 932 Link bundling [RFC4201] component links may themselves be LSP. When 933 this technique is used, any LSP which specifies the link bundle may 934 be split across the multiple paths of the LSP that comprise the 935 bundle. 937 Appendix C. Characteristics of Transport in Core Networks 939 The characteristics of primary interest are the capacity of a single 940 circuit and the use of wave division multiplexing (WDM) to provide a 941 large number of parallel circuits. 943 Wave division multiplexing (WDM) supports multiple independent 944 channels (independent ignoring crosstalk noise) at slightly different 945 wavelengths of light, multiplexed onto a single fiber. Typical in 946 the early 2000s was 40 wavelengths of 10 Gb/s capacity per 947 wavelength. These wavelengths are in the C-band range, which is 948 about 1530-1565 nm, though some work has been done using the L-band 949 1565-1625 nm. 951 The C-band has been carved up using a 100 GHz spacing from 191.7 THz 952 to 196.1 THz by [ITU-T.G.694.2]. This yields 44 channels. If the 953 outermost channels are not used, due to poorer transmission 954 characteristics, then typically 40 are used. For practical reasons, 955 a 50 GhZ or 25 GHz spacing is used by more recent equipment, 956 yielding. 80 or 160 channels in practice. 958 The early optical modulation techniques used within a single channel 959 yielded 2.5Gb/s and 10 Gb/s capacity per channel. As modulation 960 techniques have improved 40 Gb/s and 100 Gb/s per channel have been 961 achieved. 963 The 40 channels of 10 Gb/s common in the mid 2000s yields a total of 964 400 Gb/s. Tighter spacing and better modulations are yielding up to 965 8 Tb/s or more in more recent systems. 967 Over the optical is an electrical encoding. In the 1990s this was 968 typically Synchronous Optical Networking (SONET) or Synchronous 969 Digital Hierarchy (SDH), with a maximum defined circuit capacity of 970 40 Gb/s (OC-768), though the 10 Gb/s OC-192 is more common. More 971 recently the low level electrical encoding has been Optical Transport 972 Network (OTN) defined by ITU-T. OTN currently defines circuit 973 capacities up to a nominal 100 Gb/s (ODU4). Both SONET/SDH and OTN 974 make use of time division multiplexing (TDM) where the a higher 975 capacity circuit such as a 100 Gb/s ODU4 in OTN may be subdivided 976 into lower fixed capacity circuits such as ten 10 Gb/s ODU2. 978 In the 1990s, all IP and later IP/MPLS networks either used a 979 fraction of maximum circuit capacity, or at most the full circuit 980 capacity toward the end of the decade, when full circuit capacity was 981 2.5 Gb/s or 10 Gb/s. Beyond 2000, the TDM circuit multiplexing 982 capability of SONET/SDH or OTN was rarely used. 984 Early in the 2000s both transport equipment and core LSR offered 40 985 Gb/s SONET OC-768. However 10 Gb/s transport equipment was 986 predominantly deployed throughout the decade, partially because LSR 987 10GbE ports were far more cost effective than either OC-192 or OC-768 988 and became practical in the second half of the decade. 990 Entering the 2010 decade, LSR 40GbE and 100GbE are expected to become 991 widely available and cost effective. Slightly preceding this 992 transport equipment making use of 40 Gb/s and 100 Gb/s modulations 993 are becoming available. This transport equipment is capable or 994 carrying 40 Gb/s ODU3 and 100 Gb/s ODU4 circuits. 996 Early in the 2000s decade IP/MPLS core networks were making use of 997 single 10 Gb/s circuits. Capacity grew quickly in the first half of 998 the decade but more IP/MPLS core networks had only a small number of 999 IP/MPLS links requiring 4-8 parallel 10 Gb/s circuits. However, the 1000 use of multipath was necessary, was deemed the simplest and most cost 1001 effective alternative, and became thoroughly entrenched. By the end 1002 of the 2000s decade nearly all major IP/MPLS core service provider 1003 networks and a few content provider networks had IP/MPLS links which 1004 exceeded 100 Gb/s, long before 40GbE was available and 40 Gb/s 1005 transport in widespread use. 1007 It is less clear when IP/MPLS LSP exceeded 10 Gb/s, 40 Gb/s, and 100 1008 Gb/s. By 2010, many service providers have LSP in excess of 100 1009 Gb/s, but few are willing to disclose how many LSP have reached this 1010 capacity. 1012 At the time of writing 40GbE and 100GbE LSR products are being 1013 evaluated by service providers and contect providers and are in use 1014 in network trials. The cost of components required to deliver 100GbE 1015 products remains high making these products less cost effective. 1016 This is expected to change within years. 1018 The important point is that IP/MPLS core network links have long ago 1019 exceeded 100 Gb/s and a small number of IP/MPLS LSP exceed 100 Gb/s. 1020 By the time 100 Gb/s circuits are widely deployed, IP/MPLS core 1021 network links are likely to exceed 1 Tb/s and many IP/MPLS LSP 1022 capacities are likely to exceed 100 Gb/s. Therefore multipath 1023 techniques are likely here to stay. 1025 Authors' Addresses 1027 So Ning 1028 Tata Communications 1030 Email: ning.so@tatacommunications.com 1032 Andrew Malis 1033 Verizon 1034 60 Sylvan Road 1035 Waltham, MA 02451 1037 Phone: +1 781-466-2362 1038 Email: andrew.g.malis@verizon.com 1040 Dave McDysan 1041 Verizon 1042 22001 Loudoun County PKWY 1043 Ashburn, VA 20147 1045 Email: dave.mcdysan@verizon.com 1047 Lucy Yong 1048 Huawei USA 1049 5340 Legacy Dr. 1050 Plano, TX 75025 1052 Phone: +1 469-277-5837 1053 Email: lucy.yong@huawei.com 1055 Curtis Villamizar 1056 Outer Cape Cod Network Consulting 1058 Email: curtis@occnc.com