idnits 2.17.00 (12 Aug 2021) /tmp/idnits34536/draft-ietf-pwe3-fat-pw-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 6, 2011) is 3965 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4379 (Obsoleted by RFC 8029) ** Obsolete normative reference: RFC 4447 (Obsoleted by RFC 8077) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 PWE3 S. Bryant, Ed. 3 Internet-Draft C. Filsfils 4 Intended status: Standards Track Cisco Systems 5 Expires: January 7, 2012 U. Drafz 6 Deutsche Telekom 7 V. Kompella 8 J. Regan 9 Alcatel-Lucent 10 S. Amante 11 Level 3 Communications 12 July 6, 2011 14 Flow Aware Transport of Pseudowires over an MPLS Packet Switched Network 15 draft-ietf-pwe3-fat-pw-07 17 Abstract 19 Where the payload of a pseudowire comprises a number of distinct 20 flows, it can be desirable to carry those flows over the equal cost 21 multiple paths (ECMPs) that exist in the packet switched network. 22 Most forwarding engines are able to generate a hash of the MPLS label 23 stack and use this mechanism to balance MPLS flows over ECMPs. 25 This document describes a method of identifying the flows, or flow 26 groups, within pseudowires such that Label Switching Routers can 27 balance flows at a finer granularity than individual pseudowires. 28 The mechanism uses an additional label in the MPLS label stack. 30 Requirements Language 32 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 33 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 34 document are to be interpreted as described in RFC2119 [RFC2119]. 36 Status of this Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at http://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on January 7, 2012. 53 Copyright Notice 55 Copyright (c) 2011 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 1.1. ECMP in Label Switching Routers . . . . . . . . . . . . . 5 72 1.2. Flow Label . . . . . . . . . . . . . . . . . . . . . . . . 5 73 2. Native Service Processing Function . . . . . . . . . . . . . . 6 74 3. Pseudowire Forwarder . . . . . . . . . . . . . . . . . . . . . 6 75 3.1. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 7 76 4. Signaling the Presence of the Flow Label . . . . . . . . . . . 8 77 4.1. Structure of Flow Label Sub-TLV . . . . . . . . . . . . . 9 78 5. Static Pseudowires . . . . . . . . . . . . . . . . . . . . . . 9 79 6. Multi-Segment Pseudowires . . . . . . . . . . . . . . . . . . 10 80 7. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 81 8. Applicability of PWs using Flow Labels . . . . . . . . . . . . 11 82 8.1. Equal Cost Multiple Paths . . . . . . . . . . . . . . . . 12 83 8.2. Link Aggregation Groups . . . . . . . . . . . . . . . . . 13 84 8.3. Multiple RSVP-TE Paths . . . . . . . . . . . . . . . . . . 13 85 8.4. The Single Large Flow Case . . . . . . . . . . . . . . . . 14 86 8.5. Applicability to MPLS-TP . . . . . . . . . . . . . . . . . 15 87 8.6. Asymmetric Operation . . . . . . . . . . . . . . . . . . . 15 88 9. Applicability to MPLS LSPs . . . . . . . . . . . . . . . . . . 15 89 10. Security Considerations . . . . . . . . . . . . . . . . . . . 16 90 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 91 12. Congestion Considerations . . . . . . . . . . . . . . . . . . 16 92 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 93 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 94 14.1. Normative References . . . . . . . . . . . . . . . . . . . 17 95 14.2. Informative References . . . . . . . . . . . . . . . . . . 18 96 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 98 1. Introduction 100 A pseudowire (PW) [RFC3985] is normally transported over one single 101 network path, even if multiple Equal Cost Multiple Paths (ECMP) exist 102 between the ingress and egress PW provider edge (PE) 103 equipment[RFC4385] [RFC4928]. This is required to preserve the 104 characteristics of the emulated service (e.g. to avoid misordering 105 SAToP PW packets [RFC4553] or subjecting the packets to unusable 106 inter-arrival times). The use of a single path to preserve order 107 remains the default mode of operation of a PW. The new capability 108 proposed in this document is an OPTIONAL mode which may be used when 109 the use of ECMP is known to be beneficial (and not harmful) to the 110 operation of the PW. 112 Some PWs are used to transport large volumes of IP traffic between 113 routers. One example of this is the use of an Ethernet PW to create 114 a virtual direct link between a pair of routers. Such PWs may carry 115 from hundreds of Mbps to Gbps of traffic. These PWs only require 116 packet ordering to be preserved within the context of each individual 117 transported IP flow. They do not require packet ordering to be 118 preserved between all packets of all IP flows within the pseudowire. 120 The ability to explicitly configure such a PW to leverage the 121 availability of multiple ECMPs allows for better capacity planning as 122 the statistical multiplexing of a larger number of smaller flows is 123 more efficient than with a smaller set of larger flows. 125 Typically, forwarding hardware can deduce that an IP payload is being 126 directly carried by an MPLS label stack, and it is capable of looking 127 at some fields in packets to construct hash buckets for conversations 128 or flows. However, when the MPLS payload is a PW, an intermediate 129 node has no information on the type of PW being carried in the 130 packet. This limits the forwarder at the intermediate node to only 131 being able to make an ECMP choice based on a hash of the MPLS label 132 stack. In the case of a PW emulating a high bandwidth trunk, the 133 granularity obtained by hashing the label stack is inadequate for 134 satisfactory load-balancing. The ingress node, however, is in the 135 special position of being able to look at the un-encapsulated packet 136 and spread flows amongst any available ECMPs, or even any Loop-Free 137 Alternates [RFC5286]. This document defines a method to introduce 138 granularity on the hashing of traffic running over PWs by introducing 139 an additional label, chosen by the ingress node, and placed at the 140 bottom of the label stack. 142 In addition to providing an indication of the flow structure for use 143 in ECMP forwarding decisions, the mechanism described in the document 144 may also be used to select flows for distribution over an 802.1ad 145 link aggregation group that has been used in an MPLS network. 147 NOTE: Although Ethernet is frequently referenced as a use case in 148 this RFC, the mechanisms described in this document are general 149 mechanisms that may be applied to any PW type in which there are 150 identifiable flows, and in which there is no requirement to preserve 151 the order between those flows. 153 1.1. ECMP in Label Switching Routers 155 Label switching routers (LSRs) commonly generate a hash of the label 156 stack or some elements of the label stack as a method of 157 discriminating between flows, and use this to distribute those flows 158 over the available ECMPs that exist in the network. Since the label 159 at the bottom of stack is usually the label most closely associated 160 with the flow, this normally provides the greatest entropy, and hence 161 is usually included in the hash. This document describes a method of 162 adding an additional label stack entry (LSE) at the bottom of stack 163 in order to facilitate the load balancing of the flows within a PW 164 over the available ECMPs. A similar design for general MPLS use has 165 also been proposed [I-D.kompella-mpls-entropy-label], Section 9. 167 An alternative method of load balancing by creating a number of PWs 168 and distributing the flows amongst them was considered, but was 169 rejected because: 171 o It did not introduce as much entropy as can be introduced by 172 adding an additional LSE. 174 o It required additional PWs to be set up and maintained. 176 1.2. Flow Label 178 An additional LSE [RFC3032] is interposed between the PW LSE and the 179 control word, or if the control word is not present, between the PW 180 LSE and the PW payload. This additional LSE is called the flow LSE 181 and the label carried by the flow LSE is called the flow label. 182 Indivisible flows within the PW MUST be mapped to the same flow label 183 by the ingress PE. The flow label stimulates the correct ECMP load 184 balancing behaviour in the packet switched network (PSN). On receipt 185 of the PW packet at the egress PE (which knows a flow LSE is present) 186 the flow LSE is discarded without processing. 188 Note that the flow label MUST NOT be an MPLS reserved label (values 189 in the range 0..15) [RFC3032], but is otherwise unconstrained by the 190 protocol. 192 It is useful to give consideration to the choice of TTL value in the 193 flow LSE [RFC3032]. The flow LSE is at the bottom of label stack, 194 therefore, even when penultimate hop popping is employed, it will 195 always be will preceded by the PW label on arrival at the PE. If, 196 due to an error condition the flow LSE becomes top of stack it might 197 be examined as if it were a normal LSE, and the packet might then be 198 forwarded. This can be prevented by setting the flow LSE TTL to 1, 199 thereby forcing the packet to be discarded by the forwarder. Note 200 that this may be a departure from considerations that apply to the 201 general MPLS case. 203 This document does not define a use for the TC bits (formerly known 204 as the EXP bits) in the flow label. Future documents may define a 205 use for these bits, therefore implementations conforming to this 206 specification MUST set the TC bits to zero at the ingress and MUST 207 ignore them at the egress. 209 2. Native Service Processing Function 211 The Native Service Processing (NSP) function [RFC3985] is a component 212 of a PE that has knowledge of the structure of the emulated service 213 and is able to take action on the service outside the scope of the 214 PW. In this case it is required that the NSP in the ingress PE 215 identify flows, or groups of flows within the service, and indicate 216 the flow (group) identity of each packet as it is passed to the 217 pseudowire forwarder. As an example, where the PW type is an 218 Ethernet, the NSP might parse the ingress Ethernet traffic and 219 consider all of the IP traffic. This traffic could then be 220 categorised into flows by considering all traffic with the same 221 source and destination address pair to be a single indivisible flow. 222 Since this is an NSP function, by definition, the method used to 223 identify a flow is outside the scope of the PW design. Similarly, 224 since the NSP is internal to the PE, the method of flow indication to 225 the PW forwarder is outside the scope of this document. 227 3. Pseudowire Forwarder 229 The PW forwarder must be provided with a method of mapping flows to 230 load balanced paths. 232 The forwarder must generate a label for the flow or group of flows. 233 How the flow label values are determined is outside the scope of this 234 document, however the flow label allocated to a flow MUST NOT be an 235 MPLS reserved label and SHOULD remain constant for the life of the 236 flow. It is RECOMMENDED that the method chosen to generate the load 237 balancing labels introduces a high degree of entropy in their values, 238 to maximise the entropy presented to the ECMP selection mechanism in 239 the LSRs in the PSN, and hence distribute the flows as evenly as 240 possible over the available PSN ECMP. The forwarder at the ingress 241 PE prepends the PW control word (if applicable), and then pushes the 242 flow label, followed by the PW label. 244 NOTE: Although this document does not attempt to specify any hash 245 algorithms, it is suggested that any such algorithm should be based 246 on the assumption that there will be a high degree of entropy in the 247 values assigned to the load balancing labels. 249 The forwarder at the egress PE uses the pseudowire label to identify 250 the pseudowire. From the context associated with the pseudowire 251 label, the egress PE can determine whether a flow LSE is present. If 252 a flow LSE is present, it MUST be checked to determine whether it 253 carries a reserved label. If it is a reserved label the packet is 254 processed according to the rules associated with that reserved label, 255 otherwise the LSE is discarded. 257 All other PW forwarding operations are unmodified by the inclusion of 258 the flow LSE. 260 3.1. Encapsulation 262 The PWE3 Protocol Stack Reference Model modified to include flow LSE 263 is shown in Figure 1 below 265 +-------------+ +-------------+ 266 | Emulated | | Emulated | 267 | Ethernet | | Ethernet | 268 | (including | Emulated Service | (including | 269 | VLAN) |<==============================>| VLAN) | 270 | Services | | Services | 271 +-------------+ +-------------+ 272 | Flow | | Flow | 273 +-------------+ Pseudowire +-------------+ 274 |Demultiplexer|<==============================>|Demultiplexer| 275 +-------------+ +-------------+ 276 | PSN | PSN Tunnel | PSN | 277 | MPLS |<==============================>| MPLS | 278 +-------------+ +-------------+ 279 | Physical | | Physical | 280 +-----+-------+ +-----+-------+ 282 Figure 1: PWE3 Protocol Stack Reference Model 284 The encapsulation of a PW with a flow LSE is shown in Figure 2 below 285 +---------------------------+ 286 | | 287 | Payload | 288 | | n octets 289 | | 290 +---------------------------+ 291 | Optional Control Word | 4 octets 292 +---------------------------+ 293 | Flow LSE | 4 octets 294 +---------------------------+ 295 | PW LSE | 4 octets 296 +---------------------------+ 297 | MPLS Tunnel LSE (s) | n*4 octets (four octets per LSE) 298 +---------------------------+ 300 Figure 2: Encapsulation of a pseudowire with a pseudowire flow LSE 302 4. Signaling the Presence of the Flow Label 304 When using the signalling procedures in [RFC4447], a new Pseudowire 305 Interface Parameter Sub-TLV, the Flow Label Sub-TLV (FL Sub-TLV), is 306 used to synchronise the flow label states between the ingress and 307 egress PEs. 309 The absence of a FL Sub-TLV indicates that the PE is unable to 310 process flow labels. A PE that is using PW signalling and that does 311 not send a FL Sub-TLV MUST NOT include a flow label in the PW packet. 312 A PE that is using PW signalling and which does not receive a FL Sub- 313 TLV from its peer MUST NOT include a flow label in the PW packet. 314 This preserves backwards compatibility with existing PW 315 specifications. 317 A PE that wishes to send a flow label in a PW packet MUST include in 318 its label mapping message a FL Sub-TLV with T = 1 (see Section 4.1). 320 A PE that is willing to receive a flow label MUST include in its 321 label mapping message a FL Sub-TLV with R = 1 (see Section 4.1). 323 A PE that receives a label mapping message a FL Sub-TLV with R = 0 324 MUST NOT include a flow label in the PW packet. 326 Thus a PE sending a FL Sub-TLV with T = 1 and receiving a FL Sub-TLV 327 with R = 1 MUST include a flow label in the PW packet. Under all 328 other combinations of FL Sub-TLV signalling a PE MUST NOT include a 329 flow label in the PW packet. 331 The signalling procedures in [RFC4447] state that "Processing of the 332 interface parameters should continue when unknown interface 333 parameters are encountered, and they MUST be silently ignored." The 334 signalling procedure described here is therefore backwards compatible 335 with existing implementations. 337 Note that what is signalled is the desire to include the flow LSE in 338 the label stack. The value of the flow label is a local matter for 339 the ingress PE, and the label value itself is not signalled. 341 4.1. Structure of Flow Label Sub-TLV 343 The structure of the flow label TLV is shown in Figure 3. 345 0 1 2 3 346 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 347 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 348 | FL=0x17 | Length |T|R| Reserved | 349 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 351 Figure 3: Flow Label Sub-TLV 353 Where: 355 o FL (value 0x17) is the flow label sub-TLV identifier assigned by 356 IANA (seeSection 11 ). 358 o Length is the length of the TLV in octets and is 4. 360 o When T=1 the PE is requesting the ability to send a PW packet that 361 includes a flow label. When T=0, the PE is indicating that it 362 will not send a PW packet containing a flow label. 364 o When R=1 the PE is able to receive a PW packet with a flow label 365 present. When R=0 the PE is unable to receive a PW packet with 366 the flow label present. 368 o Reserved bits MUST be zero on transmit and MUST be ignored on 369 receive. 371 5. Static Pseudowires 373 If PWE3 signalling [RFC4447] is not in use for a PW, then whether the 374 flow label is used MUST be identically provisioned in both PEs at the 375 PW endpoints. If there is no provisioning support for this option, 376 the default behaviour is not to include the flow label. 378 6. Multi-Segment Pseudowires 380 The flow label mechanism described in this document works on multi- 381 segment PWs without requiring modification to the Switching PEs 382 (S-PEs). This is because the flow LSE is transparent to the label 383 swap operation, and because interface parameter Sub-TLV signalling is 384 transitive. 386 7. OAM 388 The following OAM considerations apply to this method of load 389 balancing. 391 Where the OAM is only to be used to perform a basic test that the PWs 392 have been configured at the PEs, VCCV [RFC5085] messages may be sent 393 using any load balance PW path, i.e. using any value for the flow 394 label. 396 Where it is required to verify that a pseudowire is fully functional 397 for all flows, VCCV [RFC5085] connection verification message MUST be 398 sent over each ECMP path to the pseudowire egress PE. This solution 399 may be difficult to achieve and scales poorly. Under these 400 circumstances, it may be sufficient to send VCCV messages using any 401 load balance pseudowire path because if a failure occurs within the 402 PSN the failure will normally be detected and repaired by the PSN. 403 That is, the PSN's Interior Gateway protocol (IGP) link/node failure 404 detection mechanism (loss of light, bidirectional forwarding 405 detection [RFC5880] or IGP hello detection), and the IGP convergence 406 will naturally modify the ECMP set of network paths between the 407 ingress and egress PE's. Hence the PW is only impacted during the 408 normal IGP convergence time. Note that this period may be reduced if 409 a fast re-route or fast convergence technology is deployed in the 410 network [RFC4090], [RFC5286]. 412 If the failure is related to the individual corruption of a Label 413 Forwarding Information database (LFIB) entry in a router, then only 414 the network path using that specific entry is impacted. If the PW is 415 load balanced over multiple network paths, then this failure can only 416 be detected if, by chance, the transported OAM flow is mapped onto 417 the impacted network path, or if all paths are tested. Since testing 418 all paths may present problems as noted above, other mechanisms to 419 detect this type of error may need to be developed, such as an LSP 420 self test technology. 422 To troubleshoot the MPLS PSN, including multiple paths, the 423 techniques described in [RFC4378] and [RFC4379] can be used. 425 Where the PW OAM is carried out of band (VCCV Type 2) [RFC5085] it is 426 necessary to insert an "MPLS Router Alert Label" in the label stack. 427 The resultant label stack is a follows: 429 +-------------------------------+ 430 | | 431 | VCCV Message | n octets 432 | | 433 +-------------------------------+ 434 | Optional Control Word | 4 octets 435 +-------------------------------+ 436 | Flow label | 4 octets 437 +-------------------------------+ 438 | PW label | 4 octets 439 +-------------------------------+ 440 | Router Alert label | 4 octets 441 +-------------------------------+ 442 | MPLS Tunnel label(s) | n*4 octets (four octets per label) 443 +-------------------------------+ 445 Figure 4: Use of Router Alert Label 447 Note that, depending on the number of labels hashed by the LSR, the 448 inclusion of the Router Alert label may cause the OAM packet to be 449 load balanced to a different path from that taken by the data packets 450 with identical Flow and PW labels. 452 8. Applicability of PWs using Flow Labels 454 A node within the PSN is not able to perform deep-packet-inspection 455 (DPI) of the PW as the PW technology is not self-describing: the 456 structure of the PW payload is only known to the ingress and egress 457 PE devices. The method proposed in this document provides a 458 statistical mitigation of the problem of load balance in those cases 459 where a PE is able to discern flows embedded in the traffic received 460 on the attachment circuit. 462 The methods described in this document are transparent to the PSN and 463 as such do not require any new capability from the PSN. 465 The requirement to load-balance over multiple PSN paths occurs when 466 the ratio between the PW access speed and the PSN's core link 467 bandwidth is large (e.g. >= 10%). ATM and FR are unlikely to meet 468 this property. Ethernet may have this property, and for that reason 469 this document focuses on Ethernet. Applications for other high- 470 access-bandwidth PW's (e.g. Fibre Channel) may be defined in the 471 future. 473 This design applies to MPLS PWs where it is meaningful to de- 474 construct the packets presented to the ingress PE into flows. The 475 mechanism described in this document promotes the distribution of 476 flows within the PW over different network paths. This in turn means 477 that whilst packets within a flow are delivered in order (subject to 478 normal IP delivery perturbations due to topology variation), order is 479 no longer maintained for all packets sent over the PW. It is not 480 proposed to associate a different sequence number with each flow. If 481 sequence number support is required the flow label mechanism MUST NOT 482 be used. 484 Where it is known that the traffic carried by the Ethernet PW is IP 485 the flows can be identified and mapped to an ECMP. Such methods 486 typically include hashing on the source and destination addresses, 487 the protocol ID and higher-layer flow-dependent fields such as TCP/ 488 UDP ports, L2TPv3 Session IDs etc. 490 Where it is known that the traffic carried by the Ethernet PW is 491 non-IP, techniques used for link bundling between Ethernet switches 492 may be reused. In this case however the latency distribution would 493 be larger than is found in the link bundle case. The acceptability 494 of the increased latency is for further study. Of particular 495 importance the Ethernet control frames SHOULD always be mapped to the 496 same PSN path to ensure in-order delivery. 498 8.1. Equal Cost Multiple Paths 500 ECMP in packet switched networks is statistical in nature. The 501 mapping of flows to a particular path does not take into account the 502 bandwidth of the flow being mapped or the current bandwidth usage of 503 the members of the ECMP set. This simplification works well when the 504 distribution of flows is evenly spread over the ECMP set and there 505 are a large number of flows that have low bandwidth relative to the 506 paths. The random allocation of a flow to a path provides a good 507 approximation to an even spread of flows, provided that polarisation 508 effects are avoided. The method defined in this document has the 509 same statistical properties as an IP PSN. 511 ECMP is a load-sharing mechanism that is based on sharing the load 512 over a number of layer 3 paths through the PSN. Often however 513 multiple links exist between a pair of LSRs that are considered by 514 the IGP to be a single link. These are known as link bundles. The 515 mechanism described in this document can also be used to distribute 516 the flows within a PW over the members of the link bundle by using 517 the flow label value to identify candidate flows. How that mapping 518 takes place is outside the scope of this specification. Similar 519 considerations apply to link aggregation groups. 521 There is no mechanism currently defined to indicate the bandwidths in 522 use by specific flows using the fields of the MPLS shim header. 523 Furthermore, since the semantics of the MPLS shim header are fully 524 defined in [RFC3032] and [RFC5462], those fields cannot be assigned 525 semantics to carry this information. This document does not define 526 any semantic for use in the TTL or TC fields of the label entry that 527 carries the flow label, but requires that the flow label itself be 528 selected with a high degree of entropy suggesting that the label 529 value should not be overloaded with additional meaning in any 530 subsequent specification. 532 A different type of load balancing is the desire to carry a PW over a 533 set of PSN links in which the bandwidth of members of the link set is 534 less than the bandwidth of the PW. Proposals to address this problem 535 have been made in the past[I-D.stein-pwe3-pwbonding]. Such a 536 mechanism can be considered complementary to this mechanism. 538 8.2. Link Aggregation Groups 540 A Link Aggregation Group (LAG) is used to bond together several 541 physical circuits between two adjacent nodes so they appear to 542 higher-layer protocols as a single, higher bandwidth "virtual" pipe. 543 These may co-exist in various parts of a given network. An advantage 544 of LAGs is that they reduce the number of routing and signalling 545 protocol adjacencies between devices, reducing control plane 546 processing overhead. As with ECMP, the key problem related to LAGs 547 is that due to inefficiencies in LAG load-distribution algorithms, a 548 particular component of a LAG may experience congestion. The 549 mechanism proposed here may be able to assist in producing a more 550 uniform flow distribution. 552 The same considerations requiring a flow to go over a single member 553 of an ECMP set apply to a member of a LAG. 555 8.3. Multiple RSVP-TE Paths 557 In some networks it is desirable for a Label Edge Router (LER) to be 558 able to load balance a PW across multiple RSVP-TE tunnels. The flow 559 label mechanism described in this document may be used to provide the 560 LER with the required flow information, and necessary entropy to 561 provide this type of load balancing. An example of such a case is 562 the use of the flow label mechanism in networks using a link bundle 563 with the all ones component [RFC4201]. 565 Methods by which the LER is configured to apply this type of ECMP is 566 outside the scope of this document. 568 8.4. The Single Large Flow Case 570 Clearly the operator should make sure that the service offered using 571 PW technology and the method described in this document does not 572 exceed the maximum planned link capacity, unless it can be guaranteed 573 that it conforms to the Internet traffic profile of a very large 574 number of small flows. 576 If the NSP cannot access sufficient information to distinguish flows, 577 perhaps because the protocol stack required parsing further into the 578 packet than it is able, then the functionality described in this 579 document does not give any benefits. The most common case where a 580 single flow dominates the traffic on a PW is when it is used to 581 transport enterprise traffic. Enterprise traffic may well consist of 582 a single, large TCP flow, or encrypted flows that cannot be handled 583 by the methods described in this document. 585 An operator has four options under these circumstances: 587 1. The operator can choose to do nothing and the system will work as 588 it does without the flow label. 590 2. The operator can make the customer aware that the service 591 offering has a restriction on flow bandwidth and police flows to 592 that restriction. This would allow customers offering multiple 593 flows to use a larger fraction their access bandwidth, whilst 594 preventing a single flow from consuming a fraction of internal 595 link bandwidth that the operator considered excessive. 597 3. The operator could configure the ingress PE to assign a constant 598 flow label to all high bandwidth flows so that only one path was 599 affected by these flows. 601 4. The operator could configure the ingress PE to assign a random 602 flow label to all high bandwidth flows so as to minimise the 603 disruption to the network as a cost of out of order traffic to 604 the user. 606 The issues described above are mitigated by the following two 607 factors: 609 o Firstly, the customer of a high-bandwidth PW service has an 610 incentive to get the best transport service because an inefficient 611 use of the PSN leads to jitter and eventually to loss to the PW's 612 payload. 614 o Secondly, the customer is usually able to tailor their 615 applications to generate many flows in the PSN. A well-known 616 example is massive data transport between servers which use many 617 parallel TCP sessions. This same technique can be used by any 618 transport protocol: multiple UDP ports, multiple L2TPv3 Session 619 ID's, multiple GRE keys may be used to decompose a large flow into 620 smaller components. This approach may be applied to IPsec 621 [RFC4301] where multiple Security Parameters Indexes (SPIs) may be 622 allocated to the same security association. 624 8.5. Applicability to MPLS-TP 626 The MPLS Transport Profile (MPLS-TP) [RFC5654] requirement 44 states 627 that "MPLS-TP MUST support mechanisms that ensure the integrity of 628 the transported customer's service traffic as required by its 629 associated SLA. Loss of integrity may be defined as packet 630 corruption, reordering, or loss during normal network conditions. " 631 In addition MPLS-TP makes extensive use of the fate sharing between 632 OAM and data packets, which is defeated by the flow LSE. The flow 633 aware transport of a PW reorders packets, therefore MUST NOT be 634 deployed in a network conforming to the MPLS-TP unless these 635 integrity requirements specified in the SLA can be satisfied. In a 637 8.6. Asymmetric Operation 639 The protocol defined in this document supports the asymmetric 640 inclusion of the flow LSE. Asymmetric operation can be expected when 641 there is asymmetry in the bandwidth requirements making it 642 unprofitable for one PE to perform the flow classification, or when 643 that PE is otherwise unable to perform the classification but is able 644 to receive flow labeled packet from its peer. Asymmetric operation 645 of the PW may also be required when one PE has a high transmission 646 bandwidth requirement, but has a need to receive the entire PW on a 647 single interface in order to perform a processing operation that 648 requires the context of the complete PW (for example policing of the 649 egress traffic). 651 9. Applicability to MPLS LSPs 653 An extension of this technique is to create a basis for hash 654 diversity without having to peek below the label stack for IP traffic 655 carried over LDP LSPs. The generalisation of this extension to MPLS 656 has been described in [I-D.kompella-mpls-entropy-label]. This 657 generalization can be regarded as a complementary, but distinct, 658 approach from the technique described in this document. While 659 similar consideration may apply to the identification of flows and 660 the allocation of flow label values, the flow labels are imposed by 661 different network components, and the associated signalling 662 mechanisms are different. 664 10. Security Considerations 666 The PW generic security considerations described in [RFC3985] and the 667 security considerations applicable to a specific PW type (for 668 example, in the case of an Ethernet PW [RFC4448] apply. The security 669 considerations in [RFC5920] also apply. 671 Section 1.2 describes considerations that apply to the TTL value used 672 in the flow LSE. The use of a TTL value of one prevents the 673 accidental forwarding of a packet based on the label value in the 674 flow LSE. 676 11. IANA Considerations 678 IANA is requested to amend the PW Interface Parameters Sub-TLV type 679 Registry value 0x17 (Flow Label indicator) to refer to this RFC. 681 Parameter Length Description 682 ID 684 0x17 4 Flow Label 686 12. Congestion Considerations 688 The congestion considerations applicable to PWs as described in 689 [RFC3985] and any additional congestion considerations developed at 690 the time of publication apply to this design. 692 The ability to explicitly configure a PW to leverage the availability 693 of multiple ECMPs is beneficial to capacity planning as, all other 694 parameters being constant, the statistical multiplexing of a larger 695 number of smaller flows is more efficient than with a smaller number 696 of larger flows. 698 Note that if the classification into flows is only performed on IP 699 packets the behaviour of those flows in the face of congestion will 700 be as already defined by the IETF for packets of that type and no 701 additional congestion processing is required. 703 Where flows that are not IP are classified PW congestion avoidance 704 must be applied to each non-IP load balance group. 706 13. Acknowledgements 708 The authors wish to thank Mary Barns, Eric Grey, Kireeti Kompella, 709 Joerg Kuechemann, Wilfried Maas, Luca Martini, Mark Townsley, Rolf 710 Winter and Lucy Yong for valuable comments on this document. 712 14. References 714 14.1. Normative References 716 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 717 Requirement Levels", BCP 14, RFC 2119, March 1997. 719 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 720 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 721 Encoding", RFC 3032, January 2001. 723 [RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol 724 Label Switched (MPLS) Data Plane Failures", RFC 4379, 725 February 2006. 727 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 728 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 729 Use over an MPLS PSN", RFC 4385, February 2006. 731 [RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. 732 Heron, "Pseudowire Setup and Maintenance Using the Label 733 Distribution Protocol (LDP)", RFC 4447, April 2006. 735 [RFC4448] Martini, L., Rosen, E., El-Aawar, N., and G. Heron, 736 "Encapsulation Methods for Transport of Ethernet over MPLS 737 Networks", RFC 4448, April 2006. 739 [RFC4553] Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time 740 Division Multiplexing (TDM) over Packet (SAToP)", 741 RFC 4553, June 2006. 743 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 744 Cost Multipath Treatment in MPLS Networks", BCP 128, 745 RFC 4928, June 2007. 747 [RFC5085] Nadeau, T. and C. Pignataro, "Pseudowire Virtual Circuit 748 Connectivity Verification (VCCV): A Control Channel for 749 Pseudowires", RFC 5085, December 2007. 751 14.2. Informative References 753 [I-D.kompella-mpls-entropy-label] 754 Kompella, K., Drake, J., Amante, S., Henderickx, W., and 755 L. Yong, "The Use of Entropy Labels in MPLS Forwarding", 756 draft-kompella-mpls-entropy-label-02 (work in progress), 757 March 2011. 759 [I-D.stein-pwe3-pwbonding] 760 Stein, Y., Mendelsohn, I., and R. Insler, "PW Bonding", 761 draft-stein-pwe3-pwbonding-01 (work in progress), 762 November 2008. 764 [RFC3985] Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to- 765 Edge (PWE3) Architecture", RFC 3985, March 2005. 767 [RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute 768 Extensions to RSVP-TE for LSP Tunnels", RFC 4090, 769 May 2005. 771 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 772 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 774 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 775 Internet Protocol", RFC 4301, December 2005. 777 [RFC4378] Allan, D. and T. Nadeau, "A Framework for Multi-Protocol 778 Label Switching (MPLS) Operations and Management (OAM)", 779 RFC 4378, February 2006. 781 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 782 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 784 [RFC5462] Andersson, L. and R. Asati, "Multiprotocol Label Switching 785 (MPLS) Label Stack Entry: "EXP" Field Renamed to "Traffic 786 Class" Field", RFC 5462, February 2009. 788 [RFC5654] Niven-Jenkins, B., Brungard, D., Betts, M., Sprecher, N., 789 and S. Ueno, "Requirements of an MPLS Transport Profile", 790 RFC 5654, September 2009. 792 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 793 (BFD)", RFC 5880, June 2010. 795 [RFC5920] Fang, L., "Security Framework for MPLS and GMPLS 796 Networks", RFC 5920, July 2010. 798 Authors' Addresses 800 Stewart Bryant (editor) 801 Cisco Systems 802 250 Longwater Ave 803 Reading RG2 6GB 804 United Kingdom 806 Phone: +44-208-824-8828 807 Email: stbryant@cisco.com 809 Clarence Filsfils 810 Cisco Systems 811 Brussels 812 Belgium 814 Email: cfilsfil@cisco.com 816 Ulrich Drafz 817 Deutsche Telekom 818 Muenster 819 Germany 821 Email: Ulrich.Drafz@t-com.net 823 Vach Kompella 824 Alcatel-Lucent 826 Email: Alcatel-Lucent vach.kompella@alcatel-lucent.com 828 Joe Regan 829 Alcatel-Lucent 831 Email: joe.regan@alcatel-lucent.comRegan 832 Shane Amante 833 Level 3 Communications 835 Email: shane@castlepoint.net