idnits 2.17.00 (12 Aug 2021) /tmp/idnits18611/draft-ietf-mpls-entropy-label-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3031, updated by this document, for RFC5378 checks: 1998-03-17) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 17, 2012) is 3563 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'L' is mentioned on line 724, but not defined == Missing Reference: 'E' is mentioned on line 724, but not defined == Missing Reference: 'TL4' is mentioned on line 819, but not defined -- Looks like a reference, but probably isn't: '1' on line 844 == Missing Reference: 'TL3' is mentioned on line 820, but not defined == Missing Reference: 'TL1' is mentioned on line 822, but not defined == Missing Reference: 'TL0' is mentioned on line 775, but not defined -- Looks like a reference, but probably isn't: '3' on line 846 == Missing Reference: 'AL' is mentioned on line 824, but not defined == Missing Reference: 'L4' is mentioned on line 844, but not defined == Missing Reference: 'L3' is mentioned on line 844, but not defined == Missing Reference: 'Rn' is mentioned on line 845, but not defined -- Looks like a reference, but probably isn't: '0' on line 846 ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) -- Obsolete informational reference (is this intentional?): RFC 4379 (Obsoleted by RFC 8029) -- Obsolete informational reference (is this intentional?): RFC 4447 (Obsoleted by RFC 8077) Summary: 1 error (**), 0 flaws (~~), 11 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Kompella 3 Internet-Draft J. Drake 4 Updates: 3031, 3107, 3209, 5036 Juniper Networks 5 (if approved) S. Amante 6 Intended status: Standards Track Level 3 Communications, LLC 7 Expires: February 18, 2013 W. Henderickx 8 Alcatel-Lucent 9 L. Yong 10 Huawei USA 11 August 17, 2012 13 The Use of Entropy Labels in MPLS Forwarding 14 draft-ietf-mpls-entropy-label-05 16 Abstract 18 Load balancing is a powerful tool for engineering traffic across a 19 network. This memo suggests ways of improving load balancing across 20 MPLS networks using the concept of "entropy labels". It defines the 21 concept, describes why entropy labels are useful, enumerates 22 properties of entropy labels that allow maximal benefit, and shows 23 how they can be signaled and used for various applications. This 24 document updates RFCs 3031, 3107, 3209 and 5036. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on February 18, 2013. 43 Copyright Notice 45 Copyright (c) 2012 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. Conventions used . . . . . . . . . . . . . . . . . . . . . 4 62 1.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 6 63 2. Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 7 64 3. Entropy Labels and Their Structure . . . . . . . . . . . . . . 8 65 4. Data Plane Processing of Entropy Labels . . . . . . . . . . . 9 66 4.1. Egress LSR . . . . . . . . . . . . . . . . . . . . . . . . 9 67 4.2. Ingress LSR . . . . . . . . . . . . . . . . . . . . . . . 10 68 4.3. Transit LSR . . . . . . . . . . . . . . . . . . . . . . . 11 69 4.4. Penultimate Hop LSR . . . . . . . . . . . . . . . . . . . 12 70 5. Signaling for Entropy Labels . . . . . . . . . . . . . . . . . 12 71 5.1. LDP Signaling . . . . . . . . . . . . . . . . . . . . . . 12 72 5.1.1. Processing the ELC TLV . . . . . . . . . . . . . . . . 13 73 5.2. BGP Signaling . . . . . . . . . . . . . . . . . . . . . . 13 74 5.3. RSVP-TE Signaling . . . . . . . . . . . . . . . . . . . . 14 75 5.4. Multicast LSPs and Entropy Labels . . . . . . . . . . . . 14 76 6. Operations, Administration, and Maintenance (OAM) and 77 Entropy Labels . . . . . . . . . . . . . . . . . . . . . . . . 15 78 7. MPLS-TP and Entropy Labels . . . . . . . . . . . . . . . . . . 16 79 8. Entropy Labels in Various Scenarios . . . . . . . . . . . . . 16 80 8.1. LDP Tunnel . . . . . . . . . . . . . . . . . . . . . . . . 16 81 8.2. LDP Over RSVP-TE . . . . . . . . . . . . . . . . . . . . . 18 82 8.3. MPLS Applications . . . . . . . . . . . . . . . . . . . . 19 83 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 84 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 85 10.1. Reserved Label for ELI . . . . . . . . . . . . . . . . . . 20 86 10.2. LDP Entropy Label Capability TLV . . . . . . . . . . . . . 20 87 10.3. BGP Entropy Label Capability Attribute . . . . . . . . . . 20 88 10.4. RSVP-TE Entropy Label Capability flag . . . . . . . . . . 20 89 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 90 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 91 12.1. Normative References . . . . . . . . . . . . . . . . . . . 21 92 12.2. Informative References . . . . . . . . . . . . . . . . . . 21 93 Appendix A. Applicability of LDP Entropy Label Capability TLV . . 22 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 96 1. Introduction 98 Load balancing, or multi-pathing, is an attempt to balance traffic 99 across a network by allowing the traffic to use multiple paths. Load 100 balancing has several benefits: it eases capacity planning; it can 101 help absorb traffic surges by spreading them across multiple paths; 102 it allows better resilience by offering alternate paths in the event 103 of a link or node failure. 105 As providers scale their networks, they use several techniques to 106 achieve greater bandwidth between nodes. Two widely used techniques 107 are: Link Aggregation Group (LAG) and Equal-Cost Multi-Path (ECMP). 108 LAG is used to bond together several physical circuits between two 109 adjacent nodes so they appear to higher-layer protocols as a single, 110 higher bandwidth 'virtual' pipe. ECMP is used between two nodes 111 separated by one or more hops, to allow load balancing over several 112 shortest paths in the network. This is typically obtained by 113 arranging IGP metrics such that there are several equal cost paths 114 between source-destination pairs. Both of these techniques may, and 115 often do, co-exist in various parts of a given provider's network, 116 depending on various choices made by the provider. 118 A very important requirement when load balancing is that packets 119 belonging to a given 'flow' must be mapped to the same path, i.e., 120 the same exact sequence of links across the network. This is to 121 avoid jitter, latency and re-ordering issues for the flow. What 122 constitutes a flow varies considerably. A common example of a flow 123 is a TCP session. Other examples are an L2TP session corresponding 124 to a given broadband user, or traffic within an ATM virtual circuit. 126 To meet this requirement, a node uses certain fields, termed 'keys', 127 within a packet's header as input to a load balancing function 128 (typically a hash function) that selects the path for all packets in 129 a given flow. The keys chosen for the load balancing function depend 130 on the packet type; a typical set (for IP packets) is the IP source 131 and destination addresses, the protocol type, and (for TCP and UDP 132 traffic) the source and destination port numbers. An overly 133 conservative choice of fields may lead to many flows mapping to the 134 same hash value (and consequently poorer load balancing); an overly 135 aggressive choice may map a flow to multiple values, potentially 136 violating the above requirement. 138 For MPLS networks, most of the same principles (and benefits) apply. 139 However, finding useful keys in a packet for the purpose of load 140 balancing can be more of a challenge. In many cases, MPLS 141 encapsulation may require fairly deep inspection of packets to find 142 these keys at transit Label Switching Routers (LSRs). 144 One way to eliminate the need for this deep inspection is to have the 145 ingress LSR of an MPLS Label Switched Path extract the appropriate 146 keys from a given packet, input them to its load balancing function, 147 and place the result in an additional label, termed the 'entropy 148 label', as part of the MPLS label stack it pushes onto that packet. 150 The packet's MPLS entire label stack can then be used by transit LSRs 151 to perform load balancing, as the entropy label introduces the right 152 level of "entropy" into the label stack. 154 There are five key reasons why this is beneficial: 156 1. at the ingress LSR, MPLS encapsulation hasn't yet occurred, so 157 deep inspection is not necessary; 159 2. the ingress LSR has more context and information about incoming 160 packets than transit LSRs; 162 3. ingress LSRs usually operate at lower bandwidths than transit 163 LSRs, allowing them to do more work per packet; 165 4. transit LSRs do not need to perform deep packet inspection and 166 can load balance effectively using only a packet's MPLS label 167 stack; and 169 5. transit LSRs, not having the full context that an ingress LSR 170 does, have the hard choice between potentially misinterpreting 171 fields in a packet as valid keys for load balancing (causing 172 packet ordering problems) or adopting a conservative approach 173 (giving rise to sub-optimal load balancing). Entropy labels 174 relieves them of making this choice. 176 This memo describes why entropy labels are needed and defines the 177 properties of entropy labels; in particular how they are generated 178 and received, and the expected behavior of transit LSRs. Finally, it 179 describes in general how signaling works and what needs to be 180 signaled, as well as specifics for the signaling of entropy labels 181 for LDP ([RFC5036]), BGP ([RFC3107]), and RSVP-TE ([RFC3209]). 183 1.1. Conventions used 185 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 186 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 187 document are to be interpreted as described in [RFC2119]. 189 The following acronyms are used: 191 BoS: Bottom of Stack 193 CE: Customer Edge device 195 ECMP: Equal Cost Multi-Path 197 EL: Entropy Label 199 ELC: Entropy Label Capability 201 ELI: Entropy Label Indicator 203 FEC: Forwarding Equivalence Class 205 LAG: Link Aggregation Group 207 LER: Label Edge Router 209 LSP: Label Switched Path 211 LSR: Label Switching Router 213 PE: Provider Edge Router 215 PW: Pseudowire 217 PHP: Penultimate Hop Popping 219 TC: Traffic Class 221 TTL: Time-to-Live 223 UHP: Ultimate Hop Popping 225 VPLS: Virtual Private LAN (Local Area Network) Service 227 VPN: Virtual Private Network 229 The term ingress (or egress) LSR is used interchangeably with ingress 230 (or egress) LER. The term application throughout the text refers to 231 an MPLS application (such as a VPN or VPLS). 233 A label stack (say of three labels) is denoted by , where 234 L1 is the "outermost" label and L3 the innermost (closest to the 235 payload). Packet flows are depicted left to right, and signaling is 236 shown right to left (unless otherwise indicated). 238 The term 'label' is used both for the entire 32-bit label stack entry 239 and the 20-bit label field within a label stack entry. It should be 240 clear from the context which is meant. 242 1.2. Motivation 244 MPLS is a very successful generic forwarding substrate that 245 transports several dozen types of protocols, most notably: IP, PWs, 246 VPLS and IP VPNs. Within each type of protocol, there typically 247 exist several variants, each with a different set of load balancing 248 keys, e.g., for IP: IPv4, IPv6, IPv6 in IPv4, etc.; for PWs: 249 Ethernet, ATM, Frame-Relay, etc. There are also several different 250 types of Ethernet over PW encapsulation, ATM over PW encapsulation, 251 etc. as well. Finally, given the popularity of MPLS, it is likely 252 that it will continue to be extended to transport new protocols. 254 Currently, each transit LSR along the path of a given LSP has to try 255 to infer the underlying protocol within an MPLS packet in order to 256 extract appropriate keys for load balancing. Unfortunately, if the 257 transit LSR is unable to infer the MPLS packet's protocol (as is 258 often the case), it will typically use the topmost (or all) MPLS 259 labels in the label stack as keys for the load balancing function. 260 The result may be an extremely inequitable distribution of traffic 261 across equal-cost paths exiting that LSR. This is because MPLS 262 labels are generally fairly coarse-grained forwarding labels that 263 typically describe a next-hop, or provide some of demultiplexing 264 and/or forwarding function, and do not describe the packet's 265 underlying protocol. 267 On the other hand, an ingress LSR (e.g., a PE router) has detailed 268 knowledge of an packet's contents, typically through a priori 269 configuration of the encapsulation(s) that are expected at a given 270 PE-CE interface, (e.g., IPv4, IPv6, VPLS, etc.). They also have more 271 flexible forwarding hardware. PE routers need this information and 272 these capabilities to: 274 a) apply the required services for the CE; 276 b) discern the packet's CoS forwarding treatment; 278 c) apply filters to forward or block traffic to/from the CE; 280 d) to forward routing/control traffic to an onboard management 281 processor; and, 283 e) load-balance the traffic on its uplinks to transit LSRs (e.g., 284 P routers). 286 By knowing the expected encapsulation types, an ingress LSR router 287 can apply a more specific set of payload parsing routines to extract 288 the keys appropriate for a given protocol. This allows for 289 significantly improved accuracy in determining the appropriate load 290 balancing behavior for each protocol. 292 If the ingress LSR were to capture the flow information so gathered 293 in a convenient form for downstream transit LSRs, transit LSRs could 294 remain completely oblivious to the contents of each MPLS packet, and 295 use only the captured flow information to perform load balancing. In 296 particular, there will be no reason to duplicate an ingress LSR's 297 complex packet/payload parsing functionality in a transit LSR. This 298 will result in less complex transit LSRs, enabling them to more 299 easily scale to higher forwarding rates, larger port density, lower 300 power consumption, etc. The idea in this memo is to capture this 301 flow information as a label, the so-called entropy label. 303 Ingress LSRs can also adapt more readily to new protocols and extract 304 the appropriate keys to use for load balancing packets of those 305 protocols. This means that deploying new protocols or services in 306 edge devices requires fewer concomitant changes in the core, 307 resulting in higher edge service velocity and at the same time more 308 stable core networks. 310 2. Approaches 312 There are two main approaches to encoding load balancing information 313 in the label stack. The first allocates multiple labels for a 314 particular Forwarding Equivalence Class (FEC). These labels are 315 equivalent in terms of forwarding semantics, but having multiple 316 labels allows flexibility in assigning labels to flows belonging to 317 the same FEC. This approach has the advantage that the label stack 318 has the same depth whether or not one uses label-based load 319 balancing; and so, consequently, there is no change to forwarding 320 operations on transit and egress LSRs. However, it has a major 321 drawback in that there is a significant increase in both signaling 322 and forwarding state. 324 The other approach encodes the load balancing information as an 325 additional label in the label stack, thus increasing the depth of the 326 label stack by one. With this approach, there is minimal change to 327 signaling state for a FEC; also, there is no change in forwarding 328 operations in transit LSRs, and no increase of forwarding state in 329 any LSR. The only purpose of the additional label is to increase the 330 entropy in the label stack, so this is called an "entropy label". 331 This memo focuses solely on this approach. 333 This latter approach uses upstream generated entropy labels, which 334 may conflict with downstream allocated application labels. There are 335 a few approaches to deal with this: 1) allocate a pair of labels for 336 each FEC, one that must have an entropy label below it, and one that 337 must not; 2) use a label (the "Entropy Label Indicator") to indicate 338 that the next label is an entropy label; and 3) allow entropy labels 339 only where there is no possible confusion. The first doubles control 340 and data plane state in the network; the last is too restrictive. 341 The approach taken here is the second. In making both the above 342 choices, the trade-off is to increase label stack depth rather than 343 control and data plane state in the network. 345 Finally, one may choose to associate ELs with MPLS tunnels (LSPs), or 346 with MPLS applications (e.g., VPNs). (What this entails is described 347 in later sections.) We take the former approach, for the following 348 reasons: 350 1. There are a small number of tunneling protocols for MPLS, but a 351 large and growing number of applications. Defining ELs on a 352 tunnel basis means simpler standards, lower development, 353 interoperability and testing efforts. 355 2. As a consequence, there will be much less churn in the network as 356 new applications (services) are defined and deployed. 358 3. Processing application labels in the data plane is more complex 359 than processing tunnel labels. Thus, it is preferable to burden 360 the latter rather than the former with EL processing. 362 4. Associating ELs with tunnels makes it simpler to deal with 363 hierarchy, be it LDP-over-RSVP-TE or Carrier's Carrier VPNs. 364 Each layer in the hierarchy can choose independently whether or 365 not they want ELs. 367 The cost of this approach is that ELIs will be mandatory; again, the 368 trade-off is the size of the label stack. To summarize, the net 369 increase in the label stack to use entropy labels is two: one 370 reserved label for the ELI, and the entropy label itself. 372 3. Entropy Labels and Their Structure 374 An entropy label (as used here) is a label: 376 1. that is not used for forwarding; 378 2. that is not signaled; and 379 3. whose only purpose in the label stack is to provide 'entropy' to 380 improve load balancing. 382 Entropy labels are generated by an ingress LSR, based entirely on 383 load balancing information. However, they MUST NOT have values in 384 the reserved label space (0-15) [IANA MPLS Label Values]. To ensure 385 that they are not used inadvertently for forwarding, entropy labels 386 SHOULD have a TTL of 0. The TC field of an entropy label can be set 387 to any value deemed appropriate. 389 Since entropy labels are generated by an ingress LSR, an egress LSR 390 MUST be able to distinguish unambiguously between entropy labels and 391 application labels. To accomplish this, it is REQUIRED that the 392 label immediately preceding an entropy label (EL) in the MPLS label 393 stack be an 'entropy label indicator' (ELI), where preceding means 394 closer to the top of the label stack (farther from bottom of stack 395 indication). The ELI is a reserved label with value (TBD by IANA). 396 An ELI MUST have 'Bottom of Stack' (BoS) bit = 0 ([RFC3032]). The 397 TTL SHOULD be set to whatever value the label above it in the stack 398 has. The TC field can be set to any value deemed appropriate; 399 typically, this will be the value in the label above the ELI in the 400 label stack. 402 Entropy labels are useful for pseudowires ([RFC4447]). [RFC6391] 403 explains how entropy labels can be used for RFC 4447-style 404 pseudowires, and thus is complementary to this memo, which focuses on 405 how entropy labels can be used for tunnels, and thus for all other 406 MPLS applications. 408 4. Data Plane Processing of Entropy Labels 410 4.1. Egress LSR 412 Suppose egress LSR Y is capable of processing entropy labels for a 413 tunnel. Y indicates this to all ingresses via signaling (see 414 Section 5). Y MUST be prepared to deal both with packets with an 415 imposed EL and those without; the ELI will distinguish these cases. 416 If a particular ingress chooses not to impose an EL, Y's processing 417 of the received label stack (which might be empty) is as if Y chose 418 not to accept ELs. 420 If an ingress X chooses to impose an EL, then Y will receive a tunnel 421 termination packet with label stack . Y recognizes TL as the label it distributed to its 423 upstreams for the tunnel, and pops it. (Note that TL may be the 424 implicit null label, in which case it doesn't appear in the label 425 stack.) Y then recognizes the ELI and pops two labels: the ELI and 426 the EL. Y then processes the remaining packet header as normal; this 427 may require further processing of tunnel termination, perhaps with 428 further ELI+EL pairs. When processing the final tunnel termination, 429 Y MAY enqueue the packet based on that tunnel TL's or ELI's TC value, 430 and MAY use the tunnel TL's or ELI's TTL to compute the TTL of the 431 remaining packet header. The EL's TTL MUST be ignored. 433 If any ELI processed by Y has BoS bit set, Y MUST discard the packet, 434 and MAY log an error. The EL's BoS bit will indicate whether or not 435 there are more labels in the stack. 437 4.2. Ingress LSR 439 If an egress LSR Y indicates via signaling that it can process ELs on 440 a particular tunnel, an ingress LSR X can choose whether or not to 441 insert ELs for packets going into that tunnel. Y MUST handle both 442 cases. 444 The steps that X performs to insert ELs are as follows: 446 1. On an incoming packet, identify the application to which the 447 packet belongs; based on this, pick appropriate fields as input 448 to the load balancing function; apply the load balancing function 449 to these input fields, and let LB be the output. 451 2. Determine the application label AL (if any). Push onto the 452 packet. 454 3. Based on the application, the load balancing output LB and other 455 factors, determine the egress LSR Y, the tunnel to Y, the 456 specific interface to the next hop, and thus the tunnel label TL. 457 Use LB to generate the entropy label EL. 459 4. If, for the chosen tunnel, Y has not indicated that it can 460 process ELs, push onto the packet. If Y has indicated that 461 it can process ELs for the tunnel, push onto the 462 packet. X SHOULD put the same TTL and TC fields for the ELI as 463 it does for TL. This protects LSP behavior in cases where PHP is 464 used and the ELI and EL are not stripped at the penultimate hop 465 (see Section 4.4). The TTL for the EL MUST be zero. The TC for 466 the EL may be any value. 468 5. X then determines whether further tunnel hierarchy is needed; if 469 so, X goes back to step 3, possibly with a new egress Y for the 470 new tunnel. Otherwise, X is done, and sends out the packet. 472 Notes: 474 a. X computes load balancing information and generates the EL based 475 on the incoming application packet, even though the signaling of 476 EL capability is associated with tunnels. 478 b. X MAY insert several entropy labels in the stack (each, of 479 course, preceded by an ELI), potentially one for each 480 hierarchical tunnel, provided that the egress for that tunnel has 481 indicated that it can process ELs for that tunnel. 483 c. X MUST NOT include an entropy label for a given tunnel unless the 484 egress LSR Y has indicated that it can process entropy labels for 485 that tunnel. 487 d. The signaling and use of entropy labels in one direction 488 (signaling from Y to X, and data path from X to Y) is completely 489 independent of the signaling and use of entropy labels in the 490 reverse direction (signaling from X to Y, and data path from Y to 491 X). 493 4.3. Transit LSR 495 Transit LSRs MAY operate with no change in forwarding behavior. The 496 following are suggestions for optimizations that improve load 497 balancing, reduce the amount of packet data processed, and/or enhance 498 backward compatibility. 500 If a transit LSR recognizes the ELI, it MAY choose to load balance 501 solely on the following label (the EL); otherwise, it SHOULD use as 502 much of the whole label stack as feasible as keys for the load 503 balancing function, with the exception that reserved labels MUST NOT 504 be used. 506 Some transit LSRs look beyond the label stack for better load 507 balancing information. This is a simple, backward compatible 508 approach in networks where some ingress LSRs impose ELs and others 509 don't. However, this is of limited incremental value if an EL is 510 indeed present, and requires more packet processing from the LSR. A 511 transit LSR MAY choose to parse the label stack for the presence of 512 the ELI, and look beyond the label stack only if it does not find it, 513 thus retaining the old behavior when needed, yet avoiding unnecessary 514 work if not needed. 516 As stated in Section 4.1 and Section 5, an egress LSR that signals 517 both ELC and implicit null MUST pop the ELI and the next label if it 518 encounters a packet with the ELI as the topmost label. Any other LSR 519 (including PHP LSRs) MUST drop such packets, as per section 3.18 of 520 [RFC3031]. 522 4.4. Penultimate Hop LSR 524 No change is needed at penultimate hop LSRs. However, a PHP LSR that 525 recognizes the ELI MAY choose to pop the ELI and following label 526 (which should be an entropy label) in addition to popping the tunnel 527 label, provided that doing so doesn't diminish its ability to load 528 balance on the next hop. 530 5. Signaling for Entropy Labels 532 An egress LSR Y can signal to ingress LSR(s) its ability to process 533 entropy labels (henceforth called "Entropy Label Capability" or ELC) 534 on a given tunnel. In particular, even if Y signals an implicit null 535 label, indicating that PHP is to be performed, Y MUST be prepared to 536 pop the ELI and EL. 538 Note that Entropy Label Capability may be asymmetric: if LSRs X and Y 539 are at opposite ends of a tunnel, X may be able to process entropy 540 labels, whereas Y may not. The signaling extensions below allow for 541 this asymmetry. 543 For an illustration of signaling and forwarding with entropy labels, 544 see Section 8. 546 5.1. LDP Signaling 548 A new LDP TLV ([RFC5036]) is defined to signal an egress's ability to 549 process entropy labels. This is called the ELC TLV, and may appear 550 as an Optional Parameter of the Label Mapping Message TLV. 552 The presence of the ELC TLV in a Label Mapping Message indicates to 553 ingress LSRs that the egress LSR can process entropy labels for the 554 associated LDP tunnel. The ELC TLV has Type (TBD by IANA) and Length 555 0. 557 The structure of the ELC TLV is shown below. 559 0 1 2 3 560 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 561 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 562 |U|F| Type (TBD) | Length (0) | 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 565 Figure 1: Entropy Label Capability TLV 567 where: 569 U: Unknown bit. This bit MUST be set to 1. If the ELC TLV is not 570 understood by the receiver, then it MUST be ignored. 572 F: Forward bit. This bit MUST be set be set to 1. Since the ELC 573 TLV is going to be propagated hop-by-hop, it should be forwarded 574 even by nodes that may not understand it. 576 Type: Type field. To be assigned by IANA. 578 Length: Length field. This field specifies the total length in 579 octets of the ELC TLV, and is currently defined to be 0. 581 5.1.1. Processing the ELC TLV 583 An LSR that receives a Label Mapping with the ELC TLV but does not 584 understand it MUST propagate it intact to its neighbors and MUST NOT 585 send a notification to the sender (following the meaning of the U- 586 and F-bits). 588 An LSR X may receive multiple Label Mappings for a given FEC F from 589 its neighbors. In its turn, X may advertise a Label Mapping for F to 590 its neighbors. If X understands the ELC TLV, and if any of the 591 advertisements it received for FEC F does not include the ELC TLV, X 592 MUST NOT include the ELC TLV in its own advertisements of F. If all 593 the advertised Mappings for F include the ELC TLV, then X MUST 594 advertise its Mapping for F with the ELC TLV. If any of X's 595 neighbors resends its Mapping, sends a new Mapping or Withdraws a 596 previously advertised Mapping for F, X MUST re-evaluate the status of 597 ELC for FEC F, and, if there is a change, X MUST re-advertise its 598 Mapping for F with the updated status of ELC. 600 5.2. BGP Signaling 602 When BGP [RFC4271] is used for distributing Network Layer 603 Reachability Information (NLRI) as described in, for example, 604 [RFC3107], the BGP UPDATE message may include the ELC attribute as 605 part of the Path Attributes. This is an optional, transitive BGP 606 attribute of type (to be assigned by IANA). The inclusion of this 607 attribute with an NLRI indicates that the advertising BGP router can 608 process entropy labels as an egress LSR for all routes in that NLRI. 610 A BGP speaker S that originates an UPDATE should include the ELC 611 attribute only if both of the following are true: 613 A1: S sets the BGP NEXT_HOP attribute to itself; AND 614 A2: S can process entropy labels. 616 Suppose a BGP speaker T receives an UPDATE U with the ELC attribute. 617 T has two choices. T can simply re-advertise U with the ELC 618 attribute if either of the following is true: 620 B1: T does not change the NEXT_HOP attribute; OR 622 B2: T simply swaps labels without popping the entire label stack and 623 processing the payload below. 625 An example of the use of B1 is Route Reflectors. 627 However, if T changes the NEXT_HOP attribute for U and in the data 628 plane pops the entire label stack to process the payload, T MAY 629 include an ELC attribute for UPDATE U' if both of the following are 630 true: 632 C1: T sets the NEXT_HOP attribute of U' to itself; AND 634 C2: T can process entropy labels. 636 Otherwise, T MUST remove the ELC attribute. 638 5.3. RSVP-TE Signaling 640 Entropy Label support is signaled in RSVP-TE [RFC3209] using the 641 Entropy Label Capability (ELC) flag in the Attribute Flags TLV of the 642 LSP_ATTRIBUTES object [RFC5420]. The presence of the ELC flag in a 643 Path message indicates that the ingress can process entropy labels in 644 the upstream direction; this only makes sense for a bidirectional LSP 645 and MUST be ignored otherwise. The presence of the ELC flag in a 646 Resv message indicates that the egress can process entropy labels in 647 the downstream direction. 649 The bit number for the ELC flag is to be assigned by IANA. 651 5.4. Multicast LSPs and Entropy Labels 653 Multicast LSPs [RFC4875], [RFC6388] typically do not use ECMP for 654 load balancing, as the combination of replication and multipathing 655 can lead to duplicate traffic delivery. However, these LSPs can 656 traverse bundled links [RFC4201] and LAGs. In both these cases, load 657 balancing is useful, and hence entropy labels can be of value for 658 multicast LSPs. 660 The methodology defined for entropy labels here will be used for 661 multicast LSPs; however, the details of signaling and processing ELs 662 for multicast LSPs will be specified in a companion document. 664 6. Operations, Administration, and Maintenance (OAM) and Entropy Labels 666 Generally OAM comprises a set of functions operating in the data 667 plane to allow a network operator to monitor its network 668 infrastructure and to implement mechanisms in order to enhance the 669 general behavior and the level of performance of its network, e.g., 670 the efficient and automatic detection, localization, diagnosis and 671 handling of defects. 673 Currently defined OAM mechanisms for MPLS include LSP Ping/Traceroute 674 [RFC4379] and Bidirectional Failure Detection (BFD) for MPLS 675 [RFC5884]. The latter provides connectivity verification between the 676 endpoints of an LSP, and recommends establishing a separate BFD 677 session for every path between the endpoints. 679 The LSP traceroute procedures of [RFC4379] allow an ingress LSR to 680 obtain label ranges that can be used to send packets on every path to 681 the egress LSR. It works by having ingress LSR sequentially ask the 682 transit LSRs along a particular path to a given egress LSR to return 683 a label range such that the inclusion of a label in that range in a 684 packet will cause the replying transit LSR to send that packet out 685 the egress interface for that path. The ingress provides the label 686 range returned by transit LSR N to transit LSR N + 1, which returns a 687 label range which is less than or equal in span to the range provided 688 to it. This process iterates until the penultimate transit LSR 689 replies to the ingress LSR with a label range that is acceptable to 690 it and to all LSRs along path preceding it for forwarding a packet 691 along the path. 693 However, the LSP traceroute procedures do not specify where in the 694 label stack the value from the label range is to be placed, whether 695 deep packet inspection is allowed and if so, which keys and key 696 values are to be used. 698 This memo updates LSP traceroute by specifying that the value from 699 the label range is to be placed in the entropy label. Deep packet 700 inspection is thus not necessary, although an LSR may use it, 701 provided it do so consistently, i.e., if the label range to go to a 702 given downstream LSR is computed with deep packet inspection, then 703 the data path should use the same approach and the same keys. 705 In order to have a BFD session on a given path, a value from the 706 label range for that path should be used as the EL value for BFD 707 packets sent on that path. 709 7. MPLS-TP and Entropy Labels 711 Since MPLS-TP does not use ECMP, entropy labels are not applicable to 712 an MPLS-TP deployment. 714 8. Entropy Labels in Various Scenarios 716 This section describes the use of entropy labels in various 717 scenarios. 719 In the figures below, the following conventions used to depict 720 processing between X and Y. Note that control plane signaling goes 721 right to left, whereas data plane processing goes left to right. 723 Protocols 724 Y: <--- [L, E] Y signals L to X 725 X ------------- Y 726 LS: Label stack 727 X: + X pushes 728 Y: - Y pops 730 This means that Y signals to X label L for an LDP tunnel. E can be 731 one of: 733 0: meaning egress is NOT entropy label capable, or 735 1: meaning egress is entropy label capable. 737 The line with LS: shows the label stack on the wire. Below that is 738 the operation that each LSR does in the data plane, where + means 739 push the following label stack, - means pop the following label 740 stack, L~L' means swap L with L', and * means that the operation is 741 not depicted. 743 8.1. LDP Tunnel 745 The following illustrates several simple intra-AS LDP tunnels. The 746 first diagram shows ultimate hop popping (UHP) with ingress inserting 747 an EL, the second UHP with no ELs, the third PHP with ELs, and 748 finally, PHP with no ELs, but also with an application label AL 749 (which could, for example, be a VPN label). 751 Note that, in all the cases below, the MPLS application does not 752 matter; it may be that X pushes some more labels (perhaps for a VPN 753 or VPLS) below the ones shown, and Y pops them. 755 A: <--- [TL4, 1] 756 B: <-- [TL3, 1] 757 ... 758 W: <-- [TL1, 1] 759 Y: <-- [TL0, 1] 760 X --------------- A --------- B ... W ---------- Y 761 LS: 762 X: + 763 A: TL4~TL3 764 B: TL3~TL2 765 ... 766 W: TL1~TL0 767 Y: - 769 LDP with UHP; ingress inserts ELs 771 A: <--- [TL4, 1] 772 B: <-- [TL3, 1] 773 ... 774 W: <-- [TL1, 1] 775 Y: <-- [TL0, 1] 776 X --------------- A --------- B ... W ---------- Y 777 LS: 778 X: + 779 A: TL4~TL3 780 B: TL3~TL2 781 ... 782 W: TL1~TL0 783 Y: - 785 LDP with UHP; ingress does not insert ELs 787 A: <--- [TL4, 1] 788 B: <-- [TL3, 1] 789 ... 790 W: <-- [TL1, 1] 791 Y: <-- [3, 1] 792 X --------------- A --------- B ... W ---------- Y 793 X: + 794 A: TL4~TL3 795 B: TL3~TL2 796 ... 797 W: -TL1 798 Y: - 800 LDP with PHP; ingress inserts ELs 802 A: <--- [TL4, 1] 803 B: <-- [TL3, 1] 804 ... 805 W: <-- [TL1, 1] 806 Y: <-- [3, 1] 807 VPN: <------------------------------------------ [AL] 808 X --------------- A --------- B ... W ---------- Y 809 LS: 810 X: + 811 A: TL4~TL3 812 B: TL3~TL2 813 ... 814 W: -TL1 815 Y: - 817 LDP with PHP + VPN; ingress does not insert ELs 819 A: <--- [TL4, 1] 820 B: <-- [TL3, 1] 821 ... 822 W: <-- [TL1, 1] 823 Y: <-- [3, 1] 824 VPN: <--------------------------------------------- [AL] 825 X --------------- A ------------ B ... W ---------- Y 826 LS: 827 X: + 828 A: TL4~TL3 829 B: TL3~TL2 830 ... 831 W: -TL1 832 Y: - 834 LDP with PHP + VPN; ingress inserts ELs 836 8.2. LDP Over RSVP-TE 838 The following illustrates "LDP over RSVP-TE" tunnels. X and Y are 839 the ingress and egress (respectively) of the LDP tunnel; A and W are 840 the ingress and egress of the RSVP-TE tunnel. It is assumed that 841 both the LDP and RSVP-TE tunnels have PHP. 843 LDP with ELs, RSVP-TE without ELs 844 LDP: <--- [L4, 1] <------- [L3, 1] <--- [3, 1] 845 RSVP-TE: <-- [Rn, 0] 846 <-- [3, 0] 847 X --------------- A --------- B ... W ---------- Y 848 LS: ... 849 DP: + L4~ * -L1 - 851 Figure 2: LDP over RSVP-TE Tunnels 853 8.3. MPLS Applications 855 An ingress LSR X must keep state per unicast tunnel as to whether the 856 egress for that tunnel can process entropy labels. X does not have 857 to keep state per application running over that tunnel. However, an 858 ingress PE can choose on a per-application basis whether or not to 859 insert ELs. For example, X may have an application for which it does 860 not wish to use ECMP (e.g., circuit emulation), or for which it does 861 not know which keys to use for load balancing (e.g., Appletalk over a 862 pseudowire). In either of those cases, X may choose not to insert 863 entropy labels, but may choose to insert entropy labels for an IP VPN 864 over the same tunnel. 866 9. Security Considerations 868 This document describes advertisement of the capability to support 869 receipt of entropy labels which an ingress LSR may insert in MPLS 870 packets in order to allow transit LSRs to attain better load 871 balancing across LAG and/or ECMP paths in the network. 873 This document does not introduce new security vulnerabilities to LDP, 874 BGP or RSVP-TE. Please refer to the Security Considerations section 875 of these protocols ([RFC5036], [RFC4271] and [RFC3209]) for security 876 mechanisms applicable to each. 878 Given that there is no end-user control over the values used for 879 entropy labels, there is little risk of Entropy Label forgery which 880 could cause uneven load-balancing in the network. 882 If Entropy Label Capability is not signaled from an egress PE to an 883 ingress PE, due to, for example, malicious configuration activity on 884 the egress PE, then the PE will fall back to not using entropy labels 885 for load-balancing traffic over LAG or ECMP paths which is in general 886 no worse than the behavior observed in current production networks. 887 That said, it is recommended that operators monitor changes to PE 888 configurations and, more importantly, the fairness of load 889 distribution over LAG or ECMP paths. If the fairness of load 890 distribution over a set of paths changes that could indicate a 891 misconfiguration, bug or other non-optimal behavior on their PEs and 892 they should take corrective action. 894 10. IANA Considerations 896 10.1. Reserved Label for ELI 898 IANA is requested to allocate a reserved label for the Entropy Label 899 Indicator (ELI) from the "Multiprotocol Label Switching Architecture 900 (MPLS) Label Values" Registry. 902 10.2. LDP Entropy Label Capability TLV 904 IANA is requested to allocate the next available value from the IETF 905 Consensus range in the LDP TLV Type Name Space Registry as the 906 "Entropy Label Capability TLV". 908 10.3. BGP Entropy Label Capability Attribute 910 IANA is requested to allocate the next available Path Attribute Type 911 Code from the "BGP Path Attributes" registry as the "BGP Entropy 912 Label Capability Attribute". 914 10.4. RSVP-TE Entropy Label Capability flag 916 IANA is requested to allocate a new bit from the "Attribute Flags" 917 sub-registry of the "RSVP TE Parameters" registry. 919 Bit | Name | Attribute | Attribute | RRO 920 No | | Flags Path | Flags Resv | 921 ----+--------------------------+------------+------------+----- 922 TBD Entropy Label Capability Yes Yes No 924 11. Acknowledgments 926 We wish to thank Ulrich Drafz for his contributions, as well as the 927 entire 'hash label' team for their valuable comments and discussion. 929 Sincere thanks to Nischal Sheth for his many suggestions and 930 comments, and his careful reading of the document, especially with 931 regard to data plane processing of entropy labels. 933 12. References 934 12.1. Normative References 936 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 937 Requirement Levels", BCP 14, RFC 2119, March 1997. 939 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 940 Label Switching Architecture", RFC 3031, January 2001. 942 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 943 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 944 Encoding", RFC 3032, January 2001. 946 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 947 BGP-4", RFC 3107, May 2001. 949 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 950 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 951 Tunnels", RFC 3209, December 2001. 953 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 954 Specification", RFC 5036, October 2007. 956 [RFC5420] Farrel, A., Papadimitriou, D., Vasseur, JP., and A. 957 Ayyangarps, "Encoding of Attributes for MPLS LSP 958 Establishment Using Resource Reservation Protocol Traffic 959 Engineering (RSVP-TE)", RFC 5420, February 2009. 961 12.2. Informative References 963 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 964 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 966 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 967 Protocol 4 (BGP-4)", RFC 4271, January 2006. 969 [RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol 970 Label Switched (MPLS) Data Plane Failures", RFC 4379, 971 February 2006. 973 [RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. 974 Heron, "Pseudowire Setup and Maintenance Using the Label 975 Distribution Protocol (LDP)", RFC 4447, April 2006. 977 [RFC4875] Aggarwal, R., Papadimitriou, D., and S. Yasukawa, 978 "Extensions to Resource Reservation Protocol - Traffic 979 Engineering (RSVP-TE) for Point-to-Multipoint TE Label 980 Switched Paths (LSPs)", RFC 4875, May 2007. 982 [RFC5884] Aggarwal, R., Kompella, K., Nadeau, T., and G. Swallow, 983 "Bidirectional Forwarding Detection (BFD) for MPLS Label 984 Switched Paths (LSPs)", RFC 5884, June 2010. 986 [RFC6388] Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas, 987 "Label Distribution Protocol Extensions for Point-to- 988 Multipoint and Multipoint-to-Multipoint Label Switched 989 Paths", RFC 6388, November 2011. 991 [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 992 J., and S. Amante, "Flow-Aware Transport of Pseudowires 993 over an MPLS Packet Switched Network", RFC 6391, 994 November 2011. 996 Appendix A. Applicability of LDP Entropy Label Capability TLV 998 In the case of unlabeled IPv4 (Internet) traffic, the Best Current 999 Practice is for an egress LSR to propagate eBGP learned routes within 1000 a SP's Autonomous System after resetting the BGP next-hop attribute 1001 to one of its Loopback IP addresses. That Loopback IP address is 1002 injected into the Service Provider's IGP and, concurrently, a label 1003 assigned to it via LDP. Thus, when an ingress LSR is performing a 1004 forwarding lookup for a BGP destination it recursively resolves the 1005 associated next-hop to a Loopback IP address and associated LDP label 1006 of the egress LSR. 1008 Thus, in the context of unlabeled IPv4 traffic, the LDP Entropy Label 1009 Capability TLV will typically be applied only to the FEC for the 1010 Loopback IP address of the egress LSR and the egress LSR need not 1011 announce an entropy label capability for the eBGP learned route. 1013 Authors' Addresses 1015 Kireeti Kompella 1016 Juniper Networks 1017 1194 N. Mathilda Ave. 1018 Sunnyvale, CA 94089 1019 US 1021 Email: kireeti@juniper.net 1022 John Drake 1023 Juniper Networks 1024 1194 N. Mathilda Ave. 1025 Sunnyvale, CA 94089 1026 US 1028 Email: jdrake@juniper.net 1030 Shane Amante 1031 Level 3 Communications, LLC 1032 1025 Eldorado Blvd 1033 Broomfield, CO 80021 1034 US 1036 Email: shane@level3.net 1038 Wim Henderickx 1039 Alcatel-Lucent 1040 Copernicuslaan 50 1041 2018 Antwerp 1042 Belgium 1044 Email: wim.henderickx@alcatel-lucent.com 1046 Lucy Yong 1047 Huawei USA 1048 5340 Legacy Dr. 1049 Plano, TX 75024 1050 US 1052 Email: lucy.yong@huawei.com