idnits 2.17.00 (12 Aug 2021) /tmp/idnits24561/draft-sajassi-nvo3-evpn-overlay-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 22, 2012) is 3491 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2890' is mentioned on line 248, but not defined == Missing Reference: 'RFC2784' is mentioned on line 249, but not defined == Missing Reference: 'MPLSoUDP' is mentioned on line 263, but not defined == Missing Reference: 'EVPN-REQ' is mentioned on line 304, but not defined == Missing Reference: 'NOV3-Framework' is mentioned on line 347, but not defined == Missing Reference: 'RFC5512' is mentioned on line 487, but not defined ** Obsolete undefined reference: RFC 5512 (Obsoleted by RFC 9012) == Outdated reference: A later version (-09) exists of draft-vandevelde-idr-remote-next-hop-01 == Outdated reference: draft-sridharan-virtualization-nvgre has been published as RFC 7637 == Outdated reference: draft-mahalingam-dutt-dcops-vxlan has been published as RFC 7348 == Outdated reference: draft-ietf-l2vpn-evpn has been published as RFC 7432 == Outdated reference: draft-ietf-nvo3-overlay-problem-statement has been published as RFC 7364 Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NVO3 Workgroup Ali Sajassi 3 INTERNET-DRAFT Samer Salam 4 Intended Status: Standards Track Keyur Patel 5 Cisco 7 Nabil Bitar 8 Verizon 10 Wim Henderickx 11 Alcatel-Lucent 13 Expires: April 22, 2013 October 22, 2012 15 A Network Virtualization Overlay Solution using E-VPN 16 draft-sajassi-nvo3-evpn-overlay-01 18 Abstract 20 This document describes how E-VPN can be used as an NVO solution and 21 explores the various tunnel encapsulation options and their impact on 22 the E-VPN control-plane and procedures. In particular, the following 23 three encapsulation options are analyzed: MPLS over GRE, VXLAN and 24 NVGRE. 26 Status of this Memo 28 This Internet-Draft is submitted to IETF in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF), its areas, and its working groups. Note that 33 other groups may also distribute working documents as 34 Internet-Drafts. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 The list of current Internet-Drafts can be accessed at 42 http://www.ietf.org/1id-abstracts.html 44 The list of Internet-Draft Shadow Directories can be accessed at 45 http://www.ietf.org/shadow.html 47 Copyright and License Notice 49 Copyright (c) 2012 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 66 2 E-VPN Main Features . . . . . . . . . . . . . . . . . . . . . . 5 67 2.1 Multi-homed Ethernet Segment Auto-Discovery . . . . . . . . 5 68 2.2 Fast Convergence and Mass Withdraw . . . . . . . . . . . . . 5 69 2.3 Split-Horizon . . . . . . . . . . . . . . . . . . . . . . . 5 70 2.4 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . 6 71 2.5 DF Election . . . . . . . . . . . . . . . . . . . . . . . . 6 72 3 Encapsulation Options for E-VPN Overlays . . . . . . . . . . . . 7 73 3.1 MPLS over GRE . . . . . . . . . . . . . . . . . . . . . . . 7 74 3.1.1 Benefits of MPLS over GRE . . . . . . . . . . . . . . . 7 75 3.2 VXLAN/NVGRE Encapsulation . . . . . . . . . . . . . . . . . 8 76 3.2.1 Impact on E-VPN Routes for VXLAN/NVGRE Encapsulation . . 8 77 3.2.2 Impact on E-VPN Procedures for VXLAN/NVGRE 78 Encapsulation . . . . . . . . . . . . . . . . . . . . . 9 79 3.2.2.1 NVE with No Redundancy . . . . . . . . . . . . . . . 9 80 3.2.2.2 NVE with Active/Standby Redundancy . . . . . . . . . 10 81 3.2.2.3 NVE with All-Active Redundancy . . . . . . . . . . . 10 82 3.2.3 Support for Multicast . . . . . . . . . . . . . . . . . 13 83 3.2.4 Inter-AS Challenges . . . . . . . . . . . . . . . . . . 13 84 4 Comparison between MPLSoGRE and VXLAN/NVGRE Encapsulation . . . 14 85 5 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . 15 86 6 Security Considerations . . . . . . . . . . . . . . . . . . . . 15 87 7 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 15 88 8 References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 89 8.1 Normative References . . . . . . . . . . . . . . . . . . . 15 90 8.2 Informative References . . . . . . . . . . . . . . . . . . 15 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16 93 1 Introduction 95 In the context of this document, a Network Virtualization Overlay 96 (NVO) is a solution to address the requirements of a multi-tenant 97 data center, especially one with virtualized hosts (i.e. Virtual 98 Machines or VMs). The key requirements of such a solution as 99 described in [Problem-Statement] are: 101 - Isolation of network traffic per tenant 103 - Support of large number of tenants (tens or hundreds of thousands) 105 - Extending L2 connectivity among different VMs belonging to a given 106 tenant segment (subnet) across different PODs within a data center or 107 between different data centers 109 The underlay network for NVO solutions is assumed to provide IP 110 connectivity. 112 This document describes how E-VPN can be used as an NVO solution and 113 explores the various tunnel encapsulation options for E-VPN over IP, 114 and their impact on the E-VPN control-plane and procedures. Note that 115 the use of E-VPN as an NVO solution does not necessarily mandate that 116 the BGP control-plane be running on the NVE. This may not be 117 desirable, for e.g., when the NVE resides on the hypervisor. For such 118 scenarios, it is still possible to leverage the E-VPN solution by 119 using XMPP, or alternative mechanisms, to extend the control-plane to 120 the NVE as discussed in [L3VPN-ENDSYSTEMS]. 122 The possible encapsulation options for E-VPN overlays that are 123 analyzed in this document are: 125 - MPLS over GRE 127 - VXLAN and NVGRE 129 Before getting into the description of the different encapsulation 130 options for E-VPN over IP, it is important to highlight the E-VPN 131 solution main features, how those features are currently supported, 132 and any impact that the encapsulation may have on those features. 134 1.1 Terminology 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 138 document are to be interpreted as described in RFC 2119 [KEYWORDS]. 140 2 E-VPN Main Features 142 In this section, we will recap the main features of E-VPN, to 143 highlight the encapsulation dependencies. The section only describes 144 the features and functions at high-level. For more details, the 145 reader is to refer to [E-VPN]. 147 2.1 Multi-homed Ethernet Segment Auto-Discovery 149 E-VPN NV Edge devices (NVEs) connected to the same Ethernet segment 150 (e.g. server) can automatically discover each other with minimal to 151 no configuration through the exchange of BGP routes. 153 2.2 Fast Convergence and Mass Withdraw 155 E-VPN defines a mechanism to efficiently and quickly signal, to 156 remote NVEs, the need to update their forwarding tables upon the 157 occurrence of a failure in connectivity to an Ethernet segment. This 158 is done by having each NVE advertise an Ethernet A-D Route per 159 Ethernet segment for each locally attached segment. Upon a failure in 160 connectivity to the attached segment, the NVE withdraws the 161 corresponding Ethernet A-D route. This triggers all NVEs that receive 162 the withdrawal to update their next-hop adjacencies for all MAC 163 addresses associated with the Ethernet segment in question. If no 164 other NVE had advertised an Ethernet A-D route for the same segment, 165 then the NVE that received the withdrawal simply invalidates the MAC 166 entries for that segment. Otherwise, the NVE updates the next-hop 167 adjacencies to point to the backup NVE(s). 169 2.3 Split-Horizon 171 Consider a station that is multi-homed to two or more NVEs on an 172 Ethernet segment ES1, with all-active redundancy. If the station 173 sends a multicast, broadcast or unknown unicast packet to a 174 particular NVE, say NE1, then NE1 will forward that packet to all or 175 subset of the other NVEs in the E-VPN instance. In this case the 176 NVEs, other than NE1, that the station is multi-homed to MUST drop 177 the packet and not forward back to the station. This is referred to 178 as "split horizon" filtering. In order to achieve this split horizon 179 function, every multicast, broadcast or unknown unicast packet is 180 encapsulated with an MPLS label that identifies the Ethernet segment 181 of origin (i.e. the segment from which the frame entered the E-VPN 182 network). This label is referred to as the ESI MPLS label, and is 183 distributed using the "Ethernet A-D route per Ethernet Segment". This 184 route is imported by the PEs connected to the Ethernet Segment and 185 also by the PEs that have at least one E-VPN instance in common with 186 the Ethernet Segment in the route. The disposition PEs rely on the 187 value of the ESI MPLS label to determine whether or not a flooded 188 frame is allowed to egress a specific Ethernet segment. 190 2.4 Aliasing 192 In the case where a station is multi-homed to multiple NVEs, it is 193 possible that only a single NVE learns a set of the MAC addresses 194 associated with traffic transmitted by the station. This leads to a 195 situation where remote NVEs receive MAC advertisement routes, for 196 these addresses, from a single NVE even though multiple PEs are 197 connected to the multi-homed segment. As a result, the remote PEs are 198 not able to effectively load-balance traffic among the NVEs connected 199 to the multi-homed Ethernet segment. This could be the case, for e.g. 200 when the PEs perform data-path learning on the access, and the load- 201 balancing function on the station hashes traffic from a given source 202 MAC address to a single PE. Another scenario where this occurs is 203 when the PEs rely on control plane learning on the access (e.g. using 204 ARP), since ARP traffic will be hashed to a single link in the LAG. 206 To alleviate this issue, E-VPN introduces the concept of 'Aliasing'. 207 Aliasing refers to the ability of an NVE to signal that it has 208 reachability to a given locally attached Ethernet segment, even when 209 it has learnt no MAC addresses from that segment. The Ethernet A-D 210 route per EVI is used to that end. Remote PEs which receive MAC 211 advertisement routes with non-zero ESI SHOULD consider the advertised 212 MAC address as reachable via all PEs which have advertised 213 reachability to the relevant Segment using Ethernet A-D routes with 214 the same ESI (and Ethernet Tag if applicable) and with the Active- 215 Standby flag reset. 217 2.5 DF Election 219 Consider a station that is a host or a VM that is multi-homed 220 directly to more than one NVE in an E-VPN on a given Ethernet 221 segment. One or more Ethernet Tags may be configured on the Ethernet 222 segment. In this scenario only one of the PEs, referred to as the 223 Designated Forwarder (DF), is responsible for certain actions: 225 - Sending multicast and broadcast traffic, on a given Ethernet 226 Tag on a particular Ethernet segment, to the station. 228 - Flooding unknown unicast traffic (i.e. traffic for 229 which an NVE does not know the destination MAC address), 230 on a given Ethernet Tag on a particular Ethernet segment 231 to the station, if the environment requires flooding of 232 unknown unicast traffic. 234 This is required in order to prevent duplicate delivery of multi- 235 destination frames to a multi-homed host or VM, in case of all-active 236 redundancy. 238 3 Encapsulation Options for E-VPN Overlays 240 3.1 MPLS over GRE 242 The E-VPN data-plane is modeled as an E-VPN MPLS client layer sitting 243 over an MPLS PSN tunnel. The Split-Horizon and Aliasing functions of 244 E-VPN are tied to the MPLS client layer. In order to keep the E-VPN 245 procedures intact and data-plane operation as is, an ideal 246 encapsulation would allow the E-VPN MPLS client layer to be carried 247 over an IP PSN tunnel transparently - i.e., without any changes. The 248 existing standards-based GRE encapsulation as defined by [RFC2890] 249 and [RFC2784] provides such a solution: 251 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 252 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 253 |C| |K|S| Reserved0 | Ver | Protocol Type | 254 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 255 | Key | 256 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 258 The Key field can be used to provide 32-bit entropy field. 260 The C (Checksum Present) and S (Sequence Number Present) bits in the 261 GRE header are set to zero. The K bit is set to 1. 263 [MPLSoUDP] discusses using a UDP header instead of the GRE header to 264 transport MPLS client layer over an IP PSN tunnel. The main advantage 265 for doing so is for better load-balancing capabilities over existing 266 IP networks, where some core routers can perform ECMP based on the 267 UDP header but not based on the GRE Key field. However, the routers 268 that are capable of supporting [NVGRE] encapsulation, can also 269 perform load-balancing based on the GRE key which accommodates a 32- 270 bit entropy value; whereas, UDP encapsulation accommodates a 16-bit 271 entropy value. 273 3.1.1 Benefits of MPLS over GRE 275 The benefits of using the MPLS over GRE encapsulation are as follows: 276 - Uses existing standard for transporting MPLS over IP. 277 - Uses E-VPN control plane (BGP routes and attributes), as well as 278 E-VPN procedures and functions exactly as is. 279 - Consistent with L3VPN over IP (RFC 4797) 280 - The MPLS label can be a global value (instead of downstream 281 assigned) just like VXLAN or NVGRE service-instance ID. 282 - Provides seamless interoperability with E-VPN PEs. There is no 283 need for a gateway device. 285 3.2 VXLAN/NVGRE Encapsulation 287 If either the VXLAN or NVGRE encapsulation were to be used with the 288 E-VPN control plane, there will be an impact on the E-VPN client 289 layer and the associated procedures and BGP routes. In order to 290 assess this impact, the first step is to identify which subset of the 291 service interfaces defined in [E-VPN] is needed for the NVO solutions 292 defined in [VXLAN] and [NVGRE]. Then we need to examine how the E-VPN 293 BGP routes and procedures should be modified to support these service 294 interfaces with the new encapsulation. 296 [E-VPN] defines the following four service interface types: 298 - VLAN Based Service Interface 299 - VLAN Bundle Service Interface 300 - Port-based Service Interface 301 - VLAN Aware Bundle Service Interface 303 For a detailed description of these service interface types, refer to 304 [EVPN-REQ] and [E-VPN]. As described in [E-VPN], the first three 305 service interface types don't require encoding the VLAN Tag in the 306 BGP routes, because there is a one-to-one mapping between an EVI and 307 a broadcast domain represented by a virtual network or a virtual 308 segment. 310 [NVGRE] requires only VLAN-based service interface and it clearly 311 describes that the tenant VLAN Tag (inner VLAN Tag) is not part of 312 the encapsulated frames because there is a one-to-one mapping between 313 Virtual Subnet Identifier (VSID) and the inner VLAN ID. 315 The [VXLAN] default mode of operation only requires VLAN-based 316 service interface, as it specifies that the VTEP does not include an 317 inner VLAN tag upon encapsulation; moreover, the decapsulated frames 318 with an inner VLAN tag should get discarded. However, [VXLAN] 319 provides an option of including an inner VLAN tag in the encapsulated 320 packet if it is configured explicitly at the VTEP. If an inner VLAN 321 tag is included, then VXLAN requires a VLAN-bundle service interface. 322 However, as discussed above, this service interface type does not 323 require that the tenant VLAN tag be sent in the BGP routes. 325 3.2.1 Impact on E-VPN Routes for VXLAN/NVGRE Encapsulation 327 As discussed above, both [NVGRE] and [VXLAN] do not require the 328 tenant VLAN tag to be sent in BGP routes. Therefore, the 32-bit 329 Ethernet tag field in the E-VPN BGP routes can be used to represent 330 NVGRE VSID or VXLAN VNI. This is not accidental, but rather by 331 design: The Ethernet Tag field in E-VPN was designed not just for C- 332 tagged or S-tagged interfaces [802.1Q] but also for I-tagged 333 interfaces [802.1ah] where an I-SID is a 24-bit entity representing a 334 virtual segment just like VSID or VNI. Therefore, there is no need to 335 re-purpose the MPLS label field in the E-VPN BGP routes and this 336 field can be omitted in the E-VPN BGP routes. The length field of the 337 NLRI in E-VPN routes will be three octets shorter for VXLAN and NVGRE 338 encapsulations. 340 Since VXLAN VNI or NVGRE VSID is assumed to be a global value, one 341 might question the need for the Route Distinguisher (RD) in the E-VPN 342 routes. In the scenario where all data centers are under a single 343 administrative domain, and there is a single global VNI/VSID space, 344 the RD can be set to zero in the E-VPN routes. However, in the 345 scenarios where different group of data centers are under different 346 administrative domains, and these data centers are connected via one 347 or more backbone core providers as described in [NOV3-Framework], the 348 RD must be a unique value per EVI or per NVE as described in [E-VPN]. 349 In other words, whenever, there is more than one administrative 350 domain for VNI or VSID, then a non-zero RD MUST be used. 352 3.2.2 Impact on E-VPN Procedures for VXLAN/NVGRE Encapsulation 354 In order to analyze the impact of the VXLAN/NVGRE encapsulation on E- 355 VPN procedures, we must distinguish three NVE redundancy models: 357 - No redundancy 359 - Active/Standby redundancy 361 - All-active redundancy 363 The impact of the encapsulation varies depending on the employed 364 model. 366 3.2.2.1 NVE with No Redundancy 368 This is the scenario where, for e.g., the NVE is implemented on the 369 hypervisor. In this case, neither the Split-Horizon nor the Aliasing 370 functions are required or applicable. Therefore, the choice of 371 VXLAN/NVGRE encapsulation has no impact on E-VPN procedures. 373 For all practical purposes, in this scenario, the only difference 374 between the choice of GRE or VXLAN/NVGRE encapsulation is in the size 375 of the entropy field (32-bits vs. 16 bits). 377 3.2.2.2 NVE with Active/Standby Redundancy 379 This is the scenario where the hosts are multi-homed to a set of 380 NVEs, however, only a single NVE is active at a given point of time 381 for a given VNI or VSID. In this case as well, the Split-Horizon 382 function is not required. However, in order to support fast 383 convergence in case where the primary NVE fails, the Aliasing 384 function of E-VPN is needed. Note that Aliasing in this scenario is 385 used to quickly identify the backup NVE rather than being used for 386 traffic load-balancing. In this case, the impact of the use of the 387 VXLAN/NVGRE encapsulation on the E-VPN procedures is as discussed in 388 Section 3.2.2.3.2, with the difference being that a remote NVE uses 389 the received Ethernet A-D routes to build primary and backup paths to 390 the advertising NVEs, instead of a load-balancing path-list. 392 If fast convergence is not required or not used, then the VXLAN/NVGRE 393 encapsulation would have no impact on the E-VPN procedures. 395 3.2.2.3 NVE with All-Active Redundancy 397 Out of the E-VPN features listed in section 2, the use of the VXLAN 398 or NVGRE encapsulation impacts the Split-Horizon and Aliasing 399 features, since those two rely on the MPLS client layer. Given that 400 this MPLS client layer is absent with these types of encapsulations, 401 alternative procedures and mechanisms are needed to provide the 402 required functions. Those are discussed in detail next. 404 3.2.2.3.1 Split Horizon 406 In E-VPN, an MPLS label is used for split-horizon filtering to 407 support active/active multi-homing where an ingress NV Edge device 408 (NVE) adds a label corresponding to the site of origin (aka ESI MPLS 409 Label) when encapsulating the packet. The egress NVE checks the ESI 410 MPLS label when attempting to forward a multi-destination frame out 411 an interface, and if the label corresponds to the same site 412 identifier (ESI) associated with that interface, the packet gets 413 dropped. This prevents the occurrence of forwarding loops. 415 Since the VXLAN or NVGRE encapsulation does not include this ESI MPLS 416 label, other means of performing the split-horizon filtering function 417 MUST be devised. One way of supporting this function is to assign an 418 IP address for each site of origin (e.g., for each ESI in the E-VPN 419 terminology) and advertise this IP address in the BGP Remote-Next-Hop 420 attribute associated with the E-VPN Ethernet A-D route (refer to 421 section 3.2.3 for details). The "Active-Standby" bit in the flags of 422 the ESI MPLS Label Extended Community MUST be set to 0 to indicate 423 active/active multi-homing and the MPLS label field MUST be set to 424 zero to indicate that IP address in the BGP Remote-Next-Hop attribute 425 will be used for split-horizon filtering. The ingress NVE uses the IP 426 address associated with a given site as the source IP address for all 427 traffic originating from said site. The egress NVE will program its 428 egress ACL with this IP address for the interfaces corresponding to 429 that same site. 431 Although the impact in control plane is minimal and the existing E- 432 VPN BGP routes can be used with minimum modifications to its 433 corresponding procedures, the same cannot be said in terms of network 434 operations, management, and data plane. The use of IP addresses to 435 represent the site of origin requires many IP addresses to be 436 allocated and configured on a single NVE. For example a TOR with N 437 interfaces may require one IP address per interface in worst case 438 which may impact management and operational aspects of the Data 439 Center Network. Also, the data-plane operation for Split-Horizon 440 filtering will be different from that of MPLS client layer and it 441 cannot be assumed that platforms/ASICs that support Split-Horizon 442 filtering based on MPLS label can also support such function based on 443 IP addresses. However, there are alternative options for performing 444 such Split-Horizon filtering function when doing VXLAN/NVGRE 445 encapsulation, while retaining a single IP address per NVE, and those 446 will be described in a future revision of this document. 448 It should be noted that such filtering function is not required when 449 doing active/standby multi-homing where load-balancing from a tenant 450 can still be performed on a per VLAN basis - e.g., different VLANs 451 are active on different NVEs connected to a multi-homed site. 452 Furthermore, active/active multi-homing is primarily applicable when 453 NVEs are on physical devices as opposed to on the hypervisor. For 454 example, [VXLAN] describes the use of physical devices as VXLAN 455 gateways to connect a legacy network with a VXLAN overlay network. In 456 such scenarios, one would expect: a) that the number of such gateways 457 is not very large and/or b) that not all of them require 458 active/active multi-homing. 460 3.2.2.3.2 Aliasing 462 In E-VPN, the NVEs connected to a multi-homed site optionally 463 advertise a VPN label used to load-balance traffic between NVEs, even 464 when a given MAC address is learnt by only a single NVE connected to 465 the site. In the case where VXLAN or NVGRE encapsulation is used, 466 some alternative means that does not rely on MPLS labels is required 467 to support aliasing. One solution would be to rely on the IP address 468 per site assignment depicted in the previous section for aliasing as 469 well: Effectively every NVE advertises an Ethernet A-D route for a 470 given site with the BGP Remote-Next-Hop attribute set to an IP 471 address that has a 1:1 mapping to the site. The remote NVEs resolve 472 an ESI (site ID) to a list of IP addresses corresponding to that 473 site. Furthermore, a given MAC address that is associated with an 474 ESI, in turn, gets resolved to this list of IP addresses. When a 475 remote NVE wants to forward a packet for a given MAC address, it 476 selects one of IP addresses from the list (using a hash value for 477 load balancing) and encapsulates the packet using that IP address as 478 the destination IP address in the VXLAN or NVGRE encapsulation. The 479 source IP address will be that of the source multi-homed site. In 480 case where the source site is single homed, the source IP address 481 will be the loopback address of the NVE. 483 3.2.2.3.3 Tunnel Endpoint Identification 485 To accommodate the Split Horizon as well as Aliasing functions of E- 486 VPN, multiple IP tunnel endpoints (one per site) must be associated 487 with the same NVE. As such, the mechanisms of [RFC5512] cannot be 488 used to specify the tunnel endpoint and encapsulation, since those 489 mechanisms only allow a single tunnel endpoint IP address to be 490 associated with the BGP speaker. To alleviate this, the BGP Remote- 491 Next-Hop attribute defined in [REMOTE-NH] can be used. Two new Tunnel 492 Types would be required for VXLAN and NVGRE. 494 This attribute will be carried with the E-VPN Ethernet A-D route. The 495 IP address field of this attribute serves two functions: 497 - It indicates the tunnel endpoint destination IP address that must 498 be used when load-balancing traffic associated with a given site 499 (i.e. ESI). 501 - It is used to build the egress ACL for filtering multi-destination 502 traffic on multi-homed Ethernet Segments. In this context, the IP 503 address is the tunnel endpoint source address. 505 It is worth noting that for multi-homed Ethernet segments, the NVE 506 will always advertise an Ethernet A-D route with the Remote-Next-Hop 507 attribute, in addition to the MAC Advertisement routes. In this case, 508 the NVEs which receive the routes derive the tunnel endpoint IP 509 address for a given MAC address as follows: 511 1- The NVE identifies the Ethernet Segment Identifier (ESI) 512 associated with the MAC address, as encoded in the MAC Advertisement 513 route. 515 2- The NVE then sets the tunnel endpoint IP address for that MAC to 516 the value encoded in the Remote-Next-Hop attribute of the Ethernet AD 517 route advertised for the ESI identified in step 1. 519 On the other hand, for single-homed Ethernet segments, the NVE will 520 only advertise the MAC Advertisement routes. In this latter case, the 521 tunnel endpoint IP address is derived from the BGP Next-Hop attribute 522 associated with the MAC Advertisement route. 524 3.2.3 Support for Multicast 526 The E-VPN Inclusive Multicast BGP route can be used to discover the 527 multicast endpoints associated with a given VXLAN VNI or NVGRE VSID. 528 The Ethernet Tag field of this route is used to encode the VNI or 529 VSID. This route is tagged with the PMSI Tunnel attribute, which is 530 used to encode the type of multicast tunnel to be used as well as the 531 multicast tunnel identifier. The following tunnel types can be used 532 for VXLAN/NVGRE: 534 - PIM-SSM Tree 535 - PIM-SM Tree 536 - BIDIR-PIM Tree 537 - Ingress Replication 539 In the scenario where the multicast tunnel is a tree, both the 540 Inclusive as well as the Aggregate Inclusive variants may be used. In 541 the former case, a multicast tree is dedicated to a VNI or VSID. 542 Whereas, in the latter, a multicast tree is shared among multiple 543 VNIs or VSIDs. This is done by having the NVEs advertise multiple 544 Inclusive Multicast routes with different VNI or VSID encoded in the 545 Ethernet Tag field, but with the same tunnel identifier encoded in 546 the PMSI Tunnel attribute. 548 3.2.4 Inter-AS Challenges 550 For inter-AS operation, two scenarios must be considered: 552 - Scenario 1: The tunnel endpoint IP addresses are public 553 - Scenario 2: The tunnel endpoint IP addresses are private 555 In the first scenario, inter-AS operation is straight-forward and 556 follows existing BGP inter-AS procedures. 558 The second scenario is more challenging, because the absence of the 559 MPLS client layer from the VXLAN encapsulation creates a situation 560 where the ASBR has no fully qualified indication within the tunnel 561 header as to where the tunnel endpoint resides. To elaborate on this, 562 recall that with MPLS, the client layer labels (i.e. the VPN labels) 563 are downstream assigned. As such, this label implicitly has a 564 connotation of the tunnel endpoint, and it is sufficient for the ASBR 565 to look up the client layer label in order to identify the label 566 translation required as well as the tunnel endpoint to which a given 567 packet is being destined. With the VXLAN encapsulation, the VNI is 568 globally assigned and hence is shared among all endpoints. The 569 destination IP address is the only field which identifies the tunnel 570 endpoint in the tunnel header, and this address is privately managed 571 by every data center network. Since the tunnel address is allocated 572 out of a private address pool, then we either need to do a lookup 573 based on VTEP IP address in context of a VRF (e.g., use IP-VPN) or 574 terminate the VXLAN tunnel and do a lookup based on the tenant's MAC 575 address to identify the egress tunnel on the ASBR. This effectively 576 mandates that the ASBR to either run another overlay solution such as 577 IP-VPN over MPLS/IP core network or to be aware of the MAC addresses 578 of all VMs in its local AS, at the very least. 580 Even in the first scenario where the tunnel endpoint IP addresses are 581 public, there may be security concern regarding the distribution of 582 these addresses among different ASes. This security concern is one of 583 the main reasons for having the so called inter-AS "option-B" in MPLS 584 VPN solutions such as E-VPN. 586 Using MPLS over GRE encapsulation addresses both of these concerns. 588 4 Comparison between MPLSoGRE and VXLAN/NVGRE Encapsulation 590 The comparison between MPLSoGRE and VXLAN/NVGRE encapsulation depends 591 on the required functionality on NVEs. If the hosts are single-homed 592 to NVEs without any need to support redundancy group on NVEs, or if 593 the hosts are multi-homed to two or more NVEs with active/standby 594 redundancy but without the need for fast convergence upon a failure, 595 then both MPLSoGRE and VXLAN/NVGRE do equally well with E-VPN control 596 plane. 598 If we need to support active/standby multi-homing with fast 599 convergence upon a failure or if we need to support active/active 600 multi-homing, then MPLSoGRE encap can provide these additional 601 functionality without any impact to E-VPN routes and procedures. 602 Furthermore, it can provide complete support for inter-AS operation 603 and complete set of E-VPN functions without impacting IP address 604 assignment and management of the underlying network. However, 605 VXLAN/NVGRE impacts E-VPN routes and procedures as well as the 606 underlying data plane behavior as noted above. Furthermore, there are 607 implications to IP address assignments, security, and inter-AS 608 operations. It should be noted that the additional requirements on 609 the data plane behavior as well as the above implications are the 610 consequence of the functionality that need to be supported and 611 independent of the control-plane choice. 613 As noted previously, there are existing core switches that do not 614 support ECMP by hashing the GRE key; however, vast majority of 615 existing core switches support ECMP by hashing UDP header; therefore, 616 VXLAN encapsulation can provide better ECMP functions for these 617 existing switches. Thus, the choice for overlay encapsulation 618 depends on needed functionality, inter-AS scenarios, security 619 requirements, and the ECMP capabilities of the core switches. 621 5 Acknowledgement 623 The authors would like to thank John Mullooly and Dave Smith for 624 providing value comments and feedbacks. 626 6 Security Considerations 628 7 IANA Considerations 630 8 References 632 8.1 Normative References 634 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate 635 Requirement Levels", BCP 14, RFC 2119, March 1997. 637 [REMOTE-NH] Van de Velde et al., "BGP Remote-Next-Hop", draft- 638 vandevelde-idr-remote-next-hop-01.txt, work in progress, 639 July 2012. 641 8.2 Informative References 643 [NVGRE] Sridhavan, M., et al., "NVGRE: Network Virtualization using 644 Generic Routing Encapsulation", draft-sridharan-virtualization-nvgre- 645 01.txt, July 8, 2012. 647 [VXLAN] Dutt, D., et al, "VXLAN: A Framework for Overlaying 648 Virtualized Layer 2 Networks over Layer 3 Networks", draft- 649 mahalingam-dutt-dcops-vxlan-02.txt, August 22, 2012. 651 [E-VPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf- 652 l2vpn-evpn-01.txt, work in progress, February, 2012. 654 [Problem-Statement] Narten et al., "Problem Statement: Overlays for 655 Network Virtualization", draft-ietf-nvo3-overlay-problem-statement- 656 00, September 2012. 658 [L3VPN-ENDSYSTEMS] Marques et al., "BGP-signaled end-system IP/VPNs", 659 draft-ietf-l3vpn-end-system, work in progress, October 2012. 661 Authors' Addresses 663 Ali Sajassi 664 Cisco 665 Email: sajassi@cisco.com 667 Samer Salam 668 Cisco 669 595 Burrard Street 670 Vancouver, BC V7X 1J1, Canada 671 Email: ssalam@cisco.com 673 Keyur Patel 674 Cisco 675 170 West Tasman Drive 676 San Jose, CA 95134, US 677 Email: Keyupate@cisco.com 679 Nabil Bitar 680 Verizon Communications 681 Email : nabil.n.bitar@verizon.com 683 Wim Henderickx 684 Alcatel-Lucent 685 Email: wim.henderickx@alcatel-lucent.com