idnits 2.17.00 (12 Aug 2021) /tmp/idnits40226/draft-boutros-bess-vxlan-evpn-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 3 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 25 has weird spacing: '...provide intra...' == Line 447 has weird spacing: '...anycast addre...' == Line 471 has weird spacing: '...egister messa...' == Line 472 has weird spacing: '...p) join and s...' == Line 482 has weird spacing: '... out and th...' -- The document date (July 4, 2015) is 2513 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'EVPN-OVERLY' is mentioned on line 123, but not defined == Missing Reference: 'RFC2119' is mentioned on line 131, but not defined == Missing Reference: 'LACP' is mentioned on line 261, but not defined == Missing Reference: 'PBB-EVPN' is mentioned on line 325, but not defined == Unused Reference: 'KEYWORDS' is defined on line 639, but no explicit reference was found in the text == Unused Reference: 'TRILL' is defined on line 647, but no explicit reference was found in the text == Unused Reference: 'NVGRE' is defined on line 654, but no explicit reference was found in the text == Outdated reference: draft-ietf-l2vpn-evpn has been published as RFC 7432 == Outdated reference: A later version (-02) exists of draft-ietf-l2vpn-trill-evpn-00 == Outdated reference: draft-mahalingam-dutt-dcops-vxlan has been published as RFC 7348 == Outdated reference: draft-sridharan-virtualization-nvgre has been published as RFC 7637 Summary: 1 error (**), 0 flaws (~~), 17 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Sami Boutros 3 Intended Status: Informational Ali Sajassi 4 Samer Salam 5 Dennis Cai 6 Samir Thoria 7 Cisco Systems 9 Tapraj Singh 10 John Drake 11 Juniper Networks 13 Jeff Tantsura 14 Ericsson 16 Expires: January 5, 2016 July 4, 2015 18 VXLAN DCI Using EVPN 19 draft-boutros-bess-vxlan-evpn-00.txt 21 Abstract 23 This document describes how Ethernet VPN (E-VPN) technology can be 24 used to interconnect VXLAN or NVGRE networks over an MPLS/IP network. 25 This is to provide intra-subnet connectivity at Layer 2 and control- 26 plane separation among the interconnected VXLAN or NVGRE networks. 27 The scope of the learning of host MAC addresses in VXLAN or NVGRE 28 network is limited to data plane learning in this document. 30 Status of this Memo 32 This Internet-Draft is submitted to IETF in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF), its areas, and its working groups. Note that 37 other groups may also distribute working documents as 38 Internet-Drafts. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 The list of current Internet-Drafts can be accessed at 46 http://www.ietf.org/1id-abstracts.html 47 The list of Internet-Draft Shadow Directories can be accessed at 48 http://www.ietf.org/shadow.html 50 Copyright and License Notice 52 Copyright (c) 2015 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 69 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 2.1. Control Plane Separation among VXLAN/NVGRE Networks . . . . 4 71 2.2 All-Active Multi-homing . . . . . . . . . . . . . . . . . . 5 72 2.3 Layer 2 Extension of VNIs/VSIDs over the MPLS/IP Network . . 5 73 2.4 Support for Integrated Routing and Bridging (IRB) . . . . . 5 74 3. Solution Overview . . . . . . . . . . . . . . . . . . . . . . . 5 75 3.1. Redundancy and All-Active Multi-homing . . . . . . . . . . 6 76 4. EVPN Routes . . . . . . . . . . . . . . . . . . . . . . . . . 7 77 4.1. BGP MAC Advertisement Route . . . . . . . . . . . . . . . 8 78 4.2. Ethernet Auto-Discovery Route . . . . . . . . . . . . . . 8 79 4.3. Per VPN Route Targets . . . . . . . . . . . . . . . . . . 8 80 4.4 Inclusive Multicast Route . . . . . . . . . . . . . . . . . 8 81 5.0 Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . 9 82 5.1 Unicast Forwarding . . . . . . . . . . . . . . . . . . . . 9 83 5.2 Handling Multicast . . . . . . . . . . . . . . . . . . . . . 9 84 5.2.1 Multicast Stitching with Per-VNI Load Balancing . . . . 10 85 5.2.2 PIM SM operation . . . . . . . . . . . . . . . . . . . . 11 86 6. NVGRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 87 7. Use Cases Overview . . . . . . . . . . . . . . . . . . . . . . 12 88 7.1. Homogeneous Network DCI interconnect Use cases . . . . . . 12 89 7.1.1. VNI Base Mode EVPN Service Use Case . . . . . . . . . . 12 90 7.1.2. VNI Bundle Service Use Case Scenario . . . . . . . . . 13 91 7.1.3. VNI Translation Use Case . . . . . . . . . . . . . . 13 93 7.2. Heterogeneous Network DCI Use Cases Scenarios . . . . . . . 14 94 7.2.1. VXLAN VLAN Interworking Over EVPN Use Case Scenario . . 14 95 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 96 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 97 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 98 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 99 11.1 Normative References . . . . . . . . . . . . . . . . . . . 15 100 11.2 Informative References . . . . . . . . . . . . . . . . . . 15 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 103 1 Introduction 105 [EVPN] introduces a solution for multipoint L2VPN services, with 106 advanced multi-homing capabilities, using BGP control plane over the 107 core MPLS/IP network. [VXLAN] defines a tunneling scheme to overlay 108 Layer 2 networks on top of Layer 3 networks. [VXLAN] allows for 109 optimal forwarding of Ethernet frames with support for multipathing 110 of unicast and multicast traffic. VXLAN uses UDP/IP encapsulation for 111 tunneling. 113 In this document, we discuss how Ethernet VPN (EVPN) technology can 114 be used to interconnect VXLAN or NVGRE networks over an MPLS/IP 115 network. This is achieved by terminating the VxLAN tunnel at the 116 hand-off points, performing data plane MAC learning of customer 117 traffic and providing intra-subnet connectivity for the customers at 118 Layer 2 across the MPLS/IP core. The solution maintains control-plane 119 separation among the interconnected VXLAN or NVGRE networks. The 120 scope of the learning of host MAC addresses in VXLAN or NVGRE network 121 is limited to data plane learning in this document. The distribution 122 of MAC addresses in control plane using BGP in VXLAN or NVGRE network 123 is outside of the scope of this document and it is covered in [EVPN- 124 OVERLY]. 126 1.1 Terminology 128 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 129 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 130 document are to be interpreted as described in RFC 2119 [RFC2119]. 132 LDP: Label Distribution Protocol. MAC: Media Access Control MPLS: 133 Multi Protocol Label Switching. OAM: Operations, Administration and 134 Maintenance. PE: Provide Edge Node. PW: PseudoWire. TLV: Type, 135 Length, and Value. VPLS: Virtual Private LAN Services. VXLAN: Virtual 136 eXtensible Local Area Network. VTEP: VXLAN Tunnel End Point VNI: 137 VXLAN Network Identifier (or VXLAN Segment ID) ToR: Top of Rack 138 switch. 140 2. Requirements 142 2.1. Control Plane Separation among VXLAN/NVGRE Networks 144 It is required to maintain control-plane separation for the underlay 145 networks (e.g., among the various VXLAN/NVGRE networks) being 146 interconnected over the MPLS/IP network. This ensures the following 147 characteristics: 149 - scalability of the IGP control plane in large deployments and fault 150 domain localization, where link or node failures in one site do not 151 trigger re-convergence in remote sites. 153 - scalability of multicast trees as the number of interconnected 154 networks scales. 156 2.2 All-Active Multi-homing 158 It is important to allow for all-active multi-homing of the 159 VXLAN/NVGRE network to MPLS/IP network where traffic from a VTEP can 160 arrive at any of the PEs and can be forwarded accordingly over the 161 MPLS/IP network. Furthermore, traffic destined to a VTEP can be 162 received over the MPLS/IP network at any of the PEs connected to the 163 VXLAN/NVGRE network and be forwarded accordingly. The solution MUST 164 support all-active multi-homing to an VXLAN/NVGRE network. 166 2.3 Layer 2 Extension of VNIs/VSIDs over the MPLS/IP Network 168 It is required to extend the VXLAN VNIs or NVGRE VSIDs over the 169 MPLS/IP network to provide intra-subnet connectivity between the 170 hosts (e.g. VMs) at Layer 2. 172 2.4 Support for Integrated Routing and Bridging (IRB) 174 The data center WAN edge node is required to support integrated 175 routing and bridging in order to accommodate both inter-subnet 176 routing and intra-subnet bridging for a given VNI/VSID. For example, 177 inter-subnet switching is required when a remote host connected to an 178 enterprise IP-VPN site wants to access an application resided on a 179 VM. 181 3. Solution Overview 183 Every VXLAN/NVGRE network, which is connected to the MPLS/IP core, 184 runs an independent instance of the IGP control-plane. Each PE 185 participates in the IGP control plane instance of its VXLAN/NVGRE 186 network. 188 Each PE node terminates the VXLAN or NVGRE data-plane encapsulation 189 where each VNI or VSID is mapped to a bridge-domain. The PE performs 190 data plane MAC learning on the traffic received from the VXLAN/NVGRE 191 network. 193 Each PE node implements EVPN or PBB-EVPN to distribute in BGP either 194 the client MAC addresses learnt over the VXLAN tunnel in case of 195 EVPN, or the PEs' B-MAC addresses in case of PBB-EVPN. In the PBB- 196 EVPN case, client MAC addresses will continue to be learnt in data 197 plane. 199 Each PE node would encapsulate the Ethernet frames with MPLS when 200 sending the packets over the MPLS core and with the VXLAN or NVGRE 201 tunnel header when sending the packets over the VXLAN or NVGRE 202 Network. 204 +--------------+ 205 | | 206 +---------+ +----+ MPLS +----+ +---------+ 207 +-----+ | |---|PE1 | |PE3 |--| | +-----+ 208 |VTEP1|--| | +----+ +----+ | |--|VTEP3| 209 +-----+ | VXLAN | +----+ +----+ | VXLAN | +-----+ 210 +-----+ | |---|PE2 | |PE4 |--| | +-----+ 211 |VTEP2|--| | +----+Backbone+----+ | |--|VTEP4| 212 +-----+ +---------+ +--------------+ +---------+ +-----+ 214 |<--- Underlay IGP ---->|<-Overlay BGP->|<--- Underlay IGP --->| CP 216 |<----- VXLAN --------->||<------ VXLAN ------->| DP 217 |<----MPLS----->| 219 Legend: CP = Control Plane View DP = Data Plane View 221 Figure 1: Interconnecting VXLAN Networks with VXLAN-EVPN 223 3.1. Redundancy and All-Active Multi-homing 225 When a VXLAN network is multi-homed to two or more PEs, and provided 226 that these PEs have the same IGP distance to a given NVE, the 227 solution MUST support load-balancing of traffic between the NVE and 228 the MPLS network, among all the multi-homed PEs. This maximizes the 229 use of the bisectional bandwidth of the VXLAN network. One of the 230 main capabilities of EVPN/PBB-EVPN is the support for all-active 231 multi-homing, where the known unicast traffic to/from a multi-homed 232 site can be forwarded by any of the PEs attached to that site. This 233 ensures optimal usage of multiple paths and load balancing. EVPN/PBB- 234 EVPN, through its DF election and split-horizon filtering mechanisms, 235 ensures that no packet duplication or forwarding loops result in such 236 scenarios. In this solution, the VXLAN network is treated as a 237 multi-homed site for the purpose of EVPN operation. 239 Since the context of this solution is VXLAN networks with data-plane 240 learning paradigm, it is important for the multi-homing mechanism to 241 ensure stability of the MAC forwarding tables at the NVEs, while 242 supporting all-active forwarding at the PEs. For example, in Figure 1 243 above, if each PE uses a distinct IP address for its VTEP tunnel, 244 then for a given VNI, when an NVE learns a host's MAC address against 245 the originating VTEP source address, its MAC forwarding table will 246 keep flip-flopping among the VTEP addresses of the local PEs. This is 247 because a flow associated with the same host MAC address can arrive 248 at any of the PE devices. In order to ensure that there is no 249 flip/flopping of MAC-to-VTEP address associations, an IP Anycast 250 address MUST be used as the VTEP address on all PEs multi-homed to a 251 given VXLAN network. The use of IP Anycast address has two 252 advantages: 254 a) It prevents any flip/flopping in the forwarding tables for the 255 MAC-to-VTEP associations 257 b) It enables load-balancing via ECMP for DCI traffic among the 258 multi-homed PEs 260 In the baseline [EVPN] draft, the all-active multi-homing is 261 described for a multi-homed device (MHD) using [LACP] and the single- 262 active multi-homing is described for a multi-homed network (MHN) 263 using [802.1Q]. In this draft, the all-active multi-homing is 264 described for a VXLAN MHN. This requires some changes to the 265 filtering used for BUM traffic which will be described in detail in 266 the multicast sections (Sections 5.2.1 and 5.2.2). 268 The filtering used for BUM traffic in all-active multi-homing in 269 [EVPN] is asymmetric. BUM traffic from the MPLS/IP network to the 270 multi-homed site is dropped by the non-DF PE(s) and only sent to the 271 multi-homed site by the DF, while BUM traffic from the multi-homed 272 site to the MPLS/IP network may be sent by any PE to the MPLS/IP 273 network. This is because [EVPN] assumes all-active multi-homing is 274 used in conjunction with MHD, in which the CE is attached to multiple 275 PEs via a LAG and hashes the frames of a given BUM flow to the same 276 PE, ensuring that those frames are only sent to the MPLS/IP network 277 by one PE. 279 However, this document assumes that all-active multi-homing is used 280 in conjunction with MHN, which means that the frames of a given BUM 281 flow are sent to all PEs attached to the multi-homed site. In order 282 to avoid duplication only the DF can send BUM traffic from the multi- 283 homed site to the MPLS/IP network; the non DF PE(s) MUST drop BUM 284 traffic received from the multi-homed site. 286 If PIM Bidir is used within the multi-homed site, BUM traffic from 287 the MPLS/IP network to the multi-homed site is dropped by the non-DF 288 PE(s) and only sent to the multi-homed site by the DF. If PIM SM is 289 used within the multi-homed site, BUM traffic from the MPLS/IP 290 network to the multi-homed site is sent to the multi-homed site by 291 multiple PEs attached to it. 293 4. EVPN Routes 294 This solution leverages the same BGP Routes and Attributes defined in 295 [EVPN], adapted as follows: 297 4.1. BGP MAC Advertisement Route 299 This route and its associated modes are used to distribute the 300 customer MAC addresses learnt in data plane over the VXLAN tunnel in 301 case of EVPN. Or can be used to distribute the provider Backbone MAC 302 addresses in case of PBB-EVPN. 304 In case of EVPN, the Ethernet Tag ID of this route is set to zero for 305 VNI-based mode, where there is one-to-one mapping between a VNI and 306 an EVI. In such case, there is no need to carry the VNI in the MAC 307 advertisement route because BD ID can be derived from the RT 308 associated with this route. However, for VNI-aware bundle mode, where 309 there is multiple VNIs can be mapped to the same EVI, the Ethernet 310 Tag ID MUST be set to the VNI. At the receiving PE, the BD ID is 311 derived from the combination of RT + VNI - e.g., the RT identifies 312 the associated EVI on that PE and the VNI identifies the 313 corresponding BD ID within that EVI. 315 The Ethernet Tag field can be set to a normalized value that maps to 316 the VNI, in VNI aware bundling services, this would make the VNI 317 value of local significance in multiple Data centers. Data plane need 318 to map to this normalized VNI value and have it on the IP VxLAN 319 packets exchanged between the DCIs. 321 4.2. Ethernet Auto-Discovery Route 323 When EVPN is used, the application of this route is as specified in 324 [EVPN]. However, when PBB-EVPN is used, there is no need for this 325 route per [PBB-EVPN]. 327 4.3. Per VPN Route Targets 329 VXLAN-EVPN uses the same set of route targets defined in [EVPN]. 331 4.4 Inclusive Multicast Route 333 The EVPN Inclusive Multicast route is used for auto-discovery of PE 334 devices participating in the same tenant virtual network identified 335 by a VNI over the MPLS network. It also enables the stitching of the 336 IP multicast trees, which are local to each VXLAN site, with the 337 Label Switched Multicast (LSM) trees of the MPLS network. 339 The Inclusive Multicast Route is encoded as follow: 341 - Ethernet Tag ID is set to zero for VNI-based mode and to VNI for 342 VNI-aware bundle mode. 344 - Originating Router's IP Address is set to one of the PE's IP 345 addresses. 347 All other fields are set as defined in [EVPN]. 349 Please see section 4.6 "Handling Multicast" 351 5.0 Forwarding 353 5.1 Unicast Forwarding 355 Host MAC addresses will be learnt in data plane from the VXLAN 356 network and associated with the corresponding VTEP identified by the 357 source IP address. Host MAC addresses will be learnt in control plane 358 if EVPN is implemented over the MPLS/IP core, or in the data-plane if 359 PBB-EVPN is implemented over the MPLS core. When Host MAC addressed 360 are learned in data plane over MPLS/IP core [in case of PBB-EVPN], 361 they are associated with their corresponding BMAC addresses. 363 L2 Unicast traffic destined to the VXLAN network will be encapsulated 364 with the IP/UDP header and the corresponding customer bridge VNI. 366 L2 Unicast traffic destined to the MPLS/IP network will be 367 encapsulated with the MPLS label. 369 5.2 Handling Multicast 371 Each VXLAN network independently builds its P2MP or MP2MP shared 372 multicast trees. A P2MP or MP2MP tree is built for one or more VNIs 373 local to the VXLAN network. 375 In the MPLS/IP network, multiple options are available for the 376 delivery of multicast traffic: - Ingress replication - LSM 377 with Inclusive trees - LSM with Aggregate Inclusive trees - 378 LSM with Selective trees - LSM with Aggregate Selective trees 380 When LSM is used, the trees are P2MP. 382 The PE nodes are responsible for stitching the IP multicast trees, on 383 the access side, to the ingress replication tunnels or LSM trees in 384 the MPLS/IP core. The stitching must ensure that the following 385 characteristics are maintained at all times: 387 1. Avoiding Packet Duplication: In the case where the VXLAN network 388 is multi-homed to multiple PE nodes, if all of the PE nodes forward 389 the same multicast frame, then packet duplication would arise. This 390 applies to both multicast traffic from site to core as well as from 391 core to site. 393 2. Avoiding Forwarding Loops: In the case of VXLAN network multi- 394 homing, the solution must ensure that a multicast frame forwarded by 395 a given PE to the MPLS core is not forwarded back by another PE (in 396 the same VXLAN network) to the VXLAN network of origin. The same 397 applies for traffic in the core to site direction. 399 The following approach of per-VNI load balancing can guarantee proper 400 stitching that meets the above requirements. 402 5.2.1 Multicast Stitching with Per-VNI Load Balancing 404 To setup multicast trees in the VXLAN network for DC applications, 405 PIM Bidir can be of special interest because it reduces the amount of 406 multicast state in the network significantly. Furthermore, it 407 alleviates any special processing for RPF check since PIM Bidir 408 doesn't require any RPF check. The RP for PIM Bidir can be any of the 409 spine nodes. Multiple trees can be built (e.g., one tree rooted per 410 spine node) for efficient load-balancing within the network. All PEs 411 participating in the multi-homing of the VXLAN network join all the 412 trees. Therefore, for a given tree, all PEs receive BUM traffic. DF 413 election procedures of [EVPN] are used to ensure that only traffic 414 to/from a single PE is forwarded, thus avoiding packet duplications 415 and forwarding loops. For load-balancing of BUM traffic, when a PE or 416 an NVE wants to send BUM traffic over the VXLAN network, it selects 417 one of the trees based on its VNI and forwards all the traffic for 418 that VNI on that tree. 420 Multicast traffic from VXLAN/NVGRE is first subjected to filtering 421 based on DF election procedures of [EVPN] using the VNI as the 422 Ethernet Tag. This is similar to filtering in [EVPN] in principal; 423 however, instead of VLAN ID, VNI is used for filtering, and instead 424 of being 802.1Q frame, it is a VXLAN encapsulated packet. On the DF 425 PE, where the multicast traffic is allowed to be forwarded, the VNI 426 is used to select a bridge domain,. After the packet is de- 427 capsulated, an L2 lookup is performed based on host MAC DA. It should 428 be noted that the MAC learning is performed in data-plane for the 429 traffic received from the VXLAN/NVGRE network and the host MAC SA is 430 learnt against the source VTEP address. 432 The PE nodes, connected to a multi-homed VXLAN network, perform BGP 433 DF election to decide which PE node is responsible for forwarding 434 multicast traffic associated with a given VNI. A PE would forward 435 multicast traffic for a given VNI only when it is the DF for this 436 VNI. This forwarding rule applies in both the site-to-core as well as 437 core-to-site directions. 439 5.2.2 PIM SM operation 441 In some situations, it may be desirable to use PIM SM in a VXLAN 442 networks's underlay network. However, because all of the PEs 443 attached to a multi-homed site use the same IP anycast address, if 444 only one PE (the DF) sent BUM traffic from the MPLS/IP network to the 445 multi-homed site, the RPF check in PIM SM would cause any router in 446 the VXLAN network's underlay network whose shortest path to that IP 447 anycast address was to a different PE to drop this BUM traffic. 448 Conversely, if BUM traffic from the MPLS/IP network to the multi- 449 homed site is sent to the multi-homed site by multiple PEs attached 450 to it, the RPF check in PIM SM would cause any router in the VXLAN 451 network's underlay network to forward only one copy of a given BUM 452 packet. 454 The following is a description of operations with respect to a given 455 VNI in the VXLAN network. 457 - All PEs attached to a multi-homed site join towards the RP for the 458 multicast group for that VNI. 460 - When the first BUM packet is received from the MPLS/IP network all 461 PEs attached to the multi-homed site will send PIM register messages 462 to the RP. The multicast flow is identified as (anycast address, 463 group) in the register message, and the source address for the PIM- 464 SIM register message is a unique address on the PE, typically the 465 sending interface address. 467 - Upon receiving a register message the RP will send a join for the 468 (anycast address, group), routed towards the closest PE, and that PE 469 will switch to sending BUM traffic natively. Upon receiving the 470 native BUM traffic, the RP will send register-stop messages for any 471 PEs that continue sending register messages (because only one PE 472 will get the (anycast address, group) join and switch to native 473 forwarding). 475 - After VTEPs receive traffic from the RP, they will send (anycast 476 address, group) join, routed towards the closest PE (wrt each VTEP). 477 This may start native forwarding on multiple PEs, but each VTEP or 478 router in the VXLAN network's underlay network will only accept BUM 479 traffic from one of the PEs attached to the multi-homed site. 481 - If BUM traffic stops for a long time, relevant PIM state will time 482 out and the next BUM packet for that multicast group will trigger 483 the above steps again. 485 Note that before the RP receives the first natively sent packet from 486 one particular PE, all packets encapsulated in the register messages 487 from all PEs will be forwarded by the RP, causing duplications. This 488 should be transient and will stop as soon as the first native packet 489 is received by the RP. If the transient duplication is a concern, 490 then null-register SHOULD be used at the beginning (instead of 491 encapsulating BUM traffic in register messages), but that will lead 492 to transient loss of initial packets. 494 To avoid packet loss and duplication, the PEs attached to the multi- 495 homed site SHOULD send null-register periodically as soon as initial 496 provisioning is completed to pre-build and maintain the relevant PIM 497 state. 499 6. NVGRE 501 Just like VXLAN, all the above specification would apply for NVGRE, 502 replacing the VNI with Virtual Subnet Identifier (VSID) and the VTEP 503 with NVGRE Endpoint. 505 7. Use Cases Overview 506 7.1. Homogeneous Network DCI interconnect Use cases This covers DCI 507 interconnect of two or more VXLAN based Data center over MPLS enabled 508 EVPN core. 510 7.1.1. VNI Base Mode EVPN Service Use Case This use case handles the 511 EVPN service where there is one to one mapping between a VNI and an 512 EVI. Ethernet TAG ID of EVPN BGP NLRI should be set to Zero. BD ID 513 can be derived from the RT associated with the EVI/VNI. 515 +---+ +---+ 516 | H1| +---+ +-----+ +--+ +---------+ +---+ +-----+ +---+ | H3| 517 | M1|--+ +-+ +-+PE1+-+ +-+PE3+--+ +--+ +--| M3| 518 +---+ | | | | +--+ |MPLS Core| +---+ | | | | +---+ 519 +---+ |NVE| |VXLAN| | (EVPN) | |VXLAN| |NVE| +---+ 520 | H2| | 1 | | | +--+ | | +---+ | | | 2 | | H4| 521 | M2|--+ +-+ +-+PE2+-+ +-+PE4+--+ +--+ +--| M4| 522 +---+ +---+ +-----+ +--+ +---------+ +---+ +-----+ +---+ +---+ 523 +--------+------+--------+-------+--------+------+--------+--------+ 524 |Original|VXLAN |Original|MPLS |Original|VXLAN |Original|Original| 525 |Ethernet|Header|Ethernet|Header |Ethernet|Header|Ethernet|Ethernet| 526 |Frame | |Frame | |Frame | |Frame |Frame | 527 +--------+------+--------+-------+--------+------+--------+--------+ 528 |<---Data Center Site1-->|<-----EVPN Core>|<---Data Center Site2-->| 530 Figure 2 VNI Base Service Packet Flow. 532 VNI base Service(One VNI mapped to one EVI). 534 Hosts H1, H2, H3 and H4 are hosts and there associated MAC addresses 535 are M1, M2, M3 and M4. PE1, PE2, PE3 and PE4 are the VXLAN-EVPN 536 gateways. NVE1 and NVE2 are the originators of the VXLAN based 537 network. 539 When host H1 in Data Center Site1 communicates with H3 in Data Center 540 Site2, H1 forms a layer2 packet with source IP address as IP1 and 541 Source MAC M1, Destination IP as IP3 and Destination MAC as 542 M3(assuming that ARP resolution already happened). VNE1 learns Source 543 MAC and lookup in bridge domain for the Destination MAC. Based on the 544 MAC lookup, the frame needs to be sent to VXLAN network. VXLAN 545 encapsulation is added to the original Ethernet frame and frame is 546 sent over the VXLAN tunnel. Frames arrives at PE1. PE1(i.e. VXLAN 547 gateway), identifies that frame is a VXLAN frame. The VXLAN header is 548 de-capsulated and Destination MAC lookup is done in the bridge domain 549 table of the EVI. Lookup of destination MAC results in the EVPN 550 unicast NH. This NH will be used for identifying the labels (tunnel 551 label and service label) to be added over the EVPN core. Similar 552 processing is done on the other side of DCI. 554 7.1.2. VNI Bundle Service Use Case Scenario 556 In the case of VNI-aware bundle service mode, there are multiple VNIs 557 are mapped to one EVI. The Ethernet TAG ID must be set to the VNI ID 558 in the EVPN BGP NLRIs. MPLS label allocation in this use case 559 scenario can be done either per EVI or per EVI, VNI ID basis. If MPLS 560 label allocation is done per EVI basis, then in data path there is a 561 need to push a VLAN TAG for identifying bridge-domain at egress PE so 562 that Destination MAC address lookup can be done on the bridge domain. 564 7.1.3. VNI Translation Use Case 565 +---+ +---+ 566 | H1| +---+ +-------+ +---+ +----------+ +---+ +-------+ +---+ | H3| 567 | M1|-+ +-+ +-+PE1+-+ +-+PE3+-+ +-+ +-| M3| 568 +---+ | | | | +---+ |MPLS Core | +---+ | | | | +---+ 569 +---+ |NVE| | VXLAN | | (EVPN) | | VXLAN | |NVE| +---+ 570 | H2| | 1 | | | +---+ | | +---+ | | | 2 | | H4| 571 | M2|-+ +-+ +-+PE2+-+ +-+PE4+-+ +-+ +-| M4| 572 +---+ +---+ +-------+ +---+ +----------+ +---+ +-------+ +---+ +---+ 573 |<----VNI ID A--->|<-------EVI-A------->|<----VNI_ID_B--->| 575 Figure 3 VNI Translation Use Case Scenarios. 577 There are two or more Data Center sites. These Data Center sites 578 might use different VNI ID for same service. For example, Service A 579 usage "VNI_ID_A" at data center site1 and "VNI_ID_B" for same service 580 in data center site 2. VNI ID A is terminated at ingress EVPN PE and 581 VNI ID B is encapsulated at the egress EVPN PE. 583 7.2. Heterogeneous Network DCI Use Cases Scenarios 585 Data Center sites are upgraded slowly; so heterogeneous network DCI 586 solution is required from the perspective of migration approach from 587 traditional data center to VXLAN based data center. For Example Data 588 Center Site1 is upgrade to VXLAN but Data Center Site 2 and 3 are 589 still layer2/VLAN based data centers. For these use cases, it is 590 required to provide VXLAN VLAN interworking over EVPN core. 592 7.2.1. VXLAN VLAN Interworking Over EVPN Use Case Scenario 594 The new data center site is VXLAN based data center site. But the 595 older data center sites are still based on the VLAN. 597 +---+ +---+ 598 | H1| +---+ +------+ +---+ +---------+ +---+ +-------+ +---+ | H3| 599 | M1|-+ +-+ +-+PE1+-+ +-+PE3+-+ +-+ +-| M3| 600 +---+ | | | | +---+ |MPLS Core| +---+ | | | | +---+ 601 +---+ |NVE| |VXLAN | | (EVPN) | | L2 | |NVE| +---+ 602 | H2| | 1 | | | +---+ | | +---+ |Network| | 2 | | H4| 603 | M2|-+ +-+ +-+PE2+-+ +-+PE4+-+ +-+ +-| M4| 604 +---+ +---+ +------+ +---+ +---------+ +---+ +-------+ +---+ +---+ 605 |<--Data Center Site1->|<---EVPN Core--->|<--Data Center Site2-->| 606 +-----+ +------+-----+ +------+------+-----+ +------+-----+ +-----+ 607 |L2 | |VXLAN |L2 | |MPLS |VLAN |L2 | |VLAN |L2 | |L2 | 608 |Frame| |Header|Frame| |Header|Header|Frame| |Header|Frame| |Frame| 609 +-----+ +------+-----+ +------+------+-----+ +------+-----+ +-----+ 611 Figure 5 VXLAN VLAN interworking over EVPN Use Case 613 If a service that are represented by VXLAN on one site of data center 614 and via VLAN at different data center sites, then it is a recommended 615 to model the service as a VNI base EVPN service. The BGP NLRIs will 616 always advertise VLAN ID TAG as '0' in BGP routes. The advantage with 617 this approach is that there is no requirement to do the VNI 618 normalization at EVPN core. VNI ID A is terminated at ingress EVPN PE 619 and "VLAN ID B" is encapsulated at the egress EVPN PE. 621 8. Acknowledgements 623 The authors would like to acknowledge Wen Lin contributions to this 624 document. 626 9. Security Considerations 628 There are no additional security aspects that need to be discussed 629 here. 631 10. IANA Considerations 633 TBD 635 11. References 637 11.1 Normative References 639 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate 640 Requirement Levels", BCP 14, RFC 2119, March 1997. 642 11.2 Informative References 644 [EVPN] Sajassi et al., "BGP MPLS Based Ethernet VPN", draft-ietf- 645 l2vpn-evpn-00.txt, work in progress, February, 2012. 647 [TRILL] Sajassi et al., TRILL-EVPN draft-ietf-l2vpn-trill-evpn-00, 648 work in progress, June 2012. 650 [VXLAN] Mahalingam, Dutt et al., A Framework for Overlaying 651 Virtualized Layer 2 Networks over Layer 3 Networks draft-mahalingam- 652 dutt-dcops-vxlan-02.txt, work in progress, August, 2012. 654 [NVGRE] Sridharan et al., Network Virtualization using Generic 655 Routing Encapsulation draft-sridharan-virtualization-nvgre-01.txt, 656 work in progress, July, 2012. 658 Authors' Addresses 660 Sami Boutros 661 Cisco Systems 663 EMail: sboutros@cisco.com 665 Ali Sajassi 666 Cisco Systems 668 EMail: sajassi@cisco.com 670 Samer Salam 671 Cisco Systems 672 EMail: ssalam@cisco.com 674 Dennis Cai 675 Cisco Systems 676 EMail: dcai@cisco.com 677 Tapraj Singh 678 Juniper Networks 679 Email: tsingh@juniper.net 681 John Drake 682 Juniper Networks 683 Email: jdrake@juniper.net 685 Samir Thoria 686 Cisco 687 EMail: sthoria@cisco.com 689 Jeff Tantsura 690 Ericsson 691 Email: jeff.tantsura@ericsson.com