idnits 2.17.00 (12 Aug 2021) /tmp/idnits21259/draft-chen-svdc-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (August 2015) is 2464 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Congjie Chen 3 Intended Status: Standards Track Dan Li 4 Expires: Feb 2016 Tsinghua University 5 Jun Li 6 University of Oregon 7 August 2015 9 SVDC: Software Defined Data Center Network Virtualization Architecture 10 draft-chen-svdc-00 12 Abstract 14 This document describes SVDC, a highly-scalable and low-overhead 15 virtualization architecture designed for large layer-2 data center 16 networks. By leveraging the emerging software defined network 17 framework, SVDC decouples the global identifier of a virtual network 18 from the identifier carried in the packet header. Hence, SVDC can 19 scale to a large scale of virtual networks with a very short tag in 20 the packet header, which is never achieved by previous network 21 virtualization solutions. SVDC enhances MAC-in-MAC encapsulation in a 22 way that packets with overlapped MAC addresses are correctly 23 forwarded even without in-packet global identifiers to differentiate 24 the virtual networks they belong to. Besides, scalable and efficient 25 layer-2 multicast and broadcast within virtual networks are also 26 supported in SVDC. This document also introduces a basic framework to 27 illustrate SVDC deployment. 29 Status of this Memo 31 This Internet-Draft is submitted to IETF in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as 37 Internet-Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/1id-abstracts.html 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 Copyright and License Notice 51 Copyright (c) 2015 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 This Internet-Draft will expire on January, 2016. 66 Table of Contents 68 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 70 2. SVDC Architecture . . . . . . . . . . . . . . . . . . . . . . 5 71 2.1 Virtual Switch . . . . . . . . . . . . . . . . . . . . . . 8 72 2.2 Edge switches . . . . . . . . . . . . . . . . . . . . . . . 9 73 2.3 SVDC Controller . . . . . . . . . . . . . . . . . . . . . . 10 74 3. Packet Forwarding . . . . . . . . . . . . . . . . . . . . . . 11 75 3.1 Unicast Traffic . . . . . . . . . . . . . . . . . . . . . . 11 76 3.2 Multicast/Broadcast Traffic . . . . . . . . . . . . . . . . 12 77 3.3 SVDC Frame Format . . . . . . . . . . . . . . . . . . . . . 13 78 4. SVDC Deployment Considerations . . . . . . . . . . . . . . . . 14 79 4.1 VM Migration . . . . . . . . . . . . . . . . . . . . . . . 14 80 4.2 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . 15 81 5 Security Considerations . . . . . . . . . . . . . . . . . . . . 15 82 6 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 16 83 7 References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 84 7.1 Normative References . . . . . . . . . . . . . . . . . . . 16 85 7.2 Informative References . . . . . . . . . . . . . . . . . . 16 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 88 1 Introduction 90 Due to the simplicity and easiness to manage, large layer-2 network 91 is widely accepted as the fabric to build a data center network. 92 Scalable layer-2 architectures, for example, TRILL [RFC6325] and SPB 93 [802.1aq] are proposed as industry standards. A large layer-2 network 94 segment can even cross the Internet via virtualization services such 95 as VPLS [RFC4762]. However, this kind of layer-2 network fabric 96 design mainly focus on routing/forwarding rules in the network, and 97 it is still an open issue how to run a multi-tenant network 98 virtualization scheme on top of the large layer-2 network fabrics. 99 Existing network virtualization solutions, including VLAN [802.1q], 100 VXLAN [RFC7348] and [NVGRE] either face severe scalability problem or 101 are not specifically designed for layer-2 networks. Particularly, 102 designing a virtualization solution for large layer-2 network needs 103 to address following challenges. 105 For a large-scale, geographically distributed layer-2 network 106 operated by a cloud provider, the potential number of tenants and 107 virtual networks can be huge. Network virtualization based on VLAN 108 can support at most 4094 virtual networks, which is obviously not 109 enough. Although VXLAN [RFC7348] and [NVGRE] can support 16,777,216 110 virtual networks, they are at the cost of using much more bits in the 111 packet header. The fundamental issue is, in existing network 112 virtualization proposals, the number of virtual networks that can be 113 differentiated depends on the number of bits used in the packet 114 header. 116 Given the possible overlapped MAC addresses for VMs in different 117 virtual networks and the limited forwarding table size in data center 118 switches, it is inevitable to encapsulate the original MAC address of 119 a packet when transmitting it in the core network. MAC-in-UDP 120 encapsulation used in VXLAN [RFC7348] incur unnecessary packet header 121 overhead for a layer-2 network. MAC-in-MAC encapsulation framework is 122 more applicable in the multi-tenant large layer-2 network where MAC 123 addresses of VMs largely overlap. 125 Multicast service is common in data center networks, but how to 126 support scalable multicast service in a multi-tenant virtualized 127 large layer-2 network is still open. A desired capability with a 128 layer-2 network virtualization framework is to support efficient and 129 scalable layer-2 multicast as well as broadcast. 131 This document describes SVDC, which leverages the framework of [SDN] 132 to address the challenges above, and achieves the goal of a high 133 scalability and low overhead large layer-2 network virtualization 134 architecture. It decouples the global identifier of a virtual network 135 and the in-packet tag to encompass a great scale of virtual networks 136 with a minimal tag length in the packet header. The global identifer 137 is maintained in the SVDC controller while the in-packet identifier 138 is only used to differentiate virtual networks residing in the same 139 server. To mask the VM MAC address overlap in the core network, SVDC 140 uses MAC-in-MAC encapsulation in ingress edge switches and employs 141 two techniques to guarantee correct packet forwarding in the first 142 hop and last hop without in-packet global virtual network identifier. 143 What's more, SVDC can efficiently support up to tens of billions of 144 multicast and broadcast groups with possible overlapping multicast or 145 broadcast addresses in different virtual networks in a layer-2 146 network by the same framework as in unicast. 148 1.1 Terminology 150 This document uses the following terminology. 152 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 153 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 154 document are to be interpreted as described in RFC 2119 [RFC2119]. 156 Virtual Network (VN): A VN is a logical abstraction of a physical 157 network that provides L2 network services to a set of Tenant Systems. 159 Virtual Machine (VM): It is an instance of OS's running on top of 160 hypervisor over a physical machine or server. Multiple VMs can share 161 the same physical server via the hypervisor, yet are completely 162 isolated from each other in terms of compute, storage, and other OS 163 resources. 165 Virtual Switch (vSwitch): A function within a hypervisor (typically 166 implemented in software) that provides similar forwarding services to 167 a physical Ethernet switch. A vSwitch forwards Ethernet frames 168 between VMs running on the same server or between a VM and a physical 169 Network Interface Card (NIC) connecting the server to a physical 170 Ethernet switch or router. A vSwitch also enforces network isolation 171 between VMs that by policy are not permitted to communicate with each 172 other. (e.g., by honoring VLANs). 174 Global Tenant Network Identifier (GTID): A GTID is a global 175 identifier of a virtual network. It is never carried in packets that 176 VMs send out but maintained in the SVDC controller. 178 Local Tenant Network Identifier (LTID): A LTID is a local identifier 179 that is used to differentiate virtual networks on the same server. 180 For the same virtual network, its LTID in different servers can 181 either be different or the same. When a new virtual network is 182 created, it will be assigned a LTID in each server that hosts its 183 VMs. 185 Global Identifier of a Multicast/Broadcast Group (Group-G): It is 186 used to denote the address of a multicast/broadcast group that can be 187 used in the physical network in the SVDC architecture. When a new 188 multicast/broadcast group wants to send traffic across the core 189 network, an available Group-G will be assigned to it. When all the 190 receivers of a group leave a multicast group, or a broadcast group 191 lacks of activity for a long duration, the corresponding Group-G will 192 be removed. 194 Local Identifier of a Multicast/Broadcast Group (Group-L): It is used 195 to denote the address of a multicast/broadcast group within a virtual 196 network. Group-L in different virtual networks can be overlapped. 198 Edge Switch Identifier (EID): It is used to denote the identifier of 199 an edge switch. Any identifier of a switch such as the MAC address of 200 a switch can be represented as it. 202 Server Identifier (SID): It is used to denote the identifier of a 203 physical server just like EID. 205 Virtual Machine MAC Address (VMAC): This is the MAC address assigned 206 to the virtual NIC of each VM. It is visible to VMs and applications 207 running within VMs. 209 Egress Port Identifier (p-ID): It denotes the outgoing port to which 210 the egress edge switch should forward the packet. 212 2. SVDC Architecture 214 The basic architecture of SVDC is depicted in Figure 1. 216 +--------------------+ +--------------------+ 217 | Server 1 | | Server 2 | 218 | +----------+ | | +-----------+ | 219 | | VN 1 | | | | VN 2 | | 220 | | +-------+| | | | +-------+ | | 221 | | | VM 1 || | | | | VM 2 | | | 222 | | | VMAC 1|| | | | | VMAC 2| | | 223 | | +-------+| | | | +-------+ | | 224 | | | | | | | | | | 225 | +----------+ | | +-----------+ | 226 | | | | | | 227 | +----------------+ | | +----------------+ | 228 | |Virtual Switch 1| | | |Virtual Switch 2| | 229 | +----------------+ | | +----------------+ | 230 | | | | | | 231 +--------------------+ +--------------------+ 232 | | 233 +-------------+ +----------------+ 234 |Ingress Edge | | Ingress Edge | 235 | Switch 1 | | Switch 2 | 236 +-------------+ +----------------+ 237 | | | | 238 | | ,----------. | | 239 | | ,' `. | | 240 | +-----( Core Network )--+ | 241 | `. ,' | 242 | `-+-------+' | 243 | | 244 | +-------------------------+ | 245 +-----| SVDC Controller |-----+ 246 +-------------------------+ 248 Figure 1 SVDC Architecture 250 In minimum configuration, the SVDC architecture only contains an SVDC 251 controller and the updated edge switches. The controller interacts 252 with the edge switches using an SDN protocol like [OPENFLOW]. A very 253 light-weight modification on the virtual switch is required to fill 254 the server-local identifier of a virtual network into the packet. 255 Core switches and VMs just run legacy protocols, and can be unaware 256 of SVDC. 258 In the core network, any kind of layer-2 forwarding schemes can be 259 used, for example, Spanning Tree Protocol (STP) [802.1D], TRILL 260 protocol [RFC6325] and Shortest Path Bridging protocol [802.1aq] for 261 unicast, while global multicast tree formation protocol for 262 multicast. However, up to the operator's configuration, the SVDC 263 controller can also use [OPENFLOW] to configure the unicast/multicast 264 forwarding entries in the core network. SVDC can seamlessly coexist 265 with any forwarding fabric in the core network, either SDN or non- 266 SDN. 268 Every virtual switch maintains a local FIB table with entries 269 destined to VMs on the local server, while packets sent to all the 270 other VMs are simply forwarded to the edge switch it connects to. An 271 edge switch maintains both a unicast encapsulation table and a 272 multicast encapsulation table, used in MAC-in-MAC encapsulation for 273 every packet. When the first packet of a flow arrives at an ingress 274 edge switch, the encapsulation table lookup will fail and then the 275 packet is directed to the SVDC controller. The SVDC controller then 276 looks up its mapping tables which maintain the global information of 277 the network, and responds to the ingress switch with the information 278 to update its encapsulation table. Subsequent packets of the flow 279 will be directly encapsulated by looking up the encapsulation table, 280 without interrupting the SVDC controller again. Multicast group join 281 requests are also directed to the SVDC controller, and then the 282 controller updates the multicast decapsulation table in corresponding 283 egress switches with group membership. 285 SVDC supports a great scale of virtual networks by maintaining a 286 global identifier for every virtual network in the SVDC controller, 287 but never carrying the identifier in the packet. Instead, a server- 288 local identifier is carried in the packet header to identify a 289 virtual network on a certain physical server. The SVDC controller 290 maintains the mapping relationship between the global and local 291 identifiers, and is responsible for the translation when the first 292 packet of a flow is directed to the SVDC controller. The translation 293 includes both mapping a server-local virtual network identifier to 294 the global identifier, and vice-versa. SVDC reuses the 12-bit VLAN 295 [802.1q] field as the in-packet server-local virtual network 296 identifier, which should be adequate since the number of virtual 297 networks in a physical server cannot exceed 4096. 299 To minimize the packet header overhead introduced due to 300 encapsulating the original Ethernet packets from VMs in a layer-2 301 network, SVDC uses MAC-in-MAC encapsulation in ingress switches. It 302 not only masks the MAC address overlap from VMs in different virtual 303 networks, but also minimizes the number of forwarding entries in core 304 switches. The key point here is how to guarantee correct packet 305 forwarding in the first hop and last hop, since no information is 306 carried in the packet to globally differentiate the virtual networks 307 in a direct way. SVDC has two approaches to deal with these problems. 309 First, for the ingress switch to identify the virtual network an 310 incoming packet belongs to, only the server-local identifier carried 311 in the VLAN field is not enough. But the VLAN field together with the 312 incoming port of the switch are just enough for the identification, 313 since the incoming port of the switch can uniquely identify the 314 physical server where the packet is sent from. 316 Second, when the egress switch decapsulates the outer MAC header, it 317 needs a way to correctly forward the packet to an outgoing port. 318 Local table lookup cannot help because the in-packet virtual network 319 identifier is not the global one and thus can overlap. The way we 320 come up with is to reuse the VLAN field of the outer MAC header to 321 indicate the forwarding port in the egress switch. The field is 322 filled in the ingress switch for a unicast packet by looking up the 323 unicast encapsulation table, and filled in the egress switch for a 324 multicast packet by looking up the multicast decapsulation table. The 325 12-bit VLAN tag is also more than enough to identify different 326 servers connecting the egress switch, unless the egress switch has 327 more than 4096 ports, which cannot happen in practice. 329 SVDC encompasses multicast and broadcast within each virtual network 330 with possible overlapping group addresses. In order to avoid traffic 331 leakage among virtual networks, the SVDC controller maps each 332 multicast group or broadcast in a virtual network to a global 333 multicast group, which can be identified by the global multicast 334 group address, composed of 23-bit multicast MAC address and 12-bit 335 VLAN field. This 35-bit global multicast group address is enough to 336 support a potentially huge number of multicast/broadcast groups 337 within virtual networks and can be carried in the outer Ethernet 338 header. 340 The following sections will describe the design detail of each 341 component in SVDC architecture. 343 2.1 Virtual Switch 345 Every virtual switch configures its FIB table entries towards VMs in 346 the local server, and sets the forwarding port of the default entry 347 towards the edge switch connecting to the server it resides in. The 348 key of the FIB table entry in virtual switch is a tuple (LTID,VMAC), 349 which uniquely identifies a VM in a physical server. Note that in 350 SVDC, VMs are not aware of the virtualized network infrastructure, 351 and thus the Ethernet header sent by a VM does not contain any LTID. 353 When a virtual switch receives an Ethernet packet, it first 354 determines whether it is from a local VM or from the outbound port. 355 If from a local VM, the virtual switch adds the LTID in the VLAN 356 field of the Ethernet header based on the incoming port and then 357 forwards it out. If from the outbound port, operations on it depend 358 on whether it is a unicast packet or a multicast/broadcast packet. 359 For a unicast packet, the virtual switch directly looks up the FIB 360 table and forwards it to a certain VM in the local server; for a 361 broadcast packet, the virtual switch forwards it to all VMs within 362 the same virtual network on the local server; while for a tenant- 363 defined multicast packet, the virtual switch forwards it towards VMs 364 that are interested in it, which can be learned by snooping the 365 multicast group join message sent by VMs. 367 2.2 Edge switches 369 Edge switches bear most intelligence of the data plane in SVDC. It is 370 responsible for rewriting VLAN field in the inner Ethernet packet 371 header and encapsulating/decapsulating the original Ethernet packets. 373 Every ingress edge switch maintains a unicast encapsulation table 374 which maps from (in-port, LTID-s, VM-d) to (LTID-d, ES-d, p-ID), 375 where in-port is the incoming port of the packet, LTID-s is the LTID 376 of the virtual network in the source server, VM-d is the MAC address 377 of the destination VM in the original Ethernet header, LTID-d is the 378 LTID of the virtual network in the destination server, ES-d is the 379 MAC address of the egress edge switch, and p-ID is the outgoing port 380 to which the egress edge switch should forward the packet. If the 381 lookup hits, the ingress edge switch will do the following 382 operations. First, it rewrites LTID-s in the VLAN field of the 383 original Ethernet header as LTID-d. Second, it encapsulates the 384 packet by adding an outer Ethernet header, with ES-d as the 385 destination MAC address, its own MAC address (ES-s) as the source MAC 386 address, and p-ID as the VLAN field. Third, it forwards the 387 encapsulated packet by looking up the forwarding table. However, if 388 the lookup fails, the ingress edge switch will direct the packet to 389 the SVDC controller with incoming port of the packet, which helps the 390 controller obtain the information required to install an 391 encapsulation entry in the unicast encapsulation table. 393 A multicast encapsulation table is also maintained, which maps from 394 the tuple (in-port, LTID-s, Group-L) to the global multicast group 395 address Group-G to fill in the outer Ethernet header. If the lookup 396 hits, it encapsulates the multicast/broadcast packets with Group-G as 397 the destination MAC address and VLAN ID while ES-s as the source MAC 398 address. If the lookup misses, it will send this packet to the SVDC 399 controller to update the multicast encapsulation table. 401 Since VMs of a certain group can have different LTIDs in different 402 servers, egress edge switches should rewrite LTID in the inner 403 Ethernet header for each packet duplication destined to different 404 servers. Thus, every egress edge switch maintains a multicast 405 decapsulation table, which maps from Group-G to multiple (Out-PORT, 406 LTID-d) tuples, where Out-PORT is an output port of a 407 multicast/broadcast packet duplication and LTID-d is the LTID of the 408 virtual network in the destination server connecting to the Out-PORT. 409 Entries in this table are inserted by the SVDC controller when the 410 multicast group join message sent by a VM is directed to it. When an 411 egress edge switch receives a multicast packet, it first duplicates 412 this packet as the number of (Out-PORT,LTID-d) tuples. Then, it 413 decapsulates each packet duplication, rewrites the LTID in the inner 414 Ethernet header of each packet duplication as indicated by LTID-d and 415 sends each packet duplication towards the destination server as 416 indicated by the Out-PORT. 418 2.3 SVDC Controller 420 The SVDC controller keeps several groups of mapping tables based on 421 its global knowledge of the network. 423 - LT-GT MAP: (SID, LTID) is mapped to GTID. 424 It is used to identify the global identifier of a virtual 425 network based on a physical server identifier and its local 426 virtual network identifier. 428 - VM-LT MAP: (GTID, VMAC) is mapped to (SID,LTID). 429 Based on the global identifier of a virtual network and a 430 certain MAC address, we can uniquely identify the physical 431 server a VM resides in as well as the local identifier of the 432 virtual network on that server. 434 - SID-ES MAP: (EID, port) is mapped to SID and vice versa. 435 This mapping table can be directly obtained from the network 436 topology and it is used to identify the server connected to a 437 certain port of an edge switch or vice versa. 439 - GL-GG MAP: (GTID,Group-L) is mapped to Group-G. 440 It is used to map a multicast group or broadcast address within 441 a virtual network to its global multicast group address. 443 The main function of the SVDC controller is to respond to requests 444 from edge switches with information they need, which helps install 445 the encapsulation/decapsulation table entries in the ingress/egress 446 edge switches. When an ingress edge switch receives the first packet 447 of a flow, it directs the packet to the controller with the incoming 448 port of the packet and queries the controller for the information 449 required. 451 If it is a unicast data packet, the controller first uses SID-ES MAP 452 to get the SID of the source server. By source server's SID and LTID 453 in the original packet, the controller then identifies GTID of the 454 virtual network by LT-GT MAP. Based on the GTID and the destination 455 MAC address of the original packet, the controller can use VM-LT MAP 456 to further identify the destination SID and LTID of the virtual 457 network in the destination server. Finally, the controller depends on 458 the SID-ES MAP again to get the MAC address of the egress edge switch 459 as well as the port number of the egress edge switch connecting to 460 the destination server. Now, the SVDC controller can return all the 461 information needed by the ingress edge switch to construct an unicast 462 encapsulation table entry. 464 If it is a multicast data packet, the controller uses SID-ES MAP and 465 LT-GT MAP sequentially to get the GTID of the virtual network as 466 aforementioned. Then, if the controller can find a corresponding 467 entry in GL-GG MAP to get Group-G, it returns Group-G to the ingress 468 switch to build the multicast encapsulation table. If not, it will 469 find an available global multicast group address Group-G, insert a 470 new entry to GL-GG MAP, and return the new Group-G to the ingress 471 edge switch. 473 If it is a multicast group join request, the SVDC controller first 474 gets the GTID of the virtual network by using SID-ES MAP and LT-GT 475 MAP sequentially. Then, it looks up the GL-GG MAP to find the 476 corresponding Group-G. If the SVDC controller can find one, it just 477 responds to the edge switch with this information. If not, the SVDC 478 controller will find an available Group-G and insert a new entry to 479 the GL-GG MAP before it responds it to the edge switch. After the 480 edge switch gets the Group-G from the SVDC controller, it inserts a 481 new entry into the multicast decapsulation table with Out-PORT as the 482 incoming port of the multicast group join request and LTID-d as the 483 LTID of it. 485 If the cloud provider's layer-2 data center networks are 486 geographically distributed across the Internet, the SVDC controller 487 needs to maintain the information of all cloud data center networks 488 of this cloud provider. In practice, each data center network has a 489 controller and the global information is synchronized among the 490 controllers periodically. 492 3. Packet Forwarding 494 3.1 Unicast Traffic 496 When a unicast packet is generated by a VM and sent out to the local 497 virtual switch, it carries the destination MAC address (VM-d), the 498 source MAC address (VM-s), and leaves the VLAN field empty. 500 The virtual switch then adds the local LTID (LTID-s) into the VLAN 501 field of the packet and looks up the local FIB table for forwarding. 503 If the destination VM is within the local server, the packet will be 504 directly forwarded to it. Otherwise, the packet is delivered to the 505 ingress edge switch ES-s. 507 Next, the ingress edge switch ES-s looks up its encapsulation table 508 using (in-port, LTID-s, VM-d) as key. If missed, the ingress edge 509 switch directs the packet to the controller and the controller 510 installs the encapsulation entry for the flow. If hit, the ingress 511 edge switch obtains the tuple (LTID-d, ES-d, p-ID). Then VLAN field 512 of the original Ethernet header is changed from LTID-s to LTID-d, and 513 an outer Ethernet header is added. The ingress edge switch 514 immediately looks up the FIB table to forward the packet. 516 After that, the packet is delivered by core switches towards the 517 egress edge switch ES-d. The egress edge switch gets the VLAN field 518 of the outer Ethernet header p-ID, decapsulates the outer Ethernet 519 header, and forwards it to the port p-ID. 521 Finally the packet arrives at the destination virtual switch. The 522 virtual switch looks up the FIB table based on LTID-d and VM-d, and 523 delivers it to the destination VM. 525 3.2 Multicast/Broadcast Traffic 527 When a VM generates a multicast packet, the destination address field 528 of the Ethernet header is filled with the layer-2 multicast group 529 address, denoted as Group-L. This packet then goes to the virtual 530 switch, which inserts LTID-s into the VLAN field and forwards it 531 towards the ingress edge switch. 533 The ingress edge switch ES-s looks up its multicast encapsulation 534 table using (in-port, LTID-s, Group-L) as key. If missed, the ingress 535 edge switch directs the packet to the controller. Then, the 536 controller installs the multicast encapsulation entry into the 537 ingress edge switch and the multicast decapsulation entries into the 538 egress edge switches. If hit, the ingress edge switch gets the global 539 multicast group address Group-G to fill in the outer Ethernet header. 541 This packet is then forwarded towards the egress edge switches along 542 the multicast tree. When an egress edge switch receives this packet, 543 it takes Group-G filled in the outer Ethernet header as key and gets 544 multiple (Out-PORT,LTID-d) tuples. It then duplicates the packet as 545 the number of the tuples, decapsulates each packet duplication, 546 rewrites the LTID of it and forwards it towards the Out-PORT. 548 Finally, the packet arrives at the destination virtual switch and is 549 forwarded towards VMs which have joined the multicast group in the 550 virtual network. 552 3.3 SVDC Frame Format 554 To mask the overlapped VM MAC addresses and mitigate the limitation 555 of the forwarding table size in switches. SVDC enhances MAC-in-MAC 556 encapsulation to guarantee correct packet forwarding. Figure 2 557 demonstrates the packet format of the MAC-in-MAC encapsulation used 558 in SVDC. 560 Outer Ethernet Header: 561 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 562 | Outer Destination MAC Address | 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 564 | Outer Destination MAC Address | Outer Source MAC Address | 565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 | Outer Source MAC Address | 567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 568 |Ethertype = SVDC Ethertype | Outer.VLAN Tag (p-ID) | 569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 570 Inner Ethernet Header: 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 | Inner Destination MAC Address | 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 | Inner Destination MAC Address | Inner Source MAC Address | 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 | Inner Source MAC Address | 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 |Ethertype = C-Tag [802.1q] | Inner.VLAN Tag (LTID) | 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 580 Payload: 581 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 582 | Ethertype of Original Payload | | 583 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 584 | Original Ethernet Payload | 585 | | 586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 587 Frame Check Sequence: 588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 589 | New FCS (Frame Check Sequence) | 590 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 592 Figure 2. SVDC MAC-in-MAC Packet Format 594 The outer Ethernet header: The source Ethernet address in the outer 595 Ethernet header is set to the MAC address of the ingress edge switch. 596 The destination Ethernet address is either set to the MAC address of 597 the egress edge switch (in unicast traffic) or set to the first 48 598 bits of the Global-G assigned to the virtual network (in 599 multicast/broadcast). To distinguish SVDC packets, the Ethertype of 600 the outer Ethernet header needs to be set to a specific SVDC 601 Ethertype. The outer VLAN information is used to indicate either the 602 egress port of the packet in the egress edge switch (in unicast 603 traffic) or the last 12 bits of the Global-G of the virtual network. 605 The Inner Ethernet header: The source and destination Ethernet 606 address in the inner Ethernet header is set to the MAC address of the 607 source and destination VM, respectively. Value of the VLAN tag is 608 used to indicate the LTID of the virtual network this packet belongs 609 to in the destination server. The payload of the inner Ethernet 610 header includes the Ethertype of the original payload and the 611 original Ethernet payload. 613 4. SVDC Deployment Considerations 615 4.1 VM Migration 617 To handle VM migration, a central VM manager which can communicate 618 with all hosts needs to be deployed in the network. The SVDC 619 controller needs to be co-located with this central VM manager. In 620 this scenario, when a VM is about to migrate, the VM manager will 621 notify the SVDC controller about the destination server ID, the IP 622 address and the GTID of this VM. 624 SVDC controller needs to check whether a LTID is assigned to the 625 virtual network of this VM in the destination server before VM 626 migration starts. If not, a LTID will be created and the virtual 627 switch on the destination server will be configured. 629 After VM migration completes, a gratuitous ARP message is sent from 630 the destination server to announce the new location of the VM. This 631 ARP message is directed to SVDC controller for broadcast entries 632 query when it arrives at the edge switch. In this way, SVDC can 633 confirm VM migration completion and update the location information 634 of this VM in its mapping tables. 636 To maintain the communication states destined for the migrated VM in 637 edge switches, SVDC controller broadcasts an entry update message to 638 all edge switches immediately after it receives the gratuitous ARP 639 message. This message contains the (LTID, ES, p-ID) tuple the 640 migrated VM uses after migration. All edge switches that maintain 641 encapsulation table entries toward the migrated VM update their 642 encapsulation tables and keep the communication states towards the 643 migrated VM. The gratuitous ARP message is then sent to VMs within 644 the same virtual networks to update the ARP tables of them. 646 4.2 Fault Tolerance 648 An important aspect of large virtualized data center network is the 649 increased likelihood of failures. SVDC tolerates server failures as 650 well as edge switch failures, because no "hard state" is associated 651 with a specific virtual switch or edge switch. In large virtualized 652 data center, it is rational to assume that there are virtual network 653 and physical network management systems which are responsible for 654 detecting failed virtual switches or edge switches. 656 However, it is necessary for SVDC to handle failures of controller 657 instances or control links between controller instances and edge 658 switches. To handle failures of controller instances, more than one 659 controller instances can be used to manage each network element. All 660 controller instances will synchronize network information 661 periodically. They can work in hot backup or cold backup mode. When 662 one controller instance fails, another instance can replace it in 663 time. To handle failures of control links, traditional routing 664 protocols that are fault-tolerant, e.g. Spanning-Tree protocol 665 [802.1D], can be applied to the out-band management network 666 deployment. For in-band management network deployment, we assume the 667 layer-2 routing scheme in the core network can take the 668 responsibility to handle link failures. 670 5 Security Considerations 672 Since SVDC enhances MAC-in-MAC technique to implement network 673 virtualization, it faces several security challenges that traditional 674 Ethernet network also faces, such as layer-2 traffic snooping, packet 675 flooding causing denial of service attack and MAC address spoofing. 676 In SVDC, malicious end-point can choose to attack the SVDC controller 677 by forging a great number of communication request with different 678 source and destination pairs or hijack the MAC address of the edge 679 switch to interfere the normal communication between the SVDC 680 controller and the edge switches. 682 Traditional layer-2 technique can be deployed in SVDC to handle these 683 problems, for example, IEEE 802.1 port admission control mechanism 684 [802.1X] can be used to mitigate the spoofing problem. The security 685 of the communication channel between edge switches and the SVDC 686 controller relies on security mechanism in transport layer. 688 6 IANA Considerations 690 This document has no actions for IANA, but SVDC needs to be assigned 691 a new ethertype. 693 7 References 695 7.1 Normative References 697 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 698 Requirement Levels", BCP 14, RFC 2119, March 1997. 700 7.2 Informative References 702 [802.1aq] IEEE, "Standard for Local and metropolitan area networks -- 703 Media Access Control (MAC) Bridges and Virtual Bridged 704 Local Area Networks -- Amendment 20: Shortest Path 705 Bridging", IEEE P802.1aq-2012, 2012. 707 [802.1D] IEEE, "Draft Standard for Local and Metropolitan Area 708 Networks/ Media Access Control (MAC) Bridges", IEEE 709 P802.1D-2004, 2004. 711 [802.1q] IEEE, "Standards for Local and Metropolitan Area Networks: 712 Virtual Bridged Local Area Networks.", IEEE Standard 713 802.1Q, 2005 Edition, May 2006. 715 [802.1X] IEEE, "IEEE Standard for Local and Metropolitan area 716 networks -- Port-Based Network Access Control", IEEE Std 717 802.1X-2010, February 2010. 719 [RFC4762] Lasserre, M. and Kompella, V., "Virtual private LAN service 720 (VPLS) using label distribution protocol (LDP) signaling", 721 RFC 4762, January 2007. 723 [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 724 Ghanwani, "Routing Bridges (RBridges): Base Protocol 725 Specification", RFC 6325, July 2011. 727 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., and Agarwal, P., 728 "Virtual eXtensible Local Area Network (VXLAN): A 729 Framework for Overlaying Virtualized Layer 2 Networks over 730 Layer 3 Networks", RFC 7348, August 2014. 732 [NVGRE] Sridharan, M., A. Greenberg, N. Venkataramiah, Y. Wang, K. 733 Duda, I. Ganga, G. Lin, M. Pearson, P. Thaler, and C. 735 Tumuluri. "NVGRE: Network virtualization using generic 736 routing encapsulation." IETF draft, April, 2015. 738 [SDN] Open Networking Foundation White Paper, "Software-Defined 739 Networking: The New Norm for Networks", April 2012. 741 [OPENFLOW] McKeown, N., T. Anderson, H. Balakrishnan, G. Parulkar, L. 742 Peterson, J. Rexford, S. Shenker, and J. Turner. 743 "OpenFlow: enabling innovation in campus networks 744 (OpenFlow White Paper)." Online: 745 http://www.openflowswitch.org 2008. 747 Authors' Addresses 749 Congjie Chen 750 4-104, FIT Building, 751 Tsinghua University, 752 Hai Dian District, 753 Beijing, China 755 EMail: ccjguangzhou@gmail.com 757 Dan Li 758 4-104, FIT Building, 759 Tsinghua University, 760 Hai Dian District, 761 Beijing, China 763 EMail: tolidan@tsinghua.edu.cn 765 Jun Li 766 Network and Security Research Laboratory, 767 Department of Computer and Information Science, 768 University of Oregon, 769 1585 E 13th Ave. 770 Eugene, OR 97403 772 EMail: lijun@cs.uoregon.edu