idnits 2.17.00 (12 Aug 2021) /tmp/idnits12620/draft-ietf-nvo3-overlay-problem-statement-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 5, 2012) is 3545 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: draft-ietf-l2vpn-evpn has been published as RFC 7432 == Outdated reference: draft-ietf-lisp has been published as RFC 6830 == Outdated reference: draft-ietf-trill-fine-labeling has been published as RFC 7172 == Outdated reference: A later version (-07) exists of draft-raggarwa-data-center-mobility-03 Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force T. Narten, Ed. 3 Internet-Draft IBM 4 Intended status: Informational D. Black 5 Expires: March 9, 2013 EMC 6 D. Dutt 8 L. Fang 9 Cisco Systems 10 E. Gray, Ed. 11 Ericsson 12 L. Kreeger 13 Cisco 14 M. Napierala 15 AT&T 16 M. Sridharan 17 Microsoft 18 September 5, 2012 20 Problem Statement: Overlays for Network Virtualization 21 draft-ietf-nvo3-overlay-problem-statement-00 23 Abstract 25 This document describes issues associated with providing multi- 26 tenancy in large data center networks that require an overlay-based 27 network virtualization approach to addressing them. A key multi- 28 tenancy requirement is traffic isolation, so that a tenant's traffic 29 is not visible to any other tenant. This isolation can be achieved 30 by assigning one or more virtual networks to each tenant such that 31 traffic within a virtual network is isolated from traffic in other 32 virtual networks. The primary functionality required is provisioning 33 virtual networks, associating a virtual machine's virtual network 34 interface(s) with the appropriate virtual network, and maintaining 35 that association as the virtual machine is activated, migrated and/or 36 deactivated. Use of an overlay-based approach enables scalable 37 deployment on large network infrastructures. 39 Status of this Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at http://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on March 9, 2013. 56 Copyright Notice 58 Copyright (c) 2012 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Problem Areas . . . . . . . . . . . . . . . . . . . . . . . . 5 75 2.1. Need For Dynamic Provisioning . . . . . . . . . . . . . . 5 76 2.2. Virtual Machine Mobility Limitations . . . . . . . . . . . 6 77 2.3. Inadequate Forwarding Table Sizes in Switches . . . . . . 6 78 2.4. Need to Decouple Logical and Physical Configuration . . . 7 79 2.5. Need For Address Separation Between Tenants . . . . . . . 7 80 2.6. Need For Address Separation Between Tenant and 81 Infrastructure . . . . . . . . . . . . . . . . . . . . . . 7 82 2.7. IEEE 802.1 VLAN Limitations . . . . . . . . . . . . . . . 8 83 3. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 8 84 3.1. Benefits of Network Overlays . . . . . . . . . . . . . . . 9 85 3.2. Communication Between Virtual and Traditional Networks . . 10 86 3.3. Communication Between Virtual Networks . . . . . . . . . . 11 87 3.4. Overlay Design Characteristics . . . . . . . . . . . . . . 11 88 3.5. Overlay Networking Work Areas . . . . . . . . . . . . . . 12 89 4. Related IETF and IEEE Work . . . . . . . . . . . . . . . . . 14 90 4.1. L3 BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . 14 91 4.2. L2 BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . 14 92 4.3. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 15 93 4.4. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 94 4.5. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 15 95 4.6. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 16 96 4.7. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 16 97 4.8. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 98 5. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 16 99 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 100 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 101 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 102 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 103 10. Informative References . . . . . . . . . . . . . . . . . . . . 17 104 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 19 105 A.1. Changes from 106 draft-narten-nvo3-overlay-problem-statement-04.txt . . . . 19 107 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 109 1. Introduction 111 Data Centers are increasingly being consolidated and outsourced in an 112 effort, both to improve the deployment time of applications as well 113 as reduce operational costs. This coincides with an increasing 114 demand for compute, storage, and network resources from applications. 115 In order to scale compute, storage, and network resources, physical 116 resources are being abstracted from their logical representation, in 117 what is referred to as server, storage, and network virtualization. 118 Virtualization can be implemented in various layers of computer 119 systems or networks 121 The demand for server virtualization is increasing in data centers. 122 With server virtualization, each physical server supports multiple 123 virtual machines (VMs), each running its own operating system, 124 middleware and applications. Virtualization is a key enabler of 125 workload agility, i.e., allowing any server to host any application 126 and providing the flexibility of adding, shrinking, or moving 127 services within the physical infrastructure. Server virtualization 128 provides numerous benefits, including higher utilization, increased 129 security, reduced user downtime, reduced power usage, etc. 131 Multi-tenant data centers are taking advantage of the benefits of 132 server virtualization to provide a new kind of hosting, a virtual 133 hosted data center. Multi-tenant data centers are ones where 134 individual tenants could belong to a different company (in the case 135 of a public provider) or a different department (in the case of an 136 internal company data center). Each tenant has the expectation of a 137 level of security and privacy separating their resources from those 138 of other tenants. For example, one tenant's traffic must never be 139 exposed to another tenant, except through carefully controlled 140 interfaces, such as a security gateway. 142 To a tenant, virtual data centers are similar to their physical 143 counterparts, consisting of end stations attached to a network, 144 complete with services such as load balancers and firewalls. But 145 unlike a physical data center, end stations connect to a virtual 146 network. To end stations, a virtual network looks like a normal 147 network (e.g., providing an ethernet or L3 service), except that the 148 only end stations connected to the virtual network are those 149 belonging to a tenant's specific virtual network. 151 A tenant is the administrative entity that is responsible for and 152 manages a specific virtual network instance and its associated 153 services (whether virtual or physical). In a cloud environment, a 154 tenant would correspond to the customer that has defined and is using 155 a particular virtual network. However, a tenant may also find it 156 useful to create multiple different virtual network instances. 158 Hence, there is a one-to-many mapping between tenants and virtual 159 network instances. A single tenant may operate multiple individual 160 virtual network instances, each associated with a different service. 162 How a virtual network is implemented does not generally matter to the 163 tenant; what matters is that the service provided (L2 or L3) has the 164 right semantics, performance, etc. It could be implemented via a 165 pure routed network, a pure bridged network or a combination of 166 bridged and routed networks. A key requirement is that each 167 individual virtual network instance be isolated from other virtual 168 network instances. 170 For data center virtualization, two key issues must be addressed. 171 First, address space separation between tenants must be supported. 172 Second, it must be possible to place (and migrate) VMs anywhere in 173 the data center, without restricting VM addressing to match the 174 subnet boundaries of the underlying data center network. 176 This document outlines the problems encountered in scaling the number 177 of isolated networks in a data center, as well as the problems of 178 managing the creation/deletion, membership and span of these networks 179 and makes the case that an overlay based approach, where individual 180 networks are implemented within individual virtual networks that are 181 dynamically controlled by a standardized control plane provides a 182 number of advantages over current approaches. The purpose of this 183 document is to identify the set of problems that any solution has to 184 address in building multi-tenant data centers. With this approach, 185 the goal is to allow the construction of standardized, interoperable 186 implementations to allow the construction of multi-tenant data 187 centers. 189 Section 2 describes the problem space details. Section 3 describes 190 overlay networks in more detail. Sections 4 and 5 review related and 191 further work, while Section 6 closes with a summary. 193 2. Problem Areas 195 The following subsections describe aspects of multi-tenant data 196 center networking that pose problems for network infrastructure. 197 Different problem aspects may arise based on the network architecture 198 and scale. 200 2.1. Need For Dynamic Provisioning 202 Cloud computing involves on-demand provisioning of resources for 203 multi-tenant environments. A common example of cloud computing is 204 the public cloud, where a cloud service provider offers elastic 205 services to multiple customers over the same infrastructure. In 206 current systems, it can be difficult to provision resources for 207 individual tenants in such a way that provisioned properties migrate 208 automatically when services are dynamically moved around within the 209 data center to optimize workloads. 211 2.2. Virtual Machine Mobility Limitations 213 A key benefit of server virtualization is virtual machine (VM) 214 mobility. A VM can be migrated from one server to another, live, 215 i.e., while continuing to run and without needing to shut it down and 216 restart it at the new location. A key requirement for live migration 217 is that a VM retain critical network state at its new location, 218 including its IP and MAC address(es). Preservation of MAC addresses 219 may be necessary, for example, when software licenses are bound to 220 MAC addresses. More generally, any change in the VM's MAC addresses 221 resulting from a move would be visible to the VM and thus potentially 222 result in unexpected disruptions. Retaining IP addresses after a 223 move is necessary to prevent existing transport connections (e.g., 224 TCP) from breaking and needing to be restarted. 226 In traditional data centers, servers are assigned IP addresses based 227 on their physical location, for example based on the Top of Rack 228 (ToR) switch for the server rack or the VLAN configured to the 229 server. Servers can only move to other locations within the same IP 230 subnet. This constraint is not problematic for physical servers, 231 which move infrequently, but it restricts the placement and movement 232 of VMs within the data center. Any solution for a scalable multi- 233 tenant data center must allow a VM to be placed (or moved) anywhere 234 within the data center, without being constrained by the subnet 235 boundary concerns of the host servers. 237 2.3. Inadequate Forwarding Table Sizes in Switches 239 Today's virtualized environments place additional demands on the 240 forwarding tables of switches in the physical infrastructure. 241 Instead of just one link-layer address per server, the switching 242 infrastructure has to learn addresses of the individual VMs (which 243 could range in the 100s per server). This is a requirement since 244 traffic from/to the VMs to the rest of the physical network will 245 traverse the physical network infrastructure. This places a much 246 larger demand on the switches' forwarding table capacity compared to 247 non-virtualized environments, causing more traffic to be flooded or 248 dropped when the number of addresses in use exceeds a switch's 249 forwarding table capacity. 251 2.4. Need to Decouple Logical and Physical Configuration 253 Data center operators must be able to achieve high utilization of 254 server and network capacity. For efficient and flexible allocation, 255 operators should be able to spread a virtual network instance across 256 servers in any rack in the data center. It should also be possible 257 to migrate compute workloads to any server anywhere in the network 258 while retaining the workload's addresses. In networks using VLANs, 259 moving servers elsewhere in the network may require expanding the 260 scope of the VLAN beyond its original boundaries. While this can be 261 done, it requires potentially complex network configuration changes 262 and can conflict with the desire to bound the size of broadcast 263 domains, especially in larger data centers. 265 However, in order to limit the broadcast domain of each VLAN, multi- 266 destination frames within a VLAN should optimally flow only to those 267 devices that have that VLAN configured. When workloads migrate, the 268 physical network (e.g., access lists) may need to be reconfigured 269 which is typically time consuming and error prone. 271 An important use case is cross-pod expansion. A pod typically 272 consists of one or more racks of servers with its associated network 273 and storage connectivity. A tenant's virtual network may start off 274 on a pod and, due to expansion, require servers/VMs on other pods, 275 especially the case when other pods are not fully utilizing all their 276 resources. This use case requires that virtual networks span 277 multiple pods in order to provide connectivity to all of its tenant's 278 servers/VMs. Such expansion can be difficult to achieve when tenant 279 addressing is tied to the addressing used by the underlay network or 280 when it requires that the scope of the underlying L2 VLAN expand 281 beyond its original pod boundary. 283 2.5. Need For Address Separation Between Tenants 285 Individual tenants need control over the addresses they use within a 286 virtual network. But it can be problematic when different tenants 287 want to use the same addresses, or even if the same tenant wants to 288 reuse the same addresses in different virtual networks. 289 Consequently, virtual networks must allow tenants to use whatever 290 addresses they want without concern for what addresses are being used 291 by other tenants or other virtual networks. 293 2.6. Need For Address Separation Between Tenant and Infrastructure 295 As in the previous case, a tenant needs to be able to use whatever 296 addresses it wants in a virtual network independent of what addresses 297 the underlying data center network is using. Tenants (and the 298 underlay infrastructure provider) should be able use whatever 299 addresses make sense for them, without having to worry about address 300 collisions between addresses used by tenants and those used by the 301 underlay data center network. 303 2.7. IEEE 802.1 VLAN Limitations 305 VLANs are a well known construct in the networking industry, 306 providing an L2 service via an L2 underlay. A VLAN is an L2 bridging 307 construct that provides some of the semantics of virtual networks 308 mentioned above: a MAC address is unique within a VLAN, but not 309 necessarily across VLANs. Traffic sourced within a VLAN (including 310 broadcast and multicast traffic) remains within the VLAN it 311 originates from. Traffic forwarded from one VLAN to another 312 typically involves router (L3) processing. The forwarding table look 313 up operation is keyed on {VLAN, MAC address} tuples. 315 But there are problems and limitations with L2 VLANs. VLANs are a 316 pure L2 bridging construct and VLAN identifiers are carried along 317 with data frames to allow each forwarding point to know what VLAN the 318 frame belongs to. A VLAN today is defined as a 12 bit number, 319 limiting the total number of VLANs to 4096 (though typically, this 320 number is 4094 since 0 and 4095 are reserved). Due to the large 321 number of tenants that a cloud provider might service, the 4094 VLAN 322 limit is often inadequate. In addition, there is often a need for 323 multiple VLANs per tenant, which exacerbates the issue. The use of a 324 sufficiently large VNID, present in the overlay control plane and 325 possibly also in the dataplane would eliminate current VLAN size 326 limitations associated with single 12-bit VLAN tags. 328 3. Network Overlays 330 Virtual Networks are used to isolate a tenant's traffic from that of 331 other tenants (or even traffic within the same tenant that requires 332 isolation). There are two main characteristics of virtual networks: 334 1. Virtual networks isolate the address space used in one virtual 335 network from the address space used by another virtual network. 336 The same network addresses may be used in different virtual 337 networks at the same time. In addition, the address space used 338 by a virtual network is independent from that used by the 339 underlying physical network. 341 2. Virtual Networks limit the scope of packets sent on the virtual 342 network. Packets sent by end systems attached to a virtual 343 network are delivered as expected to other end systems on that 344 virtual network and may exit a virtual network only through 345 controlled exit points such as a security gateway. Likewise, 346 packets sourced from outside of the virtual network may enter the 347 virtual network only through controlled entry points, such as a 348 security gateway. 350 3.1. Benefits of Network Overlays 352 To address the problems described in Section 2, a network overlay 353 model can be used. 355 The idea behind an overlay is quite straightforward. Each virtual 356 network instance is implemented as an overlay. The original packet 357 is encapsulated by the first-hop network device. The encapsulation 358 identifies the destination of the device that will perform the 359 decapsulation before delivering the original packet to the endpoint. 360 The rest of the network forwards the packet based on the 361 encapsulation header and can be oblivious to the payload that is 362 carried inside. 364 Overlays are based on what is commonly known as a "map-and-encap" 365 architecture. There are three distinct and logically separable 366 steps: 368 1. The first-hop overlay device implements a mapping operation that 369 determines where the encapsulated packet should be sent to reach 370 its intended destination VM. Specifically, the mapping function 371 maps the destination address (either L2 or L3) of a packet 372 received from a VM into the corresponding destination address of 373 the egress device. The destination address will be the underlay 374 address of the device doing the decapsulation and is an IP 375 address. 377 2. Once the mapping has been determined, the ingress overlay device 378 encapsulates the received packet within an overlay header. 380 3. The final step is to actually forward the (now encapsulated) 381 packet to its destination. The packet is forwarded by the 382 underlay (i.e., the IP network) based entirely on its outer 383 address. Upon receipt at the destination, the egress overlay 384 device decapsulates the original packet and delivers it to the 385 intended recipient VM. 387 Each of the above steps is logically distinct, though an 388 implementation might combine them for efficiency or other reasons. 389 It should be noted that in L3 BGP/VPN terminology, the above steps 390 are commonly known as "forwarding" or "virtual forwarding". 392 The first hop network device can be a traditional switch or router or 393 the virtual switch residing inside a hypervisor. Furthermore, the 394 endpoint can be a VM or it can be a physical server. Examples of 395 architectures based on network overlays include BGP/MPLS VPNs 396 [RFC4364], TRILL [RFC6325], LISP [I-D.ietf-lisp], and Shortest Path 397 Bridging (SPB-M) [SPBM]. 399 In the data plane, a virtual network identifier (or VNID), or a 400 locally significant identifier, can be carried as part of the overlay 401 header so that every data packet explicitly identifies the specific 402 virtual network the packet belongs to. Since both routed and bridged 403 semantics can be supported by a virtual data center, the original 404 packet carried within the overlay header can be an Ethernet frame 405 complete with MAC addresses or just the IP packet. 407 The use of a sufficiently large VNID would address current VLAN 408 limitations associated with single 12-bit VLAN tags. This VNID can 409 be carried in the control plane. In the data plane, an overlay 410 header provides a place to carry either the VNID, or an identifier 411 that is locally-significant to the edge device. In both cases, the 412 identifier in the overlay header specifies which virtual network the 413 data packet belongs to. 415 A key aspect of overlays is the decoupling of the "virtual" MAC 416 and/or IP addresses used by VMs from the physical network 417 infrastructure and the infrastructure IP addresses used by the data 418 center. If a VM changes location, the overlay edge devices simply 419 update their mapping tables to reflect the new location of the VM 420 within the data center's infrastructure space. Because an overlay 421 network is used, a VM can now be located anywhere in the data center 422 that the overlay reaches without regards to traditional constraints 423 implied by L2 properties such as VLAN numbering, or the span of an L2 424 broadcast domain scoped to a single pod or access switch. 426 Multi-tenancy is supported by isolating the traffic of one virtual 427 network instance from traffic of another. Traffic from one virtual 428 network instance cannot be delivered to another instance without 429 (conceptually) exiting the instance and entering the other instance 430 via an entity that has connectivity to both virtual network 431 instances. Without the existence of this entity, tenant traffic 432 remains isolated within each individual virtual network instance. 434 Overlays are designed to allow a set of VMs to be placed within a 435 single virtual network instance, whether that virtual network 436 provides a bridged network or a routed network. 438 3.2. Communication Between Virtual and Traditional Networks 440 Not all communication will be between devices connected to 441 virtualized networks. Devices using overlays will continue to access 442 devices and make use of services on traditional, non-virtualized 443 networks, whether in the data center, the public Internet, or at 444 remote/branch campuses. Any virtual network solution must be capable 445 of interoperating with existing routers, VPN services, load 446 balancers, intrusion detection services, firewalls, etc. on external 447 networks. 449 Communication between devices attached to a virtual network and 450 devices connected to non-virtualized networks is handled 451 architecturally by having specialized gateway devices that receive 452 packets from a virtualized network, decapsulate them, process them as 453 regular (i.e., non-virtualized) traffic, and finally forward them on 454 to their appropriate destination (and vice versa). Additional 455 identification, such as VLAN tags, could be used on the non- 456 virtualized side of such a gateway to enable forwarding of traffic 457 for multiple virtual networks over a common non-virtualized link. 459 A wide range of implementation approaches are possible. Overlay 460 gateway functionality could be combined with other network 461 functionality into a network device that implements the overlay 462 functionality, and then forwards traffic between other internal 463 components that implement functionality such as full router service, 464 load balancing, firewall support, VPN gateway, etc. 466 3.3. Communication Between Virtual Networks 468 Communication between devices on different virtual networks is 469 handled architecturally by adding specialized interconnect 470 functionality among the otherwise isolated virtual networks. For a 471 virtual network providing an L2 service, such interconnect 472 functionality could be IP forwarding configured as part of the 473 "default gateway" for each virtual network. For a virtual network 474 providing L3 service, the interconnect functionality could be IP 475 forwarding configured as part of routing between IP subnets or it can 476 be based on configured inter-virtual network traffic policies. In 477 both cases, the implementation of the interconnect functionality 478 could be distributed across the NVEs, and could be combined with 479 other network functionality (e.g., load balancing, firewall support) 480 that is applied to traffic that is forwarded between virtual 481 networks. 483 3.4. Overlay Design Characteristics 485 There are existing layer 2 and layer 3 overlay protocols in 486 existence, but they do not necessarily solve all of today's problem 487 in the environment of a highly virtualized data center. Below are 488 some of the characteristics of environments that must be taken into 489 account by the overlay technology: 491 1. Highly distributed systems. The overlay should work in an 492 environment where there could be many thousands of access devices 493 (e.g. residing within the hypervisors) and many more end systems 494 (e.g. VMs) connected to them. This leads to a distributed 495 mapping system that puts a low overhead on the overlay tunnel 496 endpoints. 498 2. Many highly distributed virtual networks with sparse membership. 499 Each virtual network could be highly dispersed inside the data 500 center. Also, along with expectation of many virtual networks, 501 the number of end systems connected to any one virtual network is 502 expected to be relatively low; Therefore, the percentage of 503 access devices participating in any given virtual network would 504 also be expected to be low. For this reason, efficient delivery 505 of multi-destination traffic within a virtual network instance 506 should be taken into consideration. 508 3. Highly dynamic end systems. End systems connected to virtual 509 networks can be very dynamic, both in terms of creation/deletion/ 510 power-on/off and in terms of mobility across the access devices. 512 4. Work with existing, widely deployed network Ethernet switches and 513 IP routers without requiring wholesale replacement. The first 514 hop device (or end system) that adds and removes the overlay 515 header will require new equipment and/or new software. 517 5. Work with existing data center network deployments without 518 requiring major changes in operational or other practices. For 519 example, some data centers have not enabled multicast beyond 520 link-local scope. Overlays should be capable of leveraging 521 underlay multicast support where appropriate, but not require its 522 enablement in order to use an overlay solution. 524 6. Network infrastructure administered by a single administrative 525 domain. This is consistent with operation within a data center, 526 and not across the Internet. 528 3.5. Overlay Networking Work Areas 530 There are three specific and separate potential work areas needed to 531 realize an overlay solution. The areas correspond to different 532 possible "on-the-wire" protocols, where distinct entities interact 533 with each other. 535 One area of work concerns the address dissemination protocol an NVE 536 uses to build and maintain the mapping tables it uses to deliver 537 encapsulated packets to their proper destination. One approach is to 538 build mapping tables entirely via learning (as is done in 802.1 539 networks). But to provide better scaling properties, a more 540 sophisticated approach is needed, i.e., the use of a specialized 541 control plane protocol. While there are some advantages to using or 542 leveraging an existing protocol for maintaining mapping tables, the 543 fact that large numbers of NVE's will likely reside in hypervisors 544 places constraints on the resources (cpu and memory) that can be 545 dedicated to such functions. 547 From an architectural perspective, one can view the address mapping 548 dissemination problem as having two distinct and separable 549 components. The first component consists of a back-end "oracle" that 550 is responsible for distributing and maintaining the mapping 551 information for the entire overlay system. The second component 552 consists of the on-the-wire protocols an NVE uses when interacting 553 with the oracle. 555 The back-end oracle could provide high performance, high resiliency, 556 failover, etc. and could be implemented in significantly different 557 ways. For example, one model uses a traditional, centralized 558 "directory-based" database, using replicated instances for 559 reliability and failover. A second model involves using and possibly 560 extending an existing routing protocol (e.g., BGP, IS-IS, etc.). To 561 support different architectural models, it is useful to have one 562 standard protocol for the NVE-oracle interaction while allowing 563 different protocols and architectural approaches for the oracle 564 itself. Separating the two allows NVEs to transparently interact 565 with different types of oracles, i.e., either of the two 566 architectural models described above. Having separate protocols 567 could also allow for a simplified NVE that only interacts with the 568 oracle for the mapping table entries it needs and allows the oracle 569 (and its associated protocols) to evolve independently over time with 570 minimal impact to the NVEs. 572 A third work area considers the attachment and detachment of VMs (or 573 Tenant End Systems [I-D.lasserre-nvo3-framework] more generally) from 574 a specific virtual network instance. When a VM attaches, the Network 575 Virtualization Edge (NVE) [I-D.lasserre-nvo3-framework] associates 576 the VM with a specific overlay for the purposes of tunneling traffic 577 sourced from or destined to the VM. When a VM disconnects, it is 578 removed from the overlay and the NVE effectively terminates any 579 tunnels associated with the VM. To achieve this functionality, a 580 standardized interaction between the NVE and hypervisor may be 581 needed, for example in the case where the NVE resides on a separate 582 device from the VM. 584 In summary, there are three areas of potential work. The first area 585 concerns the oracle itself and any on-the-wire protocols it needs. A 586 second area concerns the interaction between the oracle and NVEs. 588 The third work area concerns protocols associated with attaching and 589 detaching a VM from a particular virtual network instance. All three 590 work areas are important to the development of scalable, 591 interoperable solutions. 593 4. Related IETF and IEEE Work 595 The following subsections discuss related IETF and IEEE work in 596 progress, the items are not meant to be complete coverage of all IETF 597 and IEEE data center related work, nor are the descriptions 598 comprehensive. Each area is currently trying to address certain 599 limitations of today's data center networks, e.g., scaling is a 600 common issue for every area listed and multi-tenancy and VM mobility 601 are important focus areas as well. Comparing and evaluating the work 602 result and progress of each work area listed is out of scope of this 603 document. The intent of this section is to provide a reference to 604 the interested readers. 606 4.1. L3 BGP/MPLS IP VPNs 608 BGP/MPLS IP VPNs [RFC4364] support multi-tenancy address overlapping, 609 VPN traffic isolation, and address separation between tenants and 610 network infrastructure. The BGP/MPLS control plane is used to 611 distribute the VPN labels and the tenant IP addresses which identify 612 the tenants (or to be more specific, the particular VPN/VN) and 613 tenant IP addresses. Deployment of enterprise L3 VPNs has been shown 614 to scale to thousands of VPNs and millions of VPN prefixes. BGP/MPLS 615 IP VPNs are currently deployed in some large enterprise data centers. 616 The potential limitation for deploying BGP/MPLS IP VPNs in data 617 center environments is the practicality of using BGP in the data 618 center, especially reaching into the servers or hypervisors. There 619 may be computing work force skill set issues, equipment support 620 issues, and potential new scaling challenges. A combination of BGP 621 and lighter weight IP signaling protocols, e.g., XMPP, have been 622 proposed to extend the solutions into DC environment 623 [I-D.marques-l3vpn-end-system], while taking advantage of building in 624 VPN features with its rich policy support; it is especially useful 625 for inter-tenant connectivity. 627 4.2. L2 BGP/MPLS IP VPNs 629 Ethernet Virtual Private Networks (E-VPNs) [I-D.ietf-l2vpn-evpn] 630 provide an emulated L2 service in which each tenant has its own 631 Ethernet network over a common IP or MPLS infrastructure and a BGP/ 632 MPLS control plane is used to distribute the tenant MAC addresses and 633 the MPLS labels that identify the tenants and tenant MAC addresses. 634 Within the BGP/MPLS control plane a thirty two bit Ethernet Tag is 635 used to identify the broadcast domains (VLANs) associated with a 636 given L2 VLAN service instance and these Ethernet tags are mapped to 637 VLAN IDs understood by the tenant at the service edges. This means 638 that the limit of 4096 VLANs is associated with an individual tenant 639 service edge, enabling a much higher level of scalability. 640 Interconnection between tenants is also allowed in a controlled 641 fashion. 643 VM Mobility [I-D.raggarwa-data-center-mobility] introduces the 644 concept of a combined L2/L3 VPN service in order to support the 645 mobility of individual Virtual Machines (VMs) between Data Centers 646 connected over a common IP or MPLS infrastructure. 648 4.3. IEEE 802.1aq - Shortest Path Bridging 650 Shortest Path Bridging (SPB-M) is an IS-IS based overlay for L2 651 Ethernets. SPB-M supports multi-pathing and addresses a number of 652 shortcoming in the original Ethernet Spanning Tree Protocol. SPB-M 653 uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit 654 I-SID, which can be used to identify virtual network instances. 655 SPB-M is entirely L2 based, extending the L2 Ethernet bridging model. 657 4.4. ARMD 659 ARMD is chartered to look at data center scaling issues with a focus 660 on address resolution. ARMD is currently chartered to develop a 661 problem statement and is not currently developing solutions. While 662 an overlay-based approach may address some of the "pain points" that 663 have been raised in ARMD (e.g., better support for multi-tenancy), an 664 overlay approach may also push some of the L2 scaling concerns (e.g., 665 excessive flooding) to the IP level (flooding via IP multicast). 666 Analysis will be needed to understand the scaling tradeoffs of an 667 overlay based approach compared with existing approaches. On the 668 other hand, existing IP-based approaches such as proxy ARP may help 669 mitigate some concerns. 671 4.5. TRILL 673 TRILL is an L2-based approach aimed at improving deficiencies and 674 limitations with current Ethernet networks and STP in particular. 675 Although it differs from Shortest Path Bridging in many architectural 676 and implementation details, it is similar in that is provides an L2- 677 based service to end systems. TRILL as defined today, supports only 678 the standard (and limited) 12-bit VLAN model. Approaches to extend 679 TRILL to support more than 4094 VLANs are currently under 680 investigation [I-D.ietf-trill-fine-labeling] 682 4.6. L2VPNs 684 The IETF has specified a number of approaches for connecting L2 685 domains together as part of the L2VPN Working Group. That group, 686 however has historically been focused on Provider-provisioned L2 687 VPNs, where the service provider participates in management and 688 provisioning of the VPN. In addition, much of the target environment 689 for such deployments involves carrying L2 traffic over WANs. Overlay 690 approaches are intended be used within data centers where the overlay 691 network is managed by the data center operator, rather than by an 692 outside party. While overlays can run across the Internet as well, 693 they will extend well into the data center itself (e.g., up to and 694 including hypervisors) and include large numbers of machines within 695 the data center itself. 697 Other L2VPN approaches, such as L2TP [RFC2661] require significant 698 tunnel state at the encapsulating and decapsulating end points. 699 Overlays require less tunnel state than other approaches, which is 700 important to allow overlays to scale to hundreds of thousands of end 701 points. It is assumed that smaller switches (i.e., virtual switches 702 in hypervisors or the adjacent devices to which VMs connect) will be 703 part of the overlay network and be responsible for encapsulating and 704 decapsulating packets. 706 4.7. Proxy Mobile IP 708 Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field 709 [RFC5845] [RFC6245], but not in a way that supports multi-tenancy. 711 4.8. LISP 713 LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where 714 the internal addresses are end station Identifiers and the outer IP 715 addresses represent the location of the end station within the core 716 IP network topology. The LISP overlay header uses a 24-bit Instance 717 ID used to support overlapping inner IP addresses. 719 5. Further Work 721 It is believed that overlay-based approaches may be able to reduce 722 the overall amount of flooding and other multicast and broadcast 723 related traffic (e.g, ARP and ND) currently experienced within 724 current data centers with a large flat L2 network. Further analysis 725 is needed to characterize expected improvements. 727 There are a number of VPN approaches that provide some if not all of 728 the desired semantics of virtual networks. A gap analysis will be 729 needed to assess how well existing approaches satisfy the 730 requirements. 732 6. Summary 734 This document has argued that network virtualization using overlays 735 addresses a number of issues being faced as data centers scale in 736 size. In addition, careful study of current data center problems is 737 needed for development of proper requirements and standard solutions. 739 Three potential work were identified. The first involves the 740 interaction that take place when a VM attaches or detaches from an 741 overlay. A second involves the protocol an NVE would use to 742 communicate with a backend "oracle" to learn and disseminate mapping 743 information about the VMs the NVE communicates with. The third 744 potential work area involves the backend oracle itself, i.e., how it 745 provides failover and how it interacts with oracles in other domains. 747 7. Acknowledgments 749 Helpful comments and improvements to this document have come from 750 John Drake, Ariel Hendel, Vinit Jain, Thomas Morin, Benson Schliesser 751 and many others on the mailing list. 753 8. IANA Considerations 755 This memo includes no request to IANA. 757 9. Security Considerations 759 TBD 761 10. Informative References 763 [I-D.ietf-l2vpn-evpn] 764 Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F., 765 Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN", 766 draft-ietf-l2vpn-evpn-01 (work in progress), July 2012. 768 [I-D.ietf-lisp] 769 Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, 770 "Locator/ID Separation Protocol (LISP)", 771 draft-ietf-lisp-23 (work in progress), May 2012. 773 [I-D.ietf-trill-fine-labeling] 774 Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D. 775 Dutt, "TRILL: Fine-Grained Labeling", 776 draft-ietf-trill-fine-labeling-01 (work in progress), 777 June 2012. 779 [I-D.lasserre-nvo3-framework] 780 Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 781 Rekhter, "Framework for DC Network Virtualization", 782 draft-lasserre-nvo3-framework-03 (work in progress), 783 July 2012. 785 [I-D.marques-l3vpn-end-system] 786 Marques, P., Fang, L., Pan, P., Shukla, A., Napierala, M., 787 and N. Bitar, "BGP-signaled end-system IP/VPNs.", 788 draft-marques-l3vpn-end-system-07 (work in progress), 789 August 2012. 791 [I-D.raggarwa-data-center-mobility] 792 Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., 793 and L. Fang, "Data Center Mobility based on BGP/MPLS, IP 794 Routing and NHRP", draft-raggarwa-data-center-mobility-03 795 (work in progress), June 2012. 797 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 798 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 799 RFC 2661, August 1999. 801 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 802 Networks (VPNs)", RFC 4364, February 2006. 804 [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., 805 and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. 807 [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy 808 Mobile IPv6", RFC 5844, May 2010. 810 [RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung, 811 "Generic Routing Encapsulation (GRE) Key Option for Proxy 812 Mobile IPv6", RFC 5845, June 2010. 814 [RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J. 815 Navali, "Generic Routing Encapsulation (GRE) Key Extension 816 for Mobile IPv4", RFC 6245, May 2011. 818 [RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. 819 Ghanwani, "Routing Bridges (RBridges): Base Protocol 820 Specification", RFC 6325, July 2011. 822 [SPBM] "IEEE P802.1aq/D4.5 Draft Standard for Local and 823 Metropolitan Area Networks -- Media Access Control (MAC) 824 Bridges and Virtual Bridged Local Area Networks, 825 Amendment 8: Shortest Path Bridging", February 2012. 827 Appendix A. Change Log 829 A.1. Changes from draft-narten-nvo3-overlay-problem-statement-04.txt 831 1. This document has only one substantive change relative to 832 draft-narten-nvo3-overlay-problem-statement-04.txt. Two 833 sentences were removed per the discussion that led to WG adoption 834 of this document. 836 Authors' Addresses 838 Thomas Narten (editor) 839 IBM 841 Email: narten@us.ibm.com 843 David Black 844 EMC 846 Email: david.black@emc.com 848 Dinesh Dutt 850 Email: ddutt.ietf@hobbesdutt.com 852 Luyuan Fang 853 Cisco Systems 854 111 Wood Avenue South 855 Iselin, NJ 08830 856 USA 858 Email: lufang@cisco.com 859 Eric Gray (editor) 860 Ericsson 862 Email: eric.gray@ericsson.com 864 Lawrence Kreeger 865 Cisco 867 Email: kreeger@cisco.com 869 Maria Napierala 870 AT&T 871 200 Laurel Avenue 872 Middletown, NJ 07748 873 USA 875 Email: mnapierala@att.com 877 Murari Sridharan 878 Microsoft 880 Email: muraris@microsoft.com