idnits 2.17.00 (12 Aug 2021) /tmp/idnits12148/draft-ietf-nvo3-overlay-problem-statement-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 10, 2013) is 3298 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: draft-ietf-l2vpn-evpn has been published as RFC 7432 == Outdated reference: A later version (-06) exists of draft-ietf-l3vpn-end-system-01 == Outdated reference: draft-ietf-nvo3-framework has been published as RFC 7365 == Outdated reference: draft-ietf-trill-fine-labeling has been published as RFC 7172 == Outdated reference: A later version (-07) exists of draft-raggarwa-data-center-mobility-04 Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force T. Narten, Ed. 3 Internet-Draft IBM 4 Intended status: Informational E. Gray, Ed. 5 Expires: November 11, 2013 Ericsson 6 D. Black 7 EMC 8 L. Fang 10 L. Kreeger 11 Cisco 12 M. Napierala 13 AT&T 14 May 10, 2013 16 Problem Statement: Overlays for Network Virtualization 17 draft-ietf-nvo3-overlay-problem-statement-03 19 Abstract 21 This document describes issues associated with providing multi- 22 tenancy in large data center networks and how these issues may be 23 addressed using an overlay-based network virtualization approach. A 24 key multi-tenancy requirement is traffic isolation, so that one 25 tenant's traffic is not visible to any other tenant. Another 26 requirement is address space isolation, so that different tenants can 27 use the same address space within different virtual networks. 28 Traffic and address space isolation is achieved by assigning one or 29 more virtual networks to each tenant, where traffic within a virtual 30 network can only cross into another virtual network in a controlled 31 fashion (e.g., via a configured router and/or a security gateway). 32 Additional functionality is required to provision virtual networks, 33 associating a virtual machine's network interface(s) with the 34 appropriate virtual network, and maintaining that association as the 35 virtual machine is activated, migrated and/or deactivated. Use of an 36 overlay-based approach enables scalable deployment on large network 37 infrastructures. 39 Status of this Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at http://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on November 11, 2013. 56 Copyright Notice 58 Copyright (c) 2013 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 75 3. Problem Areas . . . . . . . . . . . . . . . . . . . . . . . . 6 76 3.1. Need For Dynamic Provisioning . . . . . . . . . . . . . . 6 77 3.2. Virtual Machine Mobility Limitations . . . . . . . . . . . 7 78 3.3. Inadequate Forwarding Table Sizes . . . . . . . . . . . . 7 79 3.4. Need to Decouple Logical and Physical Configuration . . . 7 80 3.5. Need For Address Separation Between Virtual Networks . . . 8 81 3.6. Need For Address Separation Between Virtual Networks 82 and Infrastructure . . . . . . . . . . . . . . . . . . . . 8 83 3.7. Optimal Forwarding . . . . . . . . . . . . . . . . . . . . 9 84 4. Using Network Overlays to Provide Virtual Networks . . . . . . 10 85 4.1. Overview of Network Overlays . . . . . . . . . . . . . . . 10 86 4.2. Communication Between Virtual and Non-virtualized 87 Networks . . . . . . . . . . . . . . . . . . . . . . . . . 12 88 4.3. Communication Between Virtual Networks . . . . . . . . . . 12 89 4.4. Overlay Design Characteristics . . . . . . . . . . . . . . 13 90 4.5. Control Plane Overlay Networking Work Areas . . . . . . . 14 91 4.6. Data Plane Work Areas . . . . . . . . . . . . . . . . . . 15 92 5. Related IETF and IEEE Work . . . . . . . . . . . . . . . . . . 15 93 5.1. BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . . . 15 94 5.2. BGP/MPLS Ethernet VPNs . . . . . . . . . . . . . . . . . . 16 95 5.3. 802.1 VLANs . . . . . . . . . . . . . . . . . . . . . . . 16 96 5.4. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 17 97 5.5. VDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 98 5.6. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 99 5.7. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 18 100 5.8. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 18 101 5.9. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 18 102 5.10. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 103 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 104 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 105 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 19 106 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 107 10. Security Considerations . . . . . . . . . . . . . . . . . . . 20 108 11. Informative References . . . . . . . . . . . . . . . . . . . . 20 109 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 22 110 A.1. Changes From -02 to -03 . . . . . . . . . . . . . . . . . 22 111 A.2. Changes From -01 to -02 . . . . . . . . . . . . . . . . . 22 112 A.3. Changes From -00 to -01 . . . . . . . . . . . . . . . . . 22 113 A.4. Changes from 114 draft-narten-nvo3-overlay-problem-statement-04.txt . . . . 23 115 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 117 1. Introduction 119 Data Centers are increasingly being consolidated and outsourced in an 120 effort to improve the deployment time of applications and reduce 121 operational costs. This coincides with an increasing demand for 122 compute, storage, and network resources from applications. In order 123 to scale compute, storage, and network resources, physical resources 124 are being abstracted from their logical representation, in what is 125 referred to as server, storage, and network virtualization. 126 Virtualization can be implemented in various layers of computer 127 systems or networks. 129 The demand for server virtualization is increasing in data centers. 130 With server virtualization, each physical server supports multiple 131 virtual machines (VMs), each running its own operating system, 132 middleware and applications. Virtualization is a key enabler of 133 workload agility, i.e., allowing any server to host any application 134 and providing the flexibility of adding, shrinking, or moving 135 services within the physical infrastructure. Server virtualization 136 provides numerous benefits, including higher utilization, increased 137 security, reduced user downtime, reduced power usage, etc. 139 Multi-tenant data centers are taking advantage of the benefits of 140 server virtualization to provide a new kind of hosting, a virtual 141 hosted data center. Multi-tenant data centers are ones where 142 individual tenants could belong to a different company (in the case 143 of a public provider) or a different department (in the case of an 144 internal company data center). Each tenant has the expectation of a 145 level of security and privacy separating their resources from those 146 of other tenants. For example, one tenant's traffic must never be 147 exposed to another tenant, except through carefully controlled 148 interfaces, such as a security gateway (e.g., a firewall). 150 To a tenant, virtual data centers are similar to their physical 151 counterparts, consisting of end stations attached to a network, 152 complete with services such as load balancers and firewalls. But 153 unlike a physical data center, tenant systems connect to a virtual 154 network. To tenant systems, a virtual network looks like a normal 155 network (e.g., providing an ethernet or L3 service), except that the 156 only end stations connected to the virtual network are those 157 belonging to a tenant's specific virtual network. 159 A tenant is the administrative entity on whose behalf one or more 160 specific virtual network instance and its associated services 161 (whether virtual or physical) are managed. In a cloud environment, a 162 tenant would correspond to the customer that is using a particular 163 virtual network. However, a tenant may also find it useful to create 164 multiple different virtual network instances. Hence, there is a one- 165 to-many mapping between tenants and virtual network instances. A 166 single tenant may operate multiple individual virtual network 167 instances, each associated with a different service. 169 How a virtual network is implemented does not generally matter to the 170 tenant; what matters is that the service provided (L2 or L3) has the 171 right semantics, performance, etc. It could be implemented via a 172 pure routed network, a pure bridged network or a combination of 173 bridged and routed networks. A key requirement is that each 174 individual virtual network instance be isolated from other virtual 175 network instances, with traffic crossing from one virtual network to 176 another only when allowed by policy. 178 For data center virtualization, two key issues must be addressed. 179 First, address space separation between tenants must be supported. 180 Second, it must be possible to place (and migrate) VMs anywhere in 181 the data center, without restricting VM addressing to match the 182 subnet boundaries of the underlying data center network. 184 The document outlines problems encountered in scaling the number of 185 isolated virtual networks in a data center. Furthermore, the 186 document presents issues associated with managing those virtual 187 networks, in relation to operations, such as virtual network 188 creation/deletion and end-node membership change. Finally, the 189 document makes the case that an overlay based approach has a number 190 of advantages over traditional, non-overlay approaches. The purpose 191 of this document is to identify the set of issues that any solution 192 has to address in building multi-tenant data centers. With this 193 approach, the goal is to allow the construction of standardized, 194 interoperable implementations to allow the construction of multi- 195 tenant data centers. 197 This document is the problem statement for the "Network 198 Virtualization over L3" (NVO3) Working Group. NVO3 is focused on the 199 construction of overlay networks that operate over an IP (L3) 200 underlay transport network. NVO3 expects to provide both L2 service 201 and IP service to end devices (though perhaps as two different 202 solutions). Some deployments require an L2 service, others an L3 203 service, and some may require both. 205 Section 2 gives terminology. Section 3 describes the problem space 206 details. Section 4 describes overlay networks in more detail. 207 Sections 5 and 6 review related and further work, while Section 7 208 closes with a summary. 210 2. Terminology 212 This document uses the same terminology as [I-D.ietf-nvo3-framework]. 213 In addition, this document use the following terms. 215 In-Band Virtual Network: A Virtual Network that separates tenant 216 traffic without hiding tenant forwarding information from the 217 physical infrastructure. The Tenant System may also retain 218 visibility of a tenant within the underlying physical 219 infrastructure. IEEE 802.1Q-1998 networks only using C-VIDs are 220 an example of an in-band Virtual Network. 222 Overlay Virtual Network: A Virtual Network in which the separation 223 of tenants is hidden from the underlying physical infrastructure. 224 That is, the underlying transport network does not need to know 225 about tenancy separation to correctly forward traffic. IEEE 802.1 226 Provider Backbone Bridging (PBB) [IEEE-802.1Q] is an example of an 227 L2 Overlay Network. PBB uses MAC-in-MAC encapsulation and the 228 underlying transport network forwards traffic using only the B-MAC 229 and B-VID in the outer header. The underlay transport network is 230 unaware of the tenancy separation provided by, for example, a 24- 231 bit I-SID. 233 C-VLAN: This document refers to C-VLANs as implemented by many 234 routers, i.e., an L2 virtual network identified by a C-VID. An 235 end station (e.g., a VM) in this context that is part of an L2 236 virtual network will effectively belong to a C-VLAN. Within an 237 IEEE 802.1Q-2011 network, other tags may be used as well, but such 238 usage is generally not visible to the end station. Section 5.3 239 provides more details on VLANs defined by [802.1Q]. 241 3. Problem Areas 243 The following subsections describe aspects of multi-tenant data 244 center networking that pose problems for network infrastructure. 245 Different problem aspects may arise based on the network architecture 246 and scale. 248 3.1. Need For Dynamic Provisioning 250 Cloud computing involves on-demand provisioning of resources for 251 multi-tenant environments. A common example of cloud computing is 252 the public cloud, where a cloud service provider offers elastic 253 services to multiple customers over the same infrastructure. In 254 current systems, it can be difficult to provision resources for 255 individual tenants (e.g., QoS) in such a way that provisioned 256 properties migrate automatically when services are dynamically moved 257 around within the data center to optimize workloads. 259 3.2. Virtual Machine Mobility Limitations 261 A key benefit of server virtualization is virtual machine (VM) 262 mobility. A VM can be migrated from one server to another, live, 263 i.e., while continuing to run and without needing to shut it down and 264 restart it at the new location. A key requirement for live migration 265 is that a VM retain critical network state at its new location, 266 including its IP and MAC address(es). Preservation of MAC addresses 267 may be necessary, for example, when software licenses are bound to 268 MAC addresses. More generally, any change in the VM's MAC addresses 269 resulting from a move would be visible to the VM and thus potentially 270 result in unexpected disruptions. Retaining IP addresses after a 271 move is necessary to prevent existing transport connections (e.g., 272 TCP) from breaking and needing to be restarted. 274 In data center networks, servers are typically assigned IP addresses 275 based on their physical location, for example based on the Top of 276 Rack (ToR) switch for the server rack or the C-VLAN configured to the 277 server. Servers can only move to other locations within the same IP 278 subnet. This constraint is not problematic for physical servers, 279 which move infrequently, but it restricts the placement and movement 280 of VMs within the data center. Any solution for a scalable multi- 281 tenant data center must allow a VM to be placed (or moved) anywhere 282 within the data center, without being constrained by the subnet 283 boundary concerns of the host servers. 285 3.3. Inadequate Forwarding Table Sizes 287 Today's virtualized environments place additional demands on the 288 forwarding tables of forwarding nodes in the physical infrastructure. 289 The core problem is that location independence results in specific 290 end state information being propagated into the forwarding system 291 (e.g., /32 host routes in L3 networks, or MAC addresses in L2 292 networks). In L2 networks, for instance, instead of just one address 293 per server, the network infrastructure may have to learn addresses of 294 the individual VMs (which could range in the 100s per server). This 295 increases the demand on a forwarding node's table capacity compared 296 to non-virtualized environments. 298 3.4. Need to Decouple Logical and Physical Configuration 300 Data center operators must be able to achieve high utilization of 301 server and network capacity. For efficient and flexible allocation, 302 operators should be able to spread a virtual network instance across 303 servers in any rack in the data center. It should also be possible 304 to migrate compute workloads to any server anywhere in the network 305 while retaining the workload's addresses. 307 In networks of many types (e.g., IP subnets, MPLS VPNs, VLANs, etc.) 308 moving servers elsewhere in the network may require expanding the 309 scope of a portion of the network (e.g., subnet, VPN, VLAN, etc.) 310 beyond its original boundaries. While this can be done, it requires 311 potentially complex network configuration changes and may (in some 312 cases - e.g., a VLAN or L2VPN) conflict with the desire to bound the 313 size of broadcast domains. In addition, when VMs migrate, the 314 physical network (e.g., access lists) may need to be reconfigured 315 which can be time consuming and error prone. 317 An important use case is cross-pod expansion. A pod typically 318 consists of one or more racks of servers with associated network and 319 storage connectivity. A tenant's virtual network may start off on a 320 pod and, due to expansion, require servers/VMs on other pods, 321 especially the case when other pods are not fully utilizing all their 322 resources. This use case requires that virtual networks span 323 multiple pods in order to provide connectivity to all of its tenant's 324 servers/VMs. Such expansion can be difficult to achieve when tenant 325 addressing is tied to the addressing used by the underlay network or 326 when the expansion requires that the scope of the underlying C-VLAN 327 expand beyond its original pod boundary. 329 3.5. Need For Address Separation Between Virtual Networks 331 Individual tenants need control over the addresses they use within a 332 virtual network. But it can be problematic when different tenants 333 want to use the same addresses, or even if the same tenant wants to 334 reuse the same addresses in different virtual networks. 335 Consequently, virtual networks must allow tenants to use whatever 336 addresses they want without concern for what addresses are being used 337 by other tenants or other virtual networks. 339 3.6. Need For Address Separation Between Virtual Networks and 340 Infrastructure 342 As in the previous case, a tenant needs to be able to use whatever 343 addresses it wants in a virtual network independent of what addresses 344 the underlying data center network is using. Tenants (and the 345 underlay infrastructure provider) should be able use whatever 346 addresses make sense for them, without having to worry about address 347 collisions between addresses used by tenants and those used by the 348 underlay data center network. 350 3.7. Optimal Forwarding 352 Another problem area relates to the optimal forwarding of traffic 353 between peers that are not connected to the same virtual network. 354 Such forwarding happens when a host on a virtual network communicates 355 with a host not on any virtual network (e.g., an Internet host) as 356 well as when a host on a virtual network communicates with a host on 357 a different virtual network. A virtual network may have two (or 358 more) gateways for forwarding traffic onto and off of the virtual 359 network and the optimal choice of which gateway to use may depend on 360 the set of available paths between the communicating peers. The set 361 of available gateways may not be equally "close" to a given 362 destination. The issue appears both when a VM is initially 363 instantiated on a virtual network or when a VM migrates or is moved 364 to a different location. After a migration, for instance, a VM's 365 best-choice gateway for such traffic may change, i.e., the VM may get 366 better service by switching to the "closer" gateway, and this may 367 improve the utilization of network resources. 369 IP implementations in network endpoints typically do not distinguish 370 between multiple routers on the same subnet - there may only be a 371 single default gateway in use, and any use of multiple routers 372 usually considers all of them to be one-hop away. Routing protocol 373 functionality is constrained by the requirement to cope with these 374 endpoint limitations - for example VRRP has one router serve as the 375 master to handle all outbound traffic. This problem can be 376 particularly acute when the virtual network spans multiple data 377 centers, as a VM is likely to receive significantly better service 378 when forwarding external traffic through a local router by comparison 379 to using a router at a remote data center. 381 The optimal forwarding problem applies to both outbound and inbound 382 traffic. For outbound traffic, the choice of outbound router 383 determines the path of outgoing traffic from the VM, which may be 384 sub-optimal after a VM move. For inbound traffic, the location of 385 the VM within the IP subnet for the VM is not visible to the routers 386 beyond the virtual network. Thus, the routing infrastructure will 387 have no information as to which of the two externally visible 388 gateways leading into the virtual network would be the better choice 389 for reaching a particular VM. 391 The issue is further complicated when middleboxes (e.g., load- 392 balancers, firewalls, etc.) must be traversed. Middle boxes may have 393 session state that must be preserved for ongoing communication, and 394 traffic must continue to flow through the middle box, regardless of 395 which router is "closest". 397 4. Using Network Overlays to Provide Virtual Networks 399 Virtual Networks are used to isolate a tenant's traffic from that of 400 other tenants (or even traffic within the same tenant network that 401 requires isolation). There are two main characteristics of virtual 402 networks: 404 1. Virtual networks isolate the address space used in one virtual 405 network from the address space used by another virtual network. 406 The same network addresses may be used in different virtual 407 networks at the same time. In addition, the address space used 408 by a virtual network is independent from that used by the 409 underlying physical network. 411 2. Virtual Networks limit the scope of packets sent on the virtual 412 network. Packets sent by Tenant Systems attached to a virtual 413 network are delivered as expected to other Tenant Systems on that 414 virtual network and may exit a virtual network only through 415 controlled exit points such as a security gateway. Likewise, 416 packets sourced from outside of the virtual network may enter the 417 virtual network only through controlled entry points, such as a 418 security gateway. 420 4.1. Overview of Network Overlays 422 To address the problems described in Section 3, a network overlay 423 approach can be used. 425 The idea behind an overlay is quite straightforward. Each virtual 426 network instance is implemented as an overlay. The original packet 427 is encapsulated by the first-hop network device, called a Network 428 Virtualization Edge (NVE). The encapsulation identifies the 429 destination of the device that will perform the decapsulation (i.e., 430 the egress NVE) before delivering the original packet to the 431 endpoint. The rest of the network forwards the packet based on the 432 encapsulation header and can be oblivious to the payload that is 433 carried inside. 435 Overlays are based on what is commonly known as a "map-and-encap" 436 architecture. When processing and forwarding packets, three distinct 437 and logically separable steps take place: 439 1. The first-hop overlay device implements a mapping operation that 440 determines where the encapsulated packet should be sent to reach 441 its intended destination VM. Specifically, the mapping function 442 maps the destination address (either L2 or L3) of a packet 443 received from a VM into the corresponding destination address of 444 the egress NVE device. The destination address will be the 445 underlay address of the NVE device doing the decapsulation and is 446 an IP address. 448 2. Once the mapping has been determined, the ingress overlay NVE 449 device encapsulates the received packet within an overlay header. 451 3. The final step is to actually forward the (now encapsulated) 452 packet to its destination. The packet is forwarded by the 453 underlay (i.e., the IP network) based entirely on its outer 454 address. Upon receipt at the destination, the egress overlay NVE 455 device decapsulates the original packet and delivers it to the 456 intended recipient VM. 458 Each of the above steps is logically distinct, though an 459 implementation might combine them for efficiency or other reasons. 460 It should be noted that in L3 BGP/VPN terminology, the above steps 461 are commonly known as "forwarding" or "virtual forwarding". 463 The first hop network NVE device can be a traditional switch or 464 router or the virtual switch residing inside a hypervisor. 465 Furthermore, the endpoint can be a VM or it can be a physical server. 466 Examples of architectures based on network overlays include BGP/MPLS 467 VPNs [RFC4364], TRILL [RFC6325], LISP [RFC6830], and Shortest Path 468 Bridging (SPB) [SPB]. 470 In the data plane, an overlay header provides a place to carry either 471 the virtual network identifier, or an identifier that is locally- 472 significant to the edge device. In both cases, the identifier in the 473 overlay header specifies which specific virtual network the data 474 packet belongs to. Since both routed and bridged semantics can be 475 supported by a virtual data center, the original packet carried 476 within the overlay header can be an Ethernet frame or just the IP 477 packet. 479 A key aspect of overlays is the decoupling of the "virtual" MAC 480 and/or IP addresses used by VMs from the physical network 481 infrastructure and the infrastructure IP addresses used by the data 482 center. If a VM changes location, the overlay edge devices simply 483 update their mapping tables to reflect the new location of the VM 484 within the data center's infrastructure space. Because an overlay 485 network is used, a VM can now be located anywhere in the data center 486 that the overlay reaches without regards to traditional constraints 487 imposed by the underlay network such as the C-VLAN scope, or the IP 488 subnet scope. 490 Multi-tenancy is supported by isolating the traffic of one virtual 491 network instance from traffic of another. Traffic from one virtual 492 network instance cannot be delivered to another instance without 493 (conceptually) exiting the instance and entering the other instance 494 via an entity (e.g., a gateway) that has connectivity to both virtual 495 network instances. Without the existence of a gateway entity, tenant 496 traffic remains isolated within each individual virtual network 497 instance. 499 Overlays are designed to allow a set of VMs to be placed within a 500 single virtual network instance, whether that virtual network 501 provides a bridged network or a routed network. 503 4.2. Communication Between Virtual and Non-virtualized Networks 505 Not all communication will be between devices connected to 506 virtualized networks. Devices using overlays will continue to access 507 devices and make use of services on non-virtualized networks, whether 508 in the data center, the public Internet, or at remote/branch 509 campuses. Any virtual network solution must be capable of 510 interoperating with existing routers, VPN services, load balancers, 511 intrusion detection services, firewalls, etc. on external networks. 513 Communication between devices attached to a virtual network and 514 devices connected to non-virtualized networks is handled 515 architecturally by having specialized gateway devices that receive 516 packets from a virtualized network, decapsulate them, process them as 517 regular (i.e., non-virtualized) traffic, and finally forward them on 518 to their appropriate destination (and vice versa). 520 A wide range of implementation approaches are possible. Overlay 521 gateway functionality could be combined with other network 522 functionality into a network device that implements the overlay 523 functionality, and then forwards traffic between other internal 524 components that implement functionality such as full router service, 525 load balancing, firewall support, VPN gateway, etc. 527 4.3. Communication Between Virtual Networks 529 Communication between devices on different virtual networks is 530 handled architecturally by adding specialized interconnect 531 functionality among the otherwise isolated virtual networks. For a 532 virtual network providing an L2 service, such interconnect 533 functionality could be IP forwarding configured as part of the 534 "default gateway" for each virtual network. For a virtual network 535 providing L3 service, the interconnect functionality could be IP 536 forwarding configured as part of routing between IP subnets or it can 537 be based on configured inter-virtual-network traffic policies. In 538 both cases, the implementation of the interconnect functionality 539 could be distributed across the NVEs and could be combined with other 540 network functionality (e.g., load balancing, firewall support) that 541 is applied to traffic forwarded between virtual networks. 543 4.4. Overlay Design Characteristics 545 Below are some of the characteristics of environments that must be 546 taken into account by the overlay technology. 548 1. Highly distributed systems: The overlay should work in an 549 environment where there could be many thousands of access 550 switches (e.g., residing within the hypervisors) and many more 551 Tenant Systems (e.g., VMs) connected to them. This leads to a 552 distributed mapping system that puts a low overhead on the 553 overlay tunnel endpoints. 555 2. Many highly distributed virtual networks with sparse membership: 556 Each virtual network could be highly dispersed inside the data 557 center. Also, along with expectation of many virtual networks, 558 the number of end systems connected to any one virtual network is 559 expected to be relatively low; Therefore, the percentage of NVEs 560 participating in any given virtual network would also be expected 561 to be low. For this reason, efficient delivery of multi- 562 destination traffic within a virtual network instance should be 563 taken into consideration. 565 3. Highly dynamic Tenant Systems: Tenant Systems connected to 566 virtual networks can be very dynamic, both in terms of creation/ 567 deletion/power-on/off and in terms of mobility from one access 568 device to another. 570 4. Be incrementally deployable, without necessarily requiring major 571 upgrade of the entire network: The first hop device (or end 572 system) that adds and removes the overlay header may require new 573 software and may require new hardware (e.g., for improved 574 performance). But the rest of the network should not need to 575 change just to enable the use of overlays. 577 5. Work with existing data center network deployments without 578 requiring major changes in operational or other practices: For 579 example, some data centers have not enabled multicast beyond 580 link-local scope. Overlays should be capable of leveraging 581 underlay multicast support where appropriate, but not require its 582 enablement in order to use an overlay solution. 584 6. Network infrastructure administered by a single administrative 585 domain: This is consistent with operation within a data center, 586 and not across the Internet. 588 4.5. Control Plane Overlay Networking Work Areas 590 There are three specific and separate potential work areas in the 591 area of control plane protocols needed to realize an overlay 592 solution. The areas correspond to different possible "on-the-wire" 593 protocols, where distinct entities interact with each other. 595 One area of work concerns the address dissemination protocol an NVE 596 uses to build and maintain the mapping tables it uses to deliver 597 encapsulated packets to their proper destination. One approach is to 598 build mapping tables entirely via learning (as is done in 802.1 599 networks). Another approach is to use a specialized control plane 600 protocol. While there are some advantages to using or leveraging an 601 existing protocol for maintaining mapping tables, the fact that large 602 numbers of NVE's will likely reside in hypervisors places constraints 603 on the resources (cpu and memory) that can be dedicated to such 604 functions. 606 From an architectural perspective, one can view the address mapping 607 dissemination problem as having two distinct and separable 608 components. The first component consists of a back-end Network 609 Virtualization Authority (NVA) that is responsible for distributing 610 and maintaining the mapping information for the entire overlay 611 system. For this document, we use the term NVA to refer to an entity 612 that supplies answers, without regard to how it knows the answers it 613 is providing. The second component consists of the on-the-wire 614 protocols an NVE uses when interacting with the NVA. 616 The back-end NVA could provide high performance, high resiliency, 617 failover, etc. and could be implemented in significantly different 618 ways. For example, one model uses a traditional, centralized 619 "directory-based" database, using replicated instances for 620 reliability and failover. A second model involves using and possibly 621 extending an existing routing protocol (e.g., BGP, IS-IS, etc.). To 622 support different architectural models, it is useful to have one 623 standard protocol for the NVE-NVA interaction while allowing 624 different protocols and architectural approaches for the NVA itself. 625 Separating the two allows NVEs to transparently interact with 626 different types of NVAs, i.e., either of the two architectural models 627 described above. Having separate protocols could also allow for a 628 simplified NVE that only interacts with the NVA for the mapping table 629 entries it needs and allows the NVA (and its associated protocols) to 630 evolve independently over time with minimal impact to the NVEs. 632 A third work area considers the attachment and detachment of VMs (or 633 Tenant Systems [I-D.ietf-nvo3-framework] more generally) from a 634 specific virtual network instance. When a VM attaches, the NVE 635 associates the VM with a specific overlay for the purposes of 636 tunneling traffic sourced from or destined to the VM. When a VM 637 disconnects, the NVE should notify the NVA that the Tenant System to 638 NVE address mapping is no longer valid. In addition, if this VM was 639 the last remaining member of the virtual network, then the NVE can 640 also terminate any tunnels used to deliver tenant multi-destination 641 packets within the VN to the NVE. In the case where an NVE and 642 hypervisor are on separate physical devices separated by an access 643 network, a standardized protocol may be needed. 645 In summary, there are three areas of potential work. The first area 646 concerns the implementation of the NVA function itself and any 647 protocols it needs (e.g., if implemented in a distributed fashion). 648 A second area concerns the interaction between the NVA and NVEs. The 649 third work area concerns protocols associated with attaching and 650 detaching a VM from a particular virtual network instance. All three 651 work areas are important to the development of scalable, 652 interoperable solutions. 654 4.6. Data Plane Work Areas 656 The data plane carries encapsulated packets for Tenant Systems. The 657 data plane encapsulation header carries a VN Context identifier 658 [I-D.ietf-nvo3-framework] for the virtual network to which the data 659 packet belongs. Numerous encapsulation or tunneling protocols 660 already exist that can be leveraged. In the absence of strong and 661 compelling justification, it would not seem necessary or helpful to 662 develop yet another encapsulation format just for NVO3. 664 5. Related IETF and IEEE Work 666 The following subsections discuss related IETF and IEEE work. The 667 items are not meant to provide complete coverage of all IETF and IEEE 668 data center related work, nor should the descriptions be considered 669 comprehensive. Each area aims to address particular limitations of 670 today's data center networks. In all areas, scaling is a common 671 theme as are multi-tenancy and VM mobility. Comparing and evaluating 672 the work result and progress of each work area listed is out of scope 673 of this document. The intent of this section is to provide a 674 reference to the interested readers. Note that NVO3 is scoped to 675 running over an IP/L3 underlay network. 677 5.1. BGP/MPLS IP VPNs 679 BGP/MPLS IP VPNs [RFC4364] support multi-tenancy, VPN traffic 680 isolation, address overlapping and address separation between tenants 681 and network infrastructure. The BGP/MPLS control plane is used to 682 distribute the VPN labels and the tenant IP addresses that identify 683 the tenants (or to be more specific, the particular VPN/virtual 684 network) and tenant IP addresses. Deployment of enterprise L3 VPNs 685 has been shown to scale to thousands of VPNs and millions of VPN 686 prefixes. BGP/MPLS IP VPNs are currently deployed in some large 687 enterprise data centers. The potential limitation for deploying BGP/ 688 MPLS IP VPNs in data center environments is the practicality of using 689 BGP in the data center, especially reaching into the servers or 690 hypervisors. There may be computing work force skill set issues, 691 equipment support issues, and potential new scaling challenges. A 692 combination of BGP and lighter weight IP signaling protocols, e.g., 693 XMPP, have been proposed to extend the solutions into DC environment 694 [I-D.ietf-l3vpn-end-system], while taking advantage of built-in VPN 695 features with its rich policy support; it is especially useful for 696 inter-tenant connectivity. 698 5.2. BGP/MPLS Ethernet VPNs 700 Ethernet Virtual Private Networks (E-VPNs) [I-D.ietf-l2vpn-evpn] 701 provide an emulated L2 service in which each tenant has its own 702 Ethernet network over a common IP or MPLS infrastructure. A BGP/MPLS 703 control plane is used to distribute the tenant MAC addresses and the 704 MPLS labels that identify the tenants and tenant MAC addresses. 705 Within the BGP/MPLS control plane a 32-bit Ethernet Tag is used to 706 identify the broadcast domains (VLANs) associated with a given L2 707 VLAN service instance and these Ethernet tags are mapped to VLAN IDs 708 understood by the tenant at the service edges. This means that any 709 customer site VLAN based limitation is associated with an individual 710 tenant service edge, enabling a much higher level of scalability. 711 Interconnection between tenants is also allowed in a controlled 712 fashion. 714 VM Mobility [I-D.raggarwa-data-center-mobility] introduces the 715 concept of a combined L2/L3 VPN service in order to support the 716 mobility of individual Virtual Machines (VMs) between Data Centers 717 connected over a common IP or MPLS infrastructure. 719 5.3. 802.1 VLANs 721 VLANs are a well understood construct in the networking industry, 722 providing an L2 service via an in-band L2 Virtual Network. A VLAN is 723 an L2 bridging construct that provides the semantics of virtual 724 networks mentioned above: a MAC address can be kept unique within a 725 VLAN, but it is not necessarily unique across VLANs. Traffic scoped 726 within a VLAN (including broadcast and multicast traffic) can be kept 727 within the VLAN it originates from. Traffic forwarded from one VLAN 728 to another typically involves router (L3) processing. The forwarding 729 table look up operation may be keyed on {VLAN, MAC address} tuples. 731 VLANs are a pure L2 bridging construct and VLAN identifiers are 732 carried along with data frames to allow each forwarding point to know 733 what VLAN the frame belongs to. Various types of VLANs are available 734 today, which can be used for network virtualization even together. 735 The C-VLAN, S-VLAN and B-VLAN IDs are 12 bits. The 24-bit I-SID 736 allows the support of more than 16 million virtual networks. 738 5.4. IEEE 802.1aq - Shortest Path Bridging 740 Shortest Path Bridging (SPB) [SPB] is an IS-IS based overlay that 741 operates over L2 Ethernets. SPB supports multi-pathing and addresses 742 a number of shortcomings in the original Ethernet Spanning Tree 743 Protocol. Shortest Path Bridging Mac (SPBM) uses IEEE 802.1ah PBB 744 (MAC-in-MAC) encapsulation and supports a 24-bit I-SID, which can be 745 used to identify virtual network instances. SPBM provides multi- 746 pathing and supports easy virtual network creation or update. 748 SPBM extends IS-IS in order to perform link-state routing among core 749 SPBM nodes, obviating the need for learning for communication among 750 core SPBM nodes. Learning is still used to build and maintain the 751 mapping tables of edge nodes to encapsulate Tenant System traffic for 752 transport across the SPBM core. 754 SPB is compatible with all other 802.1 standards thus allows 755 leveraging of other features, e.g., VSI Discovery Protocol (VDP), OAM 756 or scalability solutions. 758 5.5. VDP 760 VDP is the Virtual Station Interface (VSI) Discovery and 761 Configuration Protocol specified by IEEE P802.1Qbg [IEEE-802.1Qbg]. 762 VDP is a protocol that supports the association of a VSI with a port. 763 VDP is run between the end system (e.g., a hypervisor) and its 764 adjacent switch, i.e., the device on the edge of the network. VDP is 765 used for example to communicate to the switch that a Virtual Machine 766 (Virtual Station) is moving, i.e., designed for VM migration. 768 5.6. ARMD 770 The ARMD WG examined data center scaling issues with a focus on 771 address resolution and developed a problem statement document 772 [RFC6820]. While an overlay-based approach may address some of the 773 "pain points" that were raised in ARMD (e.g., better support for 774 multi-tenancy). Analysis will be needed to understand the scaling 775 tradeoffs of an overlay based approach compared with existing 776 approaches. On the other hand, existing IP-based approaches such as 777 proxy ARP may help mitigate some concerns. 779 5.7. TRILL 781 TRILL is a network protocol that provides an Ethernet L2 service to 782 end systems and is designed to operate over any L2 link type. TRILL 783 establishes forwarding paths using IS-IS routing and encapsulates 784 traffic within its own TRILL header. TRILL as defined today, 785 supports only the standard (and limited) 12-bit C-VID identifier. 786 Approaches to extend TRILL to support more than 4094 VLANs are 787 currently under investigation [I-D.ietf-trill-fine-labeling] 789 5.8. L2VPNs 791 The IETF has specified a number of approaches for connecting L2 792 domains together as part of the L2VPN Working Group. That group, 793 however has historically been focused on Provider-provisioned L2 794 VPNs, where the service provider participates in management and 795 provisioning of the VPN. In addition, much of the target environment 796 for such deployments involves carrying L2 traffic over WANs. Overlay 797 approaches as discussed in this document are intended be used within 798 data centers where the overlay network is managed by the data center 799 operator, rather than by an outside party. While overlays can run 800 across the Internet as well, they will extend well into the data 801 center itself (e.g., up to and including hypervisors) and include 802 large numbers of machines within the data center itself. 804 Other L2VPN approaches, such as L2TP [RFC3931] require significant 805 tunnel state at the encapsulating and decapsulating end points. 806 Overlays require less tunnel state than other approaches, which is 807 important to allow overlays to scale to hundreds of thousands of end 808 points. It is assumed that smaller switches (i.e., virtual switches 809 in hypervisors or the adjacent devices to which VMs connect) will be 810 part of the overlay network and be responsible for encapsulating and 811 decapsulating packets. 813 5.9. Proxy Mobile IP 815 Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field 816 [RFC5845] [RFC6245], but not in a way that supports multi-tenancy. 818 5.10. LISP 820 LISP[RFC6830] essentially provides an IP over IP overlay where the 821 internal addresses are end station Identifiers and the outer IP 822 addresses represent the location of the end station within the core 823 IP network topology. The LISP overlay header uses a 24-bit Instance 824 ID used to support overlapping inner IP addresses. 826 6. Summary 828 This document has argued that network virtualization using overlays 829 addresses a number of issues being faced as data centers scale in 830 size. In addition, careful study of current data center problems is 831 needed for development of proper requirements and standard solutions. 833 This document identified three potential control protocol work areas. 834 The first involves a backend Network Virtualization Authority and how 835 it learns and distributes the mapping information NVEs use when 836 processing tenant traffic. A second involves the protocol an NVE 837 would use to communicate with the backend NVA to obtain the mapping 838 information. The third potential work concerns the interactions that 839 take place when a VM attaches or detaches from an specific virtual 840 network instance. 842 There are a number of approaches that provide some if not all of the 843 desired semantics of virtual networks. Each approach needs to be 844 analyzed in detail to assess how well it satisfies the requirements. 846 7. Acknowledgments 848 Helpful comments and improvements to this document have come from Lou 849 Berger, John Drake, Ilango Ganga, Ariel Hendel, Vinit Jain, Petr 850 Lapukhov, Thomas Morin, Benson Schliesser, Qin Wu, Xiaohu Xu, Lucy 851 Yong and many others on the NVO3 mailing list. 853 Special thanks to Janos Farkas for his persistence and numerous 854 detailed comments related to the lack of precision in the text 855 relating to IEEE 802.1 technologies. 857 8. Contributors 859 Dinesh Dutt and Murari Sridharin were original co-authors of the 860 Internet-Draft that led to the BoF that formed the NVO3 WG. That 861 original draft eventually became the basis for the WG Problem 862 Statement document. 864 9. IANA Considerations 866 This memo includes no request to IANA. 868 10. Security Considerations 870 Because this document describes the problem space associated with the 871 need for virtualization of networks in complex, large-scale, data- 872 center networks, it does not itself introduce any security risks. 873 However, it is clear that security concerns need to be a 874 consideration of any solutions proposed to address this problem 875 space. 877 Solutions will need to address both data plane and control plane 878 security concerns. In the data plane, isolation between NVO3 domains 879 is a primary concern. Assurances against spoofing, snooping, transit 880 modification and denial of service are examples of other important 881 considerations. Some limited environments may even require 882 confidentially within domains. 884 In the control plane, the primary security concern is ensuring that 885 unauthorized control information is not installed for use in the data 886 plane. The prevention of the installation of improper control 887 information, and other forms of denial of service are also concerns. 888 Hereto, some environments may also be concerned about confidentiality 889 of the control plane. 891 11. Informative References 893 [I-D.ietf-l2vpn-evpn] 894 Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F., 895 Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN", 896 draft-ietf-l2vpn-evpn-03 (work in progress), 897 February 2013. 899 [I-D.ietf-l3vpn-end-system] 900 Marques, P., Fang, L., Pan, P., Shukla, A., Napierala, M., 901 and N. Bitar, "BGP-signaled end-system IP/VPNs.", 902 draft-ietf-l3vpn-end-system-01 (work in progress), 903 April 2013. 905 [I-D.ietf-nvo3-framework] 906 Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 907 Rekhter, "Framework for DC Network Virtualization", 908 draft-ietf-nvo3-framework-02 (work in progress), 909 February 2013. 911 [I-D.ietf-trill-fine-labeling] 912 Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D. 913 Dutt, "TRILL (Transparent Interconnection of Lots of 914 Links): Fine-Grained Labeling", 915 draft-ietf-trill-fine-labeling-06 (work in progress), 916 March 2013. 918 [I-D.raggarwa-data-center-mobility] 919 Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., 920 Fang, L., and A. Sajassi, "Data Center Mobility based on 921 E-VPN, BGP/MPLS IP VPN, IP Routing and NHRP", 922 draft-raggarwa-data-center-mobility-04 (work in progress), 923 December 2012. 925 [IEEE-802.1Q] 926 IEEE 802.1Q-2011, "IEEE standard for local and 927 metropolitan area networks: Media access control (MAC) 928 bridges and virtual bridged local area networks,", 929 August 2011. 931 [IEEE-802.1Qbg] 932 IEEE 802.1Qbg-2012, "IEEE standard for local and 933 metropolitan area networks: Media access control (MAC) 934 bridges and virtual bridged local area networks -- 935 Amendment 21: Edge virtual bridging,", July 2012. 937 [RFC3931] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling 938 Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005. 940 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 941 Networks (VPNs)", RFC 4364, February 2006. 943 [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., 944 and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. 946 [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy 947 Mobile IPv6", RFC 5844, May 2010. 949 [RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung, 950 "Generic Routing Encapsulation (GRE) Key Option for Proxy 951 Mobile IPv6", RFC 5845, June 2010. 953 [RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J. 954 Navali, "Generic Routing Encapsulation (GRE) Key Extension 955 for Mobile IPv4", RFC 6245, May 2011. 957 [RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. 958 Ghanwani, "Routing Bridges (RBridges): Base Protocol 959 Specification", RFC 6325, July 2011. 961 [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution 962 Problems in Large Data Center Networks", RFC 6820, 963 January 2013. 965 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 966 Locator/ID Separation Protocol (LISP)", RFC 6830, 967 January 2013. 969 [SPB] IEEE 802.1aq, "IEEE standard for local and metropolitan 970 area networks: Media access control (MAC) bridges and 971 virtual bridged local area networks -- Amendment 20: 972 Shortest path bridging,", June 2012. 974 Appendix A. Change Log 976 A.1. Changes From -02 to -03 978 1. Comments from Janos Farkas, including: 980 * Defined C-VLAN and changed VLAN -> C-VLAN where appropriate. 982 * Improved references to IEEE work. 984 * Removed Section "Further Work". 986 2. Improved first paragraph in "Optimal Forwarding" Section (per Qin 987 Wu). 989 3. Replaced "oracle" term with Network Virtualization Authority, to 990 match terminology discussion on list. 992 4. Reduced number of authors to 6. Still above the usual guideline 993 of 5, but chairs will ask for exception in this case. 995 A.2. Changes From -01 to -02 997 1. Security Considerations changes (Lou Berger) 999 2. Changes to section on Optimal Forwarding (Xuxiaohu) 1001 3. More wording improvements in L2 details (Janos Farkas) 1003 4. References to ARMD and LISP documents are now RFCs. 1005 A.3. Changes From -00 to -01 1007 1. Numerous editorial and clarity improvements. 1009 2. Picked up updated terminology from the framework document (e.g., 1010 Tenant System). 1012 3. Significant changes regarding IEEE 802.1 Ethernets and VLANs. 1013 All text moved to the Related Work section, where the technology 1014 is summarized. 1016 4. Removed section on Forwarding Table Size limitations. This issue 1017 only occurs in some deployments with L2 bridging, and is not 1018 considered a motivating factor for the NVO3 work. 1020 5. Added paragraph in Introduction that makes clear that NVO3 is 1021 focused on providing both L2 and L3 service to end systems, and 1022 that IP is assumed as the underlay transport in the data center. 1024 6. Added new section (2.6) on Optimal Forwarding. 1026 7. Added a section on Data Plane issues. 1028 8. Significant improvement to Section describing SPBM. 1030 9. Added sub-section on VDP in "Related Work" 1032 A.4. Changes from draft-narten-nvo3-overlay-problem-statement-04.txt 1034 1. This document has only one substantive change relative to 1035 draft-narten-nvo3-overlay-problem-statement-04.txt. Two 1036 sentences were removed per the discussion that led to WG adoption 1037 of this document. 1039 Authors' Addresses 1041 Thomas Narten (editor) 1042 IBM 1044 Email: narten@us.ibm.com 1046 Eric Gray (editor) 1047 Ericsson 1049 Email: eric.gray@ericsson.com 1050 David Black 1051 EMC 1053 Email: david.black@emc.com 1055 Luyuan Fang 1056 111 Wood Avenue South 1057 Iselin, NJ 08830 1058 USA 1060 Email: lufang@cisco.com 1062 Lawrence Kreeger 1063 Cisco 1065 Email: kreeger@cisco.com 1067 Maria Napierala 1068 AT&T 1069 200 Laurel Avenue 1070 Middletown, NJ 07748 1071 USA 1073 Email: mnapierala@att.com