idnits 2.17.00 (12 Aug 2021) /tmp/idnits55493/draft-ietf-rtgwg-net2cloud-problem-statement-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 3 instances of too long lines in the document, the longest one being 8 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (March 7, 2022) is 68 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'ITU-T-X1036' is defined on line 730, but no explicit reference was found in the text == Unused Reference: 'RFC4364' is defined on line 737, but no explicit reference was found in the text == Unused Reference: 'RFC4664' is defined on line 740, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group L. Dunbar 2 Internet Draft Futurewei 3 Intended status: Informational Andy Malis 4 Expires: September 3, 2022 Malis Consulting 5 C. Jacquenet 6 Orange 7 M. Toy 8 Verizon 9 March 7, 2022 11 Dynamic Networks to Hybrid Cloud DCs Problem Statement 12 draft-ietf-rtgwg-net2cloud-problem-statement-12 14 Abstract 16 This document describes the problems that enterprises face today 17 when interconnecting their branch offices with dynamic workloads in 18 third party data centers (a.k.a. Cloud DCs). There can be many 19 problems associated with network connecting to or among Clouds, many 20 of which probably are out of the IETF scope. The objective of this 21 document is to identify some of the problems that need additional 22 work in IETF Routing area. Other problems are out of the scope of 23 this document. 25 This document focuses on the network problems that many enterprises 26 face when they have workloads & applications & data split among 27 different data centers, especially for those enterprises with 28 multiple sites that are already interconnected by VPNs (e.g., MPLS 29 L2VPN/L3VPN). 31 Current operational problems are examined to determine whether there 32 is a need to improve existing protocols or whether a new protocol is 33 necessary to solve them. 35 Status of this Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF), its areas, and its working groups. Note that 42 other groups may also distribute working documents as Internet- 43 Drafts. 45 Internet-Drafts are draft documents valid for a maximum of six 46 months and may be updated, replaced, or obsoleted by other documents 47 at any time. It is inappropriate to use Internet-Drafts as 48 reference material or to cite them other than as "work in progress." 50 The list of current Internet-Drafts can be accessed at 51 http://www.ietf.org/ietf/1id-abstracts.txt 53 The list of Internet-Draft Shadow Directories can be accessed at 54 http://www.ietf.org/shadow.html 56 This Internet-Draft will expire on September 7, 2022. 58 Copyright Notice 60 Copyright (c) 2022 IETF Trust and the persons identified as the 61 document authors. All rights reserved. 63 This document is subject to BCP 78 and the IETF Trust's Legal 64 Provisions Relating to IETF Documents 65 (http://trustee.ietf.org/license-info) in effect on the date of 66 publication of this document. Please review these documents 67 carefully, as they describe your rights and restrictions with 68 respect to this document. Code Components extracted from this 69 document must include Simplified BSD License text as described in 70 Section 4.e of the Trust Legal Provisions and are provided without 71 warranty as described in the Simplified BSD License. 73 Table of Contents 75 1. Introduction...................................................3 76 1.1. Key Characteristics of Cloud Services:....................3 77 1.2. Connecting to Cloud Services..............................3 78 1.3. Reaching App instances in the optimal Cloud DC locations..4 79 2. Definition of terms............................................5 80 3. High Level Issues of Connecting to Multi-Cloud.................6 81 3.1. 5G Edge Clouds............................................6 82 3.2. Security Issues...........................................6 83 3.3. Authorization and Identity Management.....................7 84 3.4. API abstraction...........................................7 85 3.5. DNS for Cloud Resources...................................8 86 3.6. NAT for Cloud Services....................................9 87 3.7. Cloud Discovery..........................................10 88 4. Interconnecting Enterprise Sites with Cloud DCs...............10 89 4.1. Sites to Cloud DC........................................10 90 4.2. Inter-Cloud Interconnection..............................12 91 5. Edge Clouds...................................................14 92 6. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs...14 93 7. Problem with using IPsec tunnels to Cloud DCs.................15 94 7.1. Scaling Issues with IPsec Tunnels........................16 95 7.2. Poor performance over long distance......................16 96 8. End-to-End Security Concerns for Data Flows...................16 97 9. Requirements for Dynamic Cloud Data Center VPNs...............17 98 10. Security Considerations......................................17 99 11. IANA Considerations..........................................18 100 12. References...................................................18 101 12.1. Normative References....................................18 102 12.2. Informative References..................................18 103 13. Acknowledgments..............................................18 105 1. Introduction 107 1.1. Key Characteristics of Cloud Services: 109 Key characteristics of Cloud Services are on-demand, scalable, 110 highly available, and usage-based billing. Cloud Services, such as, 111 compute, storage, network functions (most likely virtual), third 112 party managed applications, etc. are usually hosted and managed by 113 third parties Cloud Operators. Here are some examples of Cloud 114 network functions: Virtual Firewall services, Virtual private 115 network services, Virtual PBX services including voice and video 116 conferencing systems, etc. Cloud Data Center (DC) is shared 117 infrastructure that hosts the Cloud Services to many customers. 119 1.2. Connecting to Cloud Services 121 With the advent of widely available third-party cloud DCs and 122 services in diverse geographic locations and the advancement of 123 tools for monitoring and predicting application behaviors, it is 124 very attractive for enterprises to instantiate applications and 125 workloads in locations that are geographically closest to their end- 126 users. Such proximity can improve end-to-end latency and overall 127 user experience. Conversely, an enterprise can easily shutdown 128 applications and workloads whenever end-users are in motion (thereby 129 modifying the networking connection of subsequently relocated 130 applications and workloads). In addition, enterprises may wish to 131 take advantage of more and more business applications offered by 132 cloud operators. 134 The networks that interconnect hybrid cloud DCs must address the 135 following requirements: 136 - to access all workloads in the desired cloud DCs: 137 Many enterprises include cloud in their disaster recovery 138 strategy, such as enforcing periodic backup policies within the 139 cloud, or running backup applications in the Cloud. 141 - Global reachability from different geographical zones, thereby 142 facilitating the proximity of applications as a function of the 143 end users' location, to improve latency. 144 - Elasticity: prompt connection to newly instantiated 145 applications at Cloud DCs when usages increase and prompt 146 release of connection after applications at locations being 147 removed when demands change. 148 - Scalable policy management: apply the appropriate polices to 149 the newly instantiated application instances at any Cloud DC 150 location. 152 1.3. Reaching App instances in the optimal Cloud DC locations 154 Many applications have multiple instances instantiated in different 155 Cloud DCs. The current state of the art solutions is typically based 156 on DNS assisted with load balancer by responding a FQDN (Fully 157 Qualified Domain Name) inquiry with an IP address of the closest or 158 lowest cost DC that can reach the instance. Here are some problems 159 associated with DNS based solutions: 160 - Dependent on client behavior 161 - Client can cache results indefinitely 162 - Client may not receive service even though there are 163 servers available (before cache timeout) in other Cloud 164 DCs. 166 - No inherent leverage of proximity information present in the 167 network (routing) layer, resulting in loss of performance 168 - Client on the west coast can be mapped to a DC on the east 169 coast 170 - Inflexible traffic control: 171 - Local DNS resolver become the unit of traffic management. 172 This requires DNS to receive periodical update of the 173 network condition, which is difficult. 175 2. Definition of terms 177 Cloud DC: Third party Data Centers that usually host applications 178 and workload owned by different organizations or 179 tenants. 181 Controller: Used interchangeably with SD-WAN controller to manage 182 SD-WAN overlay path creation/deletion and monitoring the 183 path conditions between two or more sites. 185 DSVPN: Dynamic Smart Virtual Private Network. DSVPN is a secure 186 network that exchanges data between sites without 187 needing to pass traffic through an organization's 188 headquarter virtual private network (VPN) server or 189 router. 191 Heterogeneous Cloud: applications and workloads split among Cloud 192 DCs owned or managed by different operators. 194 Hybrid Clouds: Hybrid Clouds refers to an enterprise using its own 195 on-premises DCs in addition to Cloud services provided 196 by one or more cloud operators. (e.g. AWS, Azure, 197 Google, Salesforces, SAP, etc). 199 VPC: Virtual Private Cloud is a virtual network dedicated to 200 one client account. It is logically isolated from other 201 virtual networks in a Cloud DC. Each client can launch 202 his/her desired resources, such as compute, storage, or 203 network functions into his/her VPC. Most Cloud 204 operators' VPCs only support private addresses, some 205 support IPv4 only, others support IPv4/IPv6 dual stack. 207 3. High Level Issues of Connecting to Multi-Cloud 209 There are many problems associated with connecting to hybrid Cloud 210 Services, many of which are out of the IETF scope. This section is 211 to identify some of the high-level problems that can be addressed by 212 IETF, especially by Routing area. Other problems are out of the 213 scope of this document. By no means has this section covered all 214 problems for connecting to Hybrid Cloud Services, e.g. difficulty in 215 managing cloud spending is not discussed here. 217 3.1. 5G Edge Clouds 219 5G edge cloud data centers have routers connecting to the 5G Core 220 functions, such as Radio Control Functions, Session Management 221 Function (SMF), Access Mobility Functions (AMF), User Plane 222 Functions (UPF), etc. Those functions need to be connected to the 223 Radio Data Unit (R-DU) on the Cell Tower. The UPFs need to be 224 connected to the 5G Local Data Networks' ingress routers which might 225 co-located the cloud edge data centers. 227 In addition, the 5G edge cloud data centers may host edge computing 228 servers for Ultra-low latency services that need to be near the UEs 229 (User equipment). Those edge computing applications need to have 230 very low latency to the UEs, and also connect to backend servers or 231 databases in another location. 233 3.2. Security Issues 235 Cloud Services is built upon shared infrastructure, therefore not 236 secure by nature. Security has been a primary, and valid, concern 237 from the start of cloud computing, e.g. not being able to see the 238 exact location where the data are stored or trace of access. 239 Headlines highlighting data breaches, compromised credentials, and 240 broken authentication, hacked interfaces and APIs, account hijacking 241 haven't helped alleviate concerns. 243 Many Cloud operators offer monitoring services for data stored in 244 Clouds, such as AWS CloudTrail, Azure Monitor, and many third-party 245 monitoring tools to improve visibility to data stored in Clouds. But 246 there is still underline security concerns on illegitimate data and 247 workloads access. 249 Secure user identity management, authentication, and access control 250 mechanisms are important. Developing appropriate security 251 measurements can enhance the confidence needed by enterprises to 252 fully take advantage of Cloud Services. 254 3.3. Authorization and Identity Management 256 One of the more prominent challenges for Cloud Services is Identity 257 Management and Authorization. The Authorization not only includes 258 user authorization, but also the authorization of API calls by 259 applications from different Cloud DCs managed by different Cloud 260 Operators. In addition, there are authorization for Workload 261 Migration, Data Migration, and Workload Management. 263 There are many types of users in cloud environments, e.g. end users 264 for accessing applications hosted in Cloud DCs, Cloud-resource users 265 who are responsible for setting permissions for the resources based 266 on roles, access lists, IP addresses, domains, etc. 268 There are many types of Cloud authorizations: including MAC 269 (Mandatory Access Control) - where each app owns individual access 270 permissions, DAC (Discretionary Access Control) - where each app 271 requests permissions from an external permissions app, RBAC (Role- 272 based Access Control) - where the authorization service owns roles 273 with different privileges on the cloud service, and ABAC (Attribute- 274 based Access Control) - where access is based on request attributes 275 and policies. 277 IETF hasn't yet developed comprehensive specification for Identity 278 management and data models for Cloud Authorizations. 280 3.4. API abstraction 282 Different Cloud Operators have different APIs to access their Cloud 283 resources, security functions, the NAT, etc. 285 It is difficult to move applications built by one Cloud operator's 286 APIs to another. However, it is highly desirable to have a single 287 and consistent way to manage the networks and respective security 288 policies for interconnecting applications hosted in different Cloud 289 DCs. 291 The desired property would be having a single network fabric to 292 which different Cloud DCs and enterprise's multiple sites can be 293 attached or detached, with a common interface for setting desired 294 policies. 296 The difficulty of connecting applications in different Clouds might 297 be stemmed from the fact that they are direct competitors. Usually 298 traffic flow out of Cloud DCs incur charges. Therefore, direct 299 communications between applications in different Cloud DCs can be 300 more expensive than intra Cloud communications. 302 It is desirable to have a common API shim layer or abstraction for 303 different Cloud providers to make it easier to move applications 304 from one Cloud DC to another. 306 3.5. DNS for Cloud Resources 308 DNS name resolution is essential for on-premises and cloud-based 309 resources. For customers with hybrid workloads, which include on- 310 premises and cloud-based resources, extra steps are necessary to 311 configure DNS to work seamlessly across both environments. 313 Cloud operators have their own DNS to resolve resources within their 314 Cloud DCs and to well-known public domains. Cloud's DNS can be 315 configured to forward queries to customer managed authoritative DNS 316 servers hosted on-premises, and to respond to DNS queries forwarded 317 by on-premises DNS servers. 319 For enterprises utilizing Cloud services by different cloud 320 operators, it is necessary to establish policies and rules on 321 how/where to forward DNS queries to. When applications in one Cloud 322 need to communication with applications hosted in another Cloud, 323 there could be DNS queries from one Cloud DC being forwarded to the 324 enterprise's on-premise DNS, which in turn be forwarded to the DNS 325 service in another Cloud. Needless to say, configuration can be 326 complex depending on the application communication patterns. 328 However, even with carefully managed policies and configurations, 329 collisions can still occur. If you use an internal name like .cloud 330 and then want your services to be available via or within some other 331 cloud provider which also uses .cloud, then it can't work. 332 Therefore, it is better to use the global domain name even when an 333 organization does not make all its namespace globally resolvable. An 334 organization's globally unique DNS can include subdomains that 335 cannot be resolved at all outside certain restricted paths, zones 336 that resolve differently based on the origin of the query, and zones 337 that resolve the same globally for all queries from any source. 339 Globally unique names do not equate to globally resolvable names or 340 even global names that resolve the same way from every perspective. 341 Globally unique names do prevent any possibility of collision at the 342 present or in the future and they make DNSSEC trust manageable. 343 Consider using a registered and fully qualified domain name (FQDN) 344 from global DNS as the root for enterprise and other internal 345 namespaces. 347 3.6. NAT for Cloud Services 349 Cloud resources, such as VM instances, are usually assigned with 350 private IP addresses. By configuration, some private subnets can 351 have the NAT function to reach out to external network and some 352 private subnets are internal to Cloud only. 354 Different Cloud operators support different levels of NAT functions. 355 For example, AWS NAT Gateway does not currently support connections 356 towards, or from VPC Endpoints, VPN, AWS Direct Connect, or VPC 357 Peering. https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc- 358 nat-gateway.html#nat-gateway-other-services. AWS Direct 359 Connect/VPN/VPC Peering does not currently support any NAT 360 functionality. 362 Google's Cloud NAT allows Google Cloud virtual machine (VM) 363 instances without external IP addresses and private Google 364 Kubernetes Engine (GKE) clusters to connect to the Internet. Cloud 365 NAT implements outbound NAT in conjunction with a default route to 366 allow instances to reach the Internet. It does not implement inbound 367 NAT. Hosts outside of VPC network can only respond to established 368 connections initiated by instances inside the Google Cloud; they 369 cannot initiate their own, new connections to Cloud instances via 370 NAT. 372 For enterprises with applications running in different Cloud DCs, 373 proper configuration of NAT has to be performed in Cloud DC and in 374 their on-premises DC. 376 3.7. Cloud Discovery 378 One of the concerns of using Cloud services is not aware where the 379 resource is located, especially Cloud operators can move application 380 instances from one place to another. When applications in Cloud 381 communicate with on-premise applications, it may not be clear where 382 the Cloud applications are located or to which VPCs they belong. 384 It is highly desirable to have tools to discover cloud services in 385 much the same way as you would discover your on-premises 386 infrastructure. A significant difference is that cloud discovery 387 uses the cloud vendor's API to extract data on your cloud services, 388 rather than the direct access used in scanning your on-premises 389 infrastructure. 391 Standard data models, APIs or tools can alleviate concerns of 392 enterprise utilizing Cloud Resources, e.g. having a Cloud service 393 scan that connects to the API of the cloud provider and collects 394 information directly. 396 4. Interconnecting Enterprise Sites with Cloud DCs 398 Considering that many enterprises already have existing VPNs (e.g. 399 MPLS based L2VPN or L3VPN) interconnecting branch offices & on- 400 premises data centers, connecting to Cloud services will be mixed of 401 different types of networks. When an enterprise's existing VPN 402 service providers do not have direct connections to the 403 corresponding cloud DCs that the enterprise prefers to use, the 404 enterprise has to face additional infrastructure and operational 405 costs to utilize the Cloud services. 407 4.1. Sites to Cloud DC 409 Most Cloud operators offer some type of network gateway through 410 which an enterprise can reach their workloads hosted in the Cloud 411 DCs. AWS (Amazon Web Services) offers the following options to reach 412 workloads in AWS Cloud DCs: 414 - AWS Internet gateway allows communication between instances in 415 AWS VPC and the internet. 416 - AWS Virtual gateway (vGW) where IPsec tunnels [RFC6071] are 417 established between an enterprise's own gateway and AWS vGW, so 418 that the communications between those gateways can be secured 419 from the underlay (which might be the public Internet). 420 - AWS Direct Connect, which allows enterprises to purchase direct 421 connect from network service providers to get a private leased 422 line interconnecting the enterprises gateway(s) and the AWS 423 Direct Connect routers. In addition, an AWS Transit Gateway can 424 be used to interconnect multiple VPCs in different Availability 425 Zones. AWS Transit Gateway acts as a hub that controls how 426 traffic is forwarded among all the connected networks which act 427 like spokes. 429 Microsoft's ExpressRoute allows extension of a private network to 430 any of the Microsoft cloud services, including Azure and Office365. 431 ExpressRoute is configured using Layer 3 routing. Customers can opt 432 for redundancy by provisioning dual links from their location to two 433 Microsoft Enterprise edge routers (MSEEs) located within a third- 434 party ExpressRoute peering location. The BGP routing protocol is 435 then setup over WAN links to provide redundancy to the cloud. This 436 redundancy is maintained from the peering data center into 437 Microsoft's cloud network. 439 Google's Cloud Dedicated Interconnect offers similar network 440 connectivity options as AWS and Microsoft. One distinct difference, 441 however, is that Google's service allows customers access to the 442 entire global cloud network by default. It does this by connecting 443 your on-premises network with the Google Cloud using BGP and Google 444 Cloud Routers to provide optimal paths to the different regions of 445 the global cloud infrastructure. 447 Figure below shows an example of some of a tenant's workloads are 448 accessible via a virtual router connected by AWS Internet Gateway; 449 some are accessible via AWS vGW, and others are accessible via AWS 450 Direct Connect. 452 Different types of access require different level of security 453 functions. Sometimes it is not visible to end customers which type 454 of network access is used for a specific application instance. To 455 get better visibility, separate virtual routers (e.g. vR1 & vR2) can 456 be deployed to differentiate traffic to/from different cloud GWs. It 457 is important for some enterprises to be able to observe the specific 458 behaviors when connected by different connections. 460 Customer Gateway can be customer owned router or ports physically 461 connected to AWS Direct Connect GW. 462 +------------------------+ 463 | ,---. ,---. | 464 | (TN-1 ) ( TN-2)| 465 | `-+-' +---+ `-+-' | 466 | +----|vR1|----+ | 467 | ++--+ | 468 | | +-+----+ 469 | | /Internet\ For External 470 | +-------+ Gateway +---------------------- 471 | \ / to reach via Internet 472 | +-+----+ 473 | | 474 | ,---. ,---. | 475 | (TN-1 ) ( TN-2)| 476 | `-+-' +---+ `-+-' | 477 | +----|vR2|----+ | 478 | ++--+ | 479 | | +-+----+ 480 | | / virtual\ For IPsec Tunnel 481 | +-------+ Gateway +---------------------- 482 | | \ / termination 483 | | +-+----+ 484 | | | 485 | | +-+----+ +------+ 486 | | / \ For Direct /customer\ 487 | +-------+ Gateway +----------+ gateway | 488 | \ / Connect \ / 489 | +-+----+ +------+ 490 | | 491 +------------------------+ 493 Figure 1: Examples of Multiple Cloud DC connections. 495 4.2. Inter-Cloud Interconnection 497 The connectivity options to Cloud DCs described in the previous 498 section are for reaching Cloud providers' DCs, but not between cloud 499 DCs. When applications in AWS Cloud need to communicate with 500 applications in Azure, today's practice requires a third-party 501 gateway (physical or virtual) to interconnect the AWS's Layer 2 502 DirectConnect path with Azure's Layer 3 ExpressRoute. 504 Enterprises can also instantiate their own virtual routers in 505 different Cloud DCs and administer IPsec tunnels among them, which 506 by itself is not a trivial task. Or by leveraging open source VPN 507 software such as strongSwan, you create an IPSec connection to the 508 Azure gateway using a shared key. The StrongSwan instance within AWS 509 not only can connect to Azure but can also be used to facilitate 510 traffic to other nodes within the AWS VPC by configuring forwarding 511 and using appropriate routing rules for the VPC. 513 Most Cloud operators, such as AWS VPC or Azure VNET, use non- 514 globally routable CIDR from private IPv4 address ranges as specified 515 by RFC1918. To establish IPsec tunnel between two Cloud DCs, it is 516 necessary to exchange Public routable addresses for applications in 517 different Cloud DCs. 519 In summary, here are some approaches, available now (which might 520 change in the future), to interconnect workloads among different 521 Cloud DCs: 523 a) Utilize Cloud DC provided inter/intra-cloud connectivity 524 services (e.g., AWS Transit Gateway) to connect workloads 525 instantiated in multiple VPCs. Such services are provided with 526 the cloud gateway to connect to external networks (e.g., AWS 527 DirectConnect Gateway). 528 b) Hairpin all traffic through the customer gateway, meaning all 529 workloads are directly connected to the customer gateway, so 530 that communications among workloads within one Cloud DC must 531 traverse through the customer gateway. 532 c) Establish direct tunnels among different VPCs (AWS' Virtual 533 Private Clouds) and VNET (Azure's Virtual Networks) via 534 client's own virtual routers instantiated within Cloud DCs. 535 DMVPN (Dynamic Multipoint Virtual Private Network) or DSVPN 536 (Dynamic Smart VPN) techniques can be used to establish direct 537 Multi-point-to-Point or multi-point-to multi-point tunnels 538 among those client's own virtual routers. 540 Approach a) usually does not work if Cloud DCs are owned and managed 541 by different Cloud providers. 543 Approach b) creates additional transmission delay plus incurring 544 cost when exiting Cloud DCs. 546 For the Approach c), DMVPN or DSVPN use NHRP (Next Hop Resolution 547 Protocol) [RFC2735] so that spoke nodes can register their IP 548 addresses & WAN ports with the hub node. The IETF ION 549 (Internetworking over NBMA (non-broadcast multiple access) WG 550 standardized NHRP for connection-oriented NBMA network (such as ATM) 551 network address resolution more than two decades ago. 553 There are many differences between virtual routers in Public Cloud 554 DCs and the nodes in an NBMA network. NHRP cannot be used for 555 registering virtual routers in Cloud DCs unless an extension of such 556 protocols is developed for that purpose, e.g. taking NAT or dynamic 557 addresses into consideration. Therefore, DMVPN and/or DSVPN cannot 558 be used directly for connecting workloads in hybrid Cloud DCs. 560 5. Edge Clouds 562 6. Problems with MPLS-based VPNs extending to Hybrid Cloud DCs 564 Traditional MPLS-based VPNs have been widely deployed as an 565 effective way to support businesses and organizations that require 566 network performance and reliability. MPLS shifted the burden of 567 managing a VPN service from enterprises to service providers. The 568 CPEs attached to MPLS VPNs are also simpler and less expensive, 569 because they do not need to manage routes to remote sites; they 570 simply pass all outbound traffic to the MPLS VPN PEs to which the 571 CPEs are attached (albeit multi-homing scenarios require more 572 processing logic on CPEs). MPLS has addressed the problems of 573 scale, availability, and fast recovery from network faults, and 574 incorporated traffic-engineering capabilities. 576 However, traditional MPLS-based VPN solutions are sub-optimized for 577 connecting end-users to dynamic workloads/applications in cloud DCs 578 because: 580 - The Provider Edge (PE) nodes of the enterprise's VPNs might not 581 have direct connections to third party cloud DCs that are used 582 for hosting workloads with the goal of providing an easy access 583 to enterprises' end-users. 585 - It takes some time to deploy provider edge (PE) routers at new 586 locations. When enterprise's workloads are changed from one 587 cloud DC to another (i.e., removed from one DC and re- 588 instantiated to another location when demand changes), the 589 enterprise branch offices need to be connected to the new cloud 590 DC, but the network service provider might not have PEs located 591 at the new location. 593 One of the main drivers for moving workloads into the cloud is 594 the widely available cloud DCs at geographically diverse 595 locations, where apps can be instantiated so that they can be 596 as close to their end-users as possible. When the user base 597 changes, the applications may be migrated to a new cloud DC 598 location closest to the new user base. 600 - Most of the cloud DCs do not expose their internal networks. An 601 enterprise with a hybrid cloud deployment can use an MPLS-VPN 602 to connect to a Cloud provider at multiple locations. The 603 connection locations often correspond to gateways of different 604 Cloud DC locations from the Cloud provider. The different 605 Cloud DCs are interconnected by the Cloud provider's own 606 internal network. At each connection location (gateway), the 607 Cloud provider uses BGP to advertise all of the prefixes in the 608 enterprise's VPC, regardless of which Cloud DC a given prefix 609 is actually in. This can result in inefficient routing for the 610 end-to-end data path. 612 Another roadblock is the lack of a standard way to express and 613 enforce consistent security policies for workloads that not only use 614 virtual addresses, but in which are also very likely hosted in 615 different locations within the Cloud DC [RFC8192]. The current VPN 616 path computation and bandwidth allocation schemes may not be 617 flexible enough to address the need for enterprises to rapidly 618 connect to dynamically instantiated (or removed) workloads and 619 applications regardless of their location/nature (i.e., third party 620 cloud DCs). 622 7. Problem with using IPsec tunnels to Cloud DCs 623 As described in the previous section, many Cloud operators expose 624 their gateways for external entities (which can be enterprises 625 themselves) to directly establish IPsec tunnels. Enterprises can 626 also instantiate virtual routers within Cloud DCs to connect to 627 their on-premises devices via IPsec tunnels. 629 7.1. Scaling Issues with IPsec Tunnels 631 If there is only one enterprise location that needs to reach the 632 Cloud DC, an IPsec tunnel is a very convenient solution. 634 However, many medium-to-large enterprises have multiple sites and 635 multiple data centers. For multiple sites to communicate with 636 workloads and apps hosted in cloud DCs, Cloud DC gateways have to 637 maintain many IPsec tunnels to all those locations. In addition, 638 each of those IPsec Tunnels requires pair-wise periodic key 639 refreshment. For a company with hundreds or thousands of locations, 640 there could be hundreds (or even thousands) of IPsec tunnels 641 terminating at the cloud DC gateway, which is very processing 642 intensive. That is why many cloud operators only allow a limited 643 number of (IPsec) tunnels & bandwidth to each customer. 645 Alternatively, you could use a solution like group encryption where 646 a single IPsec SA is necessary at the GW but the drawback is key 647 distribution and maintenance of a key server, etc. 649 7.2. Poor performance over long distance 651 When enterprise CPEs or gateways are far away from cloud DC gateways 652 or across country/continent boundaries, performance of IPsec tunnels 653 over the public Internet can be problematic and unpredictable. Even 654 though there are many monitoring tools available to measure delay 655 and various performance characteristics of the network, the 656 measurement for paths over the Internet is passive and past 657 measurements may not represent future performance. 659 Many cloud providers can replicate workloads in different available 660 zones. An App instantiated in a cloud DC closest to clients may have 661 to cooperate with another App (or its mirror image) in another 662 region or database server(s) in the on-premises DC. This kind of 663 coordination requires predicable networking behavior/performance 664 among those locations. 666 8. End-to-End Security Concerns for Data Flows 668 When IPsec tunnels established from enterprise on-premises CPEs 669 are terminated at the Cloud DC gateway where the workloads or 670 applications are hosted, some enterprises have concerns regarding 671 traffic to/from their workload being exposed to others behind the 672 data center gateway (e.g., exposed to other organizations that 673 have workloads in the same data center). 675 To ensure that traffic to/from workloads is not exposed to 676 unwanted entities, IPsec tunnels may go all the way to the 677 workload (servers, or VMs) within the DC. 679 9. Requirements for Dynamic Cloud Data Center VPNs 681 To address the aforementioned issues, any solution for enterprise 682 VPNs that includes connectivity to dynamic workloads or applications 683 in cloud data centers should satisfy a set of requirements: 685 - The solution should allow enterprises to take advantage of the 686 current state-of-the-art in VPN technology, in both traditional 687 MPLS-based VPNs and IPsec-based VPNs (or any combination 688 thereof) that run over the public Internet. 689 - The solution should not require an enterprise to upgrade all 690 their existing CPEs. 691 - The solution should support scalable IPsec key management among 692 all nodes involved in DC interconnect schemes. 693 - The solution needs to support easy and fast, on-the-fly, VPN 694 connections to dynamic workloads and applications in third 695 party data centers, and easily allow these workloads to migrate 696 both within a data center and between data centers. 697 - Allow VPNs to provide bandwidth and other performance 698 guarantees. 699 - Be a cost-effective solution for enterprises to incorporate 700 dynamic cloud-based applications and workloads into their 701 existing VPN environment. 703 10. Security Considerations 705 The draft discusses security requirements as a part of the problem 706 space, particularly in sections 4, 5, and 8. 708 Solution drafts resulting from this work will address security 709 concerns inherent to the solution(s), including both protocol 710 aspects and the importance (for example) of securing workloads in 711 cloud DCs and the use of secure interconnection mechanisms. 713 11. IANA Considerations 715 This document requires no IANA actions. RFC Editor: Please remove 716 this section before publication. 718 12. References 720 12.1. Normative References 722 12.2. Informative References 724 [RFC2735] B. Fox, et al "NHRP Support for Virtual Private 725 networks". Dec. 1999. 727 [RFC8192] S. Hares, et al "Interface to Network Security Functions 728 (I2NSF) Problem Statement and Use Cases", July 2017 730 [ITU-T-X1036] ITU-T Recommendation X.1036, "Framework for creation, 731 storage, distribution and enforcement of policies for 732 network security", Nov 2007. 734 [RFC6071] S. Frankel and S. Krishnan, "IP Security (IPsec) and 735 Internet Key Exchange (IKE) Document Roadmap", Feb 2011. 737 [RFC4364] E. Rosen and Y. Rekhter, "BGP/MPLS IP Virtual Private 738 Networks (VPNs)", Feb 2006 740 [RFC4664] L. Andersson and E. Rosen, "Framework for Layer 2 Virtual 741 Private Networks (L2VPNs)", Sept 2006. 743 13. Acknowledgments 745 Many thanks to Alia Atlas, Chris Bowers, Paul Vixie, Paul Ebersman, 746 Timothy Morizot, Ignas Bagdonas, Michael Huang, Liu Yuan Jiao, 747 Katherine Zhao, and Jim Guichard for the discussion and 748 contributions. 750 Authors' Addresses 752 Linda Dunbar 753 Futurewei 754 Email: Linda.Dunbar@futurewei.com 756 Andrew G. Malis 757 Malis Consulting 758 Email: agmalis@gmail.com 760 Christian Jacquenet 761 Orange 762 Rennes, 35000 763 France 764 Email: Christian.jacquenet@orange.com 766 Mehmet Toy 767 Verizon 768 One Verizon Way 769 Basking Ridge, NJ 07920 770 Email: mehmet.toy@verizon.com