idnits 2.17.00 (12 Aug 2021) /tmp/idnits10294/draft-li-rtgwg-cfn-dyncast-architecture-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 31, 2020) is 560 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 rtgwg Y. Li 3 Internet-Draft L. Iannone 4 Intended status: Informational Huawei Technologies 5 Expires: May 4, 2021 J. He 6 City University of Hong Kong 7 L. Geng 8 P. Liu 9 China Mobile 10 Y. Cui 11 Tsinghua University 12 October 31, 2020 14 Architecture of Dynamic-Anycast in Compute First Networking (CFN- 15 Dyncast) 16 draft-li-rtgwg-cfn-dyncast-architecture-00 18 Abstract 20 Compute First Networking (CFN) Dynamic Anycast refers to in-network 21 edge computing, where a single service offered by a provider has 22 multiple instances attached to multiple edge sites. In this 23 scenario, flows are assigned and consistently forwarded to a specific 24 instance through an anycast approach based on the network status as 25 well as the status of the different instance. 27 This document describes an architecture for the Dynamic Anycast 28 (Dyncast) in Compute First Networking (CFN). It provides an 29 overview, a description of the various components, and a workflow 30 example showing how to provide a balanced multi-edge based service in 31 terms of both computing and networking resources through dynamic 32 anycast in real time. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on May 4, 2021. 50 Copyright Notice 52 Copyright (c) 2020 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 68 2. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 3 69 3. CFN-Dyncast Architecture Overview . . . . . . . . . . . . . . 4 70 4. Architectural Components and Interactions . . . . . . . . . . 5 71 4.1. Service Identity and Bindings . . . . . . . . . . . . . . 5 72 4.2. Service Notification between Instances and CFN node . . . 7 73 4.3. CFN Dyncast Control Plane . . . . . . . . . . . . . . . . 9 74 4.4. Service Demand Dispatching . . . . . . . . . . . . . . . 9 75 4.5. CFN Dispatcher . . . . . . . . . . . . . . . . . . . . . 10 76 5. Summary of the key elements of CFN Dyncast Architecture . . . 12 77 6. Conclusion (and call for contributions) . . . . . . . . . . . 13 78 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 79 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 80 9. Informative References . . . . . . . . . . . . . . . . . . . 14 81 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 14 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 84 1. Introduction 86 Dynamic anycast in Compute First Networking (CFN-Dyncast) use cases 87 and problem statements document 88 [I-D.geng-rtgwg-cfn-dyncast-ps-usecase] shows the usage scenarios 89 that require an edge to be dynamically selected from multiple edge 90 sites to serve an edge computing service demand based on computing 91 resource available at the site and network status in real time. 92 Multiple edges provide service equivalency and service dynamism in 93 CFN. The current network architecture in edge computing provides 94 relatively static service dispatching, for example, to the closest 95 edge, or to the server with the most computing resources without 96 considering the network status. Dynamic Anycast takes the dynamic 97 nature of computing load as well as the network status as metrics for 98 deciding flow's service dispatch and at the same time maintains the 99 flow affinity in a service life cycle. 101 CFN-Dyncast architecture presents an anycast based service and access 102 models. The aim is to solve the problematic aspects of existing 103 network layer edge computing service deployment, including the 104 unawareness of computing resource information of service, static edge 105 selection, isolated network and computing metrics and/or slow refresh 106 of status. 108 CFN-Dyncast assumes there are multiple equivalent edge instances 109 implementing the same single service (think about the same service 110 function instantiated on several edge nodes). A single edge node has 111 limited computing resources attached, and different edge nodes may 112 have different resources available such as CPU or GPU. Because 113 multiple edge nodes are interconnected and can collaborate with each 114 other, it is possible to balance the service load and network load in 115 CFN. Computing resource available to serve a request is usually 116 considered the main metric to assign a service demand to an instance 117 of the service. However, the status of the network, in particular 118 paths toward the instances, varies over time and may get congested, 119 hence, becoming another key attribute to be considered. CFN-Dyncast 120 aims at providing a layer 3 protocol framework able to dispatch the 121 service demand to the "best" edge node in terms of both computing 122 resources and network status, in real time and no application and/or 123 service specific dependencies. 125 This document describes the a general architecture for the service 126 notification, status update and service dispatch in CFN edge 127 computing. 129 2. Definition of Terms 131 CFN: Compute First Networking 133 SID: Service ID, an anycast IP address representing a service and the 134 clients use it to access that service. SID is independent of which 135 service instance serves the service demand. Usually multiple service 136 instances serve a single service. 138 BID: Binding ID, an address to reach a service instance for a given 139 SID. It is usually a unicast IP. A service can be provided by 140 multiple service instances with different BID. 142 CFN-Dyncast: as defined in [I-D.geng-rtgwg-cfn-dyncast-ps-usecase]. 144 3. CFN-Dyncast Architecture Overview 146 Service instances can be hosted on servers, virtual machines, access 147 routers or gateway in edge data center. The CFN node is the glue 148 allowing CFN-Dyncast network to provide the capability to exchange 149 the information about the computing resource information of service 150 instances attached to it, but also to forward flows consistently 151 toward such instances. 153 Figure 1 shows the architecture of CFN-Dyncast. CFN nodes are 154 usually deployed at the edges of the operator infrastructure, where 155 clients are connected. As such, we can consider that clients are 156 logically connected to CFN nodes. A CFN node has the purpose to 157 constantly direct flows coming from clients to an instance of the 158 service the flow is supposed to go through. Service instances are 159 initiated at different edge sites, where a CFN node is also running. 160 A single service can have a huge number of instances running on 161 different CFN nodes. A "Service ID" (SID) is used to uniquely 162 identify a service, at the same time identifying the whole set of 163 instances of that specific service, no matter where those instances 164 are running. There can be several instances of the service running 165 on the the same CFN node (e.g., one instance per CPU core), there can 166 also be on several different CFN nodes (e.g., one instance per PGW-U 167 in a 5G network). Each instance is associated to a "Binding ID" 168 indicating where the instance is running. Hence, there is a dynamic 169 binding between an SID (the service) and a set of BIDs (the instances 170 of the service) and such bindings are enriched with information 171 concerning the network state and the available resources so that at 172 each new service request (a new flow) CFN nodes can decide which 173 instance is the most appropriate to handle the request. This 174 highlights the anycast part of CFN-Dyncast, since flow are routed 175 toward one service end-point among a set of equivalent , i.e., one- 176 to-one-out-of-many. 178 When a clients sends a service demand, it will be delivered to the 179 most appropriate instance of the service attached to a CFN node. A 180 service demand is normally the first packet of a data flow, not 181 necessarily an explicit out of band service request. Once the CFN 182 node has decided which instance has to serve the flow, flow affinity 183 must be guaranteed, meaning that all packets belonging to the same 184 flow have to go through the same service instance. 186 edge site 1 edge site 2 edge site 3 188 +------------+ 189 +------------+ +------------+ | 190 +-+----------+ | +------------+ |-+ 191 | service | | | service | | 192 | instance |-+ | instance |-+ 193 +------------+ +------------+ 194 | | 195 | +-----------------+ | 196 +----------+ | | +----------+ 197 |CFN node 1| ----| Infrastructure |---- |CFN node 3| 198 +----------+ | | +----------+ 199 | +-----------------+ 200 | | 201 | | 202 +----------+ +----------+ 203 | CFN | |CFN node 2| 204 |Dispatcher| +----------+ 205 +----------+ | 206 | | 207 | | 208 +-----+ +------+ 209 +------+| +------+ | 210 |client|+ |client|-+ 211 +------+ +------+ 213 Figure 1: CFN-Dyncast Architecture 215 4. Architectural Components and Interactions 217 Figure 1 also shows that the local components of the architecture are 218 service instance, CFN node, CFN dispatcher and client. The following 219 subsections provide an overview of how some of these architectural 220 components interact. The figures accompanying the examples do not 221 show the interconnecting infrastructure to avoid making them too 222 cluttered. 224 4.1. Service Identity and Bindings 226 As previously stated, the CFN-Dyncast architecture uses Service ID 227 (SID) and Binding ID (BID) in order to identify services and their 228 instances. 230 Service ID (SID) is an anycast service identifier (which may or may 231 not be a routable IP address). It is used to access a specific 232 service no matter which service instance eventually handles the 233 client's flow. CFN nodes must be able to know SIDs (and their 234 bindings) in advance and must be able to identify which flow needs 235 which service. This can be achieved in different ways, for example, 236 use a special range or coding of anycast IP address as SID, or use 237 DNS. 239 Binding ID (BID) is a unicast IP address. It is usually the 240 interface IP address of a service instance. Mapping and binding from 241 a SID to a BID is dynamic and depends on the computing resousrces and 242 network state at the time the service demand is made. The CFN node 243 must be able to guarantee flow affinity, i.e., steering the flow 244 always toward the same instance. 246 Figure 2 shows an abstract example of the use of SIDs and BIDs. 247 There are three services, namely SID1, SID2, and SID3. In 248 particular, SID2 has two instances on different CFN nodes (CFN node 2 249 and CFN node 3). In this case the complete list of bindings (only in 250 term of SID and BID, no network or resource state) are: 252 o SID1:BID21 254 o SID2:BID22,BID32 256 o SID3:BID33 257 SID: Service ID 258 BID: Binding ID 260 SID1 261 +--------+ service 262 +--| BID21 | instance1 263 | +--------+ 264 +----------+ | 265 +------|CFN node 2|-------| SID2 266 | +----------+ | +--------+ service 267 | +--| BID22 | instance2 268 | +--------+ 269 | 270 +------+ +----------+ 271 |client|---|CFN node 1| SID2 272 +------+ +----------+ +--------+ service 273 | +--| BID32 | instance3 274 | | +--------+ 275 | +----------+ | 276 +------|CFN node 3|-------| SID3 277 +----------+ | +--------+ service 278 +--| BID33 | instance4 279 +--------+ 281 Figure 2: CFN-Dyncast Architectural Concept Example 283 4.2. Service Notification between Instances and CFN node 285 CFN-Dyncast service side is responsible to notify its attaching CFN 286 node about the mapping information of SID and BID when a new service 287 is instantiated, terminated, or its metrics (e.g., load) change, as 288 shown in Figure 3. 290 SID: Service ID 291 BID: Binding ID service info 292 (SID1, BID21, metrics) 293 (SID2, BID22, metrics) 294 <---------------> 295 SID1 296 +--------+ service 297 +--| BID21 | instance1 298 | +--------+ 299 +----------+ | 300 +------|CFN node 2|-------| SID2 301 | +----------+ | +--------+ service 302 | +--| BID22 | instance2 303 | +--------+ 304 | 305 +------+ +----------+ 306 |client|---|CFN node 1| SID2 307 +------+ +----------+ +--------+ service 308 | +--| BID32 | instance3 309 | | +--------+ 310 | +----------+ | 311 +------|CFN node 3|-------| SID3 312 +----------+ | +--------+ service 313 +--| BID33 | instance4 314 +--------+ 316 <----------------> 317 service info 318 (SID2, BID32, metrics) 319 (SID3, BID32, metrics) 321 Figure 3: CFN-Dyncast Service Notification 323 Computing resource information of service instances is key 324 information in CFN-Dyncast. Some of them are relatively static like 325 CPU/GPU capacity, and some are very dynamic, for example, CPU/GPU 326 utilization, number of sessions associated, number of queuing 327 requests. The service side has to notify and refresh this 328 information to its attaching CFN node. Various ways can be used, for 329 instance via protocol or via an API of the management system. 330 Conceptually, a CFN node keeps track of the SIDs and computing 331 metrics of all service instances attached to it in real-time. 333 4.3. CFN Dyncast Control Plane 335 CFN Dyncast needs a control plane allowing to share information about 336 resources and costs. Through the control plane, CFN nodes share and 337 update among themselves the service information and the associated 338 computing metrics for the service instances attached to it. As a 339 network node, CFN node also monitors the network state to other CFN 340 nodes. In this way, each CFN node is able to aggregate the 341 information and create a complete vision of the resources avaible and 342 the cost to reach them. For instance, for the scenario in Figure 3, 343 the different CFN nodes will learn that there exists two instances of 344 SID2, each of which has a certain computational capacity expressed in 345 the metrics. Different mechanisms can be used in updating the 346 status, for instance, BGP [RFC4760], IGP or controller based 347 mechanism. 349 An important question CFN Dyncast raises is on the different ways to 350 represent the computing metrics. A single digitalized value 351 calculated from weighted attributes like CPU/GPU consumption and/or 352 number of sessions associated may be the easiest. However, it may 353 not accurately reflect the computing resources of interest. Multi- 354 dimensional variables may give finer information, however the 355 structure and the algorithmic processing should be sufficiently 356 general to accommodate different type of services (i.e., metrics). 358 A second important issue is related to the system stability and 359 signaling overhead. As computing metrics may change very frequently, 360 when and how frequent such information should be exchanged among CFN 361 nodes should be determined. A spectrum of approaches can be 362 employed, interval based update, threshold update, policy based 363 update, etc. 365 4.4. Service Demand Dispatching 367 Assuming that the set of metric are well defined and that the update 368 rate is tailored so to have a stable system, the CFN Dyncast data 369 plane has the task to dispatch flows to the "best" service instance. 370 When a new flow comes to a CFN ingress, CFN ingress node selects the 371 most appropriate CFN egress in terms of the network status and the 372 computing resources of the attached service instances and guarantees 373 flow affinity for the flow from now on. 375 Flow affinity is one of the critical features that CFN-Dyncast should 376 support. The flow affinity means the packets from the same flow for 377 a service should always be sent to the same CFN egress to be 378 processed by the same service instance. 380 At the time that the most appropriate CFN egress and service instance 381 is determined when a new flow comes, a flow binding table should save 382 this flow binding information which may include flow identifier, 383 selected CFN node, affinity timeout value, etc. The subsequent 384 packets of the flow are forwarded based on the table. Figure 4 shows 385 an example of what a flow binding table at CFN ingress node can look 386 like. 388 +-----------------------------------------+------------+--------+ 389 | Flow Identifier | | | 390 +------+--------+---------+--------+------+ CFN egress | timeout| 391 |src_IP| dst_IP |src_port |dst_port|proto | | | 392 +------+--------+---------+--------+------+------------+--------+ 393 | X | SID2 | - | 8888 | tcp | CFN node 2 | xxx | 394 +------+--------+---------+--------+------+------------+--------+ 395 | Y | SID2 | - | 8888 | tcp | CFN node 3 | xxx | 396 +------+--------+---------+--------+------+------------+--------+ 398 Figure 4: Example of flow binding table 400 A flow entry in the flow binding table can be identified using the 401 classic 5-tuple value. However, it is worth noting that different 402 services may have different granularity of flow identification. For 403 instance, an RTP video streaming may use different port numbers for 404 video and audio, and it may be identified as two flows if 5-tuple 405 flow identifier is used. However they certainly should be treated as 406 the same flow. Therefore 3-tuple based flow identifier is more 407 suitable for this case. Hence, it is desired to provide certain 408 level of flexibility in identifying flows in order to apply flow 409 affinity. 411 Flow affinity attributes information can be configured per service in 412 advance. For each service, the information can include the flow 413 identifier type, affinity timeout value, etc. The flow identifier 414 type can indicate what are the values, for instance, 5-tuple, 3-tuple 415 or anything else that can be used as the flow identifier. Because we 416 deal with single services the matching rules have to be disjoint, 417 meaning that two different services need not have non-overlapping 418 matching flow set. 420 4.5. CFN Dispatcher 422 When a CFN node maintains the flow binding table, the memory consumed 423 is determined by the number of flows that CFN ingress node handles. 424 The ingress node can be an edge data center gateway, hence it may 425 cover hundreds of thousands of users and each user may have tens of 426 flows. The memory space consumption on binding table at the CFN 427 ingress node can be a concern. To alleviate it, a functional entity 428 called CFN Dispatcher can help. 430 CFN Dispatcher is deployed closer to the clients and it normally 431 handles the flows for a limited number of clients. In this case, the 432 memory space required by the binding table will be much smaller. CFN 433 dispatcher is a client side located entity which directs traffic to 434 an CFN egress node. It is not a CFN node itself, that is to say, it 435 does not participate in the status update about network and computing 436 metrics among CFN nodes. CFN dispatcher does not determine the best 437 CFN egress to forward packets for a new flow by itself. It has to 438 learn such information from a CFN node and maintains it to ensure the 439 flow affinity for the subsequent packets. In this way, the CFN node 440 simply selects the most appropriate egress for the new flows and 441 informs CFN dispatcher in explicit or implicit way. It is relieved 442 from flow binding table maintenance. 444 Figure 5 shows the interaction between an CFN Dispatcher and a CFN 445 node. After CFN node makes the service demand dispatch, it informs 446 the CFN dispatcher about the selected CFN egress node for the flow. 447 Then CFN dispatcher maintains the flow binding table to ensure the 448 flow affinity. Message exchange between the CFN dispatcher and its 449 corresponding CFN node needs to be defined. The CFN dispatcher can 450 simply forward the first packet of a flow to the CFN node, who takes 451 the decision of which instance to use and pushes this information in 452 the flow binding table of the CFN dispatcher. However, in case of 453 failures, e.g., CFN egress not reachable anymore, further interaction 454 is needed between the CFN dispacther and the CFN node. 456 SID: Service ID 457 BID: Binding ID 459 SID1 460 +--------+ 461 +--| BID21 | 462 binding info | +--------+ 463 (flow1,egress2) +----------+ | 464 (flow2,egress3) +--|CFN node 2|---| SID2 465 <----- | +----------+ | +--------+ 466 +------+ | +--| BID22 | 467 |Client|-+ | +--------+ 468 +------+ \ | 469 \ | 470 +--------------+ +----------+ 471 |CFN Dispatcher|-----|CFN Node 1| 472 +--------------+ +----------+ 473 / | SID3 474 +------+ / | +--------+ 475 |Client|-+ | +--| BID32 | 476 +------+ | | +--------+ 477 | +----------+ | 478 +--|CFN node 3|---| SID3 479 +----------+ | +--------+ 480 +--| BID33 | 481 +--------+ 483 Figure 5: Service Demand Dispatch with CFN Dispatcher 485 5. Summary of the key elements of CFN Dyncast Architecture 487 o CFN Control Plane: 489 * SID: CFN nodes have to made aware of existing services through 490 the existence of the corresponding SID. It can be achieved in 491 different ways. For example, use a special range or coding of 492 anycast IP address as service IDs or use DNS. 494 * BID bindings: SID are bound to a set of BID representing the 495 different instances of the service. Associated to these BID 496 there is as well a set of metrics describing the state of the 497 instance. These bindings have to be shared among the CFN nodes 498 so that they are aware of the different instances and their 499 computing resource status. 501 * Network state: CFN nodes have to be able to share network 502 status so to have an idea on the impact of the dispatching 503 decision in terms of link congestion. 505 * Metric and network status updates need to be sufficiently 506 sparse so to limit the signaling overhead and keep the system 507 stable, but also sufficiently regular so to make the system 508 reactive to sudden traffic fluctuations. 510 o CFN Data Plane: 512 * In case of a new flow: CFN ingress node selects the most 513 appropriate CFN egress in terms of the network status and the 514 computing resource of the service instance attached to the 515 egresses. 517 * Flow affinity: CFN ingress nodes make sure the subsequent 518 packets of an existing flow are always delivered to the same 519 CFN egress node so that they can be served by the same service 520 instance. 522 6. Conclusion (and call for contributions) 524 This document introduces an architecture for CFN Dyncast, enabling 525 the service demand request to be sent to an optimal edge to improve 526 the overall system load balancing. It can dynamically adapt to the 527 computing resources consumption and network status change and avoid 528 overloading single edges. CFN-Dyncast is a network based 529 architecture that supports a large number of edges and is independent 530 of the applications or services hosted on the edge. 532 This present document is a strawman for defining CFN-Dyncast 533 architecure. 535 More discussions on control plane and data plane approach are 536 welcome. 538 7. Security Considerations 540 TBD 542 8. IANA Considerations 544 No IANA action is required so far. 546 9. Informative References 548 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 549 "Multiprotocol Extensions for BGP-4", RFC 4760, 550 DOI 10.17487/RFC4760, January 2007, 551 . 553 [I-D.geng-rtgwg-cfn-dyncast-ps-usecase] 554 Geng, L., Liu, P., and P. Willis, "Dynamic-Anycast in 555 Compute First Networking (CFN-Dyncast) Use Cases and 556 Problem Statement", draft-geng-rtgwg-cfn-dyncast-ps- 557 usecase-00 (work in progress), October 2020. 559 Acknowledgements 561 TBD 563 Authors' Addresses 565 Yizhou Li 566 Huawei Technologies 568 Email: liyizhou@huawei.com 570 Luigi Iannone 571 Huawei Technologies 573 Email: Luigi.iannone@huawei.com 575 Jianfei He 576 City University of Hong Kong 578 Email: jianfeihe2-c@my.cityu.edu.hk 580 Liang Geng 581 China Mobile 583 Email: gengliang@chinamobile.com 585 Peng Liu 586 China Mobile 588 Email: liupengyjy@chinamobile.com 589 Yong Cui 590 Tsinghua University 592 Email: cuiyong@tsinghua.edu.cn