idnits 2.17.00 (12 Aug 2021) /tmp/idnits22854/draft-li-coinrg-compute-resource-scheduling-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet has text resembling RFC 2119 boilerplate text. -- The document date (23 October 2021) is 203 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Z. Li 3 Internet-Draft K. Yao 4 Intended status: Informational Y. Li 5 Expires: 26 April 2022 China Mobile 6 23 October 2021 8 A Compute Resources Oriented Scheduling Mechanism based on Dataplane 9 Programmability 10 draft-li-coinrg-compute-resource-scheduling-00 12 Abstract 14 With massive data growing in the internet, how to effectively use the 15 compute resources has become a quite hot topic. In order to cool 16 down the pressure in today's large data centers, some compute 17 resources have been moved towards the edge, gradually forming a 18 distributed Compute Force Network. Force is a physical cause which 19 can change the state of a motion or an object. We refer the 20 definition from physics and extend its philosophy to network that in 21 future, the network can be a compute force which can facilitate the 22 integration of different kinds of compute resources, no matter 23 hardware or software, making the computation fast and effective. In 24 this draft, we present a compute resources oriented scheduling 25 mechanism based on dataplane programmability, which can effectively 26 schedule and manage compute resources in the network. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on 26 April 2022. 45 Copyright Notice 47 Copyright (c) 2021 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 52 license-info) in effect on the date of publication of this document. 53 Please review these documents carefully, as they describe your rights 54 and restrictions with respect to this document. Code Components 55 extracted from this document must include Simplified BSD License text 56 as described in Section 4.e of the Trust Legal Provisions and are 57 provided without warranty as described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Conventions Used in This Document . . . . . . . . . . . . . . 3 63 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 64 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 65 3. Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 3.1. Network Topology . . . . . . . . . . . . . . . . . . . . 3 67 3.2. Mechanism Statement . . . . . . . . . . . . . . . . . . . 5 68 4. Typical Way of Realization . . . . . . . . . . . . . . . . . 7 69 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 70 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 71 7. Normative References . . . . . . . . . . . . . . . . . . . . 8 72 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 74 1. Introduction 76 As Moore's law has been gradually reaching its limitation, the 77 computation of massive data and diverse computational requirements 78 can not be satisfied by simply upgrading the computation resources on 79 a single chip. There become an emerging trend that domain specific 80 computation resources like GPU, DPU and programmable switches are 81 becoming more and more popular, generating diverse use cases in the 82 network. For example, in network computing and in memory computing. 83 In network computing means using programmable switches or DPUs to 84 offload network functions so as to accelerate network speed. And in 85 memory computing means that the computer memory does not only serve 86 as the storage, but also provide the computation. With the 87 development of these domain specific architectures, network should 88 serve as a force which could facilitate the integration of all these 89 different types of computation resources, in turn forming a Compute 90 Force Network. In CFN, how to effectively schedule these computation 91 resources is a topic that's worthy of studying. 93 Current ways to do compute resources allocation include extending 94 protocols like DNS so as to realize the awareness and scheduling of 95 compute resources, but the management of these compute resources must 96 be done in the centralized controller. a DNS client wants to do some 97 computing tasks, e.g. Machine learning models training, and the 98 client will send a request to DNS server. Then, DNS server will 99 inform the client which compute node is available at the moment. 100 However, activating and deactivate this compute node to work, e.g. 101 creating a virtual machine, is done by centralized controller, which 102 we think is not very efficient and timely, considering massive data 103 waits to be computed in the network. The weakness above has provoked 104 an idea to realize the scheduling and management of compute resources 105 by extending current routing protocols like SRv6 with the help of 106 programmable network elements. The detailed design is presented in 107 this draft. 109 2. Conventions Used in This Document 111 2.1. Terminology 113 CFN Compute Force Network 115 DNS Domain Name Service 117 SRv6 Segment Routing over IPv6 119 GPU Graphics Processing Unit 121 DPU Data Processing Unit 123 2.2. Requirements Language 125 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 126 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 127 "OPTIONAL" in this document are to be interpreted as described in BCP 128 14[RFC2119][RFC8174] when, and only when, they appear in all 129 capitals, as shown here. 131 3. Design 133 The detailed design of the mechanism is presented in this section. A 134 typical topology will be shown below and the definition of each part 135 of the network topology will be given, and then the whole procedure 136 will be explained clearly the second subsection. 138 3.1. Network Topology 140 The network topology is shown in figure below where there are several 141 major parts inside, namely consumer, computation manament node, 142 compute node with programmable DPU, and programmable network element. 144 +------------------+ 145 |Compute node with | 146 |programmable DPU | 147 +------------------+ +---------+--------+ +-----------------+ 148 |Compute node with | | |Compute node with| 149 |programmable DPU | +--------+------+ |programmable DPU | 150 +--------+---------+ | programmable | +--------+--------+ 151 | +--+network element+---+ | 152 | | +---------------+ | | 153 +------+-------+ | | +-------+-------+ 154 | programmable +-----+ +---+ programmable | 155 |network element+----+ +---+network element| 156 +--------------+ | | +---------------+ 157 | | 158 | +-----------+ | 159 +----+ Consumer +-----+ 160 +-----------+ | +-----------------+ 161 ----+ Computation | 162 | management node | 163 +-----------------+ 165 Figure 1: Figure 1: Network Topology 167 - Consumer: End node generating computing tasks which need to be done 168 by compute resources 170 - Compute node: A network node that has the resources to finish 171 computing tasks generated by consumers,e.g. a server or a cluster of 172 servers. 174 - Programmable DPU: An unit that is connected to a compute node and a 175 programmable element, responsible for the lifetime management of 176 compute node and the communication with programmable element. 178 - Programmable network element: A network device which communicates 179 with customers and programmable DPU, forwarding messages 180 bidirectionaly including requests for computing resources, activating 181 or deactivating specific compute resource, and other routing 182 messages. 184 - Computation management node: A network node that has the full view 185 of the computation resources in the network, dynamically managing 186 these resources and generate consuming receipt. 188 3.2. Mechanism Statement 190 In this section, the detailed procedure of the communication between 191 the consumer and the compute management node which passes through 192 programmable DPU, programmable network element, and compute node will 193 be declared step by step . 195 1.Computation 196 Request +---------------+ 197 +----------+ +------------> | Programmable | 198 | Consumer | | | 199 +----------+ <------------+ |Network Element| 200 4.Compuation +---+-+---------+ 201 Response ^ | 202 | | 203 | | 204 2.Compute Resource | | 3.Registration 205 Consuming Request | | Response 206 Registration | | 207 +-------------+ | 208 | | 209 +-----+------+ | 210 | Compute +<-------+ 211 | Management | 212 | Node | 213 +------------+ 215 Figure 2: Figure 2: Computation Request Procedure 217 * Step1: computation request registration. When a consumer wants to 218 do some computing tasks, e.g. machine learning model training, it 219 first needs to send a request message to the compute management node 220 for computation resource pre-allocation. The message is passed 221 through programmable network element where some modification on the 222 packet header can be done on the dataplane. Information like 223 computation category, configuration template can be added into packet 224 header, which could notify the compute management node that what kind 225 of computation resource it needs to shedule,e.g. how many GPUs are 226 needed in the task. Afterwards, The management node will send back a 227 message in which the specific computation node IP address is 228 inserted. If no such comptation node is available at the moment, the 229 manament node will send back a refusal. And at last, the 230 programmable network element will forward the message to the 231 consumer. 233 1.computation 234 task +---------------+ 235 +----------+ +------------> | Programmable | 236 | Consumer | | | 237 +----------+ <------------+ |Network Element| 238 +-----+--+------+ 239 | ^ 240 | | 2.Computation 241 | | Message Routing 242 | | 243 3.Activation v | 244 +----------+ +-----+--+------+ 245 | Compute | <----------+ | Programmable | 246 | Node | | DPU | 247 +----------+ +----------> +---------------+ 249 Figure 3: Figure 3: Computation Activation 251 * Step 2:Computation activation. Consumer will send the actual 252 computation task to programmable network element which will do some 253 modification on the packet. The activation message of the compute 254 node will be encapsulated into the packet which could enable the 255 lifetime management of the computation and the working progress of 256 the compute node. And then, the message will be forwarded to the 257 programmable DPU directly connected to the compute node where the 258 decapsulation of the packet will be done. The DPU will tell the 259 compute node to work and dynamically monitor the state of the compute 260 node until the task is finished. 262 +---------------+ 263 | Computation | 264 |Management Node| 265 +----+---+------+ 266 | ^ 267 3.Response | | 2.Finish 268 | | Notification 269 | | 270 1.Consumption | | 271 Finish v | 272 Request +----+---+------+ 273 +----------+ +------------> | Programmable | 274 | Consumer | | | 275 +----------+ <------------+ |Network Element| 276 +------+--+-----+ 277 | ^ 278 | | 279 | | 280 4.Deactivation | | 281 v | 282 +----------+ +------+--+-----+ 283 | Compute | +------------> | Programmable | 284 | Node | | DPU | 285 +----------+ <------------+ +---------------+ 286 5.Resource 287 Reclaim 289 Figure 4: Figure 4: Consumption Finish 291 * Step 3: When the compute node notify the consumer that the task has 292 been finished, the consumer will decide whether there is any waiting 293 task, if not, the consumer will send a consumption finish request to 294 the computation management node. Like computation request 295 registration, the programmable network element will then insert 296 information of the compute node and forward the notification message 297 to the computation management node. when the programmable network 298 element receives a response message, it will start deactivation 299 procedure and tell the compute node to collect back the resource used 300 for previous computation. This is the end the lifetime of 301 computation of a single task. 303 4. Typical Way of Realization 305 The mechanism stated in above section can be realized by extending 306 protocols like SRv6. The lifetime management message can be inserted 307 dynamically in dataplane with the help of those programmable 308 hardware. Such modification can be done flexibly and in line rate. 310 5. Security Considerations 312 TBD. 314 6. IANA Considerations 316 TBD. 318 7. Normative References 320 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 321 Requirement Levels", BCP 14, RFC 2119, 322 DOI 10.17487/RFC2119, March 1997, 323 . 325 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 326 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 327 May 2017, . 329 Authors' Addresses 331 Zhiqiang Li 332 China Mobile 333 Beijing 334 100053 335 China 337 Email: lizhiqiangyjy@chinamobile.com 339 Kehan Yao 340 China Mobile 341 Beijing 342 100053 343 China 345 Email: yaokehan@chinamobile.com 347 Yang Li 348 China Mobile 349 Beijing 350 100053 351 China 353 Email: liyangzn@chinamobile.com