idnits 2.17.00 (12 Aug 2021)
/tmp/idnits19188/draft-huang-computing-delivery-in-routing-network-01.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** There are 19 instances of too long lines in the document, the longest
one being 6 characters in excess of 72.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The document doesn't use any RFC 2119 keywords, yet seems to have RFC
2119 boilerplate text.
-- The document date (7 March 2022) is 68 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
No issues found here.
Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 RTGWG D.H. Daniel
3 Internet-Draft B.T. Bin
4 Intended status: Standards Track ZTE Corporation
5 Expires: 8 September 2022 P.L. Peng
6 China Mobile
7 7 March 2022
9 Computing Delivery in Routing Network
10 draft-huang-computing-delivery-in-routing-network-01
12 Abstract
14 This document drafts a proposal of the architecture of Computing
15 Delivery in Routing Network which incorporates both computing and
16 networking metrics into the routing polices and enables the network
17 sensing and scheduling computing services based upon traditional
18 networking services. A mechanism of two-class computing status
19 granularity and two segment routing is illustrated for end-to-end
20 networking and computing service in the cloud sites, while major
21 networking and computing actors is defined in terms of functionality.
22 An example work flow is demonstrated, and both control plane and data
23 plane solution consideration is proposed..
25 Status of This Memo
27 This Internet-Draft is submitted in full conformance with the
28 provisions of BCP 78 and BCP 79.
30 Internet-Drafts are working documents of the Internet Engineering
31 Task Force (IETF). Note that other groups may also distribute
32 working documents as Internet-Drafts. The list of current Internet-
33 Drafts is at https://datatracker.ietf.org/drafts/current/.
35 Internet-Drafts are draft documents valid for a maximum of six months
36 and may be updated, replaced, or obsoleted by other documents at any
37 time. It is inappropriate to use Internet-Drafts as reference
38 material or to cite them other than as "work in progress."
40 This Internet-Draft will expire on 8 September 2022.
42 Copyright Notice
44 Copyright (c) 2022 IETF Trust and the persons identified as the
45 document authors. All rights reserved.
47 This document is subject to BCP 78 and the IETF Trust's Legal
48 Provisions Relating to IETF Documents (https://trustee.ietf.org/
49 license-info) in effect on the date of publication of this document.
50 Please review these documents carefully, as they describe your rights
51 and restrictions with respect to this document. Code Components
52 extracted from this document must include Revised BSD License text as
53 described in Section 4.e of the Trust Legal Provisions and are
54 provided without warranty as described in the Revised BSD License.
56 Table of Contents
58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
59 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
60 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
61 3. Computing delivery in routing network reference
62 architecture . . . . . . . . . . . . . . . . . . . . . . 4
63 3.1. Hierarchical granularity routing scheme . . . . . . . . . 5
64 3.2. Two-segment routing and forwarding . . . . . . . . . . . 6
65 3.3. Cross-domain computing routing and forwarding . . . . . . 6
66 3.4. CSI routing . . . . . . . . . . . . . . . . . . . . . . . 7
67 3.5. Traffic affinity . . . . . . . . . . . . . . . . . . . . 7
68 4. Computing delivery in routing network architecture work
69 flow . . . . . . . . . . . . . . . . . . . . . . . . . . 8
70 4.1. Computing resource and service update work flow . . . . . 8
71 4.2. Service flow routing and forwarding work flow . . . . . . 8
72 5. Control plane . . . . . . . . . . . . . . . . . . . . . . . . 9
73 5.1. Centralized control plane . . . . . . . . . . . . . . . . 9
74 5.2. Distributed control plane . . . . . . . . . . . . . . . . 9
75 5.3. Hybrid control plane . . . . . . . . . . . . . . . . . . 9
76 6. Data plane . . . . . . . . . . . . . . . . . . . . . . . . . 9
77 6.1. CSI encapsulation . . . . . . . . . . . . . . . . . . . . 10
78 6.2. CSI for GCR, CUR and LCR . . . . . . . . . . . . . . . . 10
79 7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
80 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10
81 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
82 10. Security Considerations . . . . . . . . . . . . . . . . . . . 11
83 11. Informative References . . . . . . . . . . . . . . . . . . . 11
84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11
86 1. Introduction
88 Computing-related services have been provided in such a way that
89 computing resources either are confined within isolated sites (data
90 centers, MECs etc.) without coordination among multiple sites or they
91 are coordinated and managed within specific and closed service
92 systems without fine-grained networking facilitation, while the
93 industry develops into an era in which the computing resources start
94 migrating from centralized data centers to distributed edge nodes.
96 Therefore substantial benefits in light of both cost and efficiency
97 resulting from scale of economy, would be brought into multiple
98 industries by intelligently and dynamically connecting the
99 distributed computing resources and rendering the coordinated
100 computing resources as a unified and virtual resource pool. On top
101 of the cost and efficiency gains, applications as well as services
102 would be served in a more sophisticated way in which computing and
103 networking resources could be aligned more efficiently and agilely
104 than conventional way in which the two are delivered in separate
105 systems. Some impressive drafts such as
106 [I-D.liu-dyncast-ps-usecases] and [I-D.li-dyncast-architecture]
107 analyze the benefits of routing related solution, and give the
108 reference architecture and preliminary test results. End
109 applications could be served not only by fine-grained computing
110 services but also fine-grained networking services rather than the
111 best-effort networking services without routing network involved
112 otherwise. The cost is the burden of maintaining and sensing
113 computing resource status in the networking layer. The architecture
114 proposal is designed to be as much smoothly compatible with the
115 ongoing routing architecture as possible.
117 1.1. Requirements Language
119 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
120 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
121 document are to be interpreted as described in [RFC2119].
123 2. Terminology
125 * Global Computing-related Routing Node (GCR): routing node
126 maintaining computing resource as well as service status from
127 remote cloud sites, and executing the cross-site routing policies
128 in terms of the aforementioned status as well as the
129 identification of computing resource and service. GCR usually
130 resides at the network edge and works as ingress of the end to end
131 service flow.
133 * Local Computing-related Routing Node (LCR): routing node
134 maintaining computing resource as well as service status from the
135 geographically local cloud sites and being responsible for the
136 last hop of the service flow towards the computing resource and
137 service instance in the specific cloud site. LCAT usually resides
138 at the network edge and works as egress of the end to end service
139 flow.
141 * Computing Unaware Routing Node (CUR): routing node unaware of
142 computing resource and service status and disregarding
143 encapsulation of the identification of computing service. CUR
144 usually resides between GCR and LCR and works as ordinary routing
145 nodes and stays irrelevant of computing delivery.
147 * Global Computing Resource and Service Status (GCRS): General cloud
148 site status of the computing resource and service which consists
149 of overall resource occupation and types of computing service
150 (algorithms, functions etc.) the specific cloud site provides.
151 GCRS is maintained at GCR and expected to remain relatively stable
152 and change in slow frequency.
154 * Local Computing Resource and Service Status (LCRS): fine-grained
155 cloud site status of the computing resource and service which
156 consists of status of each active computing service instance as
157 well as its parameters which impact the way the instance would be
158 selected and visited by LCR. LCRS is maintained at LCR and
159 expected to stay quite active and change in high frequency.
161 * Computing Service Identification (CSI): a globally unique
162 identification of a computing service with optional parameters,
163 and it could be an IPv6-like address or specifically designed
164 identification structure.
166 * Instantiated Computing Service (ICS): an active instance of a
167 computing service identification which resides in a host usually
168 purporting to a server, container or virtual machine.
170 3. Computing delivery in routing network reference architecture
172 Routing network is enabled sensing the computing resource and service
173 from the cloud sites and routing the service flow according to both
174 network and computing metrics by a computing delivery in routing
175 network architecture as illustrated in figure 1. The architecture is
176 a horizontal convergence of cloud and network, while the latter
177 maintains the converged resource status and thus is able to achieve
178 an end to end routing and forwarding policy from a perspective of
179 cloud and network resource. PE1 maintains GCRS with a whole picture
180 of the multiple cloud sites, and executes the routing policy for the
181 network segment between PE1 and PE2 or PE3, namely between ingress
182 and egress, while PE2 maintains LCRS with a focus picture of the
183 cloud site where S1 resides, and establishes a connection towards S1.
184 S1 is an active instance of a specific computing service type (CSI).
185 On top of the role of LCR which maintains LCRS, PE2 and PE3 also
186 fulfill the role GCR which maintains GCRS from neighboring cloud
187 sites. P provides traditional routing and forwarding functionality
188 for computing service flow, and remains unaware of any computing-
189 related status as well as CSI encapsulations.
191 +--------+ +--------+
192 +------>|LCR/GCR |------->| ICS |
193 | +--------+ +--------+
194 +--------+ +--------+ | PE2 S1
195 | GCR |--->| CUR |--+
196 +--------+ +--------+ | PE3 S2
197 PE1 P | +--------+ +--------+
198 +------>|LCR/GCR |------->| ICS |
199 +--------+ +--------+
200 |<------------ Network domain --------------->|<--Computing->|
201 domain
203 Figure 1
205 3.1. Hierarchical granularity routing scheme
207 Status updates of computing resource and service in the cloud sites
208 extend in a quite broad range from relatively stable service types
209 and overall resource occupation to extremely dynamic capacity changes
210 as well as busy and idle cycle of service instance. It would be a
211 disaster to build all of the status updates in the network layer
212 which would bring overburdened and volatile routing tables.
214 It should be reasonable to divide the wide range of computing
215 resource and services into different categories with differentiated
216 characteristics from routing perspective. GCRS and LCRS correspond
217 to cross-site domain and local site domain respectively, and GCRS
218 aggregates the computing resource and service status with low update
219 frequency from multiple cloud sites while LCRS focuses only upon the
220 status with high frequency in the local sites. Under this two-
221 granularity scheme, computing-related routing table of GCRS in the
222 GCR remains in a position roughly as stable as the traditional
223 routing table, and the LCRS in the LCR maintains a near synchronized
224 state table of the highly dynamic updates of computing service
225 instances in the local cloud site. Nonetheless, LCRS focusing upon a
226 single and local cloud site is the normal case while upon multiple
227 sites should be exemption if not impossible.
229 3.2. Two-segment routing and forwarding
231 When it comes to end to end service flow routing and forwarding,
232 there is an status information gap between GCRS and LCRS, therefore a
233 two-segment mechanism has to be in place in line with the two-
234 granularity routing scheme demonstrated in 3.1. As is illustrated in
235 figure 2, R1as ingress determines the specific service flow's egress
236 which turns out to be R2 according to policy calculation from GCRS.
237 In particular, the CSI from both in-band (user plane) and out-band
238 (control plane) is the only index for R1 to calculate and determine
239 the egress, it's highly possible to make this egress calculation in
240 terms of both networking (bandwidth, latency etc) and computing
241 Service Agreement Level. Nevertheless, the two SLA routing
242 optimization could be decoupled to such a degree that the traditional
243 routing algorithms could remain as they are. The convergence of the
244 SLA policies as well as the methods to make GCR aware of the two SLA
245 is out of scope of this proposal.
247 +--------+ +--------+ +--------+ +--------+
248 | GCRS |--->| |--->| LCRS |--->| ICS |
249 +--------+ +--------+ +--------+ +--------+
250 R1 R
251 |<---------- GCRS segment ---------->|<- LCRS ->|
252 segment
254 Figure 2
256 When the service flow arrives at R2 which terminates the GCRS segment
257 routing and determines S1 which is the service instance selected
258 according to LCRS maintained at R2. Again CSI is the only index for
259 LCRS segment routing process.
261 3.3. Cross-domain computing routing and forwarding
263 Co-ordinated computing resource scheduling among multiple regions
264 which are usually connected by multiple network domains, as
265 illustrated in section 1, is an important part of intended scenarios
266 with regard to why computing-based scheduling and routing is proposed
267 in the first place. The two-segment routing and forwarding scheme
268 illustrated in 3.2 is a typical use case of cross-domain computing
269 routing and forwarding and a good building block for the full-domain
270 scenario solution. Computing metric information is brought into
271 network domain to enable the latter scheduling routing policies
272 beyond network. However, a particular scheme has to be put in place
273 to ensure mild and acceptable impacts upon the ongoing IP routing
274 scheme. A consistent CSI across terminal, network (multiple domains)
275 and cloud along with hierarchical CSI-associated computing resource
276 and service status which corresponds with different network domains,
277 is the enhanced full-domain routing and forwarding solution. Each
278 domain maintains a corresponding computing resource and service
279 status at its edge node and makes the computing-based routing for the
280 domain-related segment which should be connected by the neighboring
281 segments.
283 3.4. CSI routing
285 CSI encapsulated in the headers and maintained in LCRS and GCRS
286 indicates an abstract service type rather than a geographically
287 explicit destination label, thus the routing scheme based upon CSI is
288 actually a two-part and two-layer process in which CSI only indicates
289 the routing intention of user's requested computing service type
290 where routing does not actually materialize in forwarding plane and
291 the explicit routing destination would be determined by LCRS and
292 GCRS. Therefore the actual routing falls within the traditional
293 routing scheme which remains intact.
295 Apart from the indication of computing service routing intention, CSI
296 could also indicates a specific network serivice requirements by
297 associating the networking service policy in GCRS which would
298 therefore schedule the network resources such as an SR tunnel,
299 guaranteed bandwidth etc. at egress.
301 Therefore, GCRS and LCRS in control plane along with CSI
302 encapsulation in user plane enables an logical computing routing sub-
303 layer which is able to be aware of the computing from cloud sites and
304 forward the service flow in terms of computing services as well as
305 computing resources. Nevertheless, this logical sub-layer remains
306 only relevant at ingress and egress nodes and is simply about
307 computing nodes selection rather than executing the real forwarding
308 and routing actions.
310 3.5. Traffic affinity
312 CSI holds the only semantics of the service type that could be
313 deployed as multiple instances within specific cloud site or across
314 multiple cloud sites, CSI in the destination field is not explicit
315 enough for all of the service flow packets to be forwarded to a
316 specific destination. Traffic affinity has to be guaranteed at both
317 GCR and LCR. Once the egress is determined at GCR, the binding
318 relationship between the egress and the service flow's unique
319 identification (5-tuple or other specifically designed labels) is
320 maintained and the subsequent flow could be forwarded upon this
321 binding table. Likewise LCR maintains the binding relationship
322 between the service flow identification and the selected service
323 instance.
325 Traffic affinity could be guaranteed by mechanisms beyond routing
326 layer, but they will not be in the scope of this proposal.
328 4. Computing delivery in routing network architecture work flow
330 4.1. Computing resource and service update work flow
332 The full range of computing resource and service status from a
333 specific cloud site is registered at LCR which maintains LCRS in
334 itself and notifies the part of GCRS to remote GCRs where GCRS would
335 be thus maintained and updated. As is illustrated in figure 3,GCR in
336 R1 from site1 and site 2 is updated by R2 and R3, while LCRS of site
337 1 in R2 is updated by S1 and LCRS of site 2 in R3 is updated by S2.
338 GCRS in R2 and R3 is updated by each other. Edge routers associating
339 with local cloud site establish a mesh fabric to update the according
340 GCRS among the whole network domain, the computing resource and
341 services in distributed cloud sites thus are connected and could be
342 utilized as a single pool for the applications rather than the
343 isolated islands.
345 +--------+ +--------+
346 +---------------------------|LCR/GCR |<-------| ICS |
347 | +--------+ +--------+
348 +-----V--+ +--------+ A R2 | S1
349 | GCR | | CUR | | |
350 +-----A--+ +--------+ | R3 V S2
351 R1 | R +--------+ +--------+
352 +---------------------------|LCR/GCR |<-------| ICS |
353 +--------+ +--------+
354 |<--------- GCRS update domain ----------->|<-----LCRS------>|
355 domain
357 Figure 3
359 4.2. Service flow routing and forwarding work flow
361 From perspective of the service work flow, more details have actually
362 been demonstrated in 3.2 and 3.3. Rather than the traditional
363 destination-oriented routing mechanism and the segment routing in
364 which the ingress router is explicitly aware of a specific
365 destination, CSI as an abstract label without semantics of physical
366 address works as the required destination from viewpoint of the user
367 in computing delivery in routing network architecture. Therefore the
368 service flow has to be routed and forwarded segment by segment in
369 which the two segment destinations are determined by GCRS and LCRS
370 respectively.
372 5. Control plane
374 5.1. Centralized control plane
376 LCRS's volatility makes it infeasible to be maintained and controlled
377 in a centralized entity, GCRS is the chief computing resource and
378 service status information to be collected and managed in the
379 controller when it comes to centralized control plane with regard to
380 computing delivery in routing network architecture. Routing and
381 forwarding policies from GCRS calculated in the centralized
382 controller, as is demonstrated in 3.2, apply only to the segment from
383 ingress and egress, while the second segment routing policy from
384 egress to the selected service instance in the cloud site is
385 determined by LCRS at egress.
387 Hierarchically centralized control plane architecture would be
388 strongly recommended under the circumstances of nationwide network
389 and cloud management.
391 5.2. Distributed control plane
393 GCRS is updated among the edge routers which have been connected in a
394 mesh way that each pair of edge routers could exchange GCRS to each
395 other, while LCRS will be unidirectionally updated from cloud site to
396 the associated edge router in which LCRS is maintained and its update
397 process is terminated.
399 Protocol consideration upon which GCRS and LCRS is updated is out of
400 the scope of this proposal and will be illustrated in another draft.
402 5.3. Hybrid control plane
404 It should be more efficient to update the GCRS by a distributed way
405 than a centralized way in terms of routing request and response in a
406 limited network and cloud domain, but be the opposite case in a
407 nationwide circumstance. This is how hybrid control plane could be
408 deployed in such a scheme that overall optimization is achieved.
410 6. Data plane
411 6.1. CSI encapsulation
413 Computing service identification is the predominant index across the
414 entire computing delivery in routing network architecture under which
415 a new virtual routing sub-layer is employed with CSI working as the
416 virtual destination. Data plane indicates the routing and forwarding
417 orientation with CSI by inquiring GCRS and LCRS at GCR and LCR
418 respectively. CSI encapsulation could be achieved by extending the
419 existing packet header and also achieved by designing a dedicated
420 shim layer, which along with the specific structure of CSI are out of
421 the scope of this proposal and will be illustrated in another draft.
423 6.2. CSI for GCR, CUR and LCR
425 GCR encapsulates CSI in a designated header format as a proxy by
426 translating the user-originated CSI format, and makes the first
427 segment routing policy and starts routing and forwarding the service
428 traffic. CUR ignores CSI and simply forwards the traffic as usual.
429 LCR decapsulates CSI and makes the second segment routing policy and
430 completes the last hop routing and forwarding.
432 7. Summary
434 It would significantly benefit the industry by connecting and
435 coordinating the distributed computing resources and services and
436 more so by further converging networking and computing resource.
437 Uncertainty and the potential impacts over the ongoing network
438 architecture is the main reason for the community to think twice. By
439 classifying the end to end routing and forwarding path into two
440 segments, the impacts from computing status and metrics are to be
441 reduced to a degree they would be as acceptable and comfortable
442 enough as they are as networking status and metrics. In particular,
443 employment of CSI in computing delivery in routing network
444 architecture enables a new service routing possibility perfectly
445 compatible with the ongoing routing architecture.
447 8. Acknowledgements
449 To be added upon contributions, comments and suggestions.
451 9. IANA Considerations
453 This memo includes no request to IANA.
455 10. Security Considerations
457 As information originated from the third party (cloud sites), both
458 GCRS and LCRS would be frequently updated in the network domain, both
459 security threats against the routing mechanisms and credibility and
460 security issues of the computing services should be taken into
461 account by architecture designing. The detailed analysis as well as
462 solution consideration will be proposed in the updated version of the
463 draft.
465 11. Informative References
467 [I-D.li-dyncast-architecture]
468 Li, Y., "Dynamic-Anycast Architecture", February 2021,
469 .
472 [I-D.liu-dyncast-ps-usecases]
473 Liu, Peng., "Dynamic-Anycast (Dyncast) Use Cases and
474 Problem Statement", February 2021,
475 .
478 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
479 Requirement Levels", BCP 14, RFC 2119,
480 DOI 10.17487/RFC2119, March 1997,
481 .
483 Authors' Addresses
485 Daniel Huang
486 ZTE Corporation
487 Nanjing
488 Phone: +86 13770311052
489 Email: huang.guangping@zte.com.cn
491 Bin Tan
492 ZTE Corporation
493 Nanjing
494 Phone: +86 13918622159
495 Email: tan.bin@zte.com.cn
497 Peng Liu
498 China Mobile
499 Beijing
500 Phone: +86 13810146105
501 Email: liupengyjy@chinamobile.com