idnits 2.17.00 (12 Aug 2021) /tmp/idnits52100/draft-giraltyellamraju-alto-bsg-requirements-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (22 March 2022) is 53 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 0 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ALTO J. Ros-Giralt 3 Internet-Draft S. Yellamraju 4 Intended status: Informational Qualcomm 5 Expires: 23 September 2022 Q. Wu 6 Huawei 7 L.M. Contreras 8 Telefonica 9 R. Yang 10 Yale University 11 K. Gao 12 Sichuan University 13 22 March 2022 15 Supporting Bottleneck Structure Graphs in ALTO: Use Cases and 16 Requirements 17 draft-giraltyellamraju-alto-bsg-requirements-01 19 Abstract 21 This document proposes an extension to the base Application-Layer 22 Traffic Optimization (ALTO) protocol to support bottleneck structures 23 as an efficient representation of the state of a network. Bottleneck 24 structures are efficient computational graphs that allow network 25 operators and application service providers to optimize application 26 performance in a variety of communication problems including routing, 27 flow control, flow scheduling, bandwidth prediction, and network 28 slicing, among others. This document introduces a new abstraction 29 called Bottleneck Structure Graph (BSG) and the necessary 30 requirements to integrate it into the ALTO standard. 32 About This Document 34 This note is to be removed before publishing as an RFC. 36 The latest revision of this draft can be found at 37 https://giralt.github.io/draft-ietf-alto-gradient-graph/draft- 38 giraltyellamraju-alto-bsg-requirements.html. Status information for 39 this document may be found at https://datatracker.ietf.org/doc/draft- 40 giraltyellamraju-alto-bsg-requirements/. 42 Discussion of this document takes place on the WG Working Group 43 mailing list (mailto:alto@ietf.org), which is archived at 44 https://mailarchive.ietf.org/arch/browse/alto/. 46 Source for this draft and an issue tracker can be found at 47 https://github.com/giralt/draft-ietf-alto-gradient-graph. 49 Status of This Memo 51 This Internet-Draft is submitted in full conformance with the 52 provisions of BCP 78 and BCP 79. 54 Internet-Drafts are working documents of the Internet Engineering 55 Task Force (IETF). Note that other groups may also distribute 56 working documents as Internet-Drafts. The list of current Internet- 57 Drafts is at https://datatracker.ietf.org/drafts/current/. 59 Internet-Drafts are draft documents valid for a maximum of six months 60 and may be updated, replaced, or obsoleted by other documents at any 61 time. It is inappropriate to use Internet-Drafts as reference 62 material or to cite them other than as "work in progress." 64 This Internet-Draft will expire on 23 September 2022. 66 Copyright Notice 68 Copyright (c) 2022 IETF Trust and the persons identified as the 69 document authors. All rights reserved. 71 This document is subject to BCP 78 and the IETF Trust's Legal 72 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 73 license-info) in effect on the date of publication of this document. 74 Please review these documents carefully, as they describe your rights 75 and restrictions with respect to this document. Code Components 76 extracted from this document must include Revised BSD License text as 77 described in Section 4.e of the Trust Legal Provisions and are 78 provided without warranty as described in the Revised BSD License. 80 Table of Contents 82 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 83 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 84 3. Brief Introduction to Bottleneck Structures . . . . . . . . . 4 85 3.1. Example of Bottleneck Structure . . . . . . . . . . . . . 4 86 3.2. Propagation Properties . . . . . . . . . . . . . . . . . 7 87 3.3. Forces of Interaction among Flows and Links . . . . . . . 7 88 3.4. Ripple Effects in a Communication Network . . . . . . . . 8 89 3.5. Not all Bottleneck Links Are Born Equal . . . . . . . . . 8 90 3.6. Quantifying the Ripple Effects . . . . . . . . . . . . . 9 91 3.7. Types of Bottleneck Structures . . . . . . . . . . . . . 11 92 3.8. Computing Optimized Network Reconfigurations . . . . . . 12 93 3.9. Types of Network Reconfigurations . . . . . . . . . . . . 12 94 4. ALTO Bottleneck Structure Service Use Cases . . . . . . . . . 15 95 4.1. Application Rate Limiting for CDN and Edge Cloud 96 Applications . . . . . . . . . . . . . . . . . . . . . . 15 98 4.2. Time-bound Constrained Flow Acceleration for Large Data Set 99 Transfers . . . . . . . . . . . . . . . . . . . . . . . . 15 100 4.3. Application Performance Optimization Through AI 101 Modeling . . . . . . . . . . . . . . . . . . . . . . . . 18 102 4.4. Optimized Joint Routing and Congestion Control . . . . . 19 103 4.5. Service Placement for Edge Computing . . . . . . . . . . 20 104 4.6. Training Neural Networks and AI Inference for Edge Clouds, 105 Data Centers and Planet-Scale Networks . . . . . . . . . 21 106 5. Example: Application Layer Traffic Optimization using 107 Bottleneck Structures . . . . . . . . . . . . . . . . . . 23 108 6. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 28 109 6.1. Requirement 1: Bottleneck Structure Graph (BSG) 110 Abstraction . . . . . . . . . . . . . . . . . . . . . . . 28 111 6.2. Requirement 2: Information Received from the Network . . 29 112 6.3. Requirement 3: Information Passed to the Application . . 30 113 6.4. Requirement 4: Features Needed to Support the Use 114 Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 30 115 7. Security Considerations . . . . . . . . . . . . . . . . . . . 31 116 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 117 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 118 9.1. Normative References . . . . . . . . . . . . . . . . . . 33 119 9.2. Informative References . . . . . . . . . . . . . . . . . 33 120 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 122 1. Introduction 124 Bottleneck structures have been recently introduced in [G2-SIGCOMM] 125 and [G2-SIGMETRICS] as efficient computational graphs that embed 126 information about the topology, routing and flow information of a 127 network. These computational graphs allow network operators and 128 application service providers to compute network derivatives that can 129 be used to make traffic optimization decisions. For instance, using 130 the bottleneck structure of a network, a real-time communication 131 (RTC) application can efficiently infer the multi-hop end-to-end 132 available bandwidth, and use that information to tune the encoder's 133 transmission rate and optimize the user's Quality of Experience 134 (QoE). Bottleneck structures can be used by the application to 135 address a wide variety of communication optimization problems, 136 including routing, flow control, flow scheduling, bandwidth 137 prediction, and network slicing, among others. 139 This document introduces a new abstraction called Bottleneck 140 Structure Graph (BSG) and the necessary requirements to integrate it 141 into the existing ALTO services (Network Map, Cost Map, Entity 142 Property Map and Endpoint Cost Map) exposing the properties of the 143 bottleneck structure to help optimize application performance. Use 144 cases are also introduced to motivate the relevancy of bottleneck 145 structures in the context of the ALTO standard and support the 146 description of the integration requirements. 148 2. Conventions and Definitions 150 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 152 "OPTIONAL" in this document are to be interpreted as described in 153 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 154 capitals, as shown here. 156 3. Brief Introduction to Bottleneck Structures 158 [G2-SIGMETRICS] and [G2-SIGCOMM] introduce a new mathematical 159 framework to optimize network performance called the Quantitative 160 Theory of Bottleneck Structures (QTBS). The core building block of 161 QTBS is a computational graph called _bottleneck structure_, which 162 allows to qualify and quantify the forces of interactions that flows 163 and bottleneck links exert on each other. QTBS builds the bottleneck 164 structure by assuming that flows in a network aim at maximizing 165 throughput while ensuring fairness. This general principle holds for 166 all the relevant congestion control algorithms used in IP networks 167 (e.g., TCP Cubic, BBR, QUIC, etc.). 169 3.1. Example of Bottleneck Structure 171 Consider as an example the following network configuration: 173 f3 f6 f1 174 | | | 175 | | | 176 +--+-+-+---+ 177 | | | | | 178 | | | +---+--- l1 179 | | | | c1=25 180 | | | | 181 +--+-+-----+ 182 | | 183 | | +----- f2 184 | | | 185 +--+-+-+---+ 186 | | | | | 187 ----+--+ | | | l2 188 | | | | c2=50 189 f4 ----+--+ | | | 190 | | | | 191 +--+-+-+---+ 192 | | | 193 | | +----- 194 | | 195 +--+-+-----+ 196 | | | | 197 ----+--+ +-----+---- l3 198 | | c3=100 199 ----+----+ | 200 | | | 201 +----+-----+ 202 | 203 | 204 +----+-----+ 205 | | | l4 206 f5 ---+----+ | c4=75 207 | | 208 | | 209 +----------+ 211 Figure 1: Network configuration example. 213 Each link li is represented by a squared box and has a capacity ci. 214 For instance, link l1 is represented by the top most squared box and 215 has a capacity of c1=25 units of bandwidth. In addition, each flow 216 is represented by a line that passes through the set of links it 217 traverses. For instance, flow f6 traverses links l1, l2 and l3. 219 The bottleneck structure of this network corresponds to the following 220 digraph (see [G2-TREP] for details on how a bottleneck structure is 221 computed): 223 +------+ +------+ +------+ 224 | | | | | | 225 | f1 <--> l1 <---------------> f6 | 226 | | | | +------------+ | 227 +------+ +--^---+ | +---+--+ 228 | | | 229 | | | 230 +--v---+ | | 231 | | | | 232 | f3 | | | 233 | | | | 234 +--+---+ | | 235 | | | 236 | | | 237 +--v---+ | +------+ +--v---+ +------+ 238 | <--+ | | | | | | 239 | l2 <---->| f4 +---> l3 | | l4 | 240 | | | | | | | | 241 +--^---+ +------+ +--^---+ +---^--+ 242 | | | 243 +--v---+ | +------+ | 244 | | | | | | 245 | f2 | +-> f5 <-+ 246 | | | | 247 +------+ +------+ 249 Figure 2: Bottleneck structure of the network in Figure 1. 251 The bottleneck structure is interpreted as follows: 253 * Links and flows are represented by vertices in the graph. 255 * There is a directed edge from a link l to a flow f if and only if 256 flow f is bottlenecked at link l. 258 * There is a directed edge from a flow f to a link l if and only if 259 flow f traverses link l. 261 For instance, in Figure 2 we have that flow f3 is bottlenecked at 262 link l1 (since there is a directed edge from l1 to f3) and it 263 traverses links l1 and l2 (since there is a directed edge from f3 to 264 l1 and from f3 to l2). Note that when a flow is bottlenecked at a 265 link, then the edge connecting them in the bottleneck structure must 266 necessarily be bidirectional. This is because a flow that is 267 bottlenecked at a link, must necessarily traverse that link too. 268 Indeed, in Figure 2 we can see that all the directed edges from a 269 link to a flow, are in fact bidirectional edges. This is important 270 to ensure that bottleneck structures correctly model how 271 perturbations in a network propagate, as we explain in the next 272 section. 274 3.2. Propagation Properties 276 Under the assumption of max-min fairness [GALLAGER], QTBS 277 demonstrates the following two properties [G2-SIGMETRICS]: 279 * *Property 1. Flow perturbation*. An infinitesimal change in the 280 transmission rate of a flow f will have an effect on the 281 transmission rate of a flow f' if and only if the bottleneck 282 structure has a directed path from flow f to flow f'. 284 * *Property 2. Link perturbation*. An infinitesimal change in the 285 capacity of a link l will have an effect on the transmission rate 286 of a flow f' if and only if the bottleneck structure has a 287 directed path from link l to flow f. 289 The above two properties qualitatively relate to the classic question 290 in chaos theory: Can the flap of a butterfly's wings in Brazil set 291 off a tornado in Texas? [LORENZ] Obviously a butterfly alone cannot 292 create a tornado, but every element is interconnected in a 293 distributed system, and even the flap of a butterfly's wings in 294 Brazil will have an effect in Texas. Bottleneck structures are 295 graphs that characterize and quantify such type of effects in a 296 communication network. In particular, a bottleneck structure reveals 297 how a perturbation propagates through the network, describing which 298 flows will be affected and by what magnitude. 300 3.3. Forces of Interaction among Flows and Links 302 Bottleneck structures are powerful computational graphs because they 303 are able to capture the forces of interaction that flows and 304 bottleneck links exert on each other. These forces of interaction 305 are in general non-intuitive, even for a small simple network 306 configuration like the one in Figure 1. For instance, from Property 307 2, the bottleneck structure reveals that a small variation in the 308 capacity of link l2 (e.g., in a wireless network, a variation in the 309 capacity of a link could be due to a change in the signal to noise 310 ratio of the communication channel) will propagate through the 311 network and have an impact on the transmission rate of flows f2, f4 312 and f5 (since from Property 2, the bottleneck structure has a 313 directed path from link l2 to each of these flows). However, such a 314 perturbation will have no effect on the transmission rate of flows 315 f1, f3 and f6 (since there is no path from l2 to any of these other 316 flows). Similarly, from property 1, a small perturbation on the rate 317 of flow f4 (e.g., this could be due to the effect of a traffic shaper 318 altering the transmission rate of flow f4), will have an impact on 319 the rate of flows f2 and f5, but it will have no effect on the rate 320 of flows f1, f3 and f6. 322 3.4. Ripple Effects in a Communication Network 324 As another example, given the network in Figure 1, it is also not 325 intuitive to foresee that flows f1 and f5 are related to each other 326 by the forces of interaction inherent to the communication network, 327 even though they do not traverse any common link. Specifically, flow 328 f1 traverses link l1, while flow f5 traverses links l3 and l4. In 329 between both flows, there is an additional hop (link l2) further 330 separating them. Despite not being directly connected, the 331 bottleneck structure reveals that a small perturbation on the 332 performance of flow f1 (i.e., a change in its transmission rate), 333 will create a ripple effect that will reach flow f5, affecting its 334 transmission rate. In particular, the perturbation on flow f1 will 335 propagate through the bottleneck structure and reach flow f5 via the 336 following two paths: 338 f1 -> l1 -> f3 -> l2 -> f4 -> l3 -> f5 340 f1 -> l1 -> f6 -> l3 -> f5 342 It is also not intuitive to see that the reverse is not true. That 343 is, a small perturbation on flow f5, will have no effect on flow f1, 344 since the bottleneck structure has no direct path from vertex f5 to 345 vertex f1. In [G2-SIGMETRICS], extensive empirical validation of 346 these results is presented for a variety congestion-controlled IP 347 networks. 349 3.5. Not all Bottleneck Links Are Born Equal 351 Bottleneck structures also reveal that not all bottleneck links have 352 the same relevancy. In Figure 2, links at the top of the graph have 353 a higher impact on the overall performance of the network than links 354 at the bottom. For instance, consider link l1. A variation on its 355 capacity will create a ripple effect that will impact the performance 356 of all the flows in the network, since all flow vertices are 357 reachable from vertex l1 according to the bottleneck structure. In 358 contrast, link l3 has a much smaller impact on the overall 359 performance of the network, since a variation of its capacity will 360 affect flow f5, but have no impact on any of the other flows. This 361 information is valuable to network operators and application service 362 providers as it can be used to make informed network optimization 363 decisions. For instance, in edge computing, an operator could choose 364 to place a containerized service (e.g., for extended reality, XR) on 365 compute nodes that would yield communication paths traversing 366 bottleneck links with lower impact on the overall performance of the 367 network (See the use case in Section 4.5 for more details). 368 Similarly, in network slicing (or capacity planning in general), 369 operators could choose to allocate more bandwidth on those links that 370 are more influential (i.e., those links that are at lower levels in 371 the bottleneck structure) according to the expected traffic pattern 372 in the network slice. 374 Overall, bottleneck structures provide a mechanism to rank bottleneck 375 links according to their impact on the overall performance of the 376 network. This information can be used in a variety of optimization 377 problems, such as traffic engineering, routing, capacity planning, or 378 resilience analysis, among others. 380 3.6. Quantifying the Ripple Effects 382 Bottleneck structures not only allow network operators to reason 383 about the qualitative nature of the forces that flows and links exert 384 on each other, but they also provide a mathematical framework to 385 quantify such forces. In particular, the Quantitative Theory of 386 Bottleneck Structures (QTBS) introduced in [G2-TREP] provides a 387 mathematical framework that uses bottleneck structures as efficient 388 computational graphs to quantify the impact that perturbations in a 389 network have on all of its flows. 391 One of the core building blocks of the QTBS framework is the concept 392 of _link and flow equations_, which mathematically characterize how a 393 perturbation in a network propagates through each of the link and 394 flow vertices in the bottleneck structure. (See [G2-TREP] for an 395 exact mathematical formulation.) Because quantifying the effect of a 396 perturbation on a system is nothing more than computing a derivative 397 of the system's performance with respect to the parameter that's been 398 perturbed, bottleneck structures can be used as efficient and 399 scalable computational graphs to calculate flow and link derivatives 400 in a communication network. 402 Consider for instance the computation of the following derivative: 404 dF()/dri 406 where F() represents the total throughput of the network (the sum of 407 all flows' throughput) and ri is the transmission rate of flow fi. 408 For example, this expression can be used to compute the effect of 409 traffic shaping a flow (slightly reducing its rate) on the total 410 throughput of the network. Computing this derivative using a 411 traditional calculus approach is both very complex and costly, since 412 it requires modeling the congestion control algorithm in the function 413 F(), for which there is no closed form solution. Using bottleneck 414 structures, however, the computation of this derivative is both 415 simple and inexpensive. It is simple because it can be done by 416 applying an infinitesimal change to the rate of flow fi and then 417 using the link and flow equations to measure how this perturbation 418 propagates through the bottleneck structure [G2-TREP], [G2-SIGCOMM]. 419 It is also very efficient because the computation is performed by 420 applying delta calculations on the bottleneck structure, without 421 involving links and flows that are not affected by the perturbation. 422 For instance, in Figure 1, the computation of dF()/df4 only requires 423 recomputing the transmission rates of flows f2 and f5, without the 424 need to recompute the rates of f1, f3 and f6, since these other flows 425 are not affected by the perturbation. In practice, QTBS provides a 426 methodology to compute network derivatives two or three orders of 427 magnitude faster than general purpose methods such as liner 428 programming [G2-SC]. 430 We finish this brief introduction to QTBS by stating the monotonic 431 bandwidth allocation property that all bottleneck structures satisfy: 433 * *Property 3. Monotonic bandwidth allocation (MBA).* Let si be the 434 transmission rate of the flows bottlenecked at link li. Then, for 435 any path in the bottleneck structure of the form 437 l1 -> f1 -> l2 -> f2 -> (...) -> ln -> fn 439 we have that 441 s1 < s2 < (...) < sn 443 The MBA property is relevant in that it states that bottlenecks 444 located at higher levels in the bottleneck structure will have more 445 bandwidth available than those located at lower levels. For 446 instance, this property indicates that an application requiring high 447 bandwidth should route its traffic through paths that involve links 448 at higher levels in the bottleneck structure. We will be using the 449 MBA property to reason about application performance in some of the 450 examples described in this document. 452 3.7. Types of Bottleneck Structures 454 While QTBS introduces a core definition of bottleneck structure (see 455 Section 3.1), there exist multiple types of bottleneck structures 456 that can be computed depending on the level of granularity and 457 information desired by the operator. Next, we introduce three types 458 of bottleneck structures that will be used in this document and that 459 are suitable to optimize application performance in the context of 460 the ALTO standard: 462 * *Flow gradient graph (FGG)*. This type of bottleneck structure 463 corresponds to the base definition introduced in Section 3.1. The 464 FGG has the finest level of granularity, including a vertex in 465 each graph for each link and flow in the network. Therefore, an 466 FGG can be relatively large (e.g, with millions of vertices). 468 * *Path gradient graph (PGG)*. One technique to reduce the size of 469 the bottleneck structure without affecting its accuracy is to 470 collapse all the vertices of the flows that follow the same path 471 into a single vertex called a _path vertex_. The resulting 472 bottleneck structure is called the path gradient graph (PGG). A 473 PGG usually has 2 or 3 orders of magnitude less vertices than the 474 FGG. 476 * *QoS-Path gradient graph (Q-PGG)*. Some networks assign different 477 types of traffic to different QoS classes. A Q-PGG can model QoS 478 by collapsing all the vertices of the flows that follow the same 479 path and have the same QoS class into a single vertex called a _Q- 480 path vertex_. A Q-PGG is slightly larger than a PGG (with 481 about |Q| times more vertices, where |Q| is the number of QoS 482 classes supported by the network) but still significantly smaller 483 than the FGG. 485 For most of the applications, it is recommended to use a PGG, or a 486 Q-PGG if the network supports QoS classes, since these bottleneck 487 structures are significantly smaller and faster to process than an 488 FGG, and it is often the case that the operator does not need to know 489 flow-level information in order to make proper application 490 performance optimization decisions. Note also that the PGG and the 491 Q-PGG provide the additional security advantage of hiding flow-level 492 information from the graph. This can be important to operators that 493 are sensitive about security and privacy. 495 3.8. Computing Optimized Network Reconfigurations 497 A central element to the theory of bottleneck structures is the 498 ability to efficiently compute derivatives on a network. Derivatives 499 are a core building block of the optimization framework, as they 500 reveal the directions (gradients) in the feasible set that can help 501 bring the network to a higher level of performance. In this 502 document, we will refer to these directions in the feasible set as 503 _network reconfigurations_, since that's what they effectively are in 504 the physical world. 506 For instance, an example of network reconfiguration can be the action 507 of rate limiting a flow carrying XR traffic to match the available 508 bandwidth along its path with the goal to improve its QoE. Another 509 example of network reconfiguration is the action of rerouting traffic 510 through a new path in order to accelerate the transfer of a large 511 backup data set between two cloud data centers. A third example can 512 be the deployment of a new network slice in a 5G network in order to 513 ensure the QoS of a V2X service. In each of these actions, the 514 network configuration is moved along a direction (a gradient, if the 515 change maximally improves the performance objective) within the 516 feasible set of possible configurations. 518 While derivatives describe how the performance of a network changes 519 when a very small (infinitesimal) change is applied to its 520 configuration, network reconfigurations can accept changes to the 521 network that are arbitrarily large. For instance, traffic shaping a 522 set of flows to reduce their rates by 10 Mbps is a network 523 reconfiguration that is not infinitesimal. We note that bottleneck 524 structures can also be used to compute optimized network 525 reconfigurations consisting of non-infinitesimal changes in the 526 network. This can be done by first computing derivatives using the 527 bottleneck structure to find a direction (gradient) in the feasible 528 set, and then reconfiguring the network by following that direction. 529 This process can be repeated iteratively until a final optimized 530 reconfiguration is achieved. (See for example [G2-SIGCOMM] and 531 [G2-TREP] for examples of algorithms using this technique.) 533 In the next section, we summarize some of the network 534 reconfigurations that can be optimized by using bottleneck 535 structures. 537 3.9. Types of Network Reconfigurations 539 The following is a list of some of the network reconfigurations that 540 can be efficiently computed and optimized using bottleneck 541 structures: 543 * *Flow routing*. Both the operation of routing a new flow or 544 rerouting an existing flow on a network can be modeled as a 545 perturbation, whose impact can be efficiently measured using 546 bottleneck structures. In particular, QTBS can be used to resolve 547 the joint congestion control and routing optimization problem for 548 individual flows (see Section 3.1 in [G2-TREP]). 550 * *Traffic shaping*. Traffic shaping a flow corresponds to the 551 action of taking a derivative with respect to the rate of the 552 flow. Bottleneck structures can be used by network operators and 553 application service providers to compute such perturbations. For 554 instance, to accelerate a large scale data transfer, an 555 application can use bottleneck structures to identify optimal 556 traffic shaping configurations (see Section 3.3 in [G2-TREP]). 558 * *Bandwidth enforcement*. In high-performance networks that target 559 close to 100% link utilization such as Google's B4 network 560 [B4-SIGCOMM], a centralized SDN controller is used collect the 561 state of the network and compute an optimized multipath bandwidth 562 allocation vector. The solution is then deployed at the edge of 563 the network using a technique known as bandwidth enforcement 564 [BE-SIGCOMM]. By using bottleneck structures to efficiently 565 compute changes in the bandwidth allocated to each flow path, 566 operators can efficiently derive improved bandwidth allocation 567 vectors. 569 * *Flow scheduling*. When a flow initiates transmitting data on a 570 network, it uses bandwidth along its path, creating a ripple 571 effect that impacts the performance of other flows in the network. 572 Similarly, the termination of a flow frees bandwidth along its 573 path, creating another perturbation that propagates through the 574 network. Bottleneck structures can efficiently model and compute 575 the effect of flow arrival and departure in a communication 576 network by using simple delta calculations according to the link 577 and flow equations (see Section 3.6 and [G2-SIGCOMM]). This 578 information can be used by applications that need to perform bulk 579 data transfer to decide when to schedule a flow. More in 580 particular, it can be used to enhance the ALTO Cost Calendar 581 service [RFC8896]. 583 * *Service placement*. Deploying application services in a network 584 requires deciding the location of the compute and storage 585 resources needed to run the service. For instance, in edge 586 computing, an extended reality (XR) server could be deployed at 587 the distributed unit (DU), the central unit (CU), the mobile core 588 (MC) or the central cloud [PETERSON]. Bottleneck structures can 589 be used to measure the effect of placing a service on each of the 590 candidate locations, helping the application service provider to 591 make optimized decisions. 593 * *Multi-job scheduling*. Running a job on a network implies a 594 number of flows will be initiated and terminated throughout the 595 execution of the job. the ripple effects generated from the 596 execution of a job can also be measured using bottleneck 597 structures. This can be used to decide when to optimally launch 598 one or more jobs. For instance, in a data center, bottleneck 599 structure analysis can help the application decide how to 600 optimally schedule multiple AI training or inference jobs that are 601 sharing the same interconnect [G2-SIGCOMM]. 603 * *Link capacity upgrades*. In capacity planning, operators often 604 have a fixed budget and need to decide how to optimally add 605 capacity to a network in order to maximize its performance. The 606 effect of a link upgrade operation can be computed as a derivative 607 with respect to a change (an increase) in the capacity of a link. 608 Through the processing of historical flow information from the 609 network (e.g., NetFlow logs), bottleneck structures can 610 efficiently compute the effect of each link upgrade and identify 611 those that yield maximal performance. 613 * *Path shortcuts*. Operators in wide area networks need to decide 614 whether a communication path should be set up as purely optical 615 (bypassing layer 3 routing) or undergo an optical-to-electrical- 616 to-optical (OEO) conversion at certain routers in order to perform 617 layer 3 routing [SH-SIGCOMM]. The trade-off is one of cost- 618 efficiency versus better routing control of the network. 619 Bottleneck structures can be used to search for paths that are 620 optimally suitable for being offloaded to a purely optical path. 621 These are also known in the literature as path shortcuts 622 [SH-SIGCOMM]. 624 4. ALTO Bottleneck Structure Service Use Cases 626 Applications of bottleneck structure analysis expand through a broad 627 class of optimization problems that include traffic engineering, 628 routing, flow scheduling, resiliency analysis, network slicing, 629 service level agreement (SLA) management, network design and capacity 630 planning, to name only a few. In this section, we briefly describe 631 some of the use cases that relate to the objectives of the IETF ALTO 632 Standard. 634 4.1. Application Rate Limiting for CDN and Edge Cloud Applications 636 In applications such as CDN, XR or gaming, it is important to 637 throttle the transmission rate of flows to match the true available 638 capacity along their communication path. Transmitting at a lower 639 rate than the available bandwidth leads to lower quality of 640 experience (QoE). Transmitting at a higher rate increases packet 641 losses, which wastes network resources and also leads to a lower QoE. 643 Estimating the available bandwidth for a flow is complex because it 644 depends on multiple factors including the network topology, the 645 routing configuration and the set of dynamic flows using the network 646 resources. Bottleneck structures capture in a single digraph these 647 three factors, creating a model that allows to estimate the 648 performance of each flow. See for instance Sections 3.1 and 3.2 in 649 [G2-TREP] for examples on how bottleneck structures can be used to 650 estimate the available bandwidth of an application. 652 An ALTO server could help the application service provider obtain the 653 available bandwidth on a given path by exposing the bottleneck 654 structure of the network. With this information alone, the provider 655 could directly obtain the available bandwidth. Alternatively, the 656 application service could query the ALTO server by passing the path 657 for which the available bandwidth needs to be computed, and the ALTO 658 server could return this value without the need to share the complete 659 bottleneck structure. 661 4.2. Time-bound Constrained Flow Acceleration for Large Data Set 662 Transfers 664 Bulk data transfer is an important application to both commercial and 665 government supported networks. For instance, Google's B4 network 666 supports large-scale data push synchronizing state across multiple 667 global data centers [B4-SIGCOMM]. Another common use case is found 668 in science networks, where massive data sets such as those originated 669 from the Large Hadron Collider at CERN, Switzerland, need to be 670 shared with scientific labs around the world. In this section, we 671 show how bottleneck structures can be used to reconfigure a network 672 towards accelerating a given data transfer with the goal to meet a 673 certain time constraint. 675 To illustrate this use case, we will assume the simple bottleneck 676 structure shown in Figure 3. 678 +------+ 679 | | 680 | l1 <-----------+ 681 | | | 682 +--^---+ | 683 | -1 | +1 684 | | 685 +--v---+ | 686 | | | 687 | f1 | | 688 | | | 689 +------+ | 690 | 691 +------+ +--v---+ 692 | | +1 | | 693 | l2 <--------+ f2 | 694 | | | | 695 +--^---+ +--+---+ 696 | -1 | +1 697 | | 698 +--v---+ +--v---+ 699 | | | | 700 | f3 | | l3 | 701 | | | | 702 +--+---+ +--^---+ 703 | -1 | -1 704 | | 705 +--v---+ +--v---+ 706 | | -1 | | 707 | l4 <--------+ f4 | 708 | | | | 709 +--^---+ +------+ 710 | +2 711 | 712 +--v---+ 713 | | 714 | f5 | 715 | | 716 +------+ 718 Figure 3: Reducing the rate of flow f1 maximally accelerates flow f5. 720 Suppose our goal is to accelerate flow f5. To achieve this 721 objective, we will also assume that we are allowed to traffic shape 722 (reduce) the rate of any of the other flows. Effectively, for each 723 flow fi different than f5, we need to compute the following 724 derivative 726 -dr5/d_ri 728 and then pick the maximum value. Note that in the above expression, 729 we take the left-derivative (d_), since a traffic shaper reduces the 730 rate of a flow. We also negate the derivative (-d), since we are 731 interested in a positive impact induced on flow f5 when reducing the 732 rate of another flow fi. 734 Such a calculation can be efficiently performed using the bottleneck 735 structure. As an example, Figure 3 illustrates how the value of (- 736 dr5/d_r1) is computed. First, we reduce the rate of flow f1 by 1 737 unit. This perturbation propagates through the bottleneck structure 738 reaching flow f5 via two paths: 740 l1 -> f2 -> l2 -> f3 -> l4 -> f5 742 l1 -> f2 -> l3 -> f4 -> l4 -> f5 744 Using the link and flow equations (Section 3.6), each path simply 745 flips the sign of the perturbation every time a link vertex is 746 traversed. (The reason why the sign is flipped at each link vertex 747 is explained by the link and flow equations that dictate how 748 perturbations propagate through the bottleneck structure. Further 749 mathematical descriptions to explain this effect are outside the 750 scope of this document. For detailed mathematical derivations and 751 additional examples, please see [G2-TREP]). 753 When reaching vertex f5, we find that each path contributes 1 unit of 754 bandwidth. Thus we have: 756 -dr5/d_r1 = 1 + 1 = 2 758 In fact, it can be seen that this derivative is maximal. That is, 759 traffic shaping any other flow would yield a smaller increase in the 760 rate of f5. Thus, an operator can conclude that traffic shaping flow 761 f1 yields an optimal strategy to maximally accelerate the rate of 762 flow f5. Note also that in this case, there is a positive multiplier 763 effect, since reducing flow f1's rate by 1 unit, leads to an increase 764 on flow f5's rate by more than 1 unit. This is known as a power 765 gradient [G2-SIGCOMM]. 767 While left outside the scope of this document, bottleneck structures 768 can also be used to efficiently compute the value of the optimal 769 traffic shaper (i.e., in our example, to find by how much we should 770 traffic shape flow f1) and to quantify the impact on the flow being 771 accelerated. This information can also be used by the application to 772 estimate the flow's completion time. 774 An ALTO server could help the application service provider identify 775 an optimized traffic shaping strategy by exposing the bottleneck 776 structure of the network. With this information alone, the provider 777 could efficiently compute an optimized set of traffic shapers. 778 Alternatively, the application service could query the ALTO server by 779 passing the set of flows that are allowed to be traffic shaped and 780 the flow that needs to be accelerated, and the ALTO server could 781 return the set of recommended traffic shapers. 783 4.3. Application Performance Optimization Through AI Modeling 785 A relevant and emerging area in the field of application performance 786 is AI-based network modeling. Several global initiatives are been 787 undertaken to apply AI to the field of understanding and predicting 788 network performance. For instance, OpenNetLab [BE-ONL] provides a 789 distributed networking platform with many collaborative nodes 790 (universities and companies) and common benchmarking datasets for 791 researchers to collect real networking data and train their AI models 792 for various networking environments, including the Internet, cloud, 793 and wireless networks. There also exist global benchmarks and 794 challenges to foster innovation in this field, such as the ACM MMSys 795 Challenge [MMSYS], which focuses on novel AI-based bandwidth 796 estimation algorithms to enable superior overall QoE on a global 797 production testbed built for real-time communications (RTC) of video 798 and audio. 800 Modeling communication networks using purely AI frameworks such as 801 deep learning is challenging as it requires very large production 802 data sets that often times are not available. They also require many 803 years of intense, global collaborative R&D. To address these 804 challenges, it has been seen that hybrid models built by combining AI 805 with a "physics" model of the network can outperform purely AI 806 models, as they may require less training data to achieve the same or 807 better performance. For instance, the top two winners of the ACM 808 MMSys 2021 Bandwidth Prediction Challenge [MMSYS] were based on 809 hybrid models. 811 Because bottleneck structures provide a "physics" model of the 812 network that can both qualify and quantify the forces of interactions 813 among flows and links, they can be used in combination with AI to 814 enable better performance than purely AI-based models. For instance, 815 this area is being discussed in the IETF ALTO WG (e.g., [BE-ONL]) as 816 a potential use case in the ALTO Standard to help optimize the 817 performance of RTC applications. In particular, a key building block 818 to optimize the QoE performance of RTC applications is the bandwidth 819 estimation module. This module runs on the endpoint of a real-time 820 video application and aims at dynamically adapting the video bitrate 821 to stay within the available network capacity. A limitation in the 822 current algorithms, however, is their lack of network state 823 visibility. This requires the algorithms to rely entirely on local 824 indicators such as packet loss or latency, which leads to poor 825 training and inference performance. Information provided by the 826 bottleneck structure (which includes topological, routing and flow 827 information of the network in a single digraph) exposed via the ALTO 828 service could help unlock a richer set of machine learning algorithms 829 to optimize application performance. 831 An ALTO server could help the application service provider implement 832 AI-assisted prediction algorithms by exposing the bottleneck 833 structure of the network. Alternatively, ALTO could implement an AI- 834 assisted prediction module with the help of bottleneck structures. 835 The application would then query the ALTO server to obtain the 836 predicted value. 838 4.4. Optimized Joint Routing and Congestion Control 840 In traditional IP networks, the problems of flow routing and 841 congestion control are separately resolved by following a two-step 842 process: first, a routing protocol is used to determine the path 843 between any two nodes in a network; then, flows are routed according 844 to such paths and their transmission rates are regulated using a 845 congestion control algorithm. This layered and disjoint approach is 846 known to be scalable but suboptimal because the routing algorithm 847 identifies paths without taking into account the flow transmission 848 rates assigned by the congestion control algorithm. 850 Suppose that an application is trying to launch a new flow between 851 two endpoints with the goal to maximize the available bandwidth. One 852 can be tempted to think that, to identify the path with maximal 853 available bandwidth, it suffices to look at the current state of the 854 network and find the least congested path offering the highest 855 capacity. This approach, however, is naive since it does not take 856 into account the fact that the placement of the new flow onto the 857 network will itself create a perturbation in the network, potentially 858 making the chosen path suboptimal or, even more troublesome, 859 negatively affecting the performance of other priority flows. 861 The goal of the joint routing and congestion control problem between 862 two given endpoints E1 and E2 consists in finding the path from E1 to 863 E2 that will yield the highest throughput _after_ the flow is placed 864 on the network (i.e., taking into account the effect of placing the 865 flow). 867 The solution to this problem is introduced in [G2-TREP] by employing 868 a strategy that combines the strengths of both the Dijkstra algorithm 869 and the insights revealed by the bottleneck structure. The algorithm 870 can both compute the optimal path and measure the overall network- 871 wide impact of deploying the new flow on the path. It also enables a 872 framework to identify new good-performing paths that have a limited 873 negative impact on the rest of the flows in the network. This allows 874 network and application providers to identify paths that can both 875 provide good performance to the newly added application flow while 876 preserving the performance of the existing high-priority flows. 878 An ALTO server could help the application service provider optimize 879 the path selection decision by exposing the bottleneck structure of 880 the network. With this information alone, the provider could 881 efficiently compute the optimal path (e.g., using the algorithm 882 introduced in [G2-TREP]). Alternatively, the application service 883 could query the ALTO server by passing the information of the two 884 endpoints that need to be connected, and the ALTO server could return 885 a list of the top-N paths with the highest throughput and their 886 expected performance. 888 4.5. Service Placement for Edge Computing 890 Determining the proper location to deploy an application service in 891 the edge cloud is critical to ensure a good quality of experience 892 (QoE) for its users. Yet the service placement problem is known to 893 be NP-Hard [JSP-INFOCOM], requiring heuristics to compute good 894 (albeit suboptimal) solutions. 896 In [G2-SIGCOMM], it is shown that Bottleneck structures can also be 897 used as highly scalable network simulators to evaluate the 898 performance of a network reconfiguration such as the placement of a 899 new service on a edge cloud. In particular, bottleneck structures 900 can very efficiently (1) compute the performance of each flow in the 901 network and (2) quantify the effects of the arrival (departure) of 902 new (existing) flows to (from) the network. This allows to simulate 903 the full transmission of an application traffic pattern very 904 efficiently, three or more orders of magnitude faster than 905 traditional packet simulators. 907 Network and application providers can use this capability in two 908 ways: 910 * Given a set of possible placement strategies, bottleneck 911 structures can be used to simulate them in real time, helping the 912 operator select the one that provides the best performance while 913 guaranteeing the service level agreements (SLAs) of the other 914 existing applications. 916 * Despite the server placement problem being intractable, bottleneck 917 structures provide a framework to identify good candidate 918 solutions. In particular, by capturing the topology, routing, and 919 flow information in a single computational graph, they can be used 920 to efficiently explore directions in the feasible set that yield 921 incrementally better performance. By moving in these incremental 922 directions, the placement algorithm can progress within the 923 enormous feasible set towards the optimal solution. 925 An ALTO server could help the application service provider optimize 926 the placement decision by exposing the bottleneck structure of the 927 network. With this information alone, the provider could compute the 928 effect of placing the service in one location versus another. 929 Alternatively, the application service could query the ALTO server by 930 passing the information of the possible locations where it can be 931 placed, and the ALTO server could return an ordered list of the 932 locations and their expected performance. 934 4.6. Training Neural Networks and AI Inference for Edge Clouds, Data 935 Centers and Planet-Scale Networks 937 Neural network training and inference using distributed computing 938 systems are the subject of intense research and one of the leading 939 target applications in today's communication networks. [TOPOOPT-MIT] 940 [FLEXFLOW-STFORD] [SINGULARITY-MSFT]. To illustrate this use case, 941 we will focus our discussion on three types of networks: edge clouds, 942 data centers and planet-scale networks. 944 5G and Edge Clouds enable for the first time the ability to provide 945 intelligence at the edge of the network. This capability is 946 disruptive in that humans and machines will have access to 947 unprecedented compute power to perform AI inference in real time. 948 For instance, using augmented reality (AR), humans will be able to 949 make better informed decisions as they navigate through an 950 environment by leveraging AI-inference on video and audio signals 951 captured in real time from their user equipments (UEs). Similarly, 952 machines such as vehicles or factory robots will be able to use AI 953 inference to optimize their actions. 955 Two resources are needed to perform inference: (1) Input data from 956 the environment (e.g., image and audio signals captured from a video 957 camera) and (2) compute (typically in the form of GPUs and CPUs). 959 The input data needs to be transmitted from the location where it is 960 captured (e.g., a micro-camera running on a human's glasses) to the 961 location where it is to be processed for inference. The transmission 962 of the input data requires communication resources, whereas the 963 inference process requires computing resources. Since computing 964 resources in the edge cloud (Figure 4) are distributed across the 965 user equipment (UE), the radio unit (RU), the distributed unit (DU) 966 and the central unit (CU) [PETERSON], the problem of efficiently 967 performing AI-inference is one of optimizing the trade-off 968 communication-compute as follows: compute (communication) power is 969 more scarce (abundant) if the inference is performed closer to the 970 UE, and more abundant (scarce) if performed closer to the CU. For 971 instance, if an AR application running on a UE needs to perform an 972 inference task at a time when the communication path from the RU to 973 the DU is highly congested, then it will have an incentive to perform 974 such a task directly in the UE or in the RU. If instead the network 975 offers an uncongested path to the DU and the CU, it will have an 976 incentive to run the inference task on these other nodes since they 977 offer more compute power. 979 +------+ +------+ +------+ +------+ 980 | | | | | | | | 981 | UE +-------+ RU +-------+ DU +-------+ CU + 982 | | | | | | | | 983 +------+ Air +------+ +------+ +------+ 984 Interface Fronthaul Backhaul 986 Figure 4: An AI-inference application in the edge cloud needs to 987 place the inference task on a compute node location (UE, RU, DU 988 or CU) that will perform well from both a compute and a 989 communication standpoint. 991 Using ALTO path vector [I-D.ietf-alto-path-vector] and performance 992 metrics [I-D.ietf-alto-performance-metrics] features, the application 993 could retrieve the amount of compute resources located in the RU, DU 994 and CU. By extending ALTO to support bottleneck structures, the 995 application would also be able to estimate in real-time the available 996 bandwidth for the paths UE-RU, UE-RU-DU, and UE-RU-DU-CU. Further, 997 using bottleneck structure methods described in [G2-SIGCOMM], the 998 application would be able to estimate the time to complete the 999 inference task for each of the four possible scenarios (running in 1000 the UE, the RU, the DU or, or the CU) and choose the configuration 1001 with the fastest execution. 1003 Similar joint compute-communication optimization problems appear when 1004 performing neural network training in large-scale data centers. 1005 Large-scale data centers with millions of compute nodes are used to 1006 train gigantic neural networks (with potentially trillions of 1007 parameters). Such a massive task needs to be broken down into 1008 smaller subtasks that are then executed on the nodes. Once again, 1009 compute and communication need to be jointly optimized (see 1010 [TOPOOPT-MIT] and [FLEXFLOW-STFORD]) in order to ensure regions in 1011 the network don't become bottlenecks. By exposing bottleneck 1012 structure information using ALTO, the AI-training application can 1013 make better subtask placement decisions that avoid potential network 1014 bottlenecks. 1016 Finally, AI-training using planet-scale networks generalizes the same 1017 joint compute and communication problem to an Internet level 1018 [SINGULARITY-MSFT], with the need to implement a global scheduler 1019 that is responsible for placing workloads onto clusters of globally- 1020 distributed compute nodes. Here too enabling better network state 1021 visibility using ALTO and bottleneck structure graphs could help the 1022 scheduler make better task placement decisions. 1024 5. Example: Application Layer Traffic Optimization using Bottleneck 1025 Structures 1027 In this section we provide an example illustrating how bottleneck 1028 structures can be used to optimize application performance. This 1029 example will then be referenced in Section 6 to discuss and introduce 1030 the necessary requirements to integrate the BSG service into the ALTO 1031 standard. It is worth noticing that, as shown in Section 4, 1032 bottleneck structures have numerous applications. This section 1033 provides a complete example for just one of the use cases. In 1034 particular, the focus of the next example is on the joint routing and 1035 congestion control use case Section 4.4. 1037 Figure 5 provides a view of Google's B4 network as presented in 1038 [B4-SIGCOMM], providing connectivity to 12 data centers distributed 1039 across the world (two in Asia, six in America and four in Europe). 1041 +-----+ +-----+ +-----+ +-----+ +------+ +------+ 1042 | | | | | | | | | | | | 1043 | DC1 +---+ DC3 +--+ DC4 +--+ DC7 +---+ DC11 +----+ DC12 | 1044 | | | | | | | | | | | | 1045 +---+-+ +--+--+ +--+-++ +----++ +-+--+-+ +----+-+ 1046 | | | | | | | | | 1047 | +------+ | +----+ | | | +--------+ | 1048 | | | | | | | | | 1049 | | | | | | | | | 1050 | +------|-+ +----|-+ | | +-------|--+ 1051 | | | | | | | | | 1052 +---+-+ +--+--+ ++---++ ++---++ +-+---++ +-+---+ 1053 | | | | | | | | | | | | 1054 | DC2 +---+ DC5 +--+ DC6 +--+ DC8 +---+ DC10 +----+ DC9 | 1055 | | | | | | | | | | | | 1056 +-----+ +-----+ +-----+ +-----+ +------+ +-----+ 1058 |-----| |-----------------------| |-----------------| 1059 Asia America Europe 1061 Figure 5: Google's B4 network introduced in [B4-SIGCOMM]. 1063 The 12 data centeres are connected via a total of 19 links, labeled 1064 l1, l2, ... l19. Table 1 presents the pair of data centers that each 1065 link is connected to. 1067 +======+=======================+======+=======================+ 1068 | Link | Adjacent data centers | Link | Adjacent data centers | 1069 +======+=======================+======+=======================+ 1070 | l1 | DC1, DC2 | l11 | DC10, DC12 | 1071 +------+-----------------------+------+-----------------------+ 1072 | l2 | DC1, DC3 | l12 | DC4, DC5 | 1073 +------+-----------------------+------+-----------------------+ 1074 | l3 | DC3, DC4 | l13 | DC5, DC6 | 1075 +------+-----------------------+------+-----------------------+ 1076 | l4 | DC2, DC5 | l14 | DC11, DC12 | 1077 +------+-----------------------+------+-----------------------+ 1078 | l5 | DC3, DC6 | l15 | DC4, DC7 | 1079 +------+-----------------------+------+-----------------------+ 1080 | l6 | DC6, DC7 | l16 | DC4, DC8 | 1081 +------+-----------------------+------+-----------------------+ 1082 | l7 | DC7, DC8 | l17 | DC7, DC8 | 1083 +------+-----------------------+------+-----------------------+ 1084 | l8 | DC8, DC10 | l18 | DC9, DC11 | 1085 +------+-----------------------+------+-----------------------+ 1086 | l9 | DC9, DC10 | l19 | DC10, DC11 | 1087 +------+-----------------------+------+-----------------------+ 1088 | l10 | DC7, DC11 | | | 1089 +------+-----------------------+------+-----------------------+ 1091 Table 1: Link connectivity (adjacency matrix) in the B4 1092 network. 1094 For the sake of illustration, we will assume a simple configuration 1095 consisting of a pair of flows (one for each direction) connecting 1096 every data center in the US with every data center in Europe, with 1097 all flows routed along a shortest path from source to destination. 1098 Since there are six data centers in the US and four in Europe, this 1099 configuration has a total of 48 flows. All links are assumed to have 1100 a capacity of 10 Gbps except for the transatlantic links (DC7-DC11 1101 and DC8-DC10), which are configured at 25 Gbps. 1103 The next Figure presents the bottleneck structure of Google's B4 1104 network with the assumed flow configuration. Please see Section 3.1 1105 for a description of how to interpret the bottleneck structure. (See 1106 also [G2-SC], [G2-TREP] for details on the algorithm used to compute 1107 the bottleneck structure.) 1108 +--------+ +--------+ 1109 | | | | 1110 | l15 | | l7 | 1111 | | | | 1112 +-^----^-+ +--^--^--+ 1113 | | | | 1114 | | | +--------------------------+ 1115 | | | | 1116 | +---------|------------------+ | 1117 | | | | 1118 +-v------+ +--v----+ +---v---+ +---v---+ 1119 | | | | | | | | 1120 | f1 | | f2 | | f3 | | f10 | (...) 1121 | | | | | | | | 1122 +-+-+-+--+ +-+---+-+ +--+--+-+ +-------+ 1123 | | | | | | | 1124 | | +---------|---|-------------|--|--------------+ 1125 | | | | | | | 1126 | | +---------+ +-----------+ | | | 1127 | | | | | | | 1128 | +-|---------+ +----------|-+ +---------+ | 1129 | | | | | | | 1130 | | | | | | | 1131 +-v---v--+ +-v----v-+ +-v----+ +-v----v-+ 1132 | | | | | | | | 1133 | l8 | | l10 | | l5 | | l3 | 1134 | | | | | | | | 1135 +-^---^--+ +-^---^--+ +------+ +--------+ 1136 | | | | 1137 | | | +-----------------------+ 1138 | | | | 1139 | +---------|---------------+ | 1140 | | | | 1141 +-v----+ +---v---+ +---v---+ +---v---+ 1142 | | | | | | | | 1143 | f6 | | f9 | | f23 | | f18 | (...) 1144 | | | | | | | | 1145 +------+ +-------+ +-------+ +-------+ 1147 Figure 6: Bottleneck structure of Google's B4 network example. 1149 For the sake of compactness, Figure 6 only includes the bottleneck 1150 links and a subset of the flow vertices that are part of the complete 1151 bottleneck structure. In particular, out of the 19 links that are 1152 part of B4, six links (l15, l7, l8, l10, l5, l3) are bottlenecks. 1154 The bottleneck structure graph shows the existence of two bottleneck 1155 levels in our configuration example: 1157 * The first level at the top of the bottleneck structure includes 1158 links l15 and l7. Flows f1, f2, f3, f10, etc. are bottlenecked at 1159 this level. 1161 * The second level of the bottleneck structure includes links l8, 1162 l10, l5 and l3. Flows f6, f9, f23, f18, etc. are bottlenecked at 1163 this level. 1165 From the MBA Property (Property 3), we know that flows bottlenecked 1166 by a link at level 2 will enjoy higher available bandwidth than flows 1167 bottlenecked at level 2. For instance, consider the following 1168 directed path in the bottleneck structure: 1170 l15 -> f1 -> l8 -> f6 1172 Using the MBA property, we have that since l15 precedes l8, it must 1173 be that s15 < s8, where s15 is the rate of flow f1 bottlenecked at 1174 l15 and s8 is the rate of flow f6 bottlenecked at l8. 1176 Suppose now that an application needs to place a new flow on Google's 1177 B4 network to transfer a large data set from data center 11 (DC11) to 1178 data center 4 (DC4). The application needs to select and configure a 1179 path from DC11 to DC4 (for instance, this could be done by using SR 1180 [RFC8402]). The shortest path DC11 -> l10 -> DC7 -> l15 -> DC4 is 1181 often used as the default option. Doing so, however, implies that 1182 the flow will be bottlenecked at link l15 at the upper level of the 1183 bottleneck structure, leading to a lower transmission rate. If 1184 instead we choose the non-shortest path DC11 -> l19 -> DC10 -> l8 -> 1185 DC8 -> l16 -> DC4, now the flow will be bottlenecked at link l8 (at 1186 the lower level of the bottleneck structure), leading to a higher 1187 transmission rate. 1189 Using QTBS, we can also numerically compute the transmission rate of 1190 the flow on each of the two path options. (See Section 3.1 in 1191 [G2-TREP] for a detailed description of how to compute the 1192 transmission rate assigned to the flow on each of these paths.) In 1193 particular, we obtain that when the application chooses the shortest 1194 path (bottlenecked at level 1 of the bottleneck structure), it gets a 1195 transmission rate of 1.429 Gbps. If instead the application chooses 1196 the slightly longer path (bottlenecked at level 2 of the bottleneck 1197 structure), then it gets a transmission rate of 2.5 Gbps, an increase 1198 of 74.95% with respect to the shortest path solution. 1200 [G2-TREP] introduces also a very efficient routing algorithm that 1201 uses the bottleneck structure to find the maximal throughput path for 1202 a flow in O(V+E*log(V)) steps, where V is the number of routers and E 1203 is the number of links in the network. 1205 Overall, this example illustrates that, equipped with knowledge about 1206 the bottleneck structure of a network, an application can make better 1207 informed decisions on how to route a flow. In the next sections, we 1208 will use this example to support a discussion on the requirements for 1209 integrating the Bottleneck Structure Graph (BSG) service into the 1210 ALTO standard. 1212 6. Requirements 1214 This section provides a discussion on the necessary requirements to 1215 integrate the BSG service into the ALTO standard. 1217 6.1. Requirement 1: Bottleneck Structure Graph (BSG) Abstraction 1219 The first base requirement consists in extending the ALTO server with 1220 the capability to compute bottleneck structures. For instance, with 1221 this capability, given the network configuration in Figure 5, the 1222 ALTO server would be able to compute the bottleneck structure shown 1223 in Figure 6: 1225 * Requirement 1A (R1A). The ALTO server MUST compute the bottleneck 1226 structure graph to allow applications optimize their performance 1227 using the BSG service. 1229 We note that the alternative, which would consists in ALTO simply 1230 providing the necessary information for applications to compute their 1231 own bottleneck structures, would not scale due to the following 1232 issues: 1234 * Suppose that 1 ALTO server is providing support to N ALTO clients. 1235 Then, requiring each application to compute the bottleneck 1236 structure would imply performing N identical and redundant 1237 computations. By computing the bottleneck structure in the ALTO 1238 server, a one-time computation can be leveraged by all N clients. 1239 We also note that [G2-SC] demonstrates that bottleneck structures 1240 can be efficiently computed in real time by the server even for 1241 large scale networks. 1243 * A production-ready high-speed implementation of QTBS is relatively 1244 sophisticated. Requiring the applications to implement their own 1245 QTBS optimization library would impose unnecessary and (perhaps 1246 more importantly) out-of-scope requirements to the application, 1247 which should focus on providing its service rather than 1248 implementing a network optimization math library. 1250 The next requirement focuses on the type of bottleneck structure an 1251 ALTO server must compute: 1253 * Requirement 1B (R1B). The ALTO server MUST at least support the 1254 computation of one bottleneck structure type from Section 3.7. 1255 Depending on the network information available (e.g., presence of 1256 QoS class information), the ALTO server MAY support all the three 1257 bottleneck structure types, in which case the ALTO client MAY be 1258 able to choose the bottleneck structure type for retrieval. Also, 1259 it is RECOMMENDED that the ALTO server supports the computation of 1260 the path gradient graph (PGG) as the default bottleneck structure 1261 implementation for retrieval by the ALTO clients. 1263 6.2. Requirement 2: Information Received from the Network 1265 To compute a bottleneck structure, two pieces of information are 1266 required: 1268 * Topology Object (T). The T Object is a data structure that 1269 includes: 1271 (1) A Topology Graph (V, E), where V is the set of routers and E 1272 is the set of links connecting the routers in the network. 1274 (2) A Capacity Dictionary (C), a key-value table mapping each link 1275 with its capacity (in bps). 1277 * Flow Dictionary (F). The F Dictionary is a key-value table 1278 mapping every flow with the set of links it traverses. 1280 As shown in [G2-TREP], the above information is enough to compute the 1281 bottleneck structure. In fact, with only the F and C dictionaries, 1282 one can compute the bottleneck structure. The topology graph (V, E) 1283 is needed to perform optimal routing computations (for instance, to 1284 find new paths in the network that yield higher throughput, as 1285 illustrated in Section 5). 1287 The above discussion leads to the following requirement: 1289 * Requirement 2A (R2A). The ALTO server MUST collect information 1290 about (1) the set of routers and links in a network, (2) the 1291 capacity of each link and (3) the set of links traversed by each 1292 flow. 1294 Information about the set of routers, links and link capacity is 1295 typically available from protocols and technologies such as SNMP, 1296 BGP-LS, SDN, or domain specific topology logs. This information is 1297 enough to construct the T Object. Information about the set of links 1298 traversed by each flow can be obtained from protocols such as 1299 NetFlow, sFlow, IPFIX, etc. See [FLOWDIR] and [G2-SC] for examples 1300 of how requirement R2A is implemented in real-world production 1301 networks. 1303 6.3. Requirement 3: Information Passed to the Application 1305 The following requirement is necessary so that applications can 1306 optimize their performance using bottleneck structure information: 1308 * Requirement 3A (R3A). The ALTO client MUST be able to query the 1309 ALTO server to obtain the current bottleneck structure of the 1310 network, represented as a digraph. 1312 In addition, the current ALTO services can be extended with 1313 additional information obtained from the bottleneck structure: 1315 * Requirement 3B (R3C). One or more ALTO services (the Network Map, 1316 the Cost Map, the Entity Property Map or the Endpoint Cost Map) 1317 MUST support reporting to ALTO clients additional network state 1318 information derived from the bottleneck structure to the ALTO 1319 client. 1321 For example, the ALTO Cost Map Service can be extended with a new 1322 cost metric that corresponds to the estimated available bandwidth 1323 between two endpoints according to the bottleneck structure model. 1325 6.4. Requirement 4: Features Needed to Support the Use Cases 1327 Bottleneck structures offer a rich framework to optimize application 1328 performance for a variety of use cases. In addition to the base 1329 requirement to construct the bottleneck structure graph (see R1A), in 1330 this section we enumerate additional capabilities that must be 1331 supported by the ALTO BSG service to ensure it is effective in 1332 helping applications optimize their performance for each of the 1333 supported use cases. 1335 * Requirement 4A (R4A). The ALTO BSG service MUST be able to 1336 compute the effect of network reconfigurations using bottleneck 1337 structure analysis and according to the types described in 1338 Section 3.9. 1340 For example, an extended reality (XR) application might need to 1341 choose where to place a containerized instance of the XR service 1342 among a set of possible server racks located in various edge cloud 1343 locations. The application would query the ALTO BSG service to 1344 obtain the projected performance results of placing the new service 1345 instance on each possible location, allowing it to select the one 1346 that would yield the highest performance. 1348 The following requirement is necessary to ensure that the information 1349 provided by the BSG service is not stale: 1351 Requirement 4B (R4B). The BSG service MUST be able to update the 1352 bottleneck structure graph in near-real time, at least once a minute 1353 or less. 1355 In [G2-SC] it is shown that bottleneck structures can be computed in 1356 a fraction of a session for a production US wide network with about 1357 100 routers and 500 links. Thus, the above requirement should be 1358 achievable with a good implementation of the bottleneck structure 1359 algorithm [G2-TREP]. 1361 7. Security Considerations 1363 Future versions of this document may extend the base ALTO protocol, 1364 so the Security Considerations [RFC7285] of the base ALTO protocol 1365 fully apply when this proposed extension is provided by an ALTO 1366 server. 1368 The Bottleneck Structure Graph extension requires additional scrutiny 1369 on three security considerations discussed in the base protocol: 1370 Confidentiality of ALTO information (Section 15.3 of [RFC7285]), 1371 potential undesirable guidance from authenticated ALTO information 1372 (Section 15.2 of [RFC7285]), and availability of ALTO service 1373 (Section 15.5 of [RFC7285]). 1375 For confidentiality of ALTO information, a network operator should be 1376 aware that this extension may introduce a new risk: As the Bottleneck 1377 Structure information may reveal more fine-grained internal network 1378 structures than the base protocol, an attacker may identify the 1379 bottleneck link and start a distributed denial-of-service (DDoS) 1380 attack involving minimal flows to conduct in-network congestion. 1381 Given the potential risk of leaking sensitive information, the BSG 1382 extension is mainly applicable in scenarios where: 1384 * The properties of the Bottleneck Structure Graph do not impose 1385 security risks to the ALTO service provider, e.g., by not carrying 1386 sensitive information. 1388 * The ALTO server and client have established a reliable trust 1389 relationship, for example, operated in the same administrative 1390 domain, or managed by business partners with legal contracts and 1391 proper authentication and privacy protocols. 1393 * The ALTO server implements protection mechanisms to reduce 1394 information exposure or obfuscate the real information. 1395 Implementations involving reduction or obfuscation of the 1396 Bottleneck Structure information SHOULD consider reduction/ 1397 obfuscation mechanisms that can preserve the integrity of ALTO 1398 information, for example, by using minimal feasible region 1399 compression algorithms [NOVA] or obfuscation protocols RESA 1400 [MERCATOR]. We note that these obfuscation methods are 1401 experimental and their practical applicability to the generic 1402 capability provided by this extension is not fully assessed. 1404 We note that for operators that are sensitive about disclosing flow- 1405 level information (even if it is anonymized), then they SHOULD 1406 consider using the Path Gradient Graph (PGG) or the QoS-Path Gradient 1407 Graph (Q-PGG) since these objects provide the additional security 1408 advantage of hiding flow-level information from the graph. 1410 For undesirable guidance, the ALTO server must be aware that, if 1411 information reduction/obfuscation methods are implemented, they may 1412 lead to potential misleading information from Authenticated ALTO 1413 Information. In such cases, the Protection Strategies described in 1414 Section 15.2.2 of [RFC7285] MUST be considered. 1416 For availability of ALTO service, an ALTO server should be cognizant 1417 that using Bottleneck Structures might have a new risk: frequently 1418 querying the BSG service might consume intolerable amounts of 1419 computation and storage on the server side. For example, if an ALTO 1420 server implementation dynamically computes the Bottleneck Structure 1421 for each request, the BSG service may become an entry point for 1422 denial-of-service attacks on the availability of an ALTO server. 1424 To mitigate this risk, an ALTO server may consider using 1425 optimizations such as precomputation-and-projection mechanisms 1426 [MERCATOR] to reduce the overhead for processing each query. An ALTO 1427 server may also protect itself from malicious clients by monitoring 1428 the behaviors of clients and stopping serving clients with suspicious 1429 behaviors (e.g., sending requests at a high frequency). 1431 8. IANA Considerations 1433 Future versions of this document may register new entries to the ALTO 1434 Cost Metric Registry, the ALTO Cost Mode Registry, the ALTO Domain 1435 Entity Type Registry and the ALTO Entity Property Type Registry. 1437 9. References 1439 9.1. Normative References 1441 [I-D.ietf-alto-path-vector] 1442 Gao, K., Lee, Y., Randriamasy, S., Yang, Y. R., and J. J. 1443 Zhang, "An ALTO Extension: Path Vector", Work in Progress, 1444 Internet-Draft, draft-ietf-alto-path-vector-25, 20 March 1445 2022, . 1448 [I-D.ietf-alto-performance-metrics] 1449 Wu, Q., Yang, Y. R., Lee, Y., Dhody, D., Randriamasy, S., 1450 and L. M. C. Murillo, "ALTO Performance Cost Metrics", 1451 Work in Progress, Internet-Draft, draft-ietf-alto- 1452 performance-metrics-28, 21 March 2022, 1453 . 1456 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1457 Requirement Levels", BCP 14, RFC 2119, 1458 DOI 10.17487/RFC2119, March 1997, 1459 . 1461 [RFC7285] Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S., 1462 Previdi, S., Roome, W., Shalunov, S., and R. Woundy, 1463 "Application-Layer Traffic Optimization (ALTO) Protocol", 1464 RFC 7285, DOI 10.17487/RFC7285, September 2014, 1465 . 1467 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1468 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1469 May 2017, . 1471 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 1472 Decraene, B., Litkowski, S., and R. Shakir, "Segment 1473 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 1474 July 2018, . 1476 [RFC8896] Randriamasy, S., Yang, R., Wu, Q., Deng, L., and N. 1477 Schwan, "Application-Layer Traffic Optimization (ALTO) 1478 Cost Calendar", RFC 8896, DOI 10.17487/RFC8896, November 1479 2020, . 1481 9.2. Informative References 1483 [B4-SIGCOMM] 1484 Jain et al, S., "B4: Experience with a Globally-Deployed 1485 Software Defined WAN", ACM SIGCOMM , 2013. 1487 [BE-ONL] "Bandwidth Estimation on OpenNetLab", IETF Plenary 112, 1488 IETF ALTO WG , 2021, 1489 . 1492 [BE-SIGCOMM] 1493 Kumar et al, A., "BwE: Flexible, Hierarchical Bandwidth 1494 Allocation for WAN Distributed Computing", ACM SIGCOMM , 1495 2015. 1497 [FLEXFLOW-STFORD] 1498 Jia et al, Z., "Beyond Data And Model Parallelism For Deep 1499 Neural Networks", n.d., 1500 . 1502 [FLOWDIR] Pujol, E., Poese, I., Zerwas, J., Smaragdakis, G., and A. 1503 Feldmann, "Steering Hyper-Giants' Traffic at Scale", ACM 1504 CoNEXT , 2019. 1506 [G2-SC] Amsel, N., Ros-Giralt, J., Yellamraju, S., von Hoffe, B., 1507 and R. Lethin, "Computing Bottleneck Structures at Scale 1508 for High-Precision Network Performance Analysis", IEEE 1509 International Workshop on Innovating the Network for Data 1510 Intensive Science (INDIS), Supercomputing , 2020. 1512 [G2-SIGCOMM] 1513 Ros-Giralt, J., Amsel, N., Yellamraju, S., Ezick, J., 1514 Lethin, R., Jiang, Y., Feng, A., Tassiulas, L., Wu, Z., 1515 and K. Bergman, "Designing data center networks using 1516 bottleneck structures", ACM SIGCOMM , 2021. 1518 [G2-SIGMETRICS] 1519 Ros-Giralt, J., Bohara, A., Yellamraju, S., Langston, H., 1520 Lethin, R., Jiang, Y., Tassiulas, L., Li, J., Tan, Y., and 1521 M. Veeraraghavan, "On the Bottleneck Structure of 1522 Congestion-Controlled Networks", ACM SIGMETRICS , 2020. 1524 [G2-TREP] Ros-Giralt, J., Amsel, N., Yellamraju, S., Ezick, J., 1525 Lethin, R., Jiang, Y., Feng, A., Tassiulas, L., Wu, Z., 1526 and K. Bergman, "A Quantitative Theory of Bottleneck 1527 Structures for Data Networks", Reservoir Labs (Qualcomm) 1528 Technical Report , 2021. 1530 [GALLAGER] Gallager, R. and D. Bertsekas, "Data Networks", 1992. 1532 [JSP-INFOCOM] 1533 Poularakis et al, D., "Joint Service Placement and Request 1534 Routing in Multi-cell Mobile Edge Computing Networks", 1535 n.d.. 1537 [LORENZ] Lorenz, E., "Does the flap of a butterfly's wings in 1538 Brazil set off a tornado in Texas?", American Association 1539 for the Advancement of Science, 139th Meeting , 1972. 1541 [MERCATOR] Xiang, Q., Zhang, J., Wang, X., Guok, C., Le, F., 1542 MacAuley, J., Newman, H., and Y. Yang, "Toward Fine- 1543 Grained, Privacy-Preserving, Efficient Multi-Domain 1544 Network Resource Discovery", IEEE/ACM IEEE/ACM IEEE 1545 Journal on Selected Areas of Communication 37(8): 1546 1924-1940, 2019, 1547 . 1549 [MMSYS] "Bandwidth Estimation for Real-Time Communications", 2021, 1550 . 1552 [NOVA] Gao, K., Xiang, Q., Wang, X., Yang, Y., and J. Bi, "An 1553 objective-driven on-demand network abstraction for 1554 adaptive applications", IEEE/ACM IEEE/ACM Transactions on 1555 Networking (TON) Vol 27, no. 2 (2019): 805-818., 2019, 1556 . 1558 [PETERSON] Peterson, L. and O. Sunay, "5G Mobile Networks: A Systems 1559 Approach", Open Networking Foundation , 2020. 1561 [SH-SIGCOMM] 1562 Singh et al, R., "Cost-effective capacity provisioning in 1563 wide area networks with Shoofly", ACM SIGCOMM , 2021. 1565 [SINGULARITY-MSFT] 1566 Shukla et al, D., "Singularity: Planet-Scale, Preemptive 1567 and Elastic Scheduling of AI Workloads", n.d., 1568 . 1570 [TOPOOPT-MIT] 1571 Wang et al, W., "TOPOOPT: Optimizing the Network Topology 1572 for Distributed DNN Training", n.d., 1573 . 1575 Authors' Addresses 1577 Jordi Ros-Giralt 1578 Qualcomm 1579 Email: jros@qti.qualcomm.com 1580 Sruthi Yellamraju 1581 Qualcomm 1582 Email: yellamra@qti.qualcomm.com 1584 Qin Wu 1585 Huawei 1586 Email: bill.wu@huawei.com 1588 Luis Miguel Contreras 1589 Telefonica 1590 Email: luismiguel.contrerasmurillo@telefonica.com 1592 Richard Yang 1593 Yale University 1594 Email: yry@cs.yale.edu 1596 Kai Gao 1597 Sichuan University 1598 Email: kaigao@scu.edu.cn