idnits 2.17.00 (12 Aug 2021) /tmp/idnits39555/draft-eckert-detnet-bounded-latency-problems-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 12, 2021) is 306 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-10) exists of draft-ietf-detnet-bounded-latency-06 -- No information found for draft-chen-DetNet-sr-based-bounded-latency - is the name correct? == Outdated reference: A later version (-13) exists of draft-ietf-bier-te-arch-10 -- No information found for draft-qiang-DetNet-large-scale-DetNet - is the name correct? == Outdated reference: A later version (-01) exists of draft-stein-srtsn-00 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 DETNET T. Eckert 3 Internet-Draft Futurewei Technologies USA 4 Intended status: Informational S. Bryant 5 Expires: January 13, 2022 Stewart Bryant Ltd 6 July 12, 2021 8 Problems with existing DetNet bounded latency queuing mechanisms 9 draft-eckert-detnet-bounded-latency-problems-00 11 Abstract 13 The purpose of this memo is to explain the challenges and limitations 14 of existing (standardized) bounded latency queuing mechanisms for 15 desirable (large scale) MPLS and/or IP based networks to allow them 16 to support DetNet services. These challenges relate to low-cost, 17 high-speed hardware implementations, desirable network design 18 approaches, system complexity, reliability, scalability, cost of 19 signaling, performance and jitter experience for the DetNet 20 applications. Many of these problems are rooted in the use of per- 21 hop, per-flow (DetNet) forwarding and queuing state, but highly 22 accurate network wide time synchronization can be another challenge 23 for some networks. 25 This memo does not intend to propose a specific queuing solution, but 26 in the same way in which it describes the challenges of mechanisms, 27 it reviews how those problem are addressed by currently proposed new 28 queuing mechanisms. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on January 13, 2022. 47 Copyright Notice 49 Copyright (c) 2021 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1. Problem: High speed forwarding, high scale fan-in/fan-out 3 66 1.2. Solution goal: Lightweight, per-hop, per-flow stateless 67 transit hop forwarding . . . . . . . . . . . . . . . . . 4 68 1.3. Requirement: Support for existing stateless / steering 69 solutions . . . . . . . . . . . . . . . . . . . . . . . . 4 70 1.4. Requirement: PCE to ingress/egress LSR only flow 71 signaling . . . . . . . . . . . . . . . . . . . . . . . . 4 72 1.5. Requirement: Support for DiffServ QoS model on transit 73 hops. . . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 1.6. Requirement: Low jitter bounded latency solutions. . . . 4 75 1.7. Requirement: Dynamic, application signalled DetNet flows 5 76 2. Evolution of IP/MPLS network technologies and designs . . . . 5 77 2.1. Guaranteed Service with RSVP . . . . . . . . . . . . . . 5 78 2.2. Hardware forwarding and DiffServ . . . . . . . . . . . . 6 79 2.3. MPLS and RSVP-TE . . . . . . . . . . . . . . . . . . . . 6 80 2.4. Path Computation Engines (PCE) . . . . . . . . . . . . . 7 81 2.5. Segment Routing (SR) . . . . . . . . . . . . . . . . . . 8 82 2.6. BIER . . . . . . . . . . . . . . . . . . . . . . . . . . 8 83 2.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 8 84 3. Additional current considerations . . . . . . . . . . . . . . 9 85 3.1. Impact of application based state in networks . . . . . . 9 86 3.2. Experience from IP multicast . . . . . . . . . . . . . . 9 87 3.3. Service Provider and Private MPLS Networks . . . . . . . 10 88 3.4. Mission-specific vs. shared infrastructures . . . . . . . 11 89 3.5. PTP and challenges with clock synchronization . . . . . . 12 90 3.6. Jitter - in-time versus on-time . . . . . . . . . . . . . 13 91 4. Challenges for high-speed packet forwarding hardware . . . . 15 92 5. A reference network design . . . . . . . . . . . . . . . . . 16 93 6. Standardized Bounded Latency algorithms . . . . . . . . . . . 19 94 6.1. Guaranteed Service (GS) . . . . . . . . . . . . . . . . . 19 95 6.2. TSN Asynchronous Traffic Shaping (TSN-ATS) . . . . . . . 19 96 6.3. Cyclic Queuing and Forwarding (CQF) . . . . . . . . . . . 20 97 7. Candidate solution directions . . . . . . . . . . . . . . . . 22 98 7.1. Packet tagging based CQF . . . . . . . . . . . . . . . . 22 99 7.2. Packet tagging based CQF with SR . . . . . . . . . . . . 23 100 7.3. Per-hop latency indications for Segment Routing . . . . . 23 101 7.4. Latency Based Forwarding . . . . . . . . . . . . . . . . 24 102 8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 25 103 9. Security Considerations . . . . . . . . . . . . . . . . . . . 25 104 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 105 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 25 106 12. Informative References . . . . . . . . . . . . . . . . . . . 26 107 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 28 109 1. Summary 111 The architectural evolution of IP/MPLS networks (Section 2) in 112 service provider and other "larger-than-building" (Section 3.3), 113 shared-infrastructure service networks (Section 3.4) has led to a 114 range of requirements against per-hop forwarding mechanisms which are 115 currently not supported by the current DetNet MPLS forwarding plane 116 [RFC8964] and per-hop, per-flow queueing model[RFC8655], Section 3.2, 117 especially with respect to the QoS support of per-hop bounded 118 latency. The authors of this memo think that solutions for these 119 requirements are relatively easily added to the existing DetNet 120 architecture by adding support for already existing and/or proposed, 121 but not standardized per-hop forwarding and queuing options. 123 The following sub-sections summarize the problem, solution goals and 124 requirements as perceived by the authors. The reasoning for these is 125 explained in the following sections. 127 Note that requirements are somewhat overlapping in so far as solving 128 one of them also solves others, but each addresses the problems from 129 a different perspective, and are therefore easier understood for 130 different stakeholders. For example: Operators that do want to see 131 support of DetNet for example for Segment Routing (SR) would not 132 think that this is "naturally" the same as DetNet supporting the 133 DiffServ architecture, even though solutions would have a hard time 134 to support only one of the two. 136 1.1. Problem: High speed forwarding, high scale fan-in/fan-out 138 Forwarders with bounded latency need to support interface speeds of 139 100 Gbps up to Tbps, likely over a period of 10 years from initial 140 deployment of possible DetNet solutions. Hundreds of interfaces may 141 need to be supported in a single forwarder (fan-in/fan-out). 143 Supporting bounded latency at these speeds and fan-in/fan-out raises 144 cost and feasibility challenges beyond those that had led to past 145 IETF IntServ (GS) standards ([RFC2210], [RFC2212]) or more recent TSN 146 bounded latency solutions. 148 Note that these high speed and scale requirements even cause 149 challenges when DetNet bounded latency traffic is intended to be used 150 for only a small percentage of the interfaces traffic. 152 1.2. Solution goal: Lightweight, per-hop, per-flow stateless transit 153 hop forwarding 155 Both high-speed hardware and network architecture design (for reasons 156 of simplicity and minimization of shared risk functions) do favor 157 architectures that support a lightweight transit hop forwarding plane 158 design that requires no forwarding plane or control plane operations 159 whose scale support depends on the number of services/service- 160 instances (e.g.: DetNet flows) offered, but at best only on the size 161 of the network (e.g.: no per-flow, per-hop state). 163 1.3. Requirement: Support for existing stateless / steering solutions 165 There should be DetNet bounded latency options that work in 166 conjunction with per-transit-hop stateless traffic forwarding such as 167 through Shortest Path First (SPF) routing with IP/MPLS), engineered 168 steering (e.g.: SR) and stateless replication, such as Bit Indexed 169 Explicit Replication with/without Tree Engineering (BIER, BIER-TE). 171 1.4. Requirement: PCE to ingress/egress LSR only flow signaling 173 There should be DetNet bounded latency options that for the purpose 174 of traffic engineering (including assurance of bounded latency across 175 the network) only require per-flow Path Computation Engine (PCE) 176 signaling to network ingress/egress router, but not to transit hop 177 routers. 179 1.5. Requirement: Support for DiffServ QoS model on transit hops. 181 There should be DetNet bounded latency options that support the 182 DiffServ QoS model instead of only the IntServ model. 184 1.6. Requirement: Low jitter bounded latency solutions. 186 There should be DetNet bounded latency options that together with the 187 other requirements also provide a better than worst-case jitter for 188 DetNet traffic. 190 1.7. Requirement: Dynamic, application signalled DetNet flows 192 The DetNet architecture should support signaling and forwarding that 193 would make support for automatically application instantiated DetNet 194 flows scalable and lightweight to operate. 196 2. Evolution of IP/MPLS network technologies and designs 198 To help readers understand especially the per-hop stateless 199 requirement from above, the following sections summarizes the 200 historical evolution of technologies and operational principles that 201 the authors think are relevant to understand the requirements 202 outlined above and asks to see supported in DetNet. 204 2.1. Guaranteed Service with RSVP 206 The original (first and only) IETF standardized packet forwarding 207 layer standardized queuing option for bounded latency in the IETF is 208 "Guaranteed Service", [RFC2212] (GS), see the DetNet bounded latency 209 document, [DNBL] section 6.5. At the time the RFC was published 210 (1997), the standardized signaling was proposed to be RSVP [RFC2205], 211 and the use of RSVP with GS was standardized in [RFC2210]. 213 The function to support GS bounded latency in the forwarding plane is 214 the per-flow reshaping on every forwarder hop along the path where GS 215 packets of one flow may get delayed in the egress interface queue due 216 to packets from other GS flows. In typical networks, this is every 217 hop along the path. 219 Early (1990/2000) forwarders for which RSVP was implemented where 220 using so-called "software" forwarding. This meant that the 221 forwarding plane was implemented through a general purpose CPU 222 without additional hardware support for QoS functions such as shaping 223 or queuing. While these forwarders did support traffic flow shaping, 224 GS was never implemented on them and their RSVP implementations did 225 also not support (but ignored) the RSVP TSPEC/RSPEC signaling 226 parameters used for bounded latency. Instead, RSVP implementations 227 only supported the parameters for bandwidth reservation, which was 228 henceforth called Call Admission Control (CAC). 230 In one instance, a software forwarder implementation with RSVP 231 supported the Controlled Load (CL) service [RFC2211], which does not 232 provide for bounded but instead for controlled latency. This service 233 is achieved by creating a per-flow queue and applying weighted fair 234 queuing (WFQ) with weights according to the reserved bandwidth of the 235 flows (see [RFC2211], section 11). This functionality did not 236 proliferate into later generations of routers because the execution 237 cost of WFQ was too high for a multitude of flows and the scheduling 238 accuracy was too inaccurate in interrupt driven CPU software 239 forwarding with higher speed interfaces (100Mbps...1Gbps). 241 2.2. Hardware forwarding and DiffServ 243 With the rise of forwarding planes with "acceleration" through ASIC 244 based Forwarding Plane Elements (FPE) instead of general purpose CPUs 245 and/or dedicated QoS hardware, the ability of forwarders to support 246 shaping evolved to only be supported, if at all, on DiffServ (DS) 247 boundary nodes, but not on DS interior nodes. This included both 248 shaping as well as complex queuing such as WFQ. 250 The DS architecture, [RFC2475], was specifically targeted to enable 251 the evolving, now common Service Provider network services 252 architecture, in which "high-touch" service functions are only 253 performed on so-called Provider Edge (PE) routers, which as required 254 are DS boundary nodes, whereas the hop-by-hop forwarding through so- 255 called Provider (P) (core) routers is meant to utilize only a reduced 256 set of forwarding functions, specifically excluding per-hop, per-flow 257 QoS forwarding plane functions such as shaping or policing. DiffServ 258 therefore allowed to build higher speed, lower cost forwarding plane 259 P routers. It also enabled to build equally higher speed, lower 260 costs PE routers by supporting boundary node functions only on (lower 261 speed) customer facing interfaces/line cards, but not on core facing 262 interfaces. 264 2.3. MPLS and RSVP-TE 266 With the advent of MPLS [RFC3031], RSVP was extended to support MPLS 267 through the RSVP-TE [RFC3209] extensions. RSVP-TE manages p2p (later 268 on also p2mp) MPLS Label Switched Paths (LSP), which when signaled 269 through RSVP-TE are also called RSVP-TE tunnels. These can be seen 270 as the equivalent of IP flows that RSVP manages for IP. RSVP-TE 271 tunnels can support a variety of traffic engineering functions, but 272 none of the implementations known to the authors ever implemented GS 273 or CL services, specifically because hardware forwarding for service 274 provider networks was not designed to support these QoS functions for 275 P Label Switched Routers (LSR). 277 Because CL/GS where not targeted with RSVP-TE, the signaling 278 extensions for Interior Gateway Protocols (IGP) required in the 279 classical RSVP-TE reservation model (such as [RFC8570] for IS-IS) 280 have no parameters to signal per-hop GS queuing latency or buffer 281 capacity utilization. In result, the existing IGP signaling for 282 RSVP-TE only supports RSVP-TE to perform bandwidth but not non- 283 queuing path latency resource calculations and therefore no latency 284 based traffic engineering. 286 2.4. Path Computation Engines (PCE) 288 Even though RSVP-TE implementations support only DiffServ (but not 289 GS/CL) with respect to per-hop QoS functions, its traffic-steering 290 (path selection) and signaling model introduced per-flow (per-tunnel) 291 control plane and forwarding plane overhead onto every P-hop. 292 Through the 200x's, this RSVP-TE overhead was seen as undesirable 293 complexity and overhead by many service providers using it. There 294 was also a much larger number of service providers that desired some 295 of the benefits provided by RSVP-TE, but who were not willing to 296 commit to the complexity, costs and operational risk introduced into 297 the network by complex per-flow signaling of RSVP-TE. The on-path, 298 per-hop signaling of RSVP-TE for example introduced so much overhead, 299 that reconvergence of RSVP-TE paths after a failure or recovery took 300 as much as 20 minutes in networks with 10,000 or more RSVP-TE 301 tunnels. 303 The design of RSVP-TE's (decentralized) on path signaling model 304 specifically showed problematic under high resource utilization. In 305 the original, decentralized RSVP-TE deployment model, ingress PE LSR 306 would perform so-called Constrained Shored Path Forwarding (CSPF) 307 calculations to determine the shortest path with enough free 308 resources for a new flow. Afterwards the ingress PE would signal the 309 path via RSVP-TE. The IGP would signal to all ingress PE how many 310 (bandwidth) resources where left on every link. Under high load, 311 when multiple ingress PE where performing this process in parallel 312 this would cause high load, churn and reservation collisions. 314 These problems of de-centralized RSVP-TE plus IGP signaling lead to 315 the introduction of a so-called Path Computation Element (PCE) based 316 architecture, in which the (competing and uncoordinated) traffic 317 engineering computations on every de-centralized RSVP-TE ingress LSR 318 where replaced by a centralized PCE function (or at least a 319 coordinated PE function), which would send the calculated results 320 back as a path object to the headend LSR, in result limiting the 321 functions of RSVP-TE to the signaling of a steered traffic path 322 through the network to establish the hop-by-hop LSP. The use of a 323 PCE can likewise eliminate all the reservation state dependent 324 signaling from the RSVP-TE IGP extensions, because all the 325 reservation calculations solely need to happen only on the PCE. 326 Nevertheless, the PCE does not eliminate the per-hop signaling 327 overhead of RSVP-TE to establish LSPs and hence it did not eliminate 328 for example the majority of the platform and convergence cost of 329 RSVP-TE in the network, especially for the control plane of P nodes 330 adn could hence not resolve the concerns of service providers who had 331 chosen not to adopt RSVP-TE. 333 2.5. Segment Routing (SR) 335 The introduction of centralized PCE had obsoleted most of the reasons 336 for RSVP: headends did not need to do path calculation, and P router 337 did not need to manage the available and allocated bandwdth for TE 338 tunnels. In most service-provider use-cases this left RSVP-TE only 339 serving as a very complex solution to do traffic steering, and the 340 PCE was doing the rest. This ultimately lead to the design of the 341 Segment Routing [RFC8402] architecture, and its mapping to the MPLS 342 forwarding plane, SR-MPLS [RFC8660]. Later, a mapping to IPv6 was 343 defined with SRv6 [RFC8986]. SR relies on strict or loose hop-by-hop 344 hop source routing information, contained in each packet header, 345 therefore eliminating the need to set up per-path flow state via 346 RSVP-TE, and allowed in conjunction with DiffServ for hop-by-hop QoS 347 a complete per-hop, per-flow stateless forwarding solution, arguably 348 therefore lightweight, easy to implement at high performance and 349 scalable to large number of flows. 351 2.6. BIER 353 In the same way as SR eliminated the need for hop-by-hop traffic 354 steering forwarding state from RSVP-TE in P-routers for unicast 355 traffic, Bit Indexed Explicit Replication [RFC8279] (BIER) solves 356 this problem for shortest path multicast replication state across 357 P-routers, by replacing it with a BIER packet header [RFC8296] and 358 therefore eliminating any per-application/flow, per-hop forwarding 359 state for multicast in P-routers. BIER also removed the associated 360 overhead of prior ingress replication solutions Service Providers 361 where looking into to avoid the per-hop state. 363 Finally, BIER-TE [I-D.ietf-bier-te-arch] adds traffic steering with 364 replication to the BIER architecture and calls this Tree Engineering. 365 Likewise, this is without the need for per-hop/per-flow steering or 366 replication state. 368 2.7. Summary 370 Service Provider networks have evolved especially in the past 25 371 years into an architecture, where high-speed, low-cost and high- 372 reliability are based on designs that eliminate or reduce as much as 373 possible any form of unnecessary control-plane and even more so per- 374 flow, per-application plane complexity from P-routers/transit-nodes. 376 This has led to the development of the DiffServ QoS architecture that 377 eliminated IntServ/per-flow QoS from P-routers, and later on to the 378 evolution from MPLS/RSVP-TE to SR and BIER that eliminated per-flow/ 379 tunnel forwarding/steering and replication state from the same 380 P-nodes. 382 Finally, early experience with Traffic Engineering churn under high 383 load and todays requirements for often NP-complete optimization lead 384 to an architectural preference for off-path/centralized model for TE 385 calculations via PCE to also free P-routers from signaling complexity 386 and perform dynamic/service-dependent signaling only to PE-routers. 388 3. Additional current considerations 390 The following subsections look at further into the background for why 391 per-hop, per-flow state can be problematic and discuss problems 392 beyond this core issue. 394 3.1. Impact of application based state in networks 396 RSVP-TE was (and is) solely used for services where the operator of a 397 domain explicitly provisions RSVP-TE tunnels across its domain (for 398 example using a PCE) and can therefore fairly easily know the worst- 399 case scaling impact. For example the number of tunnels does is not a 400 chance value arising through dynamic subscriber action, and the 401 number of tunnels in the network is primarily impacted by topological 402 changes and the (over time relatively rare) of occurrences of 403 additional services and/or service instances being provisioned. For 404 RSVP-TE there was never (to the knowledge of the authors) an end-to- 405 end application layer interface such as there was for RSVP over IP, 406 for example as supported by earlier versions of Microsoft Windows QoS 407 enabled IP sockets. 409 When per-flow operations including per-hop signaling or even worse 410 per-hop forwarding plane or QoS state is not a result of well- 411 controlled provisioning or well plannable/predictable failure 412 behavior but instead driven by applications not under the control of 413 network operators, the per-hop state requirements can become much 414 more an operational and cost problem, because of its 415 unpredictability. 417 3.2. Experience from IP multicast 419 The widest experience with dynamic, application based signaling in 420 Service Provider networks likely exist for IP multicast, where 421 creation of per-hop forwarding/replication state is triggered by 422 applications not under the control of network operations but by 423 customer managed applications/application-instances. Managing the 424 amount of state and the control plane load on P-routers was and is 425 one of the mayor concerns when operationalizing IP Multicast services 426 in SPs. 428 Service Provider L2-VPN and L3-VPN services can offer IP Multicast 429 via architectures such as [RFC6513] that attempt to solve/reduce the 430 problem of customer application driven, per-multicast application in 431 a variety of ways, but they all come with their own problems: 433 o In ingress-replication, the ingress-PE sends a separate unicast 434 copy to every egress-PE. This creates significant excess traffic 435 on links close to the ingress-PE and potentially higher-cost 436 ingress-PE attachment speeds. 438 o In L3VPN aggregates-trees, the traffic for multiple trees is sent 439 across a common tree reaching the superset of all egress-PE of all 440 included trees. This reduces the number of trees from one per- 441 customer application to a lower number of aggregates this, but it 442 creates potentially significant excess traffic towards egress-PE 443 that do not need all the aggregated traffic and may even result in 444 a requirement for access core access link speeds for those egress 445 routers. 447 Finally, the per P-router stateless BIER solution solved these 448 issues. It does not require any per P-router, per tree state 449 creation, and achieves a 256x better traffic efficiency than ingress 450 replication (with 256 long BIER bit strings). 452 3.3. Service Provider and Private MPLS Networks 454 With DetNet services being targeted primarily for so-called private 455 networks such as (but not limited to) those for industrial, theme 456 parks, power supply systems, road, river, airport and train 457 transportation networks, it is important to understand how concerns 458 for SP networks will apply to such private networks: 460 While the aforementioned evolution of MPLS networks focused on large- 461 scale service provider networks, the very same architectural 462 evolution is or will also happen in any private MPLS networks in the 463 same way as the DiffServ architecture equally became the only widely 464 adopted QoS architecture in any larger scale (campus or beyond) 465 private networks. 467 While some of the scaling, cost, performance and reliability issues 468 mentioned above for service providers may not equally apply to 469 smaller scale private networks, past experience has shown that that 470 it is unlikely for a critical mass for different solutions to develop 471 across a large variety of vertical private type of networks. For 472 this reason, in the past any larger scale enterprise networks have 473 preferred to adopt solutions that had proven themselves through SP 474 deployments and that where based on cross-vendor IETF based 475 architecture principles and widely, interoperable vendor 476 implementations. 478 Another reason for private network operators looking for service 479 provider calls designs is that it also is simplifies potential 480 service provider based management of the network and/or outsourcing 481 of the network to a service provider. This was seen often when large 482 enterprises that had to support multi-tenants evolved from ad-hoc 483 network virtualization solutions (such as VRF-lite) over to BGP/MPLS- 484 VPN designs and later outsourced those very networks. 486 In that same line of future proofing, networking technologies first 487 developed for enterprises would also be picked up and reused in 488 Service Provider networks as long as they would fit. IP Multicast 489 for example had (since about 1996) ca. 10 years of deployment for 490 business critical enterprise use cases (such as financial market data 491 distribution), before it was adopted widely for IPTV in service 492 providers. 494 3.4. Mission-specific vs. shared infrastructures 496 Whereas the previous section points to the practice and benefits to 497 share technologies between private and SP network, this section 498 highlights one core additional requirement of SP networks not found 499 in most private networks from which pre-DetNet deterministic service 500 requirements will likely originate. 502 In architectural terms, the desire and need to minimize or avoid per- 503 application/flow forwarding/control-plane state and per-hop control 504 plane interactions (be it through on-path signaling or direct PCE to 505 P-router signaling) is not primarily a matter of SP/private networks 506 or not even of size, but foremost a matter of whether or not the 507 network itself is seen as the (a) communications fabric of a large 508 distributed application or (b) as an independently running shared 509 infrastructure across a potentially wide variety of application/ 510 services with diverging requirements. 512 (a) is the dominant view of the network specifically from many 513 (single) mission specific networks such as many industrial networks 514 and even non-public High Performance Compute (HPC) center 515 architectures. In either of these case, it is a single architectural 516 entity that can control both network infrastructure and application 517 to build a mission optimized compound. 519 For example, switches in HPC Data Centers had traditionally very 520 shallow interface packet buffering for cost reasons, resulting in 521 inferior performance under peak load with predominant older TCP 522 congestion control stacks. Instead of using better, more expensive 523 switches, it was easier to improve application device TCP stacks, 524 leading for example to BBR TCP. While this is very much in line with 525 the desired Internet architecure that is putting a significant 526 responsibility onto transport layer protocols in hosts (not limited 527 to TCP) to behave "fair" or "ideal", the reality even in many private 528 missions centric networks such as manufacturing plant is different. 529 Dealing with misbehaving user devics or applications is one of the 530 main challenge. In the example, that is the case when a DC is 531 offering public cloud services, where TCP stacks can not be 532 controlled, and hence deeper buffers and/or better AQM are a core 533 requirement. 535 In general: In networks following the (b) shared infrastructure 536 design principle, any resource that needs to be shared across 537 different services or even service instances becomes a potential 538 three party reliability and costing issue between the provider 539 running the network and the two (or more) parties whose services 540 utilize the common resource. Minimizing the total amount of complex, 541 failure-prone and hard to quantify in a cost-effective manner shared 542 resources is thus at the base of any shared infrastructure network 543 design. 545 This again points to the model, where all network control can happen 546 on the edge, and due to the absence of per-hop, per-flow state there 547 simply is no shared flow state table that needs to be managed across 548 multiple different services/subscribers. 550 3.5. PTP and challenges with clock synchronization 552 Some bounded latency solution require accurate clock synchronization 553 across network nodes performing the bounded latency algorithm. The 554 most commonly used (family of) protocol(s) for this is the Precision 555 Time Protocol (PTP), standardized in IEEE1588 and various market 556 specific profiles thereof. 558 PTP can achieve long-term Maximum Time Interval Errors (MTIE) of as 559 little as 10th of nsec. MTIE is the maximum time difference between 560 the clocks of two PTP nodes measured over long period of time. 562 Implementing PTP in devices comes at a range of design requirements. 563 At high degree of accuracy, PTP requires accordingly accurate local 564 oscillators that includes hardware such as regulated heating to 565 operate under constant temperature. It includes accurate 566 distribution of clock across all components of the system, which can 567 be especially challenging in modular, large-scale devices, and 568 accurate insertion and retrieval of timestamp field into packet 569 headers. 571 While PTP is becoming more and more widely available, consistent 572 support of high accuracy across all target type of switches and 573 routers in wide area networks cannot be taken for granted to be a 574 feasible new requirement raised for DetNet when it did not exist in 575 before. Today, PTP is often found in mobile network fronthauls, but 576 not their backhauls or any other broadband aggregation, distribution 577 or core networks. This is because there is, as of today, no strong 578 business case requirement for PTP at high precision in those 579 networks, whereas technologies such as eCPRI raise such requirements 580 against mobile fronthauls. Instead, those other networks most often 581 resort to at best msec accuracy NTP protocol deployments which is 582 typically sufficient for control-plane and operational event tracing 583 as its main, accuracy defining use-case. 585 The larger the network and more multi-vendor varied the deployed 586 equipment is, the higher will also be the operational cost of 587 maintaining and controlling the accuracy of a PTP service. This 588 primarily has been cited in the past as a reason to not deploy PTP 589 even if hardware was supporting it. This operational challenge will 590 especially apply when PTP support may be required for only a small 591 percentage of traffic in a high speed wide area network. The revenue 592 from the service needs to cover the operational cost incurred by its 593 exclusive components (hardware, software and operations). 595 3.6. Jitter - in-time versus on-time 597 This section discusses how low-jitter bounded latency applications 598 can be highly beneficial for DetNet applications. 600 Depending on the bounded latency algorithm, the jitter experienced by 601 packets varies based on the amount of competing traffic. In 602 algorithms and their resulting end-to-end service which this memo 603 calls "in-time", such as GS and [TSN-ATS], the experienced latency in 604 the absence of any competing traffic is zero, and in the presence of 605 the maximum amount of permissible competing traffic, latency is the 606 maximum, guaranteed bounded latency. In result, the jitter provided 607 by these algorithms is the highest possible. 609 In algorithms and their resulting end-to-end service which this memo 610 calls "on-time", the experienced latency is completely or most 611 significantly independent of the amount of competing traffic and the 612 jitter therefore null or minimal. In these algorithms, the network 613 buffers packets when they are earlier than guaranteed, whereas in- 614 time algorithms deliver packets (almost) as fast as possible. 616 This memo argues that on-time queuing algorithms provide an 617 additional value-add over in-time algorithms, especially for use in 618 metropolitan or wide-area networks. Whatever algorithm is used, the 619 receiving application only has a guarantee for the maximum bounded 620 latency, and the real (shorter) latency of any received packet is no 621 indication for the latency of the next packet. Instead, the receiver 622 application has to be prepared for each and any future packet to 623 arrive with the worst possible, e.g.: the bounded latency. 625 The majority of applications require some higher layer function 626 synchronously to the sender application: Rendering of audio/video and 627 other media information needs to happen at the same frequency or 628 event intervals at which the media was encoded. When these 629 applications receive packets earlier than the time at which they can 630 be processed (which is equal or close to the bounded latency), these 631 applications buffer media in a so-called playout buffer and release 632 them only at that target time. Likewise, remote control loops 633 including industrial Programmable Logic Controller (PLC) loops or 634 remote controlling of robots or cars is typically based on 635 synchronous operations. In these applications, early packets are 636 also delayed to then be processed "synchronously" later. 638 In all cases, where applications need to buffer (or otherwise 639 remember) received data when it is too early, in-time queueing 640 latency raises the challenge to application developers to be able to 641 predict the networks worst possible jitter, and this can be 642 particularly challenging for embedded, if not constrained receiver 643 devices with minimum memory to buffer/remember. When these devices 644 are designed against one particular type of network with well-known 645 low jitter, then they will not necessarily operate correctly in 646 networks with larger jitter. And in metropolitan and WAN networks, 647 jitter with in-time services can be highly variable based on its 648 design and the relative location of the communicating nodes in the 649 topology (see Section 5 for an example network design). 651 One example of such issues was encountered when digital TV receivers 652 (Set Top Boxes, STB) designed for (mostly synchronous) digital cable 653 transmission where evolved to become IPTV STB, but the playout buffer 654 of < 50 msec was not sufficient to compensate for a > 50 msec jitter 655 experienced in IP metropolitan networks. 657 Note that this section does not claim that all applications will 658 benefit from on-time service, nor that no application would benefit 659 more from in-time service than from on-time service. Nevertheless, 660 the authors are not aware of instances of [RFC8578] application for 661 whom in-time service would be more beneficial than on-time service. 662 Of course, this comparison is only about the benefit to the 663 application and other factors such as the cost/scale of the service 664 for the network itself have also to be taken into account. 666 4. Challenges for high-speed packet forwarding hardware 668 The problems of cost and operational feasibility in shared- 669 infrastructure networks specifically applies to scaling of hardware 670 resources such as per-application-flow forwarding or QoS state in 671 high-speed network routers: Even if the business case makes it clear 672 that only e.g. 1 Gbps worth of traffic may require this advanced 673 state (such as multicast replication or per-flow shaping for bounded 674 latency), it will be more expensive to build this functionality into 675 a 100 Gbps transit switch/router than into a 1 Gbps switch/router. 676 This too is based on experience from migrating services of low-speed 677 mission specific networks, such as IP multicast onto high speed, 678 shared-infrastructure service provider networks. 680 The reason for this higher cost at higher speed is that the 1 Gbps 681 worth of "advanced" traffic still has to be built into 100 times 682 faster hardware and each of the "advanced" packets forwarded would 683 needs to be replicated/shaped 100 times faster. 685 This packet processing issue may look like it applies equally to both 686 per-hop, per-flow stateful based forwarding as well as solely in- 687 packet based mechanisms, in practice, per-flow state may requires a 688 lot more high-speed memory access because of the need to access an 689 entry from a state table. In most cases, this table space can only 690 be made to work at line rate packet processing when it is on-chip, 691 hence it is not only most expensive, it is also crucial to scale 692 right. And as the 1 vs. 100 Gbps example above showed, it is very 693 hard to come by an appropriate scale smaller than "would work for 694 100% of traffic" - because network operator providing shared 695 infrastructure networks really do not want to be responsible for 696 predicting how individual services may grow in adoption by making a 697 specific hardware selection that constrains any such grows. 699 Last, but not least, on-chip high-speed state tables become even more 700 expensive when they do not only have to be read only, but also when 701 they have to be written at line rate and even worse, when they have 702 to operate for line-rate speed read/write/read control loops: 704 The main issue with scaling state in hardware routers is that designs 705 will be hesitant to work against unclear growth predictions. Even if 706 at some point in time only 1 Gbps of DetNet traffic was expected to 707 be required on a 100 Gbps platform, hardware designers will be much 708 more likely want to scale against the worst (best) case service 709 growth expectation so that customers will not feel that they would 710 buy into a product that becomes obsolete under success. 712 Whereas steering state, such as MPLS label entries can easily scale 713 to hundreds of thousands, the same is not clear about shapers or 714 interleaved regulators. They are more challenging because they 715 require fast (on-chip) read-write memory for the state variables, 716 especially when forwarding is parallelized across multiple execution 717 unit. This does incur additional complexity to split up the state 718 and its packets across multiple execution units and/or to provide 719 consistent cross-execution units shared read/writeable memory. 721 Even only writeable (but not cross-execution units then also 722 readable) memory has traditionally been a sparse resource the faster 723 the forwarding engines are. This can be seen from (often very 724 limited) scale of packet monitoring state such as for IPfix. 726 But the main issue of per-hop, per-flow forwarding state that could 727 be quite dynamic because it might be triggered by applications is the 728 control plane to forwarding-plane-state interactions. Updating 729 hardware forwarding engine state tables is often one of the key 730 performance limits of routers. Adding significant additional state 731 with likely ongoing changes is easily seen as a big contributor to 732 churn in the control plane and likely reason for stability and 733 reduced peak performance under key events such as reconvergence of 734 all or large parts of IGP or BGP routing tables. 736 5. A reference network design 738 The following picture shows an example, worst-case network topology 739 of interest (in the opinion of the authors) for bounded latency 740 considerations. This section does not claim that greenfield rollouts 741 may or want to use all aspects of this topology. What his memo does 742 claim is that many existing brownfield networks, especially large 743 metropolitan areas show all or many of these aspects, and that it 744 would be prudent for bounded latency network technologies to support 745 networks like these so as to not create new constraints against 746 network designers by only supporting physical network topologies 747 optimized for a particular type of service (bounded latency). 749 Subscribers, Towers, IoT devices 750 ............. 751 ...Access.... National-Core 752 PE100 ... PE199 Exchanges/ 753 | | Peerings 754 | \----\ / \ 755 | \ / \ 756 --- P11 --- P12 --- P13 --- P14 -- 757 / \ 758 Edge -----P21 P15 759 DC PE | | 760 ------P21 P17 761 \ / 762 --- P20 --- ......... --- P18 -- 763 / \ 764 Edge --- P30 P40 765 DC PE \ / 766 ----- P31 -- .... P38 --- P39 767 \ / 768 \ / 769 PE200...PE299 770 ...Access.... 771 ............. 772 Subscribers, Towers, IoT devices 774 Figure 1: Reference Network Topology 776 An example metropolitan scale network as shown in Figure 1 may 777 consist of one or more rings of forwarders. A ring provides the 778 minimum cost n+1 redundancy between the ring nodes, especially when, 779 as is common in metropolitan networks, new fibre cannot cost- 780 effectively be put into new optimum trenches, but existing fibre and/ 781 or trenches have to be used. This is specifically true when the area 782 includes not dense populated suburban areas (higher cost per 783 subscriber and mile for rollouts). 785 Multiple, so-called subtended rings typically occur when existing 786 networks are expanded into new areas: A new ring is simply connected 787 at two most economic points into the existing infrastructure. 788 Likewise, such a topology may become more complicated over time by 789 addition of capacity, which resulting from TE planning calculations 790 may not follow any of the pre-existing ring paths. 792 Edge Data-Center (DC), connections to Exchanges/Peerings or national 793 cores of the provider itself, as well as all subscribers including 794 Mobile Network Towers, and IoT devices connect to these ring directly 795 via PE edge-forwarders and (more often) via additional CE type 796 devices. P nodes may also double as PE nodes. 798 In densely populated regions, P, or PE nodes may have a high number 799 of attached devices, shown in the picture with the example of 100 PE 800 forwarder connecting to a single P forwarder (or rather two P for 801 redundancy and therefore support of PREOF). 803 In summary, the following aspects of these networks are relevant for 804 bounded latency: 806 o Link speeds today are at least 100 Gbps and will be Tbps in the 807 near future. Even if only a small percentage of that traffic has 808 to support bounded latency, the queuing mechanism need to support 809 these high-speed interfaces. 811 o Fan-in/out at PE or P nodes may be (worst case) in the order of 812 hundred(s) of incoming interfaces. Bounded latency mechanisms 813 whose number of queues depend on the number (#I) of interfaces in 814 a more than linear fashion, such as (#I^2) in the case of 815 [TSN-ATS], may introduce significant challenges for cost-effective 816 hardware. 818 o Through the advent of decentralized edge Data Center and peerings 819 between different operators and content providers, traffic flows 820 of interest will not solely be between one central site from/to 821 subscribers hub&spoke. Instead arbitrary, traffic engineered 822 paths across the topology between any two edges need to be 823 supportable in scale with the bounded latency queuing mechanism. 825 o The total number of edge (#E) nodes (PE or CE) for a bounded 826 latency service can easily be in the thousands. Aggregation of 827 bounded latency flows on the order of (#E^2), which is the best 828 option in per-hop, per-flow solutions such as [TSN-ATS], is likely 829 insufficient to significantly reduce the number of flows that need 830 to be managed across P nodes in such bounded latency queuing 831 mechanisms. 833 o The total number of P nodes may be in the hundreds and bounded 834 latency flows in the tenths of thousands. It should also be 835 expected that such flows are not necessarily long-term static but 836 may need to be provisionable in the time-scale order of for 837 example telephone calls (such as flows supporting 838 remote control of devices or operations). Bounded latency 839 solutions that require per-flow, per-node state maintenance on the 840 P nodes themselves may therefore be undesirable from a network 841 operational/complexity/reliability perspective, but also from a 842 hardware engineering cost perspective, especially with respect to 843 the control plane cost of dynamically setting up per-flow bounded 844 latency for flow whenever there is a new flow or all of them 845 whenever there are topology or load changes that make rerouting 846 desirable. 848 Beyond queuing concerns, path selection too specifically for 849 deterministic services is a challenge in these networks: 851 o Path lengths may be significantly longer than e.g. 3 hops. In 852 large metropolitan networks, they can reach 20 or more hops. 853 Speed of light end-to-end in these networks will be in the order 854 of low number of msec. End-to-end queuing latency can be in the 855 same range, if not higher. 857 o To avoid undesirable re-routing under failure when PREOF and 858 engineered disjoint paths are used, traffic steering needs to 859 support efficiently supportable hop-by-hop traffic steering. In 860 networks designed for source-routing (e..: SR routing), 861 efficiently encoded strict-hop-by-hop steering for as much as 862 those (e.g.: 20) hops may be desirable to support. 864 6. Standardized Bounded Latency algorithms 866 [DNBL] gives an overview of the math for the most well-known existing 867 deterministic bounded latency algorithms/solutions. This section 868 reviews the relevant currently standardized algorithms from the 869 perspective of the above listed problems for high-speed, high-scale, 870 shared services infrastructures and to provide additional background 871 about them. 873 6.1. Guaranteed Service (GS) 875 GS is described in section 6.5 of [DNBL]. Section 2.1 describes its 876 historical evolution and challenges. We skip further detailing of 877 its issues here to concentrate on IEEE Time Synchronuous Networking - 878 Asynchronous Traffic Shaping [TSN-ATS], which in general is seen as 879 superior to GS for high speed hardware implementation. All the 880 concerns described in the TSN-ATS section apply equally or even more 881 to GS. 883 6.2. TSN Asynchronous Traffic Shaping (TSN-ATS) 885 Section 6.4 of [DNBL] describes the bounded latency used for TSN 886 Asynchronous Traffic Shaping [TSN-ATS]. Like GS, this bounded 887 latency solution also relies on per-flow shaper state, except that it 888 uses optimized shapers called "Interleaved Regulator" as explained in 889 section 4.2.1 of [DNBL]. 891 The concept and simplification in interleaved regulators over 892 traditional shapers and the concept of interleaved regulators is a 893 resulting from mathematical work done in the last 10 years starting 894 with [UBS]. 896 In a system with e.g. N=10,000 flows each with a shaper, the 897 forwarder needs to have 10,000 shapers each of which would need to 898 calculate the earliest feasible send-time of the first queued packet 899 of the flow and all these send-times would need to be compared by a 900 scheduler picking the absolute first packet to send. Of course it is 901 unlikely that the router would have to queue at least one packet for 902 all queues at any point in time, but the complexity to implement the 903 scheduler scales with N. 905 With interleaved regulators, there is still the per-flow state 906 required to hold each flows traffic parameters and its next-packet 907 earliest departure time, but instead of requiring a scheduler to 908 compare N entries, packets are queued into one out of (#IIF,#PRIO) 909 FIFO queues, one queue for all the packets arriving from the same 910 Incoming InterFace (IIF) and targeted the same worst-case queuing 911 latency/PRIOrity (PRIO) on this hop. The shaper now only needs to 912 calculate the earliest departure time of the head of each of these M= 913 #IIF * #PRIO queues and the complexity of a scheduler to select the 914 first packet across those interleave regulators is therefore reduced 915 by a factor of O(N/M). 917 Unfortunately, while industrial ethernet switches today often have no 918 more than 24 IIF, aggregation routers in metropolitan networks may 919 have thousands of IIF, so the benefit of interleaved regulators over 920 per-flow shaper will likely be much higher in classical TSN 921 environments than it would be for example likely DetNet target 922 routers in metropolitan networks. 924 In addition, the aforementioned core problems for shapers 925 (Section 4), namely control plane, read/write/read cycle access and 926 scale equally apply to interleaved regulators, so the main 927 optimization benefits of interleaved regulators is for the original 928 targets of [UBS] / [TSN-ATS]: low-speed (1..10Gbps switches) with 929 limited number of interfaces - but to a much lower degree for likely 930 important type of DetNet deployments. 932 6.3. Cyclic Queuing and Forwarding (CQF) 934 TSN Cyclic Queuing and Forwarding as described in [DNBL], section 935 6.6, is a per-flow, per-transit-hop stateless forwarding mechanism, 936 which solves the concerns with per-hop, per-flow state issues 937 described earlier in this memo. It also provides an on-time service 938 in which the per-hop and end-to-end jitter is very small, namely in 939 the order of a cycle time. 941 [CQF] operates by forwarders sending packets in periodic cycles. 942 These cycles are derived from clock synchronization: The start of 943 each cycle (and by implication the end of the prior cycle) are simply 944 periodically increasing clock timestamps that have to be synchronized 945 across adjacent forwarders, usually via PTP. This method to operate 946 cycles allows [CQF] to operate without additional [CQF] data packet 947 headers, but it is also the reason for the two issues of [CQF], and 948 both relate to the so-called dead time (DT). 950 For the receiving node to correctly associate a [CQF] packet to the 951 same cycle as the sending node, the last bit of the last packet in 952 the cycle on the sending node needs to be received by the receiving 953 node before the cycle ends. 955 [DNBL] explains that DT is the sum of latencies 1,2,3,4 as of [DNBL] 956 Figure 1, but that is missing the MTIE between the forwarders: If a 957 cycle is for example 10 usec, and the PTP MTIE is 1 usec, then only 9 958 usec of the cycle could be used (without even yet considering the 959 other factors contributing to MTIE). If MTIE is not taken into 960 account, a packet might arrive in time from the perspective of the 961 sending forwarder, but not in the perspective of the 1 usec earlier 962 receiving node. 964 In practice, MTIE should be equal or lower than 1% of the cycle time. 965 When forwarders and links increase in speed, cycle times could become 966 proportionally smaller to reduce per-hop cycle time latency. When 967 this is done, MTIE needs to equally become smaller, raising the costs 968 of the solution. Therefore, [CQF] has a challenge with higher speed 969 networks. 971 The second and even more important problem is that DT includes the 972 link latency (2 in [DNBL], Figure 1). With a speed of light in fibre 973 of 200,000 Km, link latency is 10 usec for 2 Km. This makes [CQF] 974 very problematic and limited in metropolitan and wide-area networks. 975 If the longest link of a network was 10 Km, this would cause a DT on 976 that link of 50 usec and with a cycle time of 100 usec, only 50% 977 bandwidth could be used for cycle-time (bounded latency) traffic 978 (excluding all other DT factors). 980 When links are subject to thermal expansion also known as sag on 981 hanging wires, such as broadband copper wires (Cable Networks), their 982 length can also change by as much as 20% between noon and night 983 temperatures, which without changes in the design has to be taken 984 into account as part of DT. 986 In conclusion, [CQF] solves many of the problems discussed in this 987 memo, but it's reliance on timestamp synchronized cycles may pose 988 undesirable challenges with the required accuracy of PTP in high 989 speed network and especially limits [CQF] ability to support wider- 990 scale networks due to DT. 992 7. Candidate solution directions 994 As this memo outlines, per-hop, per-flow stateless forwarding is the 995 one core requirement for to support Gbps speed metropolitan or wide- 996 area networks. 998 This section gives an overview and evaluation from the perspective of 999 the authors of this memo of currently known non-standardized 1000 proposals for per-hop-stateless forwarding with the explicit goal 1001 and/or possibility of bounded latency forwarding and in relationship. 1002 to the concerns and desires described in the previous sections. 1004 7.1. Packet tagging based CQF 1006 To overcome the challenges outlined in Section 6.3, 1007 [I-D.qiang-DetNet-large-scale-DetNet] and 1008 [I-D.dang-queuing-with-multiple-cyclic-buffers] (tagged-CQF) propose 1009 a modified [CQF] mechanism in which timestamp based cycle indication 1010 of [CQF] is replaced by indicating the senders cycle in an 1011 appropriate packet header field, so that the receiver can accordingly 1012 map the received packet to the right local cycle. 1014 This approach completely eliminates the link-latency as a factor 1015 impacting the effectiveness of the mechanism, because in this 1016 approach, the link latency does not impact the DT. Instead the link 1017 latency is used to calculate which cycle from the sender needs to be 1018 mapped to which cycle on the receiver, and this is programmd during 1019 setup of links into the receiving routers cycle mapping table. 1021 Depending on the number of cycles configured, it is also possible to 1022 compensate for variability in the link-latency and higher MTIE 1023 (picture TBD). If one more cycle is used for example, this would 1024 allow for MTIE to be the order of one cycle time as opposed to a 1025 likely target of 1% of cycle time as in [CQF], reducing the required 1026 PTP clock accuracy by a factor of 100. This possible reduction in 1027 required accuracy of operations by appropriate configuration does not 1028 only cover PTP but also extends into any forwarding operation within 1029 the nodes, e.g.: it could also reduce the cost of implementation of 1030 forwarding hardware at higher speeds accordingly. 1032 In MPLS networks, packet tagged CQF with a small number of cycle tags 1033 (such as 3 or 4) could easily be realized and standardized by relying 1034 on E-LSP where 3 or 4 EXP code points would be used to indicate the 1035 cycle value. Given how such deterministic bounded latency traffic is 1036 not subject to congestion control, it also does not require 1037 additional ECN EXP code points, so those would be available for e.g.: 1038 best-effort traffic that should use the same E-LSP. 1040 7.2. Packet tagging based CQF with SR 1042 [I-D.chen-DetNet-sr-based-bounded-latency] applies the taged-CQF 1043 mechanisms to Segment Routing (SR) by proposing SR style header 1044 elements to indicate the per-segment/hop cycle. This eliminates the 1045 need to set up on every hop a cycle mapping table. 1047 It is unclear to the authors of this memo how big a saving this is 1048 given how the PCE would need to update all the ingress router per- 1049 flow configurations where header imposition happens when links 1050 change, whereas the mapping table approach would require only 1051 localized changes on the affected routers. 1053 7.3. Per-hop latency indications for Segment Routing 1055 [I-D.stein-srtsn] describes a mechanism in which a source-routed 1056 header in the spirit of a Segment Routing (SR) header can be used to 1057 enable a per-transit-hop per-flow stateless latency control. For 1058 every hop, a maximum latency is specified. The draft outlines a 1059 control plane which similarly to packet tagging based CQF or 1060 [TSN-ATS] would put the work of admitting flows, determining their 1061 paths and admitting their resources along those paths to some form of 1062 PCE/SDN-Controller. 1064 The basic principle of forwarding in this proposal is to put received 1065 packets into a priority heap and schedule them in order of their 1066 urgency (shortest latency) for this hop. 1068 The draft explicitly does not prescribes specific algorithms on the 1069 forwarders to take the indicated latency for the hop into account in 1070 a way that the controller can calculate the resource availability, 1071 such as specific queuing or scheduling algorithms. 1073 It is not entirely clear to the authors of this memo, if the sole 1074 indication of such deadline latencies is sufficient to completely 1075 eliminate per-transit-hop, per-flow state and still achieve 1076 deterministic latency because of the [UBS] work. Consider that the 1077 packets latency for a hop could be used to derive a priority queue on 1078 the hop relative to other packets with higher or lower latency for 1079 this hop, 1081 As was shown in the research work leading up to [TSN-ATS], the 1082 priority queuing on each hop alone is not sufficient to achieve a 1083 simple, solely per-hop calculated latency bound under high load 1084 because of the problem of multi-hop burst aggregation and the 1085 resulting hard to calculate incurred upper latency bound. To 1086 overcome that calculation issue, shapers or as in [TSN-ATS] their 1087 optimization, interleaved regulators, are used in [TSN-ATS] and GS. 1088 Shapers/interleaved requires to maintain across packets from the same 1089 flow per-flow state. 1091 Nevertheless, appropriate mathematical models for SDN controllers may 1092 be possible to develop deterministic per-hop forwarding models 1093 relying not only on the per-hop indicated latency but also on 1094 additional constraints such as limited number of hops or sufficiently 1095 low degrees of maximum admitted amount of traffic. Or else this may 1096 be used for to be developed latency models that are not 100% 1097 deterministic, but close enough in probability such that the amount 1098 of late packets would be in the same order as otherwise unavoidable 1099 problems such as BER based packet loss. 1101 To that end, the author of [I-D.stein-srtsn] has conducted 1102 simulations of the proposed mechanism, contrasting it with other 1103 mechanisms. These results, which will be published elsewhere, show 1104 hat this mechanism excels in cases with high load and a small number 1105 of flows with tight budgets. However, some small percentage of 1106 packets will miss their end-to-end latency bounds, and must be 1107 treated as lost packets. 1109 Depending on the algorithms chosen, solutions may or may not rely on 1110 strong, weak, or no clock synchronization across nodes. 1112 7.4. Latency Based Forwarding 1114 "High-Precision Latency Forwarding over Packet-Programmable 1115 Networks", NOMS 2020 conference [LBF] describes a framework for per- 1116 transit-hop, per-flow stateless forwarding based on three packet 1117 parameters: The minimum and maximum desired end-to-end latency, set 1118 by the sender and not changed by the network, and the experienced 1119 latency updated by every hop. Routers supporting this LBF mechanism 1120 do also extend their routing (e.g.: IGP) to be able to calculate the 1121 non-queueing latency towards the destination. Based on the in-packet 1122 parameters and the future latency prediction are used to prioritize 1123 packets in queuing including giving them higher priority when they 1124 are late due to prior hop incurred latency, or delaying them when 1125 they are too early. 1127 LBF was started as a more fundamental research into how application 1128 experience could be improved when they are allowed to indicate such 1129 differential min/max latency Service Level Objectives (SLO). 1130 Benefits include the ability to compensate for prior hop incurred 1131 queuing latency, but also to automatically prioritize packets on a 1132 single hop based on their future path length, all without the need 1133 for any explicit admission control. 1135 The LBF algorithm is completely without need for clock 1136 synchronization across nodes. Instead, it assumes mechanisms to know 1137 or learn link latency and the remaining latencies (as defined in the 1138 DetNet architecture) can be calculated locally (e.g.: latency through 1139 a forwarder). 1141 The authors have not yet tried to define a mathematical model that 1142 would allow to derive completely deterministic behavior for this 1143 original LBF algorithm in conjunction with a PCE/SDN controller. Due 1144 to the absence of per-flow (shaper/interleaved-regulator), the 1145 authors believe that deterministic solutions would as outlined above 1146 for SRTSN (Section 7.3) likely only be possible under additional 1147 assumed constraints. 1149 8. Conclusions 1151 Bounded Latency for DetNet have been designed by trying to adopt 1152 solutions developed either several decades ago (GS) or recently for 1153 limited scope and scale L2 networks [TSN-ATS]. 1155 To allow DetNet solutions to explore opportunities in larger speed & 1156 scale shared network infrastructures, both private and service 1157 provider networks, it is highly desirable for DetNet WG (and/or other 1158 IETF WGs claiming responsibility in conjunction with DetNet as the 1159 driver) to explore the opportunities to standardize additional, and 1160 in the opinion of the authors better per-hop forwarding models in 1161 support of (near) deterministic bounded latency by mean of 1162 standardizing per-flow stateless/"DiffServ" style per-hop forwarding 1163 behavior (PHB) with appropriate network packet header parameters. 1165 9. Security Considerations 1167 This document has no security considerations (yet?). 1169 10. IANA Considerations 1171 This document has no IANA considerations. 1173 11. Acknowledgements 1175 Thanks for Yaakov Stein for reviewing and proposing text for 1176 Section 7.3. 1178 12. Informative References 1180 [CQF] IEEE Time-Sensitive Networking (TSN) Task Group., "IEEE 1181 Std 802.1Qch-2017: IEEE Standard for Local and 1182 Metropolitan Area Networks -- Bridges and Bridged Networks 1183 -- Amendment 29: Cyclic Queuing and Forwarding", 2017. 1185 [DNBL] Finn, N., Boudec, J. L., Mohammadpour, E., Zhang, J., 1186 Varga, B., and J. Farkas, "DetNet Bounded Latency", draft- 1187 ietf-detnet-bounded-latency-06 (work in progress), May 1188 2021. 1190 [I-D.chen-DetNet-sr-based-bounded-latency] 1191 Chen, M., Geng, X., and Z. Li, "Segment Routing (SR) Based 1192 Bounded Latency", draft-chen-DetNet-sr-based-bounded- 1193 latency-01 (work in progress), May 2019. 1195 [I-D.dang-queuing-with-multiple-cyclic-buffers] 1196 Liu, B. and J. Dang, "A Queuing Mechanism with Multiple 1197 Cyclic Buffers", draft-dang-queuing-with-multiple-cyclic- 1198 buffers-00 (work in progress), February 2021. 1200 [I-D.ietf-bier-te-arch] 1201 Eckert, T., Cauchie, G., and M. Menth, "Tree Engineering 1202 for Bit Index Explicit Replication (BIER-TE)", draft-ietf- 1203 bier-te-arch-10 (work in progress), July 2021. 1205 [I-D.qiang-DetNet-large-scale-DetNet] 1206 Qiang, L., Geng, X., Liu, B., Eckert, T., Geng, L., and G. 1207 Li, "Large-Scale Deterministic IP Network", draft-qiang- 1208 DetNet-large-scale-DetNet-05 (work in progress), September 1209 2019. 1211 [I-D.stein-srtsn] 1212 Stein, Y. (., "Segment Routed Time Sensitive Networking", 1213 draft-stein-srtsn-00 (work in progress), February 2021. 1215 [LBF] Clemm, A. and T. Eckert, "High-Precision Latency 1216 Forwarding over Packet-Programmable Networks", IEEE 2020 1217 IEEE/IFIP Network Operations and Management Symposium 1218 (NOMS 2020), doi 10.1109/NOMS47738.2020.9110431, April 1219 2020. 1221 [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S. 1222 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 1223 Functional Specification", RFC 2205, DOI 10.17487/RFC2205, 1224 September 1997, . 1226 [RFC2210] Wroclawski, J., "The Use of RSVP with IETF Integrated 1227 Services", RFC 2210, DOI 10.17487/RFC2210, September 1997, 1228 . 1230 [RFC2211] Wroclawski, J., "Specification of the Controlled-Load 1231 Network Element Service", RFC 2211, DOI 10.17487/RFC2211, 1232 September 1997, . 1234 [RFC2212] Shenker, S., Partridge, C., and R. Guerin, "Specification 1235 of Guaranteed Quality of Service", RFC 2212, 1236 DOI 10.17487/RFC2212, September 1997, 1237 . 1239 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 1240 and W. Weiss, "An Architecture for Differentiated 1241 Services", RFC 2475, DOI 10.17487/RFC2475, December 1998, 1242 . 1244 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 1245 Label Switching Architecture", RFC 3031, 1246 DOI 10.17487/RFC3031, January 2001, 1247 . 1249 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 1250 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 1251 Tunnels", RFC 3209, DOI 10.17487/RFC3209, December 2001, 1252 . 1254 [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ 1255 BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 1256 2012, . 1258 [RFC8279] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 1259 Przygienda, T., and S. Aldrin, "Multicast Using Bit Index 1260 Explicit Replication (BIER)", RFC 8279, 1261 DOI 10.17487/RFC8279, November 2017, 1262 . 1264 [RFC8296] Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A., 1265 Tantsura, J., Aldrin, S., and I. Meilik, "Encapsulation 1266 for Bit Index Explicit Replication (BIER) in MPLS and Non- 1267 MPLS Networks", RFC 8296, DOI 10.17487/RFC8296, January 1268 2018, . 1270 [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., 1271 Decraene, B., Litkowski, S., and R. Shakir, "Segment 1272 Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, 1273 July 2018, . 1275 [RFC8570] Ginsberg, L., Ed., Previdi, S., Ed., Giacalone, S., Ward, 1276 D., Drake, J., and Q. Wu, "IS-IS Traffic Engineering (TE) 1277 Metric Extensions", RFC 8570, DOI 10.17487/RFC8570, March 1278 2019, . 1280 [RFC8578] Grossman, E., Ed., "Deterministic Networking Use Cases", 1281 RFC 8578, DOI 10.17487/RFC8578, May 2019, 1282 . 1284 [RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas, 1285 "Deterministic Networking Architecture", RFC 8655, 1286 DOI 10.17487/RFC8655, October 2019, 1287 . 1289 [RFC8660] Bashandy, A., Ed., Filsfils, C., Ed., Previdi, S., 1290 Decraene, B., Litkowski, S., and R. Shakir, "Segment 1291 Routing with the MPLS Data Plane", RFC 8660, 1292 DOI 10.17487/RFC8660, December 2019, 1293 . 1295 [RFC8964] Varga, B., Ed., Farkas, J., Berger, L., Malis, A., Bryant, 1296 S., and J. Korhonen, "Deterministic Networking (DetNet) 1297 Data Plane: MPLS", RFC 8964, DOI 10.17487/RFC8964, January 1298 2021, . 1300 [RFC8986] Filsfils, C., Ed., Camarillo, P., Ed., Leddy, J., Voyer, 1301 D., Matsushima, S., and Z. Li, "Segment Routing over IPv6 1302 (SRv6) Network Programming", RFC 8986, 1303 DOI 10.17487/RFC8986, February 2021, 1304 . 1306 [TSN-ATS] Specht, J., "P802.1Qcr - Bridges and Bridged Networks 1307 Amendment: Asynchronous Traffic Shaping", IEEE , July 1308 2020, . 1310 [UBS] Specht, J. and S. Samii, "Urgency-Based Scheduler for 1311 Time-Sensitive Switched Ethernet Networks", IEEE 28th 1312 Euromicro Conference on Real-Time Systems (ECRTS), 2016. 1314 Authors' Addresses 1316 Toerless Eckert 1317 Futurewei Technologies USA 1318 2220 Central Expressway 1319 Santa Clara CA 95050 1320 USA 1322 Email: tte@cs.fau.de 1323 Stewart Bryant 1324 Stewart Bryant Ltd 1326 Email: sb@stewartbryant.com