idnits 2.17.00 (12 Aug 2021) /tmp/idnits64460/draft-ietf-mops-streaming-opcons-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (15 September 2021) is 247 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-02) exists of draft-cardwell-iccrg-bbr-congestion-control-00 == Outdated reference: A later version (-11) exists of draft-pantos-hls-rfc8216bis-09 == Outdated reference: draft-ietf-quic-datagram has been published as RFC 9221 == Outdated reference: A later version (-16) exists of draft-ietf-quic-manageability-13 == Outdated reference: A later version (-01) exists of draft-ietf-quic-qlog-h3-events-00 == Outdated reference: A later version (-02) exists of draft-ietf-quic-qlog-main-schema-00 == Outdated reference: A later version (-01) exists of draft-ietf-quic-qlog-quic-events-00 == Outdated reference: draft-iab-covid19-workshop has been published as RFC 9075 -- Obsolete informational reference (is this intentional?): RFC 2001 (Obsoleted by RFC 2581) Summary: 0 errors (**), 0 flaws (~~), 10 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MOPS J. Holland 3 Internet-Draft Akamai Technologies, Inc. 4 Intended status: Informational A. Begen 5 Expires: 19 March 2022 Networked Media 6 S. Dawkins 7 Tencent America LLC 8 15 September 2021 10 Operational Considerations for Streaming Media 11 draft-ietf-mops-streaming-opcons-07 13 Abstract 15 This document provides an overview of operational networking issues 16 that pertain to quality of experience in streaming of video and other 17 high-bitrate media over the Internet. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on 19 March 2022. 36 Copyright Notice 38 Copyright (c) 2021 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 43 license-info) in effect on the date of publication of this document. 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. Code Components 46 extracted from this document must include Simplified BSD License text 47 as described in Section 4.e of the Trust Legal Provisions and are 48 provided without warranty as described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.1. Notes for Contributors and Reviewers . . . . . . . . . . 4 54 1.1.1. Venues for Contribution and Discussion . . . . . . . 5 55 1.1.2. History of Public Discussion . . . . . . . . . . . . 5 56 2. Bandwidth Provisioning . . . . . . . . . . . . . . . . . . . 6 57 2.1. Scaling Requirements for Media Delivery . . . . . . . . . 6 58 2.1.1. Video Bitrates . . . . . . . . . . . . . . . . . . . 6 59 2.1.2. Virtual Reality Bitrates . . . . . . . . . . . . . . 6 60 2.2. Path Bandwidth Constraints . . . . . . . . . . . . . . . 7 61 2.2.1. Know Your Network Traffic . . . . . . . . . . . . . . 8 62 2.3. Path Requirements . . . . . . . . . . . . . . . . . . . . 9 63 2.4. Caching Systems . . . . . . . . . . . . . . . . . . . . . 9 64 2.5. Predictable Usage Profiles . . . . . . . . . . . . . . . 10 65 2.6. Unpredictable Usage Profiles . . . . . . . . . . . . . . 11 66 2.7. Extremely Unpredictable Usage Profiles . . . . . . . . . 12 67 3. Latency Considerations . . . . . . . . . . . . . . . . . . . 13 68 3.1. Ultra Low-Latency . . . . . . . . . . . . . . . . . . . . 14 69 3.2. Low-Latency Live . . . . . . . . . . . . . . . . . . . . 15 70 3.3. Non-Low-Latency Live . . . . . . . . . . . . . . . . . . 16 71 3.4. On-Demand . . . . . . . . . . . . . . . . . . . . . . . . 16 72 4. Adaptive Encoding, Adaptive Delivery, and Measurement 73 Collection . . . . . . . . . . . . . . . . . . . . . . . 16 74 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 17 75 4.2. Adaptive Encoding . . . . . . . . . . . . . . . . . . . . 17 76 4.3. Adaptive Segmented Delivery . . . . . . . . . . . . . . . 18 77 4.4. Advertising . . . . . . . . . . . . . . . . . . . . . . . 18 78 4.5. Bitrate Detection Challenges . . . . . . . . . . . . . . 20 79 4.5.1. Idle Time between Segments . . . . . . . . . . . . . 20 80 4.5.2. Head-of-Line Blocking . . . . . . . . . . . . . . . . 21 81 4.5.3. Wide and Rapid Variation in Path Capacity . . . . . . 22 82 4.6. Measurement Collection . . . . . . . . . . . . . . . . . 22 83 4.6.1. CTA-2066: Streaming Quality of Experience Events, 84 Properties and Metrics . . . . . . . . . . . . . . . 23 85 4.6.2. CTA-5004: Common Media Client Data (CMCD) . . . . . . 23 86 4.7. Unreliable Transport . . . . . . . . . . . . . . . . . . 23 87 5. Evolution of Transport Protocols and Transport Protocol 88 Behaviors . . . . . . . . . . . . . . . . . . . . . . . . 24 89 5.1. UDP and Its Behavior . . . . . . . . . . . . . . . . . . 24 90 5.2. TCP and Its Behavior . . . . . . . . . . . . . . . . . . 25 91 5.3. The QUIC Protocol and Its Behavior . . . . . . . . . . . 26 92 6. Streaming Encrypted Media . . . . . . . . . . . . . . . . . . 28 93 6.1. General Considerations for Media Encryption . . . . . . . 29 94 6.2. Considerations for "Hop-by-Hop" Media Encryption . . . . 30 95 6.3. Considerations for "End-to-End" Media Encryption . . . . 31 96 7. Further Reading and References . . . . . . . . . . . . . . . 32 97 7.1. Industry Terminology . . . . . . . . . . . . . . . . . . 32 98 7.2. Surveys and Tutorials . . . . . . . . . . . . . . . . . . 32 99 7.2.1. Encoding . . . . . . . . . . . . . . . . . . . . . . 32 100 7.2.2. Packaging . . . . . . . . . . . . . . . . . . . . . . 33 101 7.2.3. Content Delivery . . . . . . . . . . . . . . . . . . 33 102 7.2.4. ABR Algorithms . . . . . . . . . . . . . . . . . . . 33 103 7.2.5. Server/Client/Network Collaboration . . . . . . . . . 34 104 7.2.6. QoE Metrics . . . . . . . . . . . . . . . . . . . . . 35 105 7.2.7. Point Clouds and Immersive Media . . . . . . . . . . 35 106 7.3. Open-Source Tools . . . . . . . . . . . . . . . . . . . . 36 107 7.4. Technical Events . . . . . . . . . . . . . . . . . . . . 36 108 7.5. List of Organizations Working on Streaming Media . . . . 37 109 7.6. Topics to Keep an Eye on . . . . . . . . . . . . . . . . 38 110 7.6.1. 5G and Media . . . . . . . . . . . . . . . . . . . . 38 111 7.6.2. Ad Insertion . . . . . . . . . . . . . . . . . . . . 38 112 7.6.3. Contribution and Ingest . . . . . . . . . . . . . . . 39 113 7.6.4. Synchronized Encoding and Packaging . . . . . . . . . 39 114 7.6.5. WebRTC-Based Streaming . . . . . . . . . . . . . . . 39 115 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 40 116 9. Security Considerations . . . . . . . . . . . . . . . . . . . 40 117 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 40 118 11. Informative References . . . . . . . . . . . . . . . . . . . 40 119 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 48 121 1. Introduction 123 As the internet has grown, an increasingly large share of the traffic 124 delivered to end users has become video. Estimates put the total 125 share of internet video traffic at 75% in 2019, expected to grow to 126 82% by 2022. This estimate projects the gross volume of video 127 traffic will more than double during this time, based on a compound 128 annual growth rate continuing at 34% (from Appendix D of [CVNI]). 130 A substantial part of this growth is due to increased use of 131 streaming video, although the amount of video traffic in real-time 132 communications (for example, online videoconferencing) has also grown 133 significantly. While both streaming video and videoconferencing have 134 real-time delivery and latency requirements, these requirements vary 135 from one application to another. For example, videoconferencing 136 demands an end-to-end (one-way) latency of a few hundreds of 137 milliseconds whereas live streaming can tolerate latencies of several 138 seconds. 140 This document specifically focuses on the streaming applications and 141 defines streaming as follows: 143 * Streaming is transmission of a continuous media from a server to a 144 client and its simultaneous consumption by the client. 146 * Here, continuous media refers to media and associated streams such 147 as video, audio, metadata, etc. In this definition, the critical 148 term is "simultaneous", as it is not considered streaming if one 149 downloads a video file and plays it after the download is 150 completed, which would be called download-and-play. 152 This has two implications. 154 * First, the server's transmission rate must (loosely or tightly) 155 match to client's consumption rate in order to provide 156 uninterrupted playback. That is, the client must not run out of 157 data (buffer underrun) or accept more data than it can buffer 158 before playback (buffer overrun) as any excess media is simply 159 discarded. 161 * Second, the client's consumption rate is limited not only by 162 bandwidth availability but also real-time constraints. That is, 163 the client cannot fetch media that is not available from a server 164 yet. 166 In many contexts, video traffic can be handled transparently as 167 generic application-level traffic. However, as the volume of video 168 traffic continues to grow, it's becoming increasingly important to 169 consider the effects of network design decisions on application-level 170 performance, with considerations for the impact on video delivery. 172 This document examines networking issues as they relate to quality of 173 experience in internet video delivery. The focus is on capturing 174 characteristics of video delivery that have surprised network 175 designers or transport experts without specific video expertise, 176 since these highlight key differences between common assumptions in 177 existing networking documents and observations of video delivery 178 issues in practice. 180 Making specific recommendations on operational practices aimed at 181 mitigating these issues is out of scope, though some existing 182 mitigations are mentioned in passing. The intent is to provide a 183 point of reference for future solution proposals to use in describing 184 how new technologies address or avoid these existing observed 185 problems. 187 1.1. Notes for Contributors and Reviewers 189 Note to RFC Editor: Please remove this section and its subsections 190 before publication. 192 This section is to provide references to make it easier to review the 193 development and discussion on the draft so far. 195 1.1.1. Venues for Contribution and Discussion 197 This document is in the Github repository at: 199 https://github.com/ietf-wg-mops/draft-ietf-mops-streaming-opcons 200 (https://github.com/ietf-wg-mops/draft-ietf-mops-streaming-opcons) 202 Readers are welcome to open issues and send pull requests for this 203 document. 205 Substantial discussion of this document should take place on the MOPS 206 working group mailing list (mops@ietf.org). 208 * Join: https://www.ietf.org/mailman/listinfo/mops 209 (https://www.ietf.org/mailman/listinfo/mops) 211 * Search: https://mailarchive.ietf.org/arch/browse/mops/ 212 (https://mailarchive.ietf.org/arch/browse/mops/) 214 1.1.2. History of Public Discussion 216 Presentations: 218 * IETF 105 BOF: 219 https://www.youtube.com/watch?v=4G3YBVmn9Eo&t=47m21s 220 (https://www.youtube.com/watch?v=4G3YBVmn9Eo&t=47m21s) 222 * IETF 106 meeting: 223 https://www.youtube.com/watch?v=4_k340xT2jM&t=7m23s 224 (https://www.youtube.com/watch?v=4_k340xT2jM&t=7m23s) 226 * MOPS Interim Meeting 2020-04-15: 227 https://www.youtube.com/watch?v=QExiajdC0IY&t=10m25s 228 (https://www.youtube.com/watch?v=QExiajdC0IY&t=10m25s) 230 * IETF 108 meeting: 231 https://www.youtube.com/watch?v=ZaRsk0y3O9k&t=2m48s 232 (https://www.youtube.com/watch?v=ZaRsk0y3O9k&t=2m48s) 234 * MOPS 2020-10-30 Interim meeting: 235 https://www.youtube.com/watch?v=vDZKspv4LXw&t=17m15s 236 (https://www.youtube.com/watch?v=vDZKspv4LXw&t=17m15s) 238 2. Bandwidth Provisioning 240 2.1. Scaling Requirements for Media Delivery 242 2.1.1. Video Bitrates 244 Video bitrate selection depends on many variables including the 245 resolution (height and width), frame rate, color depth, codec, 246 encoding parameters, scene complexity and amount of motion. 247 Generally speaking, as the resolution, frame rate, color depth, scene 248 complexity and amount of motion increase, the encoding bitrate 249 increases. As newer codecs with better compression tools are used, 250 the encoding bitrate decreases. Similarly, a multi-pass encoding 251 generally produces better quality output compared to single-pass 252 encoding at the same bitrate, or delivers the same quality at a lower 253 bitrate. 255 Here are a few common resolutions used for video content, with 256 typical ranges of bitrates for the two most popular video codecs 257 [Encodings]. 259 +============+================+============+============+ 260 | Name | Width x Height | H.264 | H.265 | 261 +============+================+============+============+ 262 | DVD | 720 x 480 | 1.0 Mbps | 0.5 Mbps | 263 +------------+----------------+------------+------------+ 264 | 720p (1K) | 1280 x 720 | 3-4.5 Mbps | 2-4 Mbps | 265 +------------+----------------+------------+------------+ 266 | 1080p (2K) | 1920 x 1080 | 6-8 Mbps | 4.5-7 Mbps | 267 +------------+----------------+------------+------------+ 268 | 2160p (4k) | 3840 x 2160 | N/A | 10-20 Mbps | 269 +------------+----------------+------------+------------+ 271 Table 1 273 2.1.2. Virtual Reality Bitrates 275 The bitrates given in Section 2.1.1 describe video streams that 276 provide the user with a single, fixed, point of view - so, the user 277 has no "degrees of freedom", and the user sees all of the video image 278 that is available. 280 Even basic virtual reality (360-degree) videos that allow users to 281 look around freely (referred to as "three degrees of freedom", or 282 3DoF) require substantially larger bitrates when they are captured 283 and encoded as such videos require multiple fields of view of the 284 scene. The typical multiplication factor is 8 to 10. Yet, due to 285 smart delivery methods such as viewport-based or tiled-based 286 streaming, we do not need to send the whole scene to the user. 287 Instead, the user needs only the portion corresponding to its 288 viewpoint at any given time. 290 In more immersive applications, where limited user movement ("three 291 degrees of freedom plus", or 3DoF+) or full user movement ("six 292 degrees of freedom", or 6DoF) is allowed, the required bitrate grows 293 even further. In this case, immersive content is typically referred 294 to as volumetric media. One way to represent the volumetric media is 295 to use point clouds, where streaming a single object may easily 296 require a bitrate of 30 Mbps or higher. Refer to [MPEGI] and [PCC] 297 for more details. 299 2.2. Path Bandwidth Constraints 301 Even when the bandwidth requirements for video streams along a path 302 are well understood, additional analysis is required to understand 303 the contraints on bandwidth at various points in the network. This 304 analysis is necessary because media servers may react to bandwith 305 constraints using two independent feedback loops: 307 * Media servers often respond to application-level feedback from the 308 media player that indicates a bottleneck link somewhere along the 309 path, by adjusting the amount of media that the media server will 310 send to the media player in a given timeframe. This is described 311 in greater detail in Section 4. 313 * Media servers also typically implement transport protocols with 314 capacity-seeking congestion controllers that probe for bandwidth, 315 and adjust the sending rate based on transport mechanisms. This 316 is described in greater detail in Section 5. 318 The result is that these two (potentially competing) "helpful" 319 mechanisms each respond to the same bottleneck with no coordination 320 between themselves, so that each is unaware of actions taken by the 321 other, and this can result in a quality of experience for users that 322 is significantly lower than what could have been achieved. 324 In one example, if a media server overestimates the available 325 handwidth to the media player, 327 * the transport protocol detects loss due to congestion, and reduces 328 its sending window size per round trip, 330 * the media server adapts to application-level feedback from the 331 media player, and reduces its own sending rate, 333 * the transport protocol sends media at the new, lower rate, and 334 confirms that this new, lower rate is "safe", because no 335 transport-level loss is occuring, but 337 * because the media server continues to send at the new, lower rate, 338 the transport protocol's maximum sending rate is now limited by 339 the amount of information the media server queues for 340 transmission, so 342 * the transport protocol can't probe for available path bandwidth by 343 sending at a higher rate. 345 In order to avoid these types of situations, which can potentially 346 affect all the users whose streaming media traverses a bottleneck 347 link, there are several possible mitigations that streaming operators 348 can use, but the first step toward mitigating a problem is knowing 349 when that problem occurs. 351 2.2.1. Know Your Network Traffic 353 There are many reasons why path characteristics might change 354 suddenly, for example, 356 * "cross traffic" that traverses part of the path, especially if 357 this traffic is "inelastic", and does not, itself, respond to 358 indications of path congestion. 360 * routing changes, which can happen in normal operation, especially 361 if the new path now includes path segments that are more heavily 362 loaded, offer lower total bandwidth, or simply cover more 363 distance. 365 Recognizing that a path carrying streaming media is "not behaving the 366 way it normally does" is fundamental. Analytics that aid in that 367 recognition can be more or less sophisticated, and can be as simple 368 as noticing that the apparent round trip times for media traffic 369 carried over TCP transport on some paths are suddenly and 370 significantly longer than usual. Passive monitors can detect changes 371 in the elapsed time between the acknowledgements for specific TCP 372 segments from a TCP receiver, since TCP octet sequence numbers and 373 acknowledgements for those sequence numbers are "carried in the 374 clear", even if the TCP payload itself is encrypted. See Section 5.2 375 for more information. 377 As transport protocols evolve to encrypt their transport header 378 fields, one side effect of increasing encryption is that the kind of 379 passive monitoring, or even "performance enhancement" ([RFC3135]) 380 that was possible with the older transport protocols (UDP, described 381 in Section 5.1 and TCP, described in Section 5.2) is no longer 382 possible with newer transport protocols such as QUIC (described in 383 Section 5.3). The IETF has specified a "latency spin bit" mechanism 384 in Section 17.4 of [RFC9000] to allow passive latency monitoring from 385 observation points on the network path throughout the duration of a 386 connection, but currently chartered work in the IETF is focusing on 387 end-point monitoring and reporting, rather than on passive 388 monitoring. 390 One example is the "qlog" mechanism [I-D.ietf-quic-qlog-main-schema], 391 a protocol-agnostic mechanism used to provide better visibility for 392 encrypted protocols such as QUIC ([I-D.ietf-quic-qlog-quic-events]) 393 and for HTTP/3 ([I-D.ietf-quic-qlog-h3-events]). 395 2.3. Path Requirements 397 The bitrate requirements in Section 2.1 are per end-user actively 398 consuming a media feed, so in the worst case, the bitrate demands can 399 be multiplied by the number of simultaneous users to find the 400 bandwidth requirements for a router on the delivery path with that 401 number of users downstream. For example, at a node with 10,000 402 downstream users simultaneously consuming video streams, 403 approximately 80 Gbps might be necessary in order for all of them to 404 get typical content at 1080p resolution. 406 However, when there is some overlap in the feeds being consumed by 407 end users, it is sometimes possible to reduce the bandwidth 408 provisioning requirements for the network by performing some kind of 409 replication within the network. This can be achieved via object 410 caching with delivery of replicated objects over individual 411 connections, and/or by packet-level replication using multicast. 413 To the extent that replication of popular content can be performed, 414 bandwidth requirements at peering or ingest points can be reduced to 415 as low as a per-feed requirement instead of a per-user requirement. 417 2.4. Caching Systems 419 When demand for content is relatively predictable, and especially 420 when that content is relatively static, caching content close to 421 requesters, and pre-loading caches to respond quickly to initial 422 requests is often useful (for example, HTTP/1.1 caching is described 423 in [RFC7234]). This is subject to the usual considerations for 424 caching - for example, how much data must be cached to make a 425 significant difference to the requester, and how the benefits of 426 caching and pre-loading caches balances against the costs of tracking 427 "stale" content in caches and refreshing that content. 429 It is worth noting that not all high-demand content is "live" 430 content. One popular example is when popular streaming content can 431 be staged close to a significant number of requesters, as can happen 432 when a new episode of a popular show is released. This content may 433 be largely stable, so low-cost to maintain in multiple places 434 throughout the Internet. This can reduce demands for high end-to-end 435 bandwidth without having to use mechanisms like multicast. 437 Caching and pre-loading can also reduce exposure to peering point 438 congestion, since less traffic crosses the peering point exchanges if 439 the caches are placed in peer networks, especially when the content 440 can be pre-loaded during off-peak hours, and especially if the 441 transfer can make use of "Lower-Effort Per-Hop Behavior (LE PHB) for 442 Differentiated Services" [RFC8622], "Low Extra Delay Background 443 Transport (LEDBAT)" [RFC6817], or similar mechanisms. 445 All of this depends, of course, on the ability of a content provider 446 to predict usage and provision bandwidth, caching, and other 447 mechanisms to meet the needs of users. In some cases (Section 2.5), 448 this is relatively routine, but in other cases, it is more difficult 449 (Section 2.6, Section 2.7). 451 And as with other parts of the ecosystem, new technology brings new 452 challenges. For example, with the emergence of ultra-low-latency 453 streaming, responses have to start streaming to the end user while 454 still being transmitted to the cache, and while the cache does not 455 yet know the size of the object. Some of the popular caching systems 456 were designed around cache footprint and had deeply ingrained 457 assumptions about knowing the size of objects that are being stored, 458 so the change in design requirements in long-established systems 459 caused some errors in production. Incidents occurred where a 460 transmission error in the connection from the upstream source to the 461 cache could result in the cache holding a truncated segment and 462 transmitting it to the end user's device. In this case, players 463 rendering the stream often had the video freeze until the player was 464 reset. In some cases the truncated object was even cached that way 465 and served later to other players as well, causing continued stalls 466 at the same spot in the video for all players playing the segment 467 delivered from that cache node. 469 2.5. Predictable Usage Profiles 471 Historical data shows that users consume more video and videos at 472 higher bitrates than they did in the past on their connected devices. 473 Improvements in the codecs that help with reducing the encoding 474 bitrates with better compression algorithms could not have offset the 475 increase in the demand for the higher quality video (higher 476 resolution, higher frame rate, better color gamut, better dynamic 477 range, etc.). In particular, mobile data usage has shown a large 478 jump over the years due to increased consumption of entertainment as 479 well as conversational video. 481 2.6. Unpredictable Usage Profiles 483 Although TCP/IP has been used with a number of widely used 484 applications that have symmetric bandwidth requirements (similar 485 bandwidth requirements in each direction between endpoints), many 486 widely-used Internet applications operate in client-server roles, 487 with asymmetric bandwidth requirements. A common example might be an 488 HTTP GET operation, where a client sends a relatively small HTTP GET 489 request for a resource to an HTTP server, and often receives a 490 significantly larger response carrying the requested resource. When 491 HTTP is commonly used to stream movie-length video, the ratio between 492 response size and request size can become arbitrarily large. 494 For this reason, operators may pay more attention to downstream 495 bandwidth utilization when planning and managing capacity. In 496 addition, operators have been able to deploy access networks for end 497 users using underlying technologies that are inherently asymmetric, 498 favoring downstream bandwidth (e.g. ADSL, cellular technologies, 499 most IEEE 802.11 variants), assuming that users will need less 500 upstream bandwidth than downstream bandwidth. This strategy usually 501 works, except when it faiis because application bandwidth usage 502 patterns have changed in ways that were not predicted. 504 One example of this type of change was when peer-to-peer file sharing 505 applications gained popularity in the early 2000s. To take one well- 506 documented case ([RFC5594]), the Bittorrent application created 507 "swarms" of hosts, uploading and downloading files to each other, 508 rather than communicating with a server. Bittorrent favored peers 509 who uploaded as much as they downloaded, so that new Bittorrent users 510 had an incentive to significantly increase their upstream bandwidth 511 utilization. 513 The combination of the large volume of "torrents" and the peer-to- 514 peer characteristic of swarm transfers meant that end user hosts were 515 suddenly uploading higher volumes of traffic to more destinations 516 than was the case before Bittorrent. This caused at least one large 517 ISP to attempt to "throttle" these transfers, to mitigate the load 518 that these hosts placed on their network. These efforts were met by 519 increased use of encryption in Bittorrent, similar to an arms race, 520 and set off discussions about "Net Neutrality" and calls for 521 regulatory action. 523 Especially as end users increase use of video-based social networking 524 applications, it will be helpful for access network providers to 525 watch for increasing numbers of end users uploading significant 526 amounts of content. 528 2.7. Extremely Unpredictable Usage Profiles 530 The causes of unpredictable usage described in Section 2.6 were more 531 or less the result of human choices, but we were reminded during a 532 post-IETF 107 meeting that humans are not always in control, and 533 forces of nature can cause enormous fluctuations in traffic patterns. 535 In his talk, Sanjay Mishra [Mishra] reported that after the CoViD-19 536 pandemic broke out in early 2020, 538 * Comcast's streaming and web video consumption rose by 38%, with 539 their reported peak traffic up 32% overall between March 1 to 540 March 30, 542 * AT&T reported a 28% jump in core network traffic (single day in 543 April, as compared to pre stay-at-home daily average traffic), 544 with video accounting for nearly half of all mobile network 545 traffic, while social networking and web browsing remained the 546 highest percentage (almost a quarter each) of overall mobility 547 traffic, and 549 * Verizon reported similar trends with video traffic up 36% over an 550 average day (pre COVID-19)}. 552 We note that other operators saw similar spikes during this time 553 period. Craig Labowitz [Labovitz] reported 555 * Weekday peak traffic increases over 45%-50% from pre-lockdown 556 levels, 558 * A 30% increase in upstream traffic over their pre-pandemic levels, 559 and 561 * A steady increase in the overall volume of DDoS traffic, with 562 amounts exceeding the pre-pandemic levels by 40%. (He attributed 563 this increase to the significant rise in gaming-related DDoS 564 attacks ([LabovitzDDoS]), as gaming usage also increased.) 566 Subsequently, the Internet Architecture Board (IAB) held a COVID-19 567 Network Impacts Workshop [IABcovid] in November 2020. Given a larger 568 number of reports and more time to reflect, the following 569 observations from the draft workshop report are worth considering. 571 * Participants describing different types of networks reported 572 different kinds of impacts, but all types of networks saw impacts. 574 * Mobile networks saw traffic reductions and residential networks 575 saw significant increases. 577 * Reported traffic increases from ISPs and IXPs over just a few 578 weeks were as big as the traffic growth over the course of a 579 typical year, representing a 15-20% surge in growth to land at a 580 new normal that was much higher than anticipated. 582 * At DE-CIX Frankfurt, the world's largest Internet Exchange Point 583 in terms of data throughput, the year 2020 has seen the largest 584 increase in peak traffic within a single year since the IXP was 585 founded in 1995. 587 * The usage pattern changed significantly as work-from-home and 588 videoconferencing usage peaked during normal work hours, which 589 would have typically been off-peak hours with adults at work and 590 children at school. One might expect that the peak would have had 591 more impact on networks if it had happened during typical evening 592 peak hours for video streaming applications. 594 * The increase in daytime bandwidth consumption reflected both 595 significant increases in "essential" applications such as 596 videoconferencing and VPNs, and entertainment applications as 597 people watched videos or played games. 599 * At the IXP-level, it was observed that port utilization increased. 600 This phenomenon is mostly explained by a higher traffic demand 601 from residential users. 603 3. Latency Considerations 605 Streaming media latency refers to the "glass-to-glass" time duration, 606 which is the delay between the real-life occurrence of an event and 607 the streamed media being appropriately displayed on an end user's 608 device. Note that this is different from the network latency 609 (defined as the time for a packet to cross a network from one end to 610 another end) because it includes video encoding/decoding and 611 buffering time, and for most cases also ingest to an intermediate 612 service such as a CDN or other video distribution service, rather 613 than a direct connection to an end user. 615 Streaming media can be usefully categorized according to the 616 application's latency requirements into a few rough categories: 618 * ultra low-latency (less than 1 second) 619 * low-latency live (less than 10 seconds) 621 * non-low-latency live (10 seconds to a few minutes) 623 * on-demand (hours or more) 625 3.1. Ultra Low-Latency 627 Ultra low-latency delivery of media is defined here as having a 628 glass-to-glass delay target under one second. 630 Some media content providers aim to achieve this level of latency for 631 live media events. This introduces new challenges relative to less- 632 restricted levels of latency requirements because this latency is the 633 same scale as commonly observed end-to-end network latency variation 634 (for example, due to effects such as bufferbloat ([CoDel]), Wi-Fi 635 error correction, or packet reordering). These effects can make it 636 difficult to achieve this level of latency for the general case, and 637 may require tradeoffs in relatively frequent user-visible media 638 artifacts. However, for controlled environments or targeted networks 639 that provide mitigations against such effects, this level of latency 640 is potentially achievable with the right provisioning. 642 Applications requiring ultra low latency for media delivery are 643 usually tightly constrained on the available choices for media 644 transport technologies, and sometimes may need to operate in 645 controlled environments to reliably achieve their latency and quality 646 goals. 648 Most applications operating over IP networks and requiring latency 649 this low use the Real-time Transport Protocol (RTP) [RFC3550] or 650 WebRTC [RFC8825], which uses RTP for the media transport as well as 651 several other protocols necessary for safe operation in browsers. 653 Worth noting is that many applications for ultra low-latency delivery 654 do not need to scale to more than a few users at a time, which 655 simplifies many delivery considerations relative to other use cases. 657 Recommended reading for applications adopting an RTP-based approach 658 also includes [RFC7656]. For increasing the robustness of the 659 playback by implementing adaptive playout methods, refer to [RFC4733] 660 and [RFC6843]. 662 Applications with further-specialized latency requirements are out of 663 scope for this document. 665 3.2. Low-Latency Live 667 Low-latency live delivery of media is defined here as having a glass- 668 to-glass delay target under 10 seconds. 670 This level of latency is targeted to have a user experience similar 671 to traditional broadcast TV delivery. A frequently cited problem 672 with failing to achieve this level of latency for live sporting 673 events is the user experience failure from having crowds within 674 earshot of one another who react audibly to an important play, or 675 from users who learn of an event in the match via some other channel, 676 for example social media, before it has happened on the screen 677 showing the sporting event. 679 Applications requiring low-latency live media delivery are generally 680 feasible at scale with some restrictions. This typically requires 681 the use of a premium service dedicated to the delivery of live video, 682 and some tradeoffs may be necessary relative to what's feasible in a 683 higher latency service. The tradeoffs may include higher costs, or 684 delivering a lower quality video, or reduced flexibility for adaptive 685 bitrates, or reduced flexibility for available resolutions so that 686 fewer devices can receive an encoding tuned for their display. Low- 687 latency live delivery is also more susceptible to user-visible 688 disruptions due to transient network conditions than higher latency 689 services. 691 Implementation of a low-latency live video service can be achieved 692 with the use of low-latency extensions of HLS (called LL-HLS) 693 [I-D.draft-pantos-hls-rfc8216bis] and DASH (called LL-DASH) 694 [LL-DASH]. These extensions use the Common Media Application Format 695 (CMAF) standard [MPEG-CMAF] that allows the media to be packaged into 696 and transmitted in units smaller than segments, which are called 697 chunks in CMAF language. This way, the latency can be decoupled from 698 the duration of the media segments. Without a CMAF-like packaging, 699 lower latencies can only be achieved by using very short segment 700 durations. However, shorter segments means more frequent intra-coded 701 frames and that is detrimental to video encoding quality. CMAF 702 allows us to still use longer segments (improving encoding quality) 703 without penalizing latency. 705 While an LL-HLS client retrieves each chunk with a separate HTTP GET 706 request, an LL-DASH client uses the chunked transfer encoding feature 707 of the HTTP [CMAF-CTE] which allows the LL-DASH client to fetch all 708 the chunks belonging to a segment with a single GET request. An HTTP 709 server can transmit the CMAF chunks to the LL-DASH client as they 710 arrive from the encoder/packager. A detailed comparison of LL-HLS 711 and LL-DASH is given in [MMSP20]. 713 3.3. Non-Low-Latency Live 715 Non-low-latency live delivery of media is defined here as a live 716 stream that does not have a latency target shorter than 10 seconds. 718 This level of latency is the historically common case for segmented 719 video delivery using HLS [RFC8216] and DASH [MPEG-DASH]. This level 720 of latency is often considered adequate for content like news or pre- 721 recorded content. This level of latency is also sometimes achieved 722 as a fallback state when some part of the delivery system or the 723 client-side players do not have the necessary support for the 724 features necessary to support low-latency live streaming. 726 This level of latency can typically be achieved at scale with 727 commodity CDN services for HTTP(s) delivery, and in some cases the 728 increased time window can allow for production of a wider range of 729 encoding options relative to the requirements for a lower latency 730 service without the need for increasing the hardware footprint, which 731 can allow for wider device interoperability. 733 3.4. On-Demand 735 On-Demand media streaming refers to playback of pre-recorded media 736 based on a user's action. In some cases on-demand media is produced 737 as a by-product of a live media production, using the same segments 738 as the live event, but freezing the manifest after the live event has 739 finished. In other cases, on-demand media is constructed out of pre- 740 recorded assets with no streaming necessarily involved during the 741 production of the on-demand content. 743 On-demand media generally is not subject to latency concerns, but 744 other timing-related considerations can still be as important or even 745 more important to the user experience than the same considerations 746 with live events. These considerations include the startup time, the 747 stability of the media stream's playback quality, and avoidance of 748 stalls and video artifacts during the playback under all but the most 749 severe network conditions. 751 In some applications, optimizations are available to on-demand video 752 that are not always available to live events, such as pre-loading the 753 first segment for a startup time that doesn't have to wait for a 754 network download to begin. 756 4. Adaptive Encoding, Adaptive Delivery, and Measurement Collection 757 4.1. Overview 759 A simple model of video playback can be described as a video stream 760 consumer, a buffer, and a transport mechanism that fills the buffer. 761 The consumption rate is fairly static and is represented by the 762 content bitrate. The size of the buffer is also commonly a fixed 763 size. The fill process needs to be at least fast enough to ensure 764 that the buffer is never empty, however it also can have significant 765 complexity when things like personalization or ad workflows are 766 introduced. 768 The challenges in filling the buffer in a timely way fall into two 769 broad categories: 1. content selection and 2. content variation. 770 Content selection comprises all of the steps needed to determine 771 which content variation to offer the client. Content variation is 772 the number of content options that exist at any given selection 773 point. A common example, easily visualized, is Adaptive BitRate 774 (ABR), described in more detail below. The mechanism used to select 775 the bitrate is part of the content selection, and the content 776 variation are all of the different bitrate renditions. 778 Adaptive BitRate (ABR) is a sort of application-level response 779 strategy in which the streaming client attempts to detect the 780 available bandwidth of the network path by observing the successful 781 application-layer download speed, then chooses a bitrate for each of 782 the video, audio, subtitles and metadata (among the limited number of 783 available options) that fits within that bandwidth, typically 784 adjusting as changes in available bandwidth occur in the network or 785 changes in capabilities occur during the playback (such as available 786 memory, CPU, display size, etc.). 788 4.2. Adaptive Encoding 790 Media servers can provide media streams at various bitrates because 791 the media has been encoded at various bitrates. This is a so-called 792 "ladder" of bitrates, that can be offered to media players as part of 793 the manifest that describes the media being requested by the media 794 player, so that the media player can select among the available 795 bitrate choices. 797 The media server may also choose to alter which bitrates are made 798 available to players by adding or removing bitrate options from the 799 ladder delivered to the player in subsequent manifests built and sent 800 to the player. This way, both the player, through its selection of 801 bitrate to request from the manifest, and the server, through its 802 construction of the bitrates offered in the manifest, are able to 803 affect network utilization. 805 4.3. Adaptive Segmented Delivery 807 ABR playback is commonly implemented by streaming clients using HLS 808 [RFC8216] or DASH [MPEG-DASH] to perform a reliable segmented 809 delivery of media over HTTP. Different implementations use different 810 strategies [ABRSurvey], often relying on proprietary algorithms 811 (called rate adaptation or bitrate selection algorithms) to perform 812 available bandwidth estimation/prediction and the bitrate selection. 814 Many server-player systems will do an initial probe or a very simple 815 throughput speed test at the start of a video playback. This is done 816 to get a rough sense of the highest video bitrate in the ABR ladder 817 that the network between the server and player will likely be able to 818 provide under initial network conditions. After the initial testing, 819 clients tend to rely upon passive network observations and will make 820 use of player side statistics such as buffer fill rates to monitor 821 and respond to changing network conditions. 823 The choice of bitrate occurs within the context of optimizing for 824 some metric monitored by the client, such as highest achievable video 825 quality or lowest chances for a rebuffering event (playback stall). 827 4.4. Advertising 829 A variety of business models exist for producers of streaming media. 830 Some content providers derive the majority of the revenue associated 831 with streaming media directly from consumer subscriptions or one-time 832 purchases. Others derive the majority of their streaming media 833 associated revenue from advertising. Many content providers derive 834 income from a mix of these and other sources of funding. The 835 inclusion of advertising alongside or interspersed with streaming 836 media content is therefore common in today's media landscape. 838 Some commonly used forms of advertising can introduce potential user 839 experience issues for a media stream. This section provides a very 840 brief overview of a complex and evolving space, but a complete 841 coverage of the potential issues is out of scope for this document. 843 The same techniques used to allow a media player to switch between 844 renditions of different bitrates at segment or chunk boundaries can 845 also be used to enable the dynamic insertion of advertisements. 847 Ads may be inserted either with Client Side Ad Insertion (CSAI) or 848 Server Side Ad Insertion (SSAI). In CSAI, the ABR manifest will 849 generally include links to an external ad server for some segments of 850 the media stream, while in SSAI the server will remain the same 851 during advertisements, but will include media segments that contain 852 the advertising. In SSAI, the media segments may or may not be 853 sourced from an external ad server like with CSAI. 855 In general, the more targeted the ad request is, the more requests 856 the ad service needs to be able to handle concurrently. If 857 connectivity is poor to the ad service, this can cause rebuffering 858 even if the underlying video assets (both content and ads) are able 859 to be accessed quickly. The less targeted, the more likely the ad 860 requests can be consolidated and can leverage the same caching 861 techniques as the video content. 863 In some cases, especially with SSAI, advertising space in a stream is 864 reserved for a specific advertiser and can be integrated with the 865 video so that the segments share the same encoding properties such as 866 bitrate, dynamic range, and resolution. However, in many cases ad 867 servers integrate with a Supply Side Platform (SSP) that offers 868 advertising space in real-time auctions via an Ad Exchange, with bids 869 for the advertising space coming from Demand Side Platforms (DSPs) 870 that collect money from advertisers for delivering the 871 advertisements. Most such Ad Exchanges use application-level 872 protocol specifications published by the Interactive Advertising 873 Bureau [IAB-ADS], an industry trade organization. 875 This ecosystem balances several competing objectives, and integrating 876 with it naively can produce surprising user experience results. For 877 example, ad server provisioning and/or the bitrate of the ad segments 878 might be different from that of the main video, either of which can 879 sometimes result in video stalls. For another example, since the 880 inserted ads are often produced independently they might have a 881 different base volume level than the main video, which can make for a 882 jarring user experience. 884 Additionally, this market historically has had incidents of ad fraud 885 (misreporting of ad delivery to end users for financial gain). As a 886 mitigation for concerns driven by those incidents, some SSPs have 887 required the use of players with features like reporting of ad 888 delivery, or providing information that can be used for user 889 tracking. Some of these and other measures have raised privacy 890 concerns for end users. 892 In general this is a rapidly developing space with many 893 considerations, and media streaming operators engaged in advertising 894 may need to research these and other concerns to find solutions that 895 meet their user experience, user privacy, and financial goals. For 896 further reading on mitigations, [BAP] has published some standards 897 and best practices based on user experience research. 899 4.5. Bitrate Detection Challenges 901 This kind of bandwidth-measurement system can experience trouble in 902 several ways that are affected by networking issues. Because 903 adaptive application-level response strategies are often using rates 904 as observed by the application layer, there are sometimes inscrutable 905 transport-level protocol behaviors that can produce surprising 906 measurement values when the application-level feedback loop is 907 interacting with a transport-level feedback loop. 909 A few specific examples of surprising phenomena that affect bitrate 910 detection measurements are described in the following subsections. 911 As these examples will demonstrate, it's common to encounter cases 912 that can deliver application level measurements that are too low, too 913 high, and (possibly) correct but varying more quickly than a lab- 914 tested selection algorithm might expect. 916 These effects and others that cause transport behavior to diverge 917 from lab modeling can sometimes have a significant impact on ABR 918 bitrate selection and on user quality of experience, especially where 919 players use naive measurement strategies and selection algorithms 920 that don't account for the likelihood of bandwidth measurements that 921 diverge from the true path capacity. 923 4.5.1. Idle Time between Segments 925 When the bitrate selection is chosen substantially below the 926 available capacity of the network path, the response to a segment 927 request will typically complete in much less absolute time than the 928 duration of the requested segment, leaving significant idle time 929 between segment downloads. This can have a few surprising 930 consequences: 932 * TCP slow-start when restarting after idle requires multiple RTTs 933 to re-establish a throughput at the network's available capacity. 934 When the active transmission time for segments is substantially 935 shorter than the time between segments, leaving an idle gap 936 between segments that triggers a restart of TCP slow-start, the 937 estimate of the successful download speed coming from the 938 application-visible receive rate on the socket can thus end up 939 much lower than the actual available network capacity. This in 940 turn can prevent a shift to the most appropriate bitrate. 941 [RFC7661] provides some mitigations for this effect at the TCP 942 transport layer, for senders who anticipate a high incidence of 943 this problem. 945 * Mobile flow-bandwidth spectrum and timing mapping can be impacted 946 by idle time in some networks. The carrier capacity assigned to a 947 link can vary with activity. Depending on the idle time 948 characteristics, this can result in a lower available bitrate than 949 would be achievable with a steadier transmission in the same 950 network. 952 Some receiver-side ABR algorithms such as [ELASTIC] are designed to 953 try to avoid this effect. 955 Another way to mitigate this effect is by the help of two 956 simultaneous TCP connections, as explained in [MMSys11] for Microsoft 957 Smooth Streaming. In some cases, the system-level TCP slow-start 958 restart can also be disabled, for example as described in 959 [OReilly-HPBN]. 961 4.5.2. Head-of-Line Blocking 963 In the event of a lost packet on a TCP connection with SACK support 964 (a common case for segmented delivery in practice), loss of a packet 965 can provide a confusing bandwidth signal to the receiving 966 application. Because of the sliding window in TCP, many packets may 967 be accepted by the receiver without being available to the 968 application until the missing packet arrives. Upon arrival of the 969 one missing packet after retransmit, the receiver will suddenly get 970 access to a lot of data at the same time. 972 To a receiver measuring bytes received per unit time at the 973 application layer, and interpreting it as an estimate of the 974 available network bandwidth, this appears as a high jitter in the 975 goodput measurement. This can appear as a stall of some time, 976 followed by a sudden leap that can far exceed the actual capacity of 977 the transport path from the server when the hole in the received data 978 is filled by a later retransmission. 980 It's worth noting that more modern transport protocols such as QUIC 981 have mitigation of head-of-line blocking as a protocol design goal. 982 See Section 5.3 for more details. 984 4.5.3. Wide and Rapid Variation in Path Capacity 986 As many end devices have moved to wireless connectivity for the final 987 hop (Wi-Fi, 5G, or LTE), new problems in bandwidth detction have 988 emerged from radio interference and signal strength effects. 990 Each of these technologies can experience sudden changes in capacity 991 as the end user device moves from place to place and encounters new 992 sources of interference. Microwave ovens, for example, can cause a 993 throughput degradation of more than a factor of 2 while active 994 [Micro]. 5G and LTE likewise can easily see rate variation by a 995 factor of 2 or more over a span of seconds as users move around. 997 These swings in actual transport capacity can result in user 998 experience issues that can be exacerbated by insufficiently 999 responsive ABR algorithms. 1001 4.6. Measurement Collection 1003 In addition to measurements media players use to guide their segment- 1004 by-segment adaptive streaming requests, streaming media providers may 1005 also rely on measurements collected from media players to provide 1006 analytics that can be used for decisions such as whether the adaptive 1007 encoding bitrates in use are the best ones to provide to media 1008 players, or whether current media content caching is providing the 1009 best experience for viewers. 1011 In addition to measurements media players use to guide their segment- 1012 by-segment adaptive streaming requests, streaming media providers may 1013 also rely on measurements collected from media players to provide 1014 analytics that can be used for decisions such as whether the adaptive 1015 encoding bitrates in use are the best ones to provide to media 1016 players, or whether current media content caching is providing the 1017 best experience for viewers. To that effect, the Consumer Technology 1018 Association (CTA) who owns the Web Application Video Ecosystem (WAVE) 1019 project has published two important specifications. 1021 4.6.1. CTA-2066: Streaming Quality of Experience Events, Properties and 1022 Metrics 1024 [CTA-2066] specifies a set of media player events, properties, 1025 quality of experience (QoE) metrics and associated terminology for 1026 representing streaming media quality of experience across systems, 1027 media players and analytics vendors. While all these events, 1028 properties, metrics and associated terminology is used across a 1029 number of proprietary analytics and measurement solutions, they were 1030 used in slightly (or vastly) different ways that led to 1031 interoperability issues. CTA-2066 attempts to address this issue by 1032 defining a common terminology as well as how each metric should be 1033 computed for consistent reporting. 1035 4.6.2. CTA-5004: Common Media Client Data (CMCD) 1037 Many assumes that the CDNs have a holistic view into the health and 1038 performance of the streaming clients. However, this is not the case. 1039 The CDNs produce millions of log lines per second across hundreds of 1040 thousands of clients and they have no concept of a "session" as a 1041 client would have, so CDNs are decoupled from the metrics the clients 1042 generate and report. A CDN cannot tell which request belongs to 1043 which playback session, the duration of any media object, the 1044 bitrate, or whether any of the clients have stalled and are 1045 rebuffering or are about to stall and will rebuffer. The consequence 1046 of this decoupling is that a CDN cannot prioritize delivery for when 1047 the client needs it most, prefetch content, or trigger alerts when 1048 the network itself may be underperforming. One approach to couple 1049 the CDN to the playback sessions is for the clients to communicate 1050 standardized media-relevant information to the CDNs while they are 1051 fetching data. [CTA-5004] was developed exactly for this purpose. 1053 4.7. Unreliable Transport 1055 In contrast to segmented delivery, several applications use 1056 unreliable UDP or SCTP with its "partial reliability" extension 1057 [RFC3758] to deliver Media encapsulated in RTP [RFC3550] or raw MPEG 1058 Transport Stream ("MPEG-TS")-formatted video [MPEG-TS], when the 1059 media is being delivered in situations such as broadcast and live 1060 streaming, that better tolerate occasional packet loss without 1061 retransmission. 1063 Under congestion and loss, this approach generally experiences more 1064 video artifacts with fewer delay or head-of-line blocking effects. 1065 Often one of the key goals is to reduce latency, to better support 1066 applications like videoconferencing, or for other live-action video 1067 with interactive components, such as some sporting events. 1069 The Secure Reliable Transport protocol [SRT] also uses UDP in an 1070 effort to achieve lower latency for streaming media, although it adds 1071 reliability at the application layer. 1073 Congestion avoidance strategies for deployments using unreliable 1074 transport protocols vary widely in practice, ranging from being 1075 entirely unresponsive to congestion, to using feedback signaling to 1076 change encoder settings (as in [RFC5762]), to using fewer enhancement 1077 layers (as in [RFC6190]), to using proprietary methods to detect 1078 "quality of experience" issues and turn off video in order to allow 1079 less bandwidth-intensive media such as audio to be delivered. 1081 More details about congestion avoidance strategies used with 1082 unreliable transport protocols are included in Section 5.1. 1084 5. Evolution of Transport Protocols and Transport Protocol Behaviors 1086 Because networking resources are shared between users, a good place 1087 to start our discussion is how contention between users, and 1088 mechanisms to resolve that contention in ways that are "fair" between 1089 users, impact streaming media users. These topics are closely tied 1090 to transport protocol behaviors. 1092 As noted in Section 4, Adaptive Bitrate response strategies such as 1093 HLS [RFC8216] or DASH [MPEG-DASH] are attempting to respond to 1094 changing path characteristics, and underlying transport protocols are 1095 also attempting to respond to changing path characteristics. 1097 For most of the history of the Internet, these transport protocols, 1098 described in Section 5.1 and Section 5.2, have had relatively 1099 consistent behaviors that have changed slowly, if at all, over time. 1100 Newly standardized transport protocols like QUIC [RFC9000] can behave 1101 differently from existing transport protocols, and these behaviors 1102 may evolve over time more rapidly than currently-used transport 1103 protocols. 1105 For this reason, we have included a description of how the path 1106 characteristics that streaming media providers may see are likely to 1107 evolve over time. 1109 5.1. UDP and Its Behavior 1111 For most of the history of the Internet, we have trusted UDP-based 1112 applications to limit their impact on other users. One of the 1113 strategies used was to use UDP for simple query-response application 1114 protocols, such as DNS, which is often used to send a single-packet 1115 request to look up the IP address for a DNS name, and return a 1116 single-packet response containing the IP address. Although it is 1117 possible to saturate a path between a DNS client and DNS server with 1118 DNS requests, in practice, that was rare enough that DNS included few 1119 mechanisms to resolve contention between DNS users and other users 1120 (whether they are also using DNS, or using other application 1121 protocols). 1123 In recent times, the usage of UDP-based applications that were not 1124 simple query-response protocols has grown substantially, and since 1125 UDP does not provide any feedback mechanism to senders to help limit 1126 impacts on other users, application-level protocols such as RTP 1127 [RFC3550] have been responsible for the decisions that TCP-based 1128 applications have delegated to TCP - what to send, how much to send, 1129 and when to send it. So, the way some UDP-based applications 1130 interact with other users has changed. 1132 It's also worth pointing out that because UDP has no transport-layer 1133 feedback mechanisms, UDP-based applications that send and receive 1134 substantial amounts of information are expected to provide their own 1135 feedback mechanisms. This expectation is most recently codified in 1136 Best Current Practice [RFC8085]. 1138 RTP relies on RTCP Sender and Receiver Reports [RFC3550] as its own 1139 feedback mechanism, and even includes Circuit Breakers for Unicast 1140 RTP Sessions [RFC8083] for situations when normal RTP congestion 1141 control has not been able to react sufficiently to RTP flows sending 1142 at rates that result in sustained packet loss. 1144 The notion of "Circuit Breakers" has also been applied to other UDP 1145 applications in [RFC8084], such as tunneling packets over UDP that 1146 are potentially not congestion-controlled (for example, 1147 "Encapsulating MPLS in UDP", as described in [RFC7510]). If 1148 streaming media is carried in tunnels encapsulated in UDP, these 1149 media streams may encounter "tripped circuit breakers", with 1150 resulting user-visible impacts. 1152 5.2. TCP and Its Behavior 1154 For most of the history of the Internet, we have trusted the TCP 1155 protocol to limit the impact of applications that sent a significant 1156 number of packets, in either or both directions, on other users. 1157 Although early versions of TCP were not particularly good at limiting 1158 this impact [RFC0793], the addition of Slow Start and Congestion 1159 Avoidance, as described in [RFC2001], were critical in allowing TCP- 1160 based applications to "use as much bandwidth as possible, but to 1161 avoid using more bandwidth than was possible". Although dozens of 1162 RFCs have been written refining TCP decisions about what to send, how 1163 much to send, and when to send it, since 1988 [Jacobson-Karels] the 1164 signals available for TCP senders remained unchanged - end-to-end 1165 acknowledgements for packets that were successfully sent and 1166 received, and packet timeouts for packets that were not. 1168 The success of the largely TCP-based Internet is evidence that the 1169 mechanisms TCP used to achieve equilibrium quickly, at a point where 1170 TCP senders do not interfere with other TCP senders for sustained 1171 periods of time, have been largely successful. The Internet 1172 continued to work even when the specific mechanisms used to reach 1173 equilibrium changed over time. Because TCP provides a common tool to 1174 avoid contention, as some TCP-based applications like FTP were 1175 largely replaced by other TCP-based applications like HTTP, the 1176 transport behavior remained consistent. 1178 In recent times, the TCP goal of probing for available bandwidth, and 1179 "backing off" when a network path is saturated, has been supplanted 1180 by the goal of avoiding growing queues along network paths, which 1181 prevent TCP senders from reacting quickly when a network path is 1182 saturated. Congestion control mechanisms such as COPA [COPA18] and 1183 BBR [I-D.cardwell-iccrg-bbr-congestion-control] make these decisions 1184 based on measured path delays, assuming that if the measured path 1185 delay is increasing, the sender is injecting packets onto the network 1186 path faster than the receiver can accept them, so the sender should 1187 adjust its sending rate accordingly. 1189 Although TCP protocol behavior has changed over time, the common 1190 practice of implementing TCP as part of an operating system kernel 1191 has acted to limit how quickly TCP behavior can change. Even with 1192 the widespread use of automated operating system update installation 1193 on many end-user systems, streaming media providers could have a 1194 reasonable expectation that they could understand TCP transport 1195 protocol behaviors, and that those behaviors would remain relatively 1196 stable in the short term. 1198 5.3. The QUIC Protocol and Its Behavior 1200 The QUIC protocol, developed from a proprietary protocol into an IETF 1201 standards-track protocol [RFC9000], turns many of the statements made 1202 in Section 5.1 and Section 5.2 on their heads. 1204 Although QUIC provides an alternative to the TCP and UDP transport 1205 protocols, QUIC is itself encapsulated in UDP. As noted elsewhere in 1206 Section 6.1, the QUIC protocol encrypts almost all of its transport 1207 parameters, and all of its payload, so any intermediaries that 1208 network operators may be using to troubleshoot HTTP streaming media 1209 performance issues, perform analytics, or even intercept exchanges in 1210 current applications will not work for QUIC-based applications 1211 without making changes to their networks. Section 6 describes the 1212 implications of media encryption in more detail. 1214 While QUIC is designed as a general-purpose transport protocol, and 1215 can carry different application-layer protocols, the current 1216 standardized mapping is for HTTP/3 [I-D.ietf-quic-http], which 1217 describes how QUIC transport features are used for HTTP. The 1218 convention is for HTTP/3 to run over UDP port 443 [Port443] but this 1219 is not a strict requirement. 1221 When HTTP/3 is encapsulated in QUIC, which is then encapsulated in 1222 UDP, streaming operators (and network operators) might see UDP 1223 traffic patterns that are similar to HTTP(S) over TCP. Since earlier 1224 versions of HTTP(S) rely on TCP, UDP ports may be blocked for any 1225 port numbers that are not commonly used, such as UDP 53 for DNS. 1226 Even when UDP ports are not blocked and HTTP/3 can flow, streaming 1227 operators (and network operators) may severely rate-limit this 1228 traffic because they do not expect to see legitimate high-bandwidth 1229 traffic such as streaming media over the UDP ports that HTTP/3 is 1230 using. 1232 As noted in Section 4.5.2, because TCP provides a reliable, in-order 1233 delivery service for applications, any packet loss for a TCP 1234 connection causes "head-of-line blocking", so that no TCP segments 1235 arriving after a packet is lost will be delivered to the receiving 1236 application until the lost packet is retransmitted, allowing in-order 1237 delivery to the application to continue. As described in [RFC9000], 1238 QUIC connections can carry multiple streams, and when packet losses 1239 do occur, only the streams carried in the lost packet are delayed. 1241 A QUIC extension currently being specified ([I-D.ietf-quic-datagram]) 1242 adds the capability for "unreliable" delivery, similar to the service 1243 provided by UDP, but these datagrams are still subject to the QUIC 1244 connection's congestion controller, providing some transport-level 1245 congestion avoidance measures, which UDP does not. 1247 As noted in Section 5.2, there is increasing interest in transport 1248 protocol behaviors that responds to delay measurements, instead of 1249 responding to packet loss. These behaviors may deliver improved user 1250 experience, but in some cases have not responded to sustained packet 1251 loss, which exhausts available buffers along the end-to-end path that 1252 may affect other users sharing that path. The QUIC protocol provides 1253 a set of congestion control hooks that can be use for algorithm 1254 agility, and [RFC9002] defines a basic algorithm with transport 1255 behavior that is roughly similar to TCP NewReno [RFC6582]. However, 1256 QUIC senders can and do unilaterally chose to use different 1257 algorithms such as loss-based CUBIC [RFC8312], delay-based COPA or 1258 BBR, or even something completely different 1259 We do have experience with deploying new congestion controllers 1260 without melting the Internet (CUBIC is one example), but the point 1261 mentioned in Section 5.2 about TCP being implemented in operating 1262 system kernels is also different with QUIC. Although QUIC can be 1263 implemented in operating system kernels, one of the design goals when 1264 this work was chartered was "QUIC is expected to support rapid, 1265 distributed development and testing of features", and to meet this 1266 expectation, many implementers have chosen to implement QUIC in user 1267 space, outside the operating system kernel, and to even distribute 1268 QUIC libraries with their own applications. 1270 The decision to deploy a new version of QUIC is relatively 1271 uncontrolled, compared to other widely used transport protocols, and 1272 this can include new transport behaviors that appear without much 1273 notice except to the QUIC endpoints. At IETF 105, Christian Huitema 1274 and Brian Trammell presented a talk on "Congestion Defense in Depth" 1275 [CDiD], that explored potential concerns about new QUIC congestion 1276 controllers being broadly deployed without the testing and 1277 instrumentation that current major content providers routinely 1278 include. The sense of the room at IETF 105 was that the current 1279 major content providers understood what is at stake when they deploy 1280 new congestion controllers, but this presentation, and the related 1281 discussion in TSVAREA minutes from IETF 105 ([tsvarea-105], are still 1282 worth a look for new and rapidly growing content providers. 1284 It is worth considering that if TCP-based HTTP traffic and UDP-based 1285 HTTP/3 traffic are allowed to enter operator networks on roughly 1286 equal terms, questions of fairness and contention will be heavily 1287 dependent on interactions between the congestion controllers in use 1288 for TCP-base HTTP traffic and UDP-based HTTP/3 traffic. 1290 More broadly, [I-D.ietf-quic-manageability] discusses manageability 1291 of the QUIC transport protocol, focusing on the implications of 1292 QUIC's design and wire image on network operations involving QUIC 1293 traffic. It discusses what network operators can consider in some 1294 detail. 1296 6. Streaming Encrypted Media 1298 "Encrypted Media" has at least three meanings: 1300 * Media encrypted at the application layer, typically using some 1301 sort of Digital Rights Management (DRM) system, and typically 1302 remaining encrypted "at rest", when senders and receivers store 1303 it, 1305 * Media encrypted by the sender at the transport layer, and 1306 remaining encrypted until it reaches the ultimate media consumer 1307 (in this document, referred to as "end-to-end media encryption"), 1308 and 1310 * Media encrypted by the sender at the transport layer, and 1311 remaining encrypted until it reaches some intermediary that is 1312 _not_ the ultimate media consumer, but has credentials allowing 1313 decryption of the media content. This intermediary may examine 1314 and even transform the media content in some way, before 1315 forwarding re-encrypted media content (in this document referred 1316 to as "hop-by-hop media encryption") 1318 Both "hop-by-hop" and "end-to-end" encrypted transport may carry 1319 media that is, in addition, encrypted at the application layer. 1321 Each of these encryption strategies is intended to achieve a 1322 different goal. For instance, application-level encryption may be 1323 used for business purposes, such as avoiding piracy or enforcing 1324 geographic restrictions on playback, while transport-layer encryption 1325 may be used to prevent media steam manipulation or to protect 1326 manifests. 1328 This document does not take a position on whether those goals are 1329 "valid" (whatever that might mean). 1331 In this document, we will focus on media encrypted at the transport 1332 layer, whether encrypted "hop-by-hop" or "end-to-end". Because media 1333 encrypted at the application layer will only be processed by 1334 application-level entities, this encryption does not have transport- 1335 layer implications. 1337 Both "End-to-End" and "Hop-by-Hop" media encryption have specific 1338 implications for streaming operators. These are described in 1339 Section 6.2 and Section 6.3. 1341 6.1. General Considerations for Media Encryption 1343 The use of strong encryption does provide confidentiality for 1344 encrypted streaming media, from the sender to either an intermediary 1345 or the ultimate media consumer, and this does prevent Deep Packet 1346 Inspection by any intermediary that does not possess credentials 1347 allowing decryption. However, even encrypted content streams may be 1348 vulnerable to traffic analysis. An intermediary that can identify an 1349 encrypted media stream without decrypting it, may be able to 1350 "fingerprint" the encrypted media stream of known content, and then 1351 match the targeted media stream against the fingerprints of known 1352 content. This protection can be lessened if a media provider is 1353 repeatedly encrypting the same content. [CODASPY17] is an example of 1354 what is possible when identifying HTTPS-protected videos over TCP 1355 transport, based either on the length of entire resources being 1356 transferred, or on characteristic packet patterns at the beginning of 1357 a resource being transferred. 1359 If traffic analysis is successful at identifying encrypted content 1360 and associating it with specific users, this breaks privacy as 1361 certainly as examining decrypted traffic. 1363 Because HTTPS has historically layered HTTP on top of TLS, which is 1364 in turn layered on top of TCP, intermediaries do have access to 1365 unencrypted TCP-level transport information, such as retransmissions, 1366 and some carriers exploited this information in attempts to improve 1367 transport-layer performance [RFC3135]. The most recent standardized 1368 version of HTTPS, HTTP/3 [I-D.ietf-quic-http], uses the QUIC protocol 1369 [RFC9000] as its transport layer. QUIC relies on the TLS 1.3 initial 1370 handshake [RFC8446] only for key exchange [RFC9001], and encrypts 1371 almost all transport parameters itself, with the exception of a few 1372 invariant header fields. In the QUIC short header, the only 1373 transport-level parameter which is sent "in the clear" is the 1374 Destination Connection ID [RFC8999], and even in the QUIC long 1375 header, the only transport-level parameters sent "in the clear" are 1376 the Version, Destination Connection ID, and Source Connection ID. 1377 For these reasons, HTTP/3 is significantly more "opaque" than HTTPS 1378 with HTTP/1 or HTTP/2. 1380 6.2. Considerations for "Hop-by-Hop" Media Encryption 1382 Although the IETF has put considerable emphasis on end-to-end 1383 streaming media encryption, there are still important use cases that 1384 require the insertion of intermediaries. 1386 There are a variety of ways to involve intermediaries, and some are 1387 much more intrusive than others. 1389 From a content provider's perspective, a number of considerations are 1390 in play. The first question is likely whether the content provider 1391 intends that intermediaries are explicitly addressed from endpoints, 1392 or whether the content provider is willing to allow intermediaries to 1393 "intercept" streaming content transparently, with no awareness or 1394 permission from either endpoint. 1396 If a content provider does not actively work to avoid interception by 1397 intermediaries, the effect will be indistinguishable from 1398 "impersonation attacks", and endpoints cannot be assumed of any level 1399 of privacy. 1401 Assuming that a content provider does intend to allow intermediaries 1402 to participate in content streaming, and does intend to provide some 1403 level of privacy for endpoints, there are a number of possible tools, 1404 either already available or still being specified. These include 1406 * Server And Network assisted DASH [MPEG-DASH-SAND] - this 1407 specification introduces explicit messaging between DASH clients 1408 and network elements or between various network elements for the 1409 purpose of improving the efficiency of streaming sessions by 1410 providing information about real-time operational characteristics 1411 of networks, servers, proxies, caches, CDNs, as well as DASH 1412 client's performance and status. 1414 * "Double Encryption Procedures for the Secure Real-Time Transport 1415 Protocol (SRTP)" [RFC8723] - this specification provides a 1416 cryptographic transform for the Secure Real-time Transport 1417 Protocol that provides both hop-by-hop and end-to-end security 1418 guarantees. 1420 * Secure Media Frames [SFRAME] - [RFC8723] is closely tied to SRTP, 1421 and this close association impeded widespread deployment, because 1422 it could not be used for the most common media content delivery 1423 mechanisms. A more recent proposal, Secure Media Frames [SFRAME], 1424 also provides both hop-by-hop and end-to-end security guarantees, 1425 but can be used with other transport protocols beyond SRTP. 1427 If a content provider chooses not to involve intermediaries, this 1428 choice should be carefully considered. As an example, if media 1429 manifests are encrypted end-to-end, network providers who had been 1430 able to lower offered quality and reduce on their networks will no 1431 longer be able to do that. Some resources that might inform this 1432 consideration are in [RFC8825] (for WebRTC) and 1433 [I-D.ietf-quic-manageability] (for HTTP/3 and QUIC). 1435 6.3. Considerations for "End-to-End" Media Encryption 1437 "End-to-end" media encryption offers the potential of providing 1438 privacy for streaming media consumers, with the idea being that if an 1439 unauthorized intermediary can't decrypt streaming media, the 1440 intermediary can't use Deep Packet Inspection (DPI) to examine HTTP 1441 request and response headers and identify the media content being 1442 streamed. 1444 "End-to-end" media encryption has become much more widespread in the 1445 years since the IETF issued "Pervasive Monitoring Is an Attack" 1446 [RFC7258] as a Best Current Practice, describing pervasive monitoring 1447 as a much greater threat than previously appreciated. After the 1448 Snowden disclosures, many content providers made the decision to use 1449 HTTPS protection - HTTP over TLS - for most or all content being 1450 delivered as a routine practice, rather than in exceptional cases for 1451 content that was considered "sensitive". 1453 Unfortunately, as noted in [RFC7258], there is no way to prevent 1454 pervasive monitoring by an "attacker", while allowing monitoring by a 1455 more benign entity who "only" wants to use DPI to examine HTTP 1456 requests and responses in order to provide a better user experience. 1457 If a modern encrypted transport protocol is used for end-to-end media 1458 encryption, intermediary streaming operators are unable to examine 1459 transport and application protocol behavior. As described in 1460 Section 6.2, only an intermediary streaming operator who is 1461 explicitly authorized to examine packet payloads, rather than 1462 intercepting packets and examining them without authorization, can 1463 continue these practices. 1465 [RFC7258] said that "The IETF will strive to produce specifications 1466 that mitigate pervasive monitoring attacks", so streaming operators 1467 should expect the IETF's direction toward preventing unauthorized 1468 monitoring of IETF protocols to continue for the forseeable future. 1470 7. Further Reading and References 1472 Editor's note: This section is to be kept in a living document where 1473 future references, links and/or updates to the existing references 1474 will be reflected. That living document is likely to be an IETF- 1475 owned Wiki: https://tinyurl.com/streaming-opcons-reading 1477 7.1. Industry Terminology 1479 * SVA Glossary: https://glossary.streamingvideoalliance.org/ 1481 * Datazoom Video Player Data Dictionary: 1482 https://help.datazoom.io/hc/en-us/articles/360031323311 1484 * Datazoom Video Metrics Encyclopedia: https://help.datazoom.io/hc/ 1485 en-us/articles/360046177191 1487 7.2. Surveys and Tutorials 1489 7.2.1. Encoding 1491 The following papers describe how video is encoded, different video 1492 encoding standards and tradeoffs in selecting encoding parameters. 1494 * Overview of the Versatile Video Coding (VVC) Standard and its 1495 Applications (https://ieeexplore.ieee.org/document/9503377) 1497 * Video Compression - From Concepts to the H.264/AVC Standard 1498 (https://ieeexplore.ieee.org/document/1369695) 1500 * Developments in International Video Coding Standardization After 1501 AVC, With an Overview of Versatile Video Coding (VVC) 1502 (https://ieeexplore.ieee.org/document/9328514) 1504 * A Technical Overview of AV1 (https://ieeexplore.ieee.org/ 1505 document/9363937) 1507 * CTU Depth Decision Algorithms for HEVC: A Survey 1508 (https://arxiv.org/abs/2104.08328) 1510 7.2.2. Packaging 1512 The following papers summarize the methods for selecting packaging 1513 configurations such as the resolution-bitrate pairs, segment 1514 durations, use of constant vs. variable-duration segments, etc. 1516 * Deep Reinforced Bitrate Ladders for Adaptive Video Streaming 1517 (https://dl.acm.org/doi/10.1145/3458306.3458873) 1519 * Comparing Fixed and Variable Segment Durations for Adaptive Video 1520 Streaming: a Holistic Analysis (https://dl.acm.org/ 1521 doi/10.1145/3339825.3391858) 1523 7.2.3. Content Delivery 1525 The following links describe some of the issues and solutions 1526 regarding the interconnecting of the content delivery networks. 1528 * Open Caching: Open standards for Caching in ISP Networks: 1529 https://www.streamingvideoalliance.org/working-group/open-caching/ 1531 * Netflix Open Connect: https://openconnect.netflix.com 1533 7.2.4. ABR Algorithms 1535 The two surveys describe and compare different rate-adaptation 1536 algorithms in terms of different metrics like achieved bitrate/ 1537 quality, stall rate/duration, bitrate switching frequency, fairness, 1538 network utilization, etc. 1540 * A Survey on Bitrate Adaptation Schemes for Streaming Media Over 1541 HTTP (https://ieeexplore.ieee.org/document/8424813) 1543 * A Survey of Rate Adaptation Techniques for Dynamic Adaptive 1544 Streaming Over HTTP (https://ieeexplore.ieee.org/document/7884970) 1546 * Low-Latency Live Streaming: The following papers describe the 1547 peculiarities of adaptive streaming in low-latency live streaming 1548 scenarios. 1550 * Catching the Moment with LoL+ in Twitch-like Low-latency Live 1551 Streaming Platforms (https://ieeexplore.ieee.org/document/9429986) 1553 * Data-driven Bandwidth Prediction Models and Automated Model 1554 Selection for Low Latency (https://ieeexplore.ieee.org/ 1555 document/9154522) 1557 * Performance Analysis of ACTE: A Bandwidth Prediction Method for 1558 Low-latency Chunked Streaming (https://dl.acm.org/ 1559 doi/10.1145/3387921) 1561 * Online Learning for Low-latency Adaptive Streaming 1562 (https://dl.acm.org/doi/10.1145/3339825.3397042) 1564 * Tightrope Walking in Low-latency Live Streaming: Optimal Joint 1565 Adaptation of Video Rate and Playback Speed (https://dl.acm.org/ 1566 doi/10.1145/3458305.3463382) 1568 * Content-aware Playback Speed Control for Low-latency Live 1569 Streaming of Sports (https://dl.acm.org/ 1570 doi/10.1145/3458305.3478437) 1572 7.2.5. Server/Client/Network Collaboration 1574 The following papers explain the benefits of server and network 1575 assistance in client-driven streaming systems. There is also a good 1576 reference about how congestion affects video quality and how rate 1577 control works in streaming applications. 1579 * Manus Manum Lavat: Media Clients and Servers Cooperating with 1580 Common Media Client/Server Data (https://dl.acm.org/ 1581 doi/10.1145/3472305.3472886) 1583 * Common media client data (CMCD): initial findings 1584 (https://dl.acm.org/doi/10.1145/3458306.3461444) 1586 * SDNDASH: Improving QoE of HTTP Adaptive Streaming Using Software 1587 Defined Networking (https://dl.acm.org/ 1588 doi/10.1145/2964284.2964332) 1590 * Caching in HTTP Adaptive Streaming: Friend or Foe? 1591 (https://dl.acm.org/doi/10.1145/2578260.2578270) 1593 * A Survey on Multi-Access Edge Computing Applied to Video 1594 Streaming: Some Research Issues and Challenges 1595 (https://ieeexplore.ieee.org/document/9374553) 1597 * The Ultimate Guide to Internet Congestion Control 1598 (https://www.compiralabs.com/ultimate-guide-congestion-control) 1600 7.2.6. QoE Metrics 1602 The following papers describe various QoE metrics one can use in 1603 streaming applications. 1605 * QoE Management of Multimedia Streaming Services in Future 1606 Networks: a Tutorial and Survey (https://ieeexplore.ieee.org/ 1607 document/8930519) 1609 * A Survey on Quality of Experience of HTTP Adaptive Streaming 1610 (https://ieeexplore.ieee.org/document/6913491) 1612 * QoE Modeling for HTTP Adaptive Video Streaming-A Survey and Open 1613 Challenges (https://ieeexplore.ieee.org/document/8666971) 1615 7.2.7. Point Clouds and Immersive Media 1617 The following papers explain the latest developments in the immersive 1618 media domain (for video and audio) and the developing standards for 1619 such media. 1621 * A Survey on Adaptive 360o Video Streaming: Solutions, Challenges 1622 and Opportunities (https://ieeexplore.ieee.org/document/9133103) 1624 * MPEG Immersive Video Coding Standard (https://ieeexplore.ieee.org/ 1625 document/9374648) 1627 * Emerging MPEG Standards for Point Cloud Compression 1628 (https://ieeexplore.ieee.org/document/8571288) 1630 * Compression of Sparse and Dense Dynamic Point Clouds--Methods and 1631 Standards (https://ieeexplore.ieee.org/document/9457097) 1633 * MPEG Standards for Compressed Representation of Immersive Audio 1634 (https://ieeexplore.ieee.org/document/9444109) 1636 * An Overview of Omnidirectional MediA Format (OMAF) 1637 (https://ieeexplore.ieee.org/document/9380215) 1639 * From Capturing to Rendering: Volumetric Media Delivery with Six 1640 Degrees of Freedom (https://ieeexplore.ieee.org/document/9247522) 1642 7.3. Open-Source Tools 1644 * 5G-MA: https://www.5g-mag.com/reference-tools 1646 * dash.js: http://reference.dashif.org/dash.js/latest/samples/ 1648 * DASH-IF Conformance: https://conformance.dashif.org 1650 * ExoPlayer: https://github.com/google/ExoPlayer 1652 * FFmpeg: https://www.ffmpeg.org/ 1654 * GPAC: https://gpac.wp.imt.fr/ 1656 * hls.js: https://github.com/video-dev/hls.js 1658 * OBS Studio: https://obsproject.com/ 1660 * Shaka Player: https://github.com/google/shaka-player 1662 * Shaka Packager: https://github.com/google/shaka-packager 1664 * Traffic Control CDN: https://trafficcontrol.incubator.apache.org 1666 * VideoLAN: https://www.videolan.org/projects/ 1668 * video.js: https://github.com/videojs/video.js 1670 7.4. Technical Events 1672 * ACM Mile High Video (MHV): https://mile-high.video/ 1674 * ACM Multimedia Systems (MMSys): https://acmmmsys.org 1676 * ACM Multimedia (MM): https://acmmm.org 1678 * ACM NOSSDAV: https://www.nossdav.org/ 1680 * ACM Packet Video: https://packet.video/ 1682 * Demuxed and meetups: https://demuxed.com/ and https://demuxed.com/ 1683 events/ 1685 * DVB World: https://www.dvbworld.org 1687 * EBU BroadThinking: https://tech.ebu.ch/events/broadthinking2021 1689 * IBC Conference: https://show.ibc.org/conference/ibc-conference 1690 * IEEE Int. Conf. on Multimedia and Expo (ICME) 1692 * Media Web Symposium: https://www.fokus.fraunhofer.de/de/go/mws 1694 * Live Video Stack: https://sh2021.livevideostack.com 1696 * Picture Coding Symp. (PCS) 1698 * SCTE Expo: https://expo.scte.org/ 1700 7.5. List of Organizations Working on Streaming Media 1702 * 3GPP SA4: https://www.3gpp.org/specifications-groups/sa-plenary/ 1703 sa4-codec 1705 * 5G-MAG: https://www.5g-mag.com/ 1707 * AOM: http://aomedia.org/ 1709 * ATSC: https://www.atsc.org/ 1711 * CTA WAVE: https://cta.tech/Resources/Standards/WAVE-Project 1713 * DASH Industry Forum: https://dashif.org/ 1715 * DVB: https://dvb.org/ 1717 * HbbTV: https://www.hbbtv.org/ 1719 * HESP Alliance: https://www.hespalliance.org/ 1721 * IAB: https://www.iab.com/ 1723 * MPEG: https://www.mpegstandards.org/ 1725 * Streaming Video Alliance: https://www.streamingvideoalliance.org/ 1727 * SCTE: https://www.scte.org/ 1729 * SMPTE: https://www.smpte.org/ 1731 * SRT Alliance: https://www.srtalliance.org/ 1733 * Video Services Forum: https://vsf.tv/ 1735 * VQEG: https://www.its.bldrdoc.gov/vqeg/vqeg-home.aspx 1737 * W3C: https://www.w3.org/ 1739 7.6. Topics to Keep an Eye on 1741 7.6.1. 5G and Media 1743 5G new radio and systems technologies provide new functionalities for 1744 video distribution. 5G targets not only smartphones, but also new 1745 devices such as augmented reality glasses or automotive receivers. 1746 Higher bandwidth, lower latencies, edge and cloud computing 1747 functionalities, service-based architectures, low power consumption, 1748 broadcast/multicast functionalities and other network functions come 1749 hand in hand with new media formats and processing capabilities 1750 promising better and more consistent quality for traditional video 1751 streaming services as well as enabling new experiences such as 1752 immersive media and augmented realities. 1754 * 5G Multimedia Standardization (https://www.riverpublishers.com/ 1755 journal_read_html_article.php?j=JICTS/6/1/8) 1757 7.6.2. Ad Insertion 1759 Ads can be inserted at different stages in the streaming workflow, on 1760 the server side or client side. The DASH-IF guidelines detail 1761 server-side ad-insertion with period replacements based on 1762 manipulating the manifest. HLS interstitials provide a similar 1763 approach. The idea is that the manifest can be changed and point to 1764 a sub-playlist of segments, possibly located on a different location. 1765 This approach results in efficient resource usage in the network, as 1766 duplicate caching is avoided, but some intelligence at the player is 1767 needed to deal with content transitions (e.g., codec changes, 1768 timeline gaps, etc.). Player support for such content is gradually 1769 maturing. Other important technologies for ad insertion include 1770 signalling of ads and breaks that is still typically based on SCTE-35 1771 for HLS and SCTE-214 for DASH. Such signals provide useful 1772 information for scheduling the ads and contacting ad servers. The 1773 usage of SCTE-35 for ad insertion is popular in the broadcast 1774 industry, while the exact usage in the OTT space is still being 1775 discussed in SCTE. Another important technology is identification of 1776 ads, such as based on ad-id or other commercial entities that provide 1777 such services. The identification of the ad in a manifest or stream 1778 is usually standardized by SMPTE. Other key technologies for ad 1779 insertion include tracking of viewer impressions, usually based on 1780 Video Ad Serving Template (VAST) defined by IAB. 1782 * DASH-IF Ad Insertion Guidelines: https://dashif.org/docs/CR-Ad- 1783 Insertion-r7.pdf 1785 * SCTE-214-1: https://www.scte.org/standards-development/library/ 1786 standards-catalog/ansiscte-214-1-2016/ 1788 * RP 2092-1:2015 - SMPTE Recommended Practice - Advertising Digital 1789 Identifier (Ad-ID) Representations: https://ieeexplore.ieee.org/ 1790 document/7291518 1792 * IAB Tech Lab Digital Video Studio: https://iabtechlab.com/audio- 1793 video/tech-lab-digital-video-suite/ 1795 7.6.3. Contribution and Ingest 1797 There are different contribution and ingest specifications dealing 1798 with different use cases. A common case is contribution that 1799 previously happened over satellite to a broadcast or streaming 1800 headend. RIST and SRT are examples of such contribution protocols. 1801 Within a streaming headend the encoder and packager/CDN may have an 1802 ingest/contribution interface as well. This is specified by the 1803 DASH-IF Ingest. 1805 * DASH-IF Ingest: https://github.com/Dash-Industry-Forum/Ingest 1807 * RIST: https://www.rist.tv/ 1809 * SRT: https://github.com/Haivision/srt 1811 7.6.4. Synchronized Encoding and Packaging 1813 Practical streaming headends need redundant encoders and packagers to 1814 operate without glitches and blackouts. The redundant operation 1815 requires synchronization between two or more encoders and also 1816 between two or more packagers that possibly handle different inputs 1817 and outputs, generating compatible inter-changeable output 1818 representations. This problem is important for anyone developing a 1819 streaming headend at scale, and the synchronization problem is 1820 currently under discussion in the wider community. Follow the 1821 developments at: https://sites.google.com/view/encodersyncworkshop/ 1822 home 1824 7.6.5. WebRTC-Based Streaming 1826 WebRTC is increasingly being used for streaming of time-sensitive 1827 content such as live sporting events. Innovations in cloud computing 1828 allow implementers to efficiently scale delivery of content using 1829 WebRTC. Support for WebRTC communication is available on all modern 1830 web browsers and is available on native clients for all major 1831 platforms. 1833 * DASH-IF WebRTC Discussions: https://dashif.org/webRTC/ 1835 * Overview of WebRTC: https://webrtc.org/ 1837 8. IANA Considerations 1839 This document requires no actions from IANA. 1841 9. Security Considerations 1843 This document introduces no new security issues. 1845 10. Acknowledgments 1847 Thanks to Alexandre Gouaillard, Aaron Falk, Dave Oran, Glenn Deen, 1848 Kyle Rose, Leslie Daigle, Lucas Pardue, Mark Nottingham, Matt Stock, 1849 Mike English, Roni Even, and Will Law for very helpful suggestions, 1850 reviews and comments. 1852 (If we missed your name, please let us know!) 1854 11. Informative References 1856 [ABRSurvey] 1857 Taani, B., Begen, A.C., Timmerer, C., Zimmermann, R., and 1858 A. Bentaleb et al, "A Survey on Bitrate Adaptation Schemes 1859 for Streaming Media Over HTTP", IEEE Communications 1860 Surveys & Tutorials , 2019, 1861 . 1863 [BAP] "The Coalition for Better Ads", n.d., 1864 . 1866 [CDiD] Huitema, C. and B. Trammell, "(A call for) Congestion 1867 Defense in Depth", July 2019, 1868 . 1871 [CMAF-CTE] Law, W., "Ultra-Low-Latency Streaming Using Chunked- 1872 Encoded and Chunked Transferred CMAF", October 2018, 1873 . 1876 [CODASPY17] 1877 Reed, A. and M. Kranch, "Identifying HTTPS-Protected 1878 Netflix Videos in Real-Time", ACM CODASPY , March 2017, 1879 . 1881 [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", 1882 Communications of the ACM, Volume 55, Issue 7, pp. 42-50 , 1883 July 2012. 1885 [COPA18] Arun, V. and H. Balakrishnan, "Copa: Practical Delay-Based 1886 Congestion Control for the Internet", USENIX NSDI , April 1887 2018, . 1889 [CTA-2066] Consumer Technology Association, "Streaming Quality of 1890 Experience Events, Properties and Metrics", March 2020, 1891 . 1894 [CTA-5004] CTA, ., "Common Media Client Data (CMCD)", September 2020, 1895 . 1898 [CVNI] "Cisco Visual Networking Index: Forecast and Trends, 1899 2017-2022 White Paper", 27 February 2019, 1900 . 1904 [ELASTIC] De Cicco, L., Caldaralo, V., Palmisano, V., and S. 1905 Mascolo, "ELASTIC: A client-side controller for dynamic 1906 adaptive streaming over HTTP (DASH)", Packet Video 1907 Workshop , December 2013, 1908 . 1910 [Encodings] 1911 Apple, Inc, ., "HLS Authoring Specification for Apple 1912 Devices", June 2020, 1913 . 1917 [I-D.cardwell-iccrg-bbr-congestion-control] 1918 Cardwell, N., Cheng, Y., Yeganeh, S. H., and V. Jacobson, 1919 "BBR Congestion Control", Work in Progress, Internet- 1920 Draft, draft-cardwell-iccrg-bbr-congestion-control-00, 3 1921 July 2017, . 1924 [I-D.draft-pantos-hls-rfc8216bis] 1925 Pantos, R., "HTTP Live Streaming 2nd Edition", Work in 1926 Progress, Internet-Draft, draft-pantos-hls-rfc8216bis-09, 1927 27 April 2021, . 1930 [I-D.ietf-quic-datagram] 1931 Pauly, T., Kinnear, E., and D. Schinazi, "An Unreliable 1932 Datagram Extension to QUIC", Work in Progress, Internet- 1933 Draft, draft-ietf-quic-datagram-04, 8 September 2021, 1934 . 1937 [I-D.ietf-quic-http] 1938 Bishop, M., "Hypertext Transfer Protocol Version 3 1939 (HTTP/3)", Work in Progress, Internet-Draft, draft-ietf- 1940 quic-http-34, 2 February 2021, 1941 . 1944 [I-D.ietf-quic-manageability] 1945 Kuehlewind, M. and B. Trammell, "Manageability of the QUIC 1946 Transport Protocol", Work in Progress, Internet-Draft, 1947 draft-ietf-quic-manageability-13, 2 September 2021, 1948 . 1951 [I-D.ietf-quic-qlog-h3-events] 1952 Marx, R., Niccolini, L., and M. Seemann, "HTTP/3 and QPACK 1953 event definitions for qlog", Work in Progress, Internet- 1954 Draft, draft-ietf-quic-qlog-h3-events-00, 10 June 2021, 1955 . 1958 [I-D.ietf-quic-qlog-main-schema] 1959 Marx, R., Niccolini, L., and M. Seemann, "Main logging 1960 schema for qlog", Work in Progress, Internet-Draft, draft- 1961 ietf-quic-qlog-main-schema-00, 10 June 2021, 1962 . 1965 [I-D.ietf-quic-qlog-quic-events] 1966 Marx, R., Niccolini, L., and M. Seemann, "QUIC event 1967 definitions for qlog", Work in Progress, Internet-Draft, 1968 draft-ietf-quic-qlog-quic-events-00, 10 June 2021, 1969 . 1972 [IAB-ADS] "IAB", n.d., . 1974 [IABcovid] Arkko, J., Farrel, S., Kühlewind, M., and C. Perkins, 1975 "Report from the IAB COVID-19 Network Impacts Workshop 1976 2020", November 2020, . 1979 [Jacobson-Karels] 1980 Jacobson, V. and M. Karels, "Congestion Avoidance and 1981 Control", November 1988, 1982 . 1984 [Labovitz] Labovitz, C., "Network traffic insights in the time of 1985 COVID-19: April 9 update", April 2020, 1986 . 1989 [LabovitzDDoS] 1990 Takahashi, D., "Why the game industry is still vulnerable 1991 to DDoS attacks", May 2018, 1992 . 1996 [LL-DASH] DASH-IF, ., "Low-latency Modes for DASH", March 2020, 1997 . 1999 [Micro] Taher, T.M., Misurac, M.J., LoCicero, J.L., and D.R. Ucci, 2000 "Microwave Oven Signal Interference Mitigation For Wi-Fi 2001 Communication Systems", 2008 5th IEEE Consumer 2002 Communications and Networking Conference 5th IEEE, pp. 2003 67-68 , 2008. 2005 [Mishra] Mishra, S. and J. Thibeault, "An update on Streaming Video 2006 Alliance", April 2020, 2007 . 2012 [MMSP20] Durak, K. and . et al, "Evaluating the performance of 2013 Apple's low-latency HLS", IEEE MMSP , September 2020, 2014 . 2016 [MMSys11] Akhshabi, S., Begen, A.C., and C. Dovrolis, "An 2017 experimental evaluation of rate-adaptation algorithms in 2018 adaptive streaming over HTTP", ACM MMSys , February 2011, 2019 . 2021 [MPEG-CMAF] 2022 "ISO/IEC 23000-19:2020 Multimedia application format 2023 (MPEG-A) - Part 19: Common media application format (CMAF) 2024 for segmented media", March 2020, 2025 . 2027 [MPEG-DASH] 2028 "ISO/IEC 23009-1:2019 Dynamic adaptive streaming over HTTP 2029 (DASH) - Part 1: Media presentation description and 2030 segment formats", December 2019, 2031 . 2033 [MPEG-DASH-SAND] 2034 "ISO/IEC 23009-5:2017 Dynamic adaptive streaming over HTTP 2035 (DASH) - Part 5: Server and network assisted DASH (SAND)", 2036 February 2017, . 2038 [MPEG-TS] "H.222.0 : Information technology - Generic coding of 2039 moving pictures and associated audio information: 2040 Systems", 29 August 2018, 2041 . 2043 [MPEGI] Boyce, J.M. and . et al, "MPEG Immersive Video Coding 2044 Standard", Proceedings of the IEEE , n.d., 2045 . 2047 [OReilly-HPBN] 2048 "High Performance Browser Networking (Chapter 2: Building 2049 Blocks of TCP)", May 2021, 2050 . 2052 [PCC] Schwarz, S. and . et al, "Emerging MPEG Standards for 2053 Point Cloud Compression", IEEE Journal on Emerging and 2054 Selected Topics in Circuits and Systems , March 2019, 2055 . 2057 [Port443] "Service Name and Transport Protocol Port Number 2058 Registry", April 2021, . 2062 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 2063 RFC 793, DOI 10.17487/RFC0793, September 1981, 2064 . 2066 [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast 2067 Retransmit, and Fast Recovery Algorithms", RFC 2001, 2068 DOI 10.17487/RFC2001, January 1997, 2069 . 2071 [RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. 2072 Shelby, "Performance Enhancing Proxies Intended to 2073 Mitigate Link-Related Degradations", RFC 3135, 2074 DOI 10.17487/RFC3135, June 2001, 2075 . 2077 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 2078 Jacobson, "RTP: A Transport Protocol for Real-Time 2079 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 2080 July 2003, . 2082 [RFC3758] Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P. 2083 Conrad, "Stream Control Transmission Protocol (SCTP) 2084 Partial Reliability Extension", RFC 3758, 2085 DOI 10.17487/RFC3758, May 2004, 2086 . 2088 [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF 2089 Digits, Telephony Tones, and Telephony Signals", RFC 4733, 2090 DOI 10.17487/RFC4733, December 2006, 2091 . 2093 [RFC5594] Peterson, J. and A. Cooper, "Report from the IETF Workshop 2094 on Peer-to-Peer (P2P) Infrastructure, May 28, 2008", 2095 RFC 5594, DOI 10.17487/RFC5594, July 2009, 2096 . 2098 [RFC5762] Perkins, C., "RTP and the Datagram Congestion Control 2099 Protocol (DCCP)", RFC 5762, DOI 10.17487/RFC5762, April 2100 2010, . 2102 [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. 2103 Eleftheriadis, "RTP Payload Format for Scalable Video 2104 Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011, 2105 . 2107 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 2108 NewReno Modification to TCP's Fast Recovery Algorithm", 2109 RFC 6582, DOI 10.17487/RFC6582, April 2012, 2110 . 2112 [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, 2113 "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, 2114 DOI 10.17487/RFC6817, December 2012, 2115 . 2117 [RFC6843] Clark, A., Gross, K., and Q. Wu, "RTP Control Protocol 2118 (RTCP) Extended Report (XR) Block for Delay Metric 2119 Reporting", RFC 6843, DOI 10.17487/RFC6843, January 2013, 2120 . 2122 [RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, 2123 Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching", 2124 RFC 7234, DOI 10.17487/RFC7234, June 2014, 2125 . 2127 [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an 2128 Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May 2129 2014, . 2131 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 2132 "Encapsulating MPLS in UDP", RFC 7510, 2133 DOI 10.17487/RFC7510, April 2015, 2134 . 2136 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 2137 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 2138 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 2139 DOI 10.17487/RFC7656, November 2015, 2140 . 2142 [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating 2143 TCP to Support Rate-Limited Traffic", RFC 7661, 2144 DOI 10.17487/RFC7661, October 2015, 2145 . 2147 [RFC8083] Perkins, C. and V. Singh, "Multimedia Congestion Control: 2148 Circuit Breakers for Unicast RTP Sessions", RFC 8083, 2149 DOI 10.17487/RFC8083, March 2017, 2150 . 2152 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", 2153 BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 2154 . 2156 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 2157 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 2158 March 2017, . 2160 [RFC8216] Pantos, R., Ed. and W. May, "HTTP Live Streaming", 2161 RFC 8216, DOI 10.17487/RFC8216, August 2017, 2162 . 2164 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 2165 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 2166 RFC 8312, DOI 10.17487/RFC8312, February 2018, 2167 . 2169 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 2170 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 2171 . 2173 [RFC8622] Bless, R., "A Lower-Effort Per-Hop Behavior (LE PHB) for 2174 Differentiated Services", RFC 8622, DOI 10.17487/RFC8622, 2175 June 2019, . 2177 [RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach, 2178 "Double Encryption Procedures for the Secure Real-Time 2179 Transport Protocol (SRTP)", RFC 8723, 2180 DOI 10.17487/RFC8723, April 2020, 2181 . 2183 [RFC8825] Alvestrand, H., "Overview: Real-Time Protocols for 2184 Browser-Based Applications", RFC 8825, 2185 DOI 10.17487/RFC8825, January 2021, 2186 . 2188 [RFC8999] Thomson, M., "Version-Independent Properties of QUIC", 2189 RFC 8999, DOI 10.17487/RFC8999, May 2021, 2190 . 2192 [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based 2193 Multiplexed and Secure Transport", RFC 9000, 2194 DOI 10.17487/RFC9000, May 2021, 2195 . 2197 [RFC9001] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure 2198 QUIC", RFC 9001, DOI 10.17487/RFC9001, May 2021, 2199 . 2201 [RFC9002] Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection 2202 and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, 2203 May 2021, . 2205 [SFRAME] "Secure Media Frames Working Group (Home Page)", n.d., 2206 . 2208 [SRT] Sharabayko, M., "Secure Reliable Transport (SRT) Protocol 2209 Overview", 15 April 2020, 2210 . 2215 [tsvarea-105] 2216 "TSVAREA Minutes - IETF 105", July 2019, 2217 . 2220 Authors' Addresses 2222 Jake Holland 2223 Akamai Technologies, Inc. 2224 150 Broadway 2225 Cambridge, MA 02144, 2226 United States of America 2228 Email: jakeholland.net@gmail.com 2230 Ali Begen 2231 Networked Media 2232 Turkey 2234 Email: ali.begen@networked.media 2236 Spencer Dawkins 2237 Tencent America LLC 2238 United States of America 2240 Email: spencerdawkins.ietf@gmail.com