idnits 2.17.00 (12 Aug 2021) /tmp/idnits36013/draft-dong-usecase-packet-significance-diff-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (20 October 2021) is 206 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Independent Submission L. Dong 3 Internet-Draft K. Makhijani 4 Intended status: Informational R. Li 5 Expires: 23 April 2022 Futurewei Technologies Inc. 6 20 October 2021 8 A Use Case of Packets' Significance Difference with Media Scalability 9 draft-dong-usecase-packet-significance-diff-01 11 Abstract 13 This document introduces a use case of packets' significance 14 difference embedded with media scalability. With the dominance of 15 video traffic on the Internet, selectively dropping packets or parts 16 of packets from competing media streams becomes a complementary 17 mechanism when dealing with network congestion. 19 The document describes the characteristics of media scalability, some 20 limitations of existing end-to-end congestion control mechanisms 21 through rate control and adaptation, explains why current ways of 22 entire packet dropping at the traffic class level using in-network 23 active queue management are not most appropriate to meet end users' 24 Quality of Service expectations. The document identifies that there 25 exists "significance difference" among packets or even among parts of 26 the packets within a flow, and brings out a new set of requirements 27 for application and network to support packet significance difference 28 to improve the Quality of Experience of end users. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on 23 April 2022. 47 Copyright Notice 49 Copyright (c) 2021 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 54 license-info) in effect on the date of publication of this document. 55 Please review these documents carefully, as they describe your rights 56 and restrictions with respect to this document. Code Components 57 extracted from this document must include Simplified BSD License text 58 as described in Section 4.e of the Trust Legal Provisions and are 59 provided without warranty as described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 64 2. Terms and Abbreviations . . . . . . . . . . . . . . . . . . . 3 65 3. Media Scalability and Congestion Control . . . . . . . . . . 4 66 4. Packet Dropping . . . . . . . . . . . . . . . . . . . . . . . 5 67 5. Significance Difference Among Packets and Within Packets . . 6 68 6. New Requirements . . . . . . . . . . . . . . . . . . . . . . 7 69 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 70 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 71 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 72 10. Informative References . . . . . . . . . . . . . . . . . . . 8 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 75 1. Introduction 77 Recent studies [CiscoNetworkingIndex] show that IP video traffic will 78 be 82 percent of all consumer Internet traffic by 2021 in a global 79 scale, up from 73 percent in 2016. Live video has grown 15-fold from 80 2016 to 2021, accounts for 13 percent of Internet video traffic by 81 2021. VR (Virtual Reality) and AR (Augmented Reality) traffic has 82 increased 20-fold between 2016 and 2021, at a CAGR (Compound Annual 83 Growth Rate) of 82 percent. With the rapid growth of multimedia 84 streaming traffic, it is increasingly likely that multiple streaming 85 flows share a bottleneck link, which would inevitably cause network 86 congestion. Today's transport protocols and Internet protocols are 87 oblivious to multimedia streaming applications or end users' QoE 88 (Quality of Experience) expectations. From the perspective of user 89 experience and user expectation, the following two observations could 90 be made. 92 * It is very likely that a user may prefer to acquire the media 93 content in a somewhat degraded quality that is above the tolerance 94 threshold rather than getting nothing at all for a few seconds. 96 * A user may be particularly interested in certain group of blocks 97 belonging to the interested objects in the media content (i.e., 98 Region of Interest, RoI). It is necessary to prevent the RoI 99 blocks from being lost during transmission. 101 At the beginning of this document, the different types of scalability 102 are discussed in current video codecs, facilitating the rate control 103 and adaptation mechanisms carried out in video segments when dealing 104 with network congestion during the media streaming. It is 105 acknowledged that such mechanisms have efficiently improved users' 106 QoE. However, the packets on the wire cannot avoid the possibility 107 of being entirely dropped when the bottleneck network nodes cannot 108 retain them due to buffer overflowing during congestion. Thanks to 109 the scalability characteristics designed to the video codecs, it is 110 not hard to find out that the importance or significance of different 111 packets within a media streaming flow or even different parts of the 112 single packet could vary for their usefulness in decoding and 113 recovering the media content to meet receiver's expectation. The 114 document highlights the requirements of making the user' preference 115 and application context aware to the network to help further improve 116 the QoE of media streaming. Accordingly, the network could treat the 117 packets or different parts of the packets according to the 118 characteristics of the packets and end users' preferences. 120 2. Terms and Abbreviations 122 The terms and abbreviations used in this document are listed below. 124 * AR: Augmented Reality 126 * CAGR: Compound Annual Growth Rate 128 * DASH: Dynamic Adaptive Streaming over HTTP 130 * GOP: Group of Picture 132 * HAS: HTTP Adaptive Stream 134 * HTTP: Hypertext Transfer Protocol 136 * QoE: Quality of Experience 138 * QoS: Quality of Service 140 * SNR: Signal-to-Noise Ratio 142 * SVC: Scalable Video Coding 143 * VR: Virtual Reality 145 The above terminology is defined in greater details in the remainder 146 of this document. 148 3. Media Scalability and Congestion Control 150 A visual scene is represented in digital form by sampling the real 151 scene spatially on a rectangular grid in the video image plane and 152 sampling temporally at regular time intervals as a sequence of still 153 frames. Correspondingly, modern media codec [Conklin2001] [Kim2001] 154 incorporates three types of "Scalability": i.e., temporal 155 scalability, spatial scalability, and quality scalability, which 156 adapt the media bitstream by adding or removing some portions to/from 157 it in order to match the different needs or preferences of end users 158 as well as to the network conditions. 160 Temporal scalability refers to scalability designed to allow the 161 frame rate of the video bitstream to be varied using interlayer 162 prediction. Spatial scalability represents the spatial resolution 163 variations with respect to the original image frame. The lower layer 164 provides the basic spatial resolution. The enhancement layer employs 165 the spatially interpolated lower layers and constructs the source 166 video in its full spatial resolution. Quality scalability is also 167 commonly referred to as fidelity or SNR (Signal-to-Noise Ratio) 168 scalability. Each spatial layer could have many quality layers. For 169 example, SVC (Scalable Video Coding)[SVC] is an H.264 [H.264] 170 extension that divides a single video bitstream into multiple 171 representations or layers. This hierarchical layered structure 172 comprises a base layer and two enhancement layers. The media may be 173 scaled up by adding the enhancement layer(s) or scaled down by 174 dropping the enhancement layer(s). The levels of scalability 175 included in the media stream affect the quality of media presented to 176 the end users' devices. 178 Bursty loss and longer-than-expected delay have catastrophic effect 179 on QoE to end-users in media streaming. They are usually caused by 180 network congestion. Despite all kinds of congestion control 181 mechanisms developed in the community over the decades [Saadi2019] 182 [Adams2013], they often target different goals, e.g., link 183 utilization improvement, loss reduction, fairness enhancement. By 184 leveraging the flexibility and variety of media qualities provided by 185 different types of media scalability, for media streaming, minimizing 186 the possibility of network congestion can often be achieved by rate 187 control and media adaptation methods. 189 Existing rate control and adaptation methods [Bentaleb2019] [Wu2001] 190 can be at source-side and receiver-side, which are carried at end 191 devices and servers, respectively. 193 * In source-based schemes [Wu2000] , source regulates the sending 194 rate to maintain the packet loss ratio below a threshold by 195 employing the feedback from probing experiments, or source 196 determines the sending rate through a TCP-friendly model. 197 However, some constraints exist, media codecs can usually only 198 adjust their output rates in a much more coarse-grained fashion 199 than, for example, TCP. Users' QoE would also suffer if encoding 200 rates are switched too frequently. 202 * HTTP (Hypertext Transfer Protocol)-based dynamic video adaptation 203 methods [Kua2017] could be driven by source. The server collects 204 the feedback from the network and client (e.g., dynamic variation 205 of network bandwidth and receiving buffer capacity of the client), 206 and accordingly, the video quality will be adapted and streamed. 207 On the other hand, adaptation techniques are also proposed at 208 receiver-side, which mainly use DASH (Dynamic Adaptive Streaming 209 over HTTP) [MPEG-DASH-SAND] [MPEG-DASH] and HAS (HTTP Adaptive 210 Stream) for streaming adapted video data. 212 * The receiver-based rate control [McCanne1996] is typically used in 213 multicasting scalable media content, which is split into multiple 214 layers, with each layer corresponding to one channel in the 215 multicast tree. Receivers could regulate their own receiving 216 rates by adding/dropping channels. Thus receiver-based rate has 217 its limited usage in unicasting. All these techniques consider 218 full quality while streaming from sender to receivers; hence, they 219 consume more resources in the network. 221 4. Packet Dropping 223 Acknowledging the benefits offered by various congestion control and 224 congestion avoidance mechanisms, we would like to point out that the 225 feedback and rate adaption might not be prompt enough to cope with 226 the dropping of packets on the wire. 228 In the current Internet, a packet is treated as the minimal, 229 independent, and self-sufficient unit that gets classified, 230 forwarded, or dropped completely by a network node, according to the 231 local configuration and congestion condition. Although congestion 232 discard can be mitigated by a mixture of ingress traffic shaping and 233 active queue management mechanisms [Thiruchelvi2008] [Adams2013] to 234 avoid any network resource overdrawn, it is not feasible to be 235 deployed on a large scale, meanwhile wastes network resources 236 preparing for the worst possible scenario. 238 DiffServ [RFC2475] is is used to manage resources such as bandwidth 239 and queuing buffers on a per-hop basis between different classes of 240 traffic. The Internet traffic may be separated into different 241 classes with differentiated priorities. This allows preferential 242 treatment for latency or loss sensitive traffic over more tolerant 243 applications, for example those that can afford retransmission. 244 However, with video traffic dominating Internet traffic, flows of 245 media streaming applications with the same class still compete for 246 network resources when encountering bottleneck links and fighting 247 network congestion, preference decided on traffic class would not be 248 effective to eliminate the possibility of degraded service levels or 249 packet drops due to collisions with each other. 251 The routers treat every bit/byte in the packet payload equally, which 252 means every bit/byte has the same significance to the routers. Each 253 to-be-dropped packet is discarded completely. If the transport layer 254 protocol is TCP, after timeout or duplicate acknowledgements received 255 at the sender, the sender may re-try to send the dropped packet 256 before the maximum number of re-transmissions reaches. 257 Retransmission of packets wastes network resources, reduces the 258 overall throughput of the connection and causes longer latency for 259 the packet delivery. The study [RFC8836] has shown that a loss rate 260 of 1% is tolerable to users while a loss rate of 3% is intolerable to 261 most users who found the quality to be annoying (or worse), according 262 to the subjective opinions of the effects of packet loss on media 263 quality. Therefore, the current way of handling network congestion 264 by discarding the packet entirely and retransmitting the packets in a 265 blind-of-application-context manner is not very suitable for media 266 streaming. 268 5. Significance Difference Among Packets and Within Packets 270 With the various scalability implemented in the media codec, some 271 bits of an encoded media stream are more important than others. Bits 272 belonging to base layer usually are more significant to the decoder 273 than bits belonging to enhancement layers. For example, I-frames 274 hold complete picture data [Orosz2015] and is frequently referenced 275 by the subsequent frames. It is inserted by the encoder when the 276 scene changes. Losing the first I-frame in the GOP (Group of 277 Pictures) would cause video picture even missing for few seconds, 278 because P- and B-frames referencing to the I-frame would not be 279 decoded nor displayed either. Thus, I-frames are most essential in 280 the media stream, which have the most effect on perceived video 281 quality, and such effect can last through the whole GOP. P- and 282 B-frames are inserted at appropriate places to reduce the video size 283 or bitrate and are tuned to maintain a certain video quality level. 284 P-frame stands for Predicted Frame and allows macroblocks to be 285 compressed using temporal prediction in addition to spatial 286 prediction. A P-frame might be referenced by a P frame after it, or 287 a B frame before or after it. B-frame stands for bi-directional 288 frame, which can be predicted using backward prediction and forward 289 prediction. A B-frame can act as a reference, and if so, it is 290 termed as a reference B-frame. If a B-frame is not to be used as a 291 reference, it is called a non-reference B-frame. Video scenes with a 292 low level of movement are less sensitive to both B-frame and P-frame 293 packet loss, alternatively video scenes with a high level of movement 294 are more sensitive to both B-frame and P-frame packet loss. A lost 295 P-frame can impact the remaining part of the GOP. A lost B-frame has 296 only local effects in a slowly moving content or with large static 297 background. In a scene of a dynamically moving content, losing 298 B-frame has more dramatic impact and its scale can be as far-reaching 299 as a P-frame loss. 301 As another example, macroblocks that are identified to represent the 302 objects in RoI are likely more important than other macroblocks of 303 non-RoI regions. For packets carrying RoI macroblocks in the media 304 stream need to have higher priority to be retained compared to other 305 packets carrying non-RoI macroblocks. 307 According to the characteristics of frames contained in the video 308 packet payload, namely: frame type, whether the frames are referenced 309 by other frames, movement level of the pictures, whether the picture 310 contained in the packet belongs to RoI or not, etc., significance 311 difference could present among packets for the video decoding at the 312 receiver side and the QoE improvement of end users. The dropping 313 priority is possibly implemented at packet level in the network. 315 On the other hand, let's say that the end-users can reveal their 316 preferences to the network, e.g., degree of tolerance to the decoded 317 media content' quality degradation, which might reflect visually such 318 as resolution reduction, missing objects in non-RoI regions, the 319 network could selectively drop packets in a differentiated manner 320 according to such information. This avoids retransmission or delay 321 of those packets with higher significance, reduce the experienced 322 end-to-end latency of end users, and maintain the continuous 323 streaming of the media. This is achieved at the cost of dropping 324 lower-significance packets. 326 6. New Requirements 328 We have discussed in the previous sections that due to the various 329 types of scalability implemented in the media codecs, "significance 330 difference" exists among packets or even among parts of the packets. 331 In other words, some packets containing the more important 332 macroblocks (e.g., RoI macroblocks, base layer macroblocks) show 333 higher significance than other packets for the media decoding at the 334 receiver side and the improvement of QoE of end users. In order for 335 the network be able to treat the packets of media streams in a 336 differentiated manner and at finer granularity than DiffServ, the 337 application shall reveal some information to the network to enable 338 selective packet dropping or partial packet dropping. For example, 339 an API could be implemented to input such information or metadata 340 from the application. which might be mapped to IPv6 extension header, 341 IPv4 options or a dedicated metadata field in the IP header. Some 342 examples of such information or metadata are listed below: 344 * Receiving end user's preference on media quality, e.g. tolerable 345 quality degradation regarding for example resolution. 347 * Characteristics of media content contained in the packets, e.g., 348 frame type, whether the packet contains frames that are referenced 349 by other frames, movement level of the video sample contained in 350 the packet. 352 * Labeling of the packets or some parts of the packets that 353 correspond to receiver's interested objects as RoI. 355 Correspondingly, the network shall be able to leverage the above 356 information revealed by the application, and selectively drop packets 357 or parts of the packets from competing media streaming flows with 358 precedence order when network congestion happens. The retransmission 359 could be maximumly eliminated. The receiving end user is able to 360 consume the delivered packets as many as possible in-time with 361 acceptable quality. 363 7. IANA Considerations 365 This document requires no actions from IANA. 367 8. Security Considerations 369 This document introduces no new security issues. 371 9. Acknowledgements 373 10. Informative References 375 [Adams2013] 376 Adams, R., "Active Queue Management: A Survey", IEEE 377 Communications Surveys and Tutorials, vol. 15, no. 3, pp. 378 1425-1476, 2013, . 381 [Bentaleb2019] 382 Bentaleb, A., Taani, B., Begen, A. C., Timmerer, C., and 383 R. Zimmermann, "A Survey on Bitrate Adaptation Schemes for 384 Streaming Media Over HTTP", IEEE Communications Surveys 385 and Tutorials, vol. 21, no. 1, pp. 562-585, 2019, 386 . 388 [CiscoNetworkingIndex] 389 Cisco, "Cisco Visual Networking Index: Forecast and 390 Methodology, 2016 to 2021", June 2017, 391 . 395 [Conklin2001] 396 Conklin, G. J., Greenbaum, G. S., Lillevold, K. O., 397 Lippman, A. F., and Y. A. Reznik, "Video Coding for 398 Streaming Media Delivery on the Internet", IEEE 399 Transactions on Circuits and Systems for Video 400 Technology, vol. 11, no. 3, pp. 269-281, 2001, 401 . 403 [H.264] ITU-T, "H.264 : Advanced Video Coding for Generic 404 Audiovisual Services", 2019, 405 . 407 [Kim2001] Kim, T., "Scalable video Streaming Over Internet", Ph.D. 408 Thesis, School of Electrical and Computer Engineering, 409 GeorgiaInstitute of Technology, January 2005, 410 . 412 [Kua2017] Kua, J., Armitage, G., and P. Branch, "A Survey of Rate 413 Adaptation Techniques for Dynamic Adaptive Streaming Over 414 HTTP", IEEE Communications Surveys and Tutorials, vol. 19, 415 no. 3, pp. 1842-1866, 2017, 416 . 418 [McCanne1996] 419 McCanne, S., Jacobson, V., and M. Vetterli, "Receiver- 420 Driven Layered Multicast", ACM Sigcomm, pp. 117-130, 1996, 421 . 424 [MPEG-DASH] 425 ISO/IEC, "23009-1:2019, Dynamic Adaptive Streaming over 426 HTTP (DASH) - Part 1: Media Presentation Description and 427 Segment Formats", 2019, 428 . 430 [MPEG-DASH-SAND] 431 ISO/IEC, "23009-5:2017, Dynamic Adaptive Streaming over 432 HTTP (DASH) - Part 5: Server and Network Assisted DASH 433 (SAND)", February 2017, 434 . 436 [Orosz2015] 437 Orosz, P., Skopko, T., and P. Varga, "Towards Estimating 438 Video QoE Based on Frame Loss Statistics of the Video 439 Streams", DOI: 10.1109/INM.2015.7140482, IFIP/IEEE 440 International Symposium on Integrated Network Management 441 (IM), pp. 1282-1285, 2015, 442 . 444 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 445 and W. Weiss, "An Architecture for Differentiated 446 Services", RFC 2475, December 1998, 447 . 449 [RFC8836] Jesup, R. and Z. Sarker, "Congestion Control Requirements 450 for Interactive Real-Time Media", RFC 8836, January 2001, 451 . 453 [Saadi2019] 454 Al-Saadi, R., Armitage, G., But, J., and P. Branch, "A 455 Survey of Delay-Based and Hybrid TCP Congestion Control 456 Algorithms", IEEE Communications Surveys and 457 Tutorials, vol. 21, no. 4, pp. 3609-3638, 2019, 458 . 460 [SVC] Schwarz, H., Marpe, D., and T. Wiegand, "Overview of the 461 Scalable Video Coding Extension of the H.264/AVC 462 Standard", IEEE Transactions on Circuits and Systems for 463 Video Technology, vol. 17, no. 9, 1103-1120, 2007, 464 . 466 [Thiruchelvi2008] 467 Thiruchelvi, G. and J. Raja, "A Survey On Active Queue 468 Management Mechanisms", International Journal of Computer 469 Science and Network Security, vol. 8, 2008, 470 . 473 [Wu2000] Wu, D., Hou, Y., and Y. Zhang, "Transporting Real-Time 474 Video Over the Internet: Challenges and approaches", 475 Proceedings of the IEEE, vol. 88, no. 12, 1855-1875, 2000, 476 . 478 [Wu2001] Wu, D., Hou, Y., Zhu, W., Zhang, Y., and J. Peha, 479 "Streaming Video Over the Internet: Approaches and 480 Directions", IEEE Transactions on Circuits and Systems for 481 Video Technology, vol. 11, no. 3, pp. 282-300, 2001, 482 . 484 Authors' Addresses 486 Lijun Dong 487 Futurewei Technologies Inc. 489 Email: lijun.dong@futurewei.com 491 Kiran Makhijani 492 Futurewei Technologies Inc. 494 Email: kiran.ietf@gmail.com 496 Richard Li 497 Futurewei Technologies Inc. 499 Email: richard.li@futurewei.com