idnits 2.17.00 (12 Aug 2021) /tmp/idnits35918/draft-lshuo-avt-rtp-avsp2-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 2120. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2091. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2098. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2104. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1, updated by RFC 4748 (on line 2108), which is fine, but *also* found old RFC 3978, Section 5.4, paragraph 1 text on line 35. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 9 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 428 has weird spacing: '... Type payl...' == Line 463 has weird spacing: '... Type stru...' == Line 635 has weird spacing: '... Type struc...' == Line 1553 has weird spacing: '...am that the o...' == Line 1966 has weird spacing: '...its are remov...' == (1 more instance...) -- The exact meaning of the all-uppercase expression 'MAY NOT' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == The expression 'MAY NOT', while looking like RFC 2119 requirements text, is not defined in RFC 2119, and should not be used. Consider using 'MUST NOT' instead (if that is what you mean). Found 'MAY NOT' in this paragraph: Some parameters provide a receiver with the properties of the stream that will be sent. The name of all these parameters starts with "sprop" for stream properties. Some of these "sprop" parameters are limited by other payload or codec configuration parameters. The media sender selects all "sprop" parameters rather than the receiver. This uncommon characteristic of the "sprop" parameters MAY NOT be compatible with some signaling protocol concepts, in which case the use of these parameters SHOULD be avoided. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 2007) is 5392 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '10' is mentioned on line 247, but not defined == Missing Reference: '11' is mentioned on line 1712, but not defined == Missing Reference: '13' is mentioned on line 1797, but not defined == Missing Reference: '14' is mentioned on line 1799, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 2327 (ref. '4') (Obsoleted by RFC 4566) ** Obsolete normative reference: RFC 4288 (ref. '5') (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 3555 (ref. '6') (Obsoleted by RFC 4855, RFC 4856) ** Obsolete normative reference: RFC 3548 (ref. '7') (Obsoleted by RFC 4648) Summary: 7 errors (**), 0 flaws (~~), 15 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force L. Huo 3 Internet Draft Peking University 4 Document: draft-lshuo-avt-rtp-avsp2-00.txt L. Wang 5 Expires: February 2008 Beijing Univ. of P&T 6 August 2007 8 RTP Payload Format for AVS-P2 Video 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Copyright Notice 35 Copyright (C) The Internet Society (2007). 37 Abstract 39 This memo specifies an RTP payload format for encapsulating AVS-P2 40 compressed video bit streams, as defined by the Audio Video Coding 41 Standard Workgroup of China (AVS Workgroup). The payload format has 42 wide applicability, as it supports applications from simple low 43 bit-rate conversational usage, to Internet video streaming with 44 interleaved transmission, to high bit-rate video-on-demand. 46 Table of Contents 48 1. Introduction...................................................2 49 1.1 Overview of AVS-P2 Video Codec.............................2 50 1.2 Conventions used in this document..........................4 52 2. Scope..........................................................4 53 3. Definitions and Abbreviations..................................4 54 3.1 Definitions................................................4 55 3.2 Abbreviation...............................................5 56 4. NAL Unit.......................................................5 57 5. RTP Payload Format.............................................7 58 5.1 RTP Header Usage...........................................7 59 5.2 Common Structure of the RTP Payload Format.................8 60 5.3 Packetization Modes........................................9 61 5.4 Decoding Order Number.....................................10 62 5.5 Single NAL Unit Packet....................................11 63 5.6 Aggregation Packets.......................................12 64 5.7 Fragmentation Units (FUs).................................18 65 6. Packetization Rules............ ..............................21 66 6.1 Common Packetization Rules................................21 67 6.2 Single NAL Unit Mode......................................21 68 6.3 Non-Interleaved Mode......................................22 69 6.4 Interleaved Mode..........................................22 70 7. Payload Format Parameters.....................................22 71 7.1 Media Type Registration...................................22 72 7.2 SDP Parameters............................................29 73 7.3 Considerations for Sequence Header........................34 74 8. Security Considerations.......................................35 75 9. Congestion Control............................................36 76 10. IANA Considerations..........................................36 77 11. De-Packetization Process (Informative).......................36 78 11.1. Single NAL Unit and Non-Interleaved Mode................37 79 11.2. Interleaved Mode........................................37 80 11.3. Additional De-Packetization Guidelines..................39 81 12. References...................................................39 82 12.1 Normative references.....................................39 83 12.2 Informative references...................................40 85 1. Introduction 87 1.1 Overview of AVS-P2 Video Codec 89 This memo specifies an RTP payload specification for the video 90 coding standard known as AVS-P2. The official name for AVS-P2 is 91 "Information Technology - Advanced Audio and Video Coding Part 2: 92 Video", which was defined by the Audio Video Coding Standard 93 Workgroup of China (AVS Workgroup), and approved as GB/T 20090.2 94 -2006 by Standardization Administration of China and enacted on 95 March 1, 2006 [1]. In this memo the AVS-P2 acronym is used for the 96 codec and the standard. 98 The AVS-P2 video codec has a very broad application range that 99 covers all forms of digital compressed video from, low bit-rate 100 Internet streaming applications to HDTV broadcast and Digital Cinema 101 applications with nearly lossless coding. The overall performance of 102 AVS-P2 is such that bit rate savings of more than 50% are reported, 103 when compared against MPEG-2. AVS-P2 has comparable compression 104 performance with that of H.264/AVC£¬ however with a valuable feature 105 of lower computational complexity [9]. AVS-P2 has been adopted by 106 number of applications including Chinese IPTV operators, Mobile TV 107 operators as well as digital terrestrial TV broadcasting operators. 109 AVS-P2 specification [1] defines the AVS-P2 bit stream syntax and 110 specifies constraints that must be met by AVS-P2 conformant bit 111 streams. It also specifies the complete process required to decode 112 the bit stream. However, it does not specify the AVS-P2 compression 113 algorithm, thus allowing for different ways to implement an AVS-P2 114 encoder. 116 AVS-P2 is a hybrid coding based on spatial and temporal prediction, 117 8x8 transform and entropy coding. It has one profile called Jizhun 118 profile. In this profile, there are 4 levels, which are level 4.0, 119 4.2, 6.0 and 6.2, respectively. 121 The AVS-P2 bit stream is defined as a hierarchy of layers. This is 122 conceptually similar to the notion of a protocol stack of networking 123 protocols. The outermost layer is called the video sequence layer. 124 The other layers are, picture, slice, macroblock and block. A video 125 sequence begins with a sequence header, followed by a series of one 126 or more coded pictures. Each picture begins with a picture header, 127 followed by a series of one or more slices. A slice comprises one 128 or more contiguous rows of macroblocks. Each macroblock consists of 129 one 16x16 luma block and two 8x8 chroma blocks for 4:2:0 format and 130 four 8x8 chroma blocks for 4:2:2. 132 AVS-P2 has Intra picture (I-picture), forward predicted picture 133 (P-picture), and bi-directional predicted picture (B-picture). The 134 prediction reference picture number is maximally two. The 135 predictions are at the integer-pel resolution and quarter-pel 136 resolution. It uses a 4-tap filter for half pel interpolation and a 137 4-tap filter for quarter pel interpolation. It uses in-loop 138 deblocking. It uses two dimensional context adaptive variable length 139 coding, 19 look-up tables are used. It uses 8x8 intra prediction 140 from the upper row and left column pels, five prediction angles 141 are used. 143 Each picture can be coded as an I-picture, P-picture, or B-picture. 144 Random accessible point is defined in AVS-P2. The sequence header 145 can occur repeatedly in the AVS-P2 video bit stream before any 146 random access point. 148 In Jizhun profile, each sequence header, picture header and slice is 149 considered a Coding Data Unit (CDU). A CDU is always byte-aligned 150 and is defined as a unit that can be parsed (i.e., syntax decoded) 151 independently of other information in the same layer. The beginning 152 of each CDU is signaled by an identifier called Start Code. 153 Macroblocks and blocks are not CDUs and thus do not have a Start 154 Code and are not necessarily byte-aligned. 156 The Start Code consists of four bytes. The first three bytes are the 157 Start Code Prefix with the fixed value of 0x000001. The fourth byte 158 is called the Start Code Data and it is used to indicate the type of 159 the CDU that follows the Start Code. The Start Code is always 160 byte-aligned and is transmitted in network byte order. To prevent 161 accidental emulation of the Start Code in the coded bit stream, 162 AVS-P2 defines an encapsulation mechanism that uses byte stuffing. 163 There are also other types of CDUs defined in AVS-P2. See Table 1 164 (in Section 7) of AVS-P2 specification [1] for a complete list of 165 CDUs and their corresponding Start Code Values. 167 1.2 Conventions used in this document 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 171 document are to be interpreted as described in BCP 14, RFC 2119 [2]. 173 2. Scope 174 The applications of this memo include video telephony, video 175 conferencing, Internet media streaming, IPTV, video-on-demand, etc. 177 3. Definitions and abbreviations 179 This memo uses definitions and abbreviations of AVS-P2 [1]. 180 Additionally, the following definitions and abbreviations also 181 apply to this specification. 183 3.1 Definitions 185 NAL unit (Network Abstract Layer Unit) 186 Before packetizing an AVS-P2 video bit stream using the RTP 187 Payload Format defined in this memo, firstly the bit stream MUST 188 be transformed into a NAL unit stream, i.e., mapping the CDU data 189 between every two consecutive Start Code prefixes (0x000001) in 190 the AVS-P2 video bit stream into a NAL unit. The detail 191 definitions of NAL unit and the transformation from AVS-P2 video 192 bitstream into NAL unit stream are described in Section 4 of this 193 memo. 195 NAL unit stream 196 A sequence composed of one or more NAL units. 198 NAL unit decoding order 199 The order of NALUs in a NAL unit stream. 201 Decoding order number (DON) 202 A field in the RTP payload structure or a derived variable 203 indicating NAL unit decoding order. Values of DON are in the 204 range of 0 to 65535, inclusive. After reaching the maximum value, 205 the value of DON wraps around to 0. 207 Transmission order 208 The order of packets in ascending RTP sequence number order (in 209 modulo arithmetic). Within an aggregation packet, the NAL unit 210 transmission order is the same as the order of appearance of NAL 211 units in the packet. 213 Access Unit 214 A series of NAL units which constitute a frame of coded picture. 215 Besides picture header and slice data, an access unit can also 216 contain other types of coding data. All data within the same 217 access unit MUST have the same timestamp value for RTP 218 packetization. 220 Media Aware Network Element (MANE) 221 A network element, such as a middlebox or application layer 222 gateway that is capable of parsing certain aspects of the RTP 223 payload headers or the RTP payload and reacting to the contents. 225 3.2 Abbreviation 227 DON: Decoding Order Number 228 DONB: Decoding Order Number Base 229 DOND: Decoding Order Number Difference 230 FEC: Forward Error Correction 231 FU: Fragmentation Unit 232 MTAP: Multi-Time Aggregation Packet 233 MTAP16: MTAP with 16-bit timestamp offset 234 MTAP24: MTAP with 24-bit timestamp offset 235 MTU: Maximum Transfer Unit 236 NAL: Network Abstraction Layer 237 NALU: NAL Unit 238 PSI£º Payload Structure Indicator 239 RTP: Real-time Transport Protocol 240 STAP: Single-Time Aggregation Packet 241 STAP-A: STAP type A 242 STAP-B: STAP type B 244 4. NAL Unit 246 The syntax of NAL unit used in this memo resembles the one defined 247 for H.264/AVC in IETF RFC 3984 [10]. An NAL unit is composed of two 248 parts: NAL unit header and NAL unit data. The NAL unit header 249 consists of exactly one byte, while the NAL unit data consists of a 250 series of one or more bytes. 252 The conversion process from AVS-P2 video bit stream to NAL unit 253 stream is as follows: first map the CDU data between every two 254 consecutive Start Code Prefixes (0x000001) in the AVS-P2 video bit 255 stream (including the first Start Code Value but excluding Start 256 Code Prefixes) into the NAL unit data, and then insert a NAL unit 257 header before the NAL unit data according the Start Code Value and 258 its context. 260 The format of NAL unit header is shown in Figure 1. 262 +---------------+ 263 |0|1|2|3|4|5|6|7| 264 +-+-+-+-+-+-+-+-+ 265 |F|NRI| Type | 266 +---------------+ 268 Figure 1. NAL unit header format 270 The syntax and semantics of the NAL unit are as follows: 272 F: 1 bit 273 Forbidden zero bit, its value SHOULD be 0. 275 NRI: 2 bit 276 NAL Reference Identification (nal_ref_idc). Value of non-zero 277 means that the data contained in this NAL unit is sequence header 278 or reference frame data. Value of 0 means that the data contained 279 in this NAL unit is not reference frame data. For sequence header 280 NAL unit, nal_ref_idc SHOULD NOT be 0. For a certain frame, if 281 nal_ref_idc of one NAL unit's is 0, then nal_ref_idc of all NAL 282 units in the same frame SHOULD be 0. Nal_ref_idc of NAL units for 283 I frames SHOULD NOT be 0. 285 Type: 5 bit 286 NAL unit type (nal_unit_type). The value of this field is decided 287 according to the start code value and the picuture header 288 contained in the following NAL unit data, and their context, as 289 shown in Table 1. 291 Table 1. Value assignment for the NAL unit type field in NAL 292 unit header 294 NALU Corresponding Reason for type assignment 295 type CDU in NALU data 296 ----------------------------------------------------------------- 297 0 reserved 298 1 Sequence header Start code value is 0xB0 299 2 Video extension Start code value is 0xB5 300 3 User data Start code value is 0xB2 301 4 Video edit Start code value is 0xB7 302 5 Picture header Start code value is 0xB3 303 of I frame 304 6 Picture header Start code value is 0xB6£¬and the 305 of P frame picture coding type of the picture 306 header is 01b 307 7 Picture header Start code value is 0xB6£¬and the 308 of B frame picture coding type of the picture 309 header is 10b 310 8 Slice data Start code value is 0x00~0xAF£¬and the 311 of I frame start code value of the last picture 312 header before this NALU is 0xB3 313 9 Slice data Start code value is 0x00~0xAF£¬and the 314 of P frame start code value of the last picture 315 header before this NALU is 0xB6, and 316 the picture coding type of the picture 317 header is 01b 318 10 Slice data Start code value is 0x00~0xAF£¬and the 319 of B frame start code value of the last picture 320 header before this NALU is 0xB6, and 321 the picture coding type of the picture 322 header is 10b 323 11-23 reserved 324 24-31 undefined 326 When the decoder receives NAL unit stream, before decoding, it MUST 327 discard every NAL unit header of an NAL unit, and then insert a 328 Start Code Prefix (0x000001) in the same position to transform the 329 NAL unit stream back into an AVS-P2 video bit stream. 331 5 RTP Payload Format 333 5.1 RTP Header Usage 335 The RTP header format is defined in IETF RFC 3550 [3] as shown in 336 Figure 2. 338 0 1 2 3 339 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 340 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 341 |V=2|P|X| CC |M| PT | sequence number | 342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 343 | Timestamp | 344 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 345 | synchronization source (SSRC) identifier | 346 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 347 | contributing source (CSRC) identifiers | 348 | .... | 349 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 351 Figure 2. RTP header according to RFC 3550 353 For the usage of RTP header, this document obeys the same rules 354 define in RFC 3550, except for some enhancements for the M and 355 Timestamp fields: 357 Marker bit (M): 1 bit 358 Set for the very last packet of the access unit indicated by the 359 RTP timestamp, in line with the normal use of the M bit in video 360 formats, to allow an efficient playout buffer handling. For 361 aggregation packets (STAP and MTAP), the marker bit in the RTP 362 header MUST be set to the value that the marker bit of the last 363 NAL unit of the aggregation packet would have been if it were 364 transported in its own RTP packet. Decoders MAY use this bit as 365 an early indication of the last packet of an access unit, but 366 MUST NOT rely on this property. 368 Timestamp: 32 bit 369 The RTP timestamp is set to the sampling timestamp of the 370 content. A 90 kHz clock rate MUST be used, which means that the 371 time unit of RTP timestamp is 1/90000 second. If the NAL unit 372 has no timing properties of its own, such as sequence header 373 NALU, its RTP timestamp is set to the timestamp of the first 374 following coded picture after it. The setting of the RTP 375 Timestamp for MTAPs is defined in section 5.6.2. 377 5.2 Common structure of the RTP Payload format 379 The document defines three different basic payload structures, while 380 each may be further divided into different sub-types. 382 Single NALU packet 383 Contains one and only one NAL unit in a single RTP payload. 385 Aggregation packet 386 Multiple NAL units are aggregated into a single RTP payload. 387 This packet exists in four versions, the Single-Time Aggregation 388 Packet type A (STAP-A), the Single-Time Aggregation Packet type B 389 (STAP-B), Multi-Time Aggregation Packet (MTAP) with 16-bit offset 390 (MTAP16), and Multi-Time Aggregation Packet (MTAP) with 24-bit 391 offset (MTAP24). 393 Fragmentation Units (FUs) 394 Used to fragment a single NAL unit over multiple RTP packets. 395 Exists with two versions, FU-A and FU-B. 397 Different payload structures are identified by the first byte of the 398 RTP payload, which is called PSI (payload structure indication). PSI 399 has the same format with NALU header, as shown in Figure 3. 401 +---------------+ 402 |0|1|2|3|4|5|6|7| 403 +-+-+-+-+-+-+-+-+ 404 |F|NRI| Type | 405 +---------------+ 407 Figure 3: Format of PSI 409 F: 1bit 410 Value 0 means that there are no bit errors and other semantic 411 errors in the RTP payload. Value 1 means that there may be bit 412 errors and other semantic errors in RTP payload. If bit error 413 is found in RTP payload, MANE SHOULD set F to 1. 415 NRI: 2bit 416 Besides the rules defined in Section 4 for NALU, NRI value in PSI 417 indicates the relative transmission priority of RTP packet. MANE 418 can use this information to protect more important RTP packet. 419 The highest priority is 11b, followed by 10b, and then 01b, and 420 00b. The NRI value for sequence header and I pictures SHOULD be 421 set to 11b. 423 Type: 5bit 424 Indicate the RTP payload structure, as shown in Table 2. 426 Table 2. Summary of RTP payload structure types 428 Type payload explain Section 429 struture 430 ----------------------------------------------------------------- 431 0 Undefined - 432 1-23 Single NALU Single NAL unit packet 5.5 433 24 STAP-A Single-time aggregation packet - A 5.6.1 434 25 STAP-B Single-time aggregation packet - B 5.6.1 435 26 MTAP16 Multi-time aggregation packet 5.6.2 436 with 16 bit offset 437 27 MTAP24 Multi-time aggregation packet 5.6.2 438 with 24 bit offset 439 28 FU-A Fragmentation unit - A 5.7 440 29 FU-B Fragmentation unit - B 5.7 441 30-31 Undefined - 443 5.3 Packetization Modes 445 This payload format specifies three cases of packetization modes: 446 Single NAL unit mode, Non-interleaved mode and Interleaved mode. 447 In the Single NAL unit mode or the non-interleaved mode, NAL units 448 are transmitted in NAL unit decoding order. The interleaved mode 449 allows transmission of NAL units out of NAL unit decoding order. 451 The used packetization mode governs which payload structures are 452 allowed in RTP payloads. The packetization mode in use MAY be 453 signaled by the value of the OPTIONAL packetization-mode media type 454 parameter or by external means. Table 3 summarizes the allowed 455 payload structures for each packetization mode. Some payload 456 structures values are reserved for future extensions. 458 Table 3. Summary of allowed payload structures for each 459 packetization mode (yes = allowed, no = disallowed, 460 ig = ignore) 462 PSI payload Single NALU Non-Interleaved Interleaved 463 Type structure mode mode mode 464 ----------------------------------------------------------------- 465 0 Undefined ig ig ig 466 1-23 NALU yes yes no 467 24 STAP-A no yes no 468 25 STAP-B no no yes 469 26 MTAP16 no no yes 470 27 MTAP24 no no yes 471 28 FU-A no yes yes 472 29 FU-B no no yes 473 30-31 Undefined ig ig ig 475 5.4 Decoding Order Number 477 In the interleaved packetization mode, the transmission order of NAL 478 units is allowed to differ from the decoding order of the NAL units. 479 Decoding order number (DON) is a field in the payload structure or a 480 derived variable that indicates the NAL unit decoding order. 482 The coupling of transmission and decoding order is controlled by the 483 OPTIONAL sprop-interleaving-depth media type parameter as follows. 484 When the value of the OPTIONAL sprop-interleaving-depth parameter is 485 equal to 0 (explicitly or per default) or transmission of NAL units 486 out of their decoding order is disallowed by external means, the 487 transmission order of NAL units MUST conform to the NAL unit 488 decoding order. When the value of the OPTIONAL 489 sprop-interleaving-depth parameter is greater than 0 or transmission 490 of NAL units out of their decoding order is allowed by external means, 492 o the order of NAL units in an MTAP16 and an MTAP24 is NOT REQUIRED 493 to be the NAL unit decoding order, and 495 o the order of NAL units generated by decapsulating STAP-Bs, MTAPs, 496 and FUs in two consecutive packets is NOT REQUIRED to be the NAL 497 unit decoding order. 499 The RTP payload structures for a single NAL unit packet, an STAP-A, 500 and an FU-A do not include DON. STAP-B and FU-B structures include 501 DON, and the structure of MTAPs enables derivation of DON as 502 specified in section 5.6.2. 504 In the single NAL unit packetization mode, the transmission order of 505 NAL units, determined by the RTP sequence number, MUST be the same 506 as their NAL unit decoding order. In the non-interleaved 507 packetization mode, the transmission order of NAL units in single 508 NAL unit packets, STAP-As, and FU-As MUST be the same as their NAL 509 unit decoding order. The NAL units within an STAP MUST appear in the 510 NAL unit decoding order. Thus, the decoding order is first provided 511 through the implicit order within a STAP, and second provided 512 through the RTP sequence number for the order between STAPs, FUs, 513 and single NAL unit packets. 515 Signaling of the value of DON for NAL units carried in STAP-B, MTAP, 516 and a series of fragmentation units starting with an FU-B is 517 specified in sections 5.6.1, 5.6.2, and 5.7, respectively. The DON 518 value of the first NAL unit in transmission order may be set to any 519 value. Values of DON are in the range of 0 to 65535, inclusive. 520 After reaching the maximum value, the value of DON wraps around to 0. 522 The decoding order of two NAL units contained in any STAP-B, MTAP, 523 or a series of fragmentation units starting with an FU-B is 524 determined as follows. Let DON(i) be the decoding order number of 525 the NAL unit having index i in the transmission order. Function 526 don_diff(m,n) is specified as follows: 528 If DON(m) == DON(n), don_diff(m,n) = 0 530 If (DON(m) < DON(n) and DON(n) - DON(m) < 32768), 531 don_diff(m,n) = DON(n) - DON(m) 533 If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768), 534 don_diff(m,n) = 65536 - DON(m) + DON(n) 536 If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768), 537 don_diff(m,n) = - (DON(m) + 65536 - DON(n)) 539 If (DON(m) > DON(n) and DON(m) - DON(n) < 32768), 540 don_diff(m,n) = - (DON(m) - DON(n)) 542 When don_diff(m,n) is equal to 0, then the NAL unit decoding order 543 of the two NAL units can be in either order. A positive value of 544 don_diff(m,n) indicates that the NAL unit having transmission order 545 index n follows, in decoding order, the NAL unit having transmission 546 order index m. A negative value of don_diff(m,n) indicates that the 547 NAL unit having transmission order index n precedes, in decoding 548 order, the NAL unit having transmission order index m. 550 Values of DON related fields (DON, DONB, and DOND; see section 5.6) 551 MUST be such that the decoding order determined by the values of 552 DON, as specified above, conforms to the NAL unit decoding order. 553 If the order of two NAL units in NAL unit decoding order is switched 554 and the new order does not conform to the NAL unit decoding order, 555 the NAL units MUST NOT have the same value of DON. If the order of 556 two consecutive NAL units in the NAL unit stream is switched and the 557 new order still conforms to the NAL unit decoding order, the NAL 558 units may have the same value of DON. Consequently, NAL units having 559 the same value of DON can be decoded in any order, and two NAL units 560 having a different value of DON SHOULD be passed to the decoder in 561 the order specified above. When two consecutive NAL units in the NAL 562 unit decoding order have a different value of DON, the value of DON 563 for the second NAL unit in decoding order SHOULD be the value of DON 564 for the first, incremented by one. 566 5.5 Single NAL Unit Packet 568 The single NAL unit packet MUST contain one and only one NAL unit as 569 defined in Section 4. This means that neither an aggregation packet 570 nor a fragmentation unit can be used within a single NAL unit 571 packet. In the payload structure of single NAL unit, the first byte 572 of the RTP payload (i.e., PSI) co-serves as the NALU header which is 573 directly followed by the NALU data, as shown in figure 4. 575 0 1 2 3 576 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 |F|NRI| Type |<---PSI/NALU Header | 579 +-+-+-+-+-+-+-+-+ | 580 | | 581 | Bytes 2..n of a Single NAL unit£¨NALU data£© | 582 | | 583 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 584 | : ... OPTIONAL RTP padding | 585 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 587 Figure 4. Single NAL Unit Packet RTP payload format 589 5.6 Aggregation Packets 591 Two types of aggregation packets are defined in this document: 593 o Single-time aggregation packet (STAP): aggregates NAL units with 594 identical NALU-time. Two types of STAPs are defined, one without 595 DON (STAP-A) and another including DON (STAP-B). 597 o Multi-time aggregation packet (MTAP): aggregates NAL units with 598 potentially differing NALU-time. Two different MTAPs are defined, 599 differing in the length of the NAL unit timestamp offset: MTAP16 600 and MTAP24. 602 The term NALU-time is defined as the value that the RTP timestamp 603 would have if that NAL unit would be transported in its own RTP 604 packet. Each NAL unit to be carried in an aggregation packet is 605 encapsulated in an aggregation unit. Please see below for the four 606 different aggregation units and their characteristics. Figure 5 607 shows the structure of the RTP payload format for aggregation 608 packets. 610 0 1 2 3 611 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 612 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 613 |F|NRI| Type |<---PSI | 614 +-+-+-+-+-+-+-+-+ | 615 | | 616 | one or more aggregation units | 617 | | 618 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 619 | : ... OPTIONAL RTP padding | 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 622 Figure 5. RTP payload format for aggregation packets 624 The RTP timestamp MUST be set to the earliest of the NALU times of 625 all the NAL units to be aggregated. The type field of PSI MUST be 626 set to the appropriate value, as indicated in Table 4. The F bit 627 of PSI MUST be cleared if all F bits of the aggregated NAL units 628 are zero; otherwise, it MUST be set to 1. The value of NRI in PSI 629 MUST be the maximum of all the NAL units carried in the aggregation 630 packet. 632 Table 4. Type field of PSI for STAPs and MTAPs 634 PSI Payload Timestamp offset DON related fields 635 Type structure field length£¨in bits£© £¨DON¡¢DONB¡¢DOND£© 636 present 637 ----------------------------------------------------------------- 638 24 STAP-A 0 NO 639 25 STAP-B 0 YES 640 26 MTAP16 16 YES 641 27 MTAP24 24 YES 643 The marker bit (M) in the RTP header is set to the value that the 644 marker bit of the last NAL unit of the aggregated packet would have 645 if it were transported in its own RTP packet. 647 Following PSI, the payload of an aggregation packet consists of one 648 or more aggregation units. See sections 5.6.1 and 5.6.2 for the four 649 different types of aggregation units. An aggregation packet can 650 carry as many aggregation units as necessary; however, the total 651 amount of data in an aggregation packet obviously MUST fit into an 652 IP packet, and the size SHOULD be chosen so that the resulting IP 653 packet is smaller than the MTU size. An aggregation packet MUST NOT 654 contain fragmentation units specified in section 5.7. Aggregation 655 packets MUST NOT be nested; i.e., an aggregation packet MUST NOT 656 contain another aggregation packet. 658 5.6.1 Single-Time Aggregation Packet 660 Single-time aggregation packet (STAP) SHOULD be used whenever NAL 661 units are aggregated that all share the same NALU-time. The payload 662 of an STAP-A does not include DON and consists of at least one 663 single-time aggregation unit, as presented in Figure 6. The payload 664 of an STAP-B consists of a 16-bit unsigned decoding order number 665 (DON) (in network byte order) followed by at least one single-time 666 aggregation unit, as presented in Figure 7. 668 The DON field specifies the value of DON for the first NAL unit in 669 an STAP-B in transmission order. For each successive NAL unit in 670 appearance order in an STAP-B, the value of DON is equal to (the 671 value of DON of the previous NAL unit in the STAP-B + 1) % 65536, 672 in which '%' stands for the modulo operation. 674 A single-time aggregation unit consists of 16-bit unsigned size 675 information (in network byte order) that indicates the size of the 676 following NAL unit in bytes (excluding these two octets, but 677 including the NAL unit type octet of the NAL unit), followed by the 678 NAL unit itself, including its NAL unit type byte. A single-time 679 aggregation unit is byte aligned within the RTP payload, but it may 680 not be aligned on a 32-bit word boundary. Figure 8 presents the 681 structure of the single-time aggregation unit. 683 0 1 2 3 684 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 685 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 686 |F|NRI|1|1|0|0|0| | 687 +-+-+-+-+-+-+-+-+ | 688 | | 689 | one or more single-time aggregation unit | 690 | | 691 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 692 | : 693 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 695 Figure 6. Payload format for STAP-A 697 0 1 2 3 698 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 700 |F|NRI|1|1|0|0|1| DON | | 701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 702 | | 703 | one or more single-time aggregation unit | 704 | | 705 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 706 | : 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 709 Figure 7. Payload format for STAP-B 711 0 1 2 3 712 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 714 : NALU size | | 715 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 716 | | 717 | NALU | 718 | | 719 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 720 | : 721 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 723 Figure 8. Structure for Single-Time aggregation unit 725 Figure 9 shows an example of an RTP packet that contains an 726 STAP-A. The STAP contains two single-time aggregation units, 727 labeled as 1 and 2 in the figure. 729 0 1 2 3 730 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 731 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 732 | RTP Header | 733 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 734 |F|NRI|1|1|0|0|0| NALU 1 Size | NALU 1 Header | 735 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 736 | NALU 1 Data | 737 : : 738 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 739 | | NALU 2 Szie | NALU 2 Header | 740 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 741 | NALU 2 Data | 742 : : 743 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 744 | : ... OPTIONAL RTP padding | 745 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 747 Figure 9. An example of an RTP packet including an STAP-A and two 748 single-time aggregation units 750 Figure 10 shows an example of an RTP packet that contains an STAP-B. 751 The STAP contains two single-time aggregation units, labeled as 1 752 and 2 in the figure. 754 0 1 2 3 755 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 756 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 757 | RTP Header | 758 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 759 |F|NRI|1|1|0|0|1| DON | NALU 1 Size | 760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 761 | NALU 1 Size | NALU 1 Header | NALU 1 Data | 762 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 763 : : 764 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 765 | | NALU 2 Size | NALU 2 Header | 766 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 767 | NALU 2 Data | 768 : : 769 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 770 | : ... OPTIONAL RTP padding | 771 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 773 Figure 10. An example of an RTP packet including an STAP-B and 774 two single-time aggregation units 776 5.6.2 Multi-Time Aggregation Packets (MTAPs) 778 The NAL unit payload of MTAPs consists of a 16-bit unsigned decoding 779 order number base (DONB) (in network byte order) and one or more 780 multi-time aggregation units, as shown in Figure 11. DONB MUST 781 contain the value of DON for the first NAL unit in the NAL unit 782 decoding order among the NAL units of the MTAP. 784 0 1 2 3 785 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 786 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 787 |F|NRI| Type | DONB | | 788 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 789 | | 790 | one or more multi-time aggregation units | 791 | | 792 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 793 | : 794 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 796 Figure 11. MTAP payload format 798 Two different multi-time aggregation units are defined in this 799 specification. Both of them consist of 16 bits unsigned size 800 information of the following NAL unit (in network byte order), 801 an 8-bit unsigned decoding order number difference (DOND), and 802 n bits (in network byte order) of timestamp offset (TS offset) 803 for this NAL unit, whereby n can be 16 or 24. The choice between 804 the different MTAP types (MTAP16 and MTAP24) is application 805 dependent: the larger the timestamp offset is, the higher the 806 flexibility of the MTAP, but the transport efficiency is lower. 808 The structure of the multi-time aggregation units for MTAP16 and 809 MTAP24 are presented in Figures 12 and 13, respectively. The 810 starting or ending position of an aggregation unit within a packet 811 is NOT REQUIRED to be on a 32-bit word boundary. The DON of the 812 following NAL unit is equal to (DONB + DOND) % 65536, in which % 813 denotes the modulo operation. This memo does not specify how the NAL 814 units within an MTAP are ordered, but, in most cases, NAL unit 815 decoding order SHOULD be used. 817 The timestamp offset field MUST be set to a value equal to the value 818 of the following formula: If the NALU-time is larger than or equal 819 to the RTP timestamp of the packet, then the timestamp offset equals 820 (the NALU-time of the NAL unit - the RTP timestamp of the packet). 821 If the NALU-time is smaller than the RTP timestamp of the packet, 822 then the timestamp offset is equal to the NALU-time + (2^32 - the 823 RTP timestamp of the packet). 825 0 1 2 3 826 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 827 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 828 : NALU Size | DOND | TS offset | 829 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 830 | TS offset | | 831 +-+-+-+-+-+-+-+-+ NALU | 832 | | 833 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 834 | : 835 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 837 Figure 12. Multi-time aggregation unit for MTAP16 839 0 1 2 3 840 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 841 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 842 : NALU Szie | DOND | TS offset | 843 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 844 | TS offset | | 845 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 846 | NALU | 847 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 848 | : 849 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 851 Figure 13. Multi-time aggregation unit for MTAP24 853 For the "earliest" multi-time aggregation unit in an MTAP the 854 timestamp offset MUST be zero. Hence, the RTP timestamp of the 855 MTAP itself is identical to the earliest NALU-time. The "earliest" 856 multi-time aggregation unit means the one that would have the 857 smallest extended RTP timestamp among all the aggregation units 858 of an MTAP if the aggregation units were encapsulated in single 859 NAL unit packets. An extended timestamp is a timestamp that has 860 more than 32 bits and is capable of counting the wraparound of 861 the timestamp field, thus enabling one to determine the smallest 862 value if the timestamp wraps. Such an "earliest" aggregation unit 863 may not be the first one in the order in which the aggregation 864 units are encapsulated in an MTAP. The "earliest" NAL unit need 865 not be the same as the first NAL unit in the NAL unit decoding 866 order either. 868 Figure 14 presents an example of an RTP packet that contains a 869 multi-time aggregation packet of type MTAP16 that contains two 870 multi-time aggregation units, labeled as 1 and 2 in the figure. 872 0 1 2 3 873 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 874 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 875 | RTP Header | 876 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 877 |F|NRI|1|1|0|1|0| DONB | NALU 1 Size | 878 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 879 | NALU 1 Size | NALU 1 DOND | NALU 1 TS offset | 880 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 881 | NALU 1 Header| NALU 1 Data | 882 +-+-+-+-+-+-+-+-+ + 883 : : 884 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 885 | | NALU 2 Size | NALU 2 DOND | 886 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 887 | NALU 2 TS offset | NALU 2 Header | NALU 2 Data | 888 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 889 : : 890 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 891 | : ... OPTIONAL RTP padding | 892 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 894 Figure 14. An RTP packet including a multi-time aggregation packet 895 of type MTAP16 and two multi-time aggregation units 897 Figure 15 presents an example of an RTP packet that contains a 898 multi-time aggregation packet of type MTAP24 that contains two 899 multi-time aggregation units, labeled as 1 and 2 in the figure. 901 0 1 2 3 902 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 903 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 904 | RTP Header | 905 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 906 |F|NRI|1|1|0|1|1| DONB | NALU 1 Size | 907 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 908 | NALU 1 Size | NALU 1 DOND | NALU 1 TS offset | 909 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 910 |NALU 1 TS offs | NALU 1 Header| NALU 1 Data | 911 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 912 : : 913 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 914 | | NALU 2 Size | NALU 2 DOND | 915 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 916 | NALU 2 TS offset | NALU 2 Header| 917 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 918 : NALU 2 Data : 919 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 920 | : ... OPTIONAL RTP padding | 921 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 923 Figure 15. An RTP packet including a multi-time aggregation packet 924 of type MTAP24 and two multi-time aggregation units 926 5.7 Fragmentation Units(FUs) 928 This payload type allows fragmenting a NAL unit into several RTP 929 packets. Doing so on the application layer instead of relying on 930 lower layer fragmentation (e.g., by IP) ,The payload format is 931 capable of transporting NAL units bigger than 64 kbytes over an 932 IPv4 network that may be present in prerecorded video, particularly 933 in High Definition formats. 935 Fragmentation is defined only for a single NAL unit and not for any 936 aggregation packets. A fragment of a NAL unit consists of an integer 937 number of consecutive octets of that NAL unit. Each octet of the NAL 938 unit MUST be part of exactly one fragment of that NAL unit. 939 Fragments of the same NAL unit MUST be sent in consecutive order 940 with ascending RTP sequence numbers, (with no other RTP packets 941 within the same RTP packet stream being sent between the first and 942 last fragment. Similarly, a NAL unit MUST be reassembled in RTP 943 sequence number order. 945 A NAL unit fragmented is called fragmented NAL unit. STAPs and MTAPs 946 MUST NOT be fragmented. FUs MUST NOT be nested; i.e., an FU MUST NOT 947 contain another FU. 949 The RTP timestamp of an RTP packet carrying an FU is set to the NALU 950 time of the fragmented NAL unit. 952 Figure 16 presents the RTP payload format for FU-As. An FU-A 953 consists of a PSI (1 byte), a fragmentation unit header (1 byte), 954 and a fragmentation unit payload. 956 0 1 2 3 957 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 958 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 959 |F|NRI|1|1|1|0|0| FU Header | | 960 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 961 | | 962 | FU payload | 963 | | 964 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 965 | : ... OPTIONAL RTP padding | 966 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 968 Figure 16. RTP payload for FU-A 970 Figure 17 presents the RTP payload format for FU-Bs. An FU-B 971 consists of a PSI (1 byte), a fragmentation unit header (1 byte), 972 a decoding order number (DON, in network byte order), and a 973 fragmentation unit payload. 975 0 1 2 3 976 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 977 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 978 |F|NRI|1|1|1|0|0| FU header | DON | 979 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 980 | | 981 | FU payload | 982 | | 983 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 984 | : ... OPTIONAL RTP padding | 985 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 986 Figure 17. RTP payload for FU-B 988 NAL unit type FU-B MUST be used in the interleaved packetization 989 mode for the first fragmentation unit of a fragmented NAL unit. NAL 990 unit type FU-B MUST NOT be used in any other case. In other words, 991 in the interleaved packetization mode, each NALU that is fragmented 992 has an FU-B as the first fragment, followed by one or more FU-A 993 fragments. 995 Figure 18 shows the format of the FU header: 997 +---------------+ 998 |0|1|2|3|4|5|6|7| 999 +-+-+-+-+-+-+-+-+ 1000 |S|E|R| Type | 1001 +---------------+ 1003 Figure 18. FU header format 1005 S: 1 bit 1006 When set to one, the Start bit indicates the start of a 1007 fragmented NAL unit. When the following FU payload is not the 1008 start of a fragmented NAL unit payload, the Start bit is set 1009 to zero. 1011 E: 1 bit 1012 When set to one, the End bit indicates the end of a fragmented 1013 NAL unit, i.e., the last byte of the payload is also the last 1014 byte of the fragmented NAL unit. When the following FU payload 1015 is not the last fragment of a fragmented NAL unit, the End bit 1016 is set to zero. 1018 R: 1 bit 1019 The Reserved bit MUST be equal to 0 and MUST be ignored by the 1020 receiver. 1022 Type: 5 bits 1023 The NAL unit payload type as defined in Table 1 of Section 4. 1025 The value of DON in FU-Bs is selected as described in section 5.4. 1027 A fragmented NAL unit MUST NOT be transmitted in one FU; i.e., the 1028 Start bit and End bit MUST NOT both be set to one in the same FU 1029 header. 1031 The FU payload consists of fragments of the payload of the 1032 fragmented NAL unit so that if the fragmentation unit payloads of 1033 consecutive FUs are sequentially concatenated, the payload of the 1034 fragmented NAL unit can be reconstructed. The NAL unit type octet 1035 of the fragmented NAL unit is not included as such in the 1036 fragmentation unit payload, but rather the information of the NAL 1037 unit type octet of the fragmented NAL unit is conveyed in F and NRI 1038 fields of the FU indicator octet of the fragmentation unit and in 1039 the type field of the FU header. A FU payload MAY have any number 1040 of octets and MAY be empty. 1042 If a fragmentation unit is lost, the receiver SHOULD discard all 1043 following fragmentation units in transmission order corresponding 1044 to the same fragmented NAL unit. 1046 A receiver in an endpoint or in a MANE MAY aggregate the first n-1 1047 fragments of a NAL unit to an (incomplete) NAL unit, even if 1048 fragment n of that NAL unit is not received. In this case, the 1049 forbidden_zero_bit of the NAL unit MUST be set to one to indicate 1050 a syntax violation. 1052 6. Packetization Rules 1054 6.1 Common Packetization Rules 1056 All senders MUST enforce the following packetization rules 1057 regardless of the packetization mode in use: 1059 o Coded picture header or slice NAL units belonging to the same 1060 coded picture (and thus sharing the same RTP timestamp value) MAY 1061 be sent in any order permitted by the applicable profiles defined 1062 in AVS-P2; However, for delay-critical systems, they SHOULD be 1063 sent in their original coding order to minimize the delay. 1065 o Sequence headers are handled in accordance with the rules and 1066 recommendations given in section 7.3. 1068 o Senders (include MANE) MUST NOT duplicate any NAL unit except for 1069 sequence header or picture header NAL units. Sequence header NAL 1070 units MUST NOT be duplicated to affect any active sequence 1071 header. Duplicated Picture header NAL units MUST be followed by 1072 the picture's slice NAL units (but MAY not be the first slice of 1073 the picture). Duplication SHOULD be performed on the application 1074 layer and not by duplicating RTP packets (with identical sequence 1075 numbers). 1077 Senders using the non-interleaved mode and the interleaved mode MUST 1078 enforce the following packetization rule: 1080 o MANEs MAY convert many single NAL unit packets into one 1081 aggregation packet, convert an aggregation packet into several 1082 single NAL unit packets, or mix both concepts, in an RTP 1083 translator. The RTP translator SHOULD take into account at least 1084 the following parameters: path MTU size, unequal protection 1085 mechanisms, bearable latency of the system, and buffering 1086 capabilities of the receiver. 1088 6.2 Single NAL Unit Mode 1089 This mode is in use when the value of the OPTIONAL 1090 packetization-mode media type parameter is equal to 0, the 1091 packetization-mode is not present, or no other packetization mode 1092 is signaled by external means. All receivers MUST support this 1093 mode. Only single NAL unit packets MAY be used in this mode. STAPs, 1094 MTAPs, and FUs MUST NOT be used. The transmission order of single 1095 NAL unit packets MUST comply with the NAL unit decoding order. 1097 6.3 Non-Interleaved Mode 1099 This mode is in use when the value of the OPTIONAL 1100 packetization-mode media type parameter is equal to 1 or the mode 1101 is turned on by external means. Only single NAL unit packets, 1102 STAP-As, and FU-As MAY be used in this mode. STAP-Bs, MTAPs, and 1103 FU-Bs MUST NOT be used. The transmission order of NAL units MUST 1104 comply with the NAL unit decoding order. 1106 6.4 Interleaved Mode 1108 This mode is in use when the value of the OPTIONAL 1109 packetization-mode media type parameter is equal to 2 or the mode 1110 is turned on by external means. Some receivers MAY support this 1111 mode. Only STAP-Bs, MTAPs, FU-As, and FU-Bs MAY be used. Single 1112 NAL unit packets and STAP-As MUST NOT be used. The transmission 1113 order of packets and NAL units is constrained as specified in 1114 section 5.4. 1116 7. Payload Format Parameters 1118 This section specifies the media type parameters that MAY be used 1119 to select optional features of the payload format and certain 1120 features of the bitstream. The parameters are specified here as 1121 part of the media subtype registration for the AVS-P2 video 1122 specification. A mapping of the parameters into the Session 1123 Description Protocol (SDP) [4] is also provided for applications 1124 that use SDP. Equivalent parameters could be defined elsewhere 1125 for use with control protocols that do not use media type parameters 1126 or SDP. 1128 Some parameters provide a receiver with the properties of the stream 1129 that will be sent. The name of all these parameters starts with 1130 "sprop" for stream properties. Some of these "sprop" parameters are 1131 limited by other payload or codec configuration parameters. The 1132 media sender selects all "sprop" parameters rather than the 1133 receiver. This uncommon characteristic of the "sprop" parameters 1134 MAY NOT be compatible with some signaling protocol concepts, in 1135 which case the use of these parameters SHOULD be avoided. 1137 7.1 Media Type Registration 1139 This registration uses the template defined in IETF RFC 4288 [5] 1140 and follows IETF RFC 3555 [6]. 1142 The media subtype for the AVS-P2 video is allocated from the IETF 1143 tree. The receiver MUST ignore any unspecified parameter. 1145 Media Type name: 1146 video 1148 Media subtype name: 1149 AVS1-P2 1151 Required parameters: 1152 none 1154 Optional parameters: 1156 profile-level-id: 1157 A base16 [7] (hexadecimal) representation of the following 1158 two bytes in the sequence header of AVS-P2: profile_id and 1159 level_id. 1161 If the profile-level-id parameter is used to indicate 1162 properties of a AVS-P2 bit stream, it indicates the profile 1163 and level that has to support in order to comply with when 1164 it decodes the stream. 1166 If the profile-level-id parameter is used for capability 1167 exchange or session setup procedure, it indicates the profile 1168 that the codec supports and the highest level supported for 1169 the signaled profile. 1171 If no profile-level-id is present, the Jizhun Profile 1172 without additional constraints at Level 4.0 MUST be implied. 1174 max-mbps, max-fs, max-dpb, and max-br: 1175 These parameters MAY be used to signal the capabilities of a 1176 receiver implementation. These parameters MUST NOT be used for 1177 any other purposes. The profile-level-id parameter MUST be 1178 present in the same receiver capability description that 1179 contains any of these parameters. The level conveyed in the 1180 value of the profile-level-id parameter MUST be such that the 1181 receiver is fully capable of supporting. These four parameters 1182 MAY be used to indicate capabilities of the receiver that 1183 extend the required capabilities of the signaled level, as 1184 specified below. 1186 When more than one parameter from the four is present, the 1187 receiver MUST support all signaled capabilities 1188 simultaneously. For example, if both max-mbps and max-br 1189 are present, the signaled level with the extension of both 1190 the frame rate and bit rate is supported by the receiver. 1191 That is, the receiver is able to decode bit stream in which 1192 the macroblock processing rate is up to max-mbps (inclusive), 1193 the bit rate is up to max-br (inclusive), the coded picture 1194 buffer size is derived as specified in the semantics of the 1195 max-br parameter below, and other properties comply with the 1196 level specified in the value of the profile-level-id 1197 parameter. 1199 A receiver MUST NOT signal values of max-mbps, max-fs, 1200 max-dpb, and max-br that meet the requirements of a higher 1201 level, referred to as level A herein, compared to the level 1202 specified in the value of the profile-level-id parameter, if 1203 the receiver can support all the properties of level A. 1205 max-mbps: 1206 The value of max-mbps is an integer indicating the maximum 1207 macroblock processing rate in units of macroblocks per second. 1208 The max-mbps parameter signals that the receiver is capable of 1209 decoding video at a higher rate than is REQUIRED by the 1210 signaled level conveyed in the value of the profile-level-id 1211 parameter. When max-mbps is signaled, the receiver MUST be 1212 able to decode AVS-P2 bit streams that conform to the signaled 1213 level, with the exception that the value of maximum 1214 microblocks per second in Table B.4 and B.5 of AVS-P2 [1] for 1215 the signaled level is replaced with the value of max-mbps. The 1216 value of max-mbps MUST be greater than or equal to the value 1217 of maximum microblocks per second for the level given in Table 1218 B.4 and B.5 of AVS-P2. Senders MAY use this knowledge to send 1219 a given size at a higher frame rate than is indicated in the 1220 signaled level. 1222 max-fs: 1223 The value of max-fs is an integer indicating the maximum frame 1224 size in units of macroblocks. The max-fs parameter signals 1225 that the receiver is capable of decoding larger picture sizes 1226 than are REQUIRED by the signaled level conveyed in the value 1227 of the profile-level-id parameter. When max-fs is signaled, 1228 the receiver MUST be able to decode bit streams that conform 1229 to the signaled level, with the exception that the value of 1230 maximum macroblocks per frame in Table B.4 and B.5 of AVS-P2 1231 for the signaled level is replaced with the value of max-fs. 1232 The value of max-fs MUST be greater than or equal to the value 1233 of maximum macroblocks per frame for the level given in Table 1234 B.4 and B.5 of AVS-P2. Senders MAY use this knowledge to send 1235 larger pictures at a proportionally lower frame rate than is 1236 indicated in the signaled level. 1238 max-dpb: 1239 The value of max-dpb is an integer indicating the maximum 1240 decoded picture buffer size in units of 1000 bits. The 1241 max-dpb parameter signals that the receiver has more memory 1242 than the minimum amount of decoded picture buffer memory 1243 required by the signaled level conveyed in the value of the 1244 profile-level-id parameter. When max-dpb is signaled, the 1245 receiver MUST be able to decode bit streams that conform to 1246 the signaled level, with the exception that the value of BBV 1247 buffer size in Table B.4 and B.5 of AVS-P2 for the signaled 1248 level is replaced with the value of 1000*(max-dpb). The value 1249 of 1000*(max-dpb) MUST be greater than or equal to the value 1250 of BBV buffer size for the level given in Table B.4 and B.5 1251 of AVS-P2. Senders MAY use this knowledge to construct coded 1252 streams with improved compression compared to BBV buffer size 1253 of the signaled profile. 1255 max-br: 1256 The value of max-br is an integer indicating the maximum video 1257 bit rate in units of 1000 bits per second. The max-br 1258 parameter signals that the video decoder of the receiver is 1259 capable of decoding video at a higher bit rate than is 1260 required by the signaled level conveyed. When max-br is 1261 signaled, the video codec of the receiver MUST be able to 1262 decode bit streams that conform to the signaled level, 1263 conveyed in the profile-level-id parameter, with the exception 1264 that the value of maximum bit rate in Table B.4 and B.5 of 1265 AVS-P2 for the signaled level is replaced with 1000*(max-br). 1266 The value of 1000*(max-br) MUST be greater than or equal to 1267 the value of maximum bit rate for the signaled level given in 1268 TableB-3, B-4. Senders MAY use this knowledge to send higher 1269 bitrate video as allowed in the level definition to achieve 1270 improved video quality. 1272 sprop-parameter-sets: 1273 This parameter MAY be used to convey any sequence header bit 1274 stream. The parameter MUST NOT be used to indicate codec 1275 capability in any capability exchange procedure. The value 1276 of the parameter is the base64 [7] representation of the 1277 sequence header bit stream. The headers are conveyed in 1278 decoding order, and a comma is used to separate any pair of 1279 headers in the list. 1281 parameter-add: 1282 This parameter MAY be used to signal whether the receiver of 1283 this parameter is allowed to add headers in its signaling 1284 response using the sprop-parameter-sets parameter. The value 1285 of this parameter is either 0 (deny) or 1 (allowing). If the 1286 parameter is not present, its value MUST be 1. 1288 packetization-mode: 1289 This parameter signals the properties of an RTP payload type 1290 or the capabilities of a receiver implementation. Only a 1291 single configuration point can be indicated; thus, when 1292 capabilities to support more than one packetization-mode are 1293 declared, multiple configuration points (RTP payload types) 1294 must be used. When the value of packetization-mode is equal 1295 to 0 or packetization-mode is not present, the single NAL 1296 mode, as defined in section 6.2. When the value of 1297 packetization-mode is equal to 1, the non-interleaved mode, 1298 as defined in section 6.3, MUST be used. When the value of 1299 packetization-mode is equal to 2, the interleaved mode, as 1300 defined in section 6.4 , MUST be used. The value of 1301 packetization mode MUST be an integer in the range of 0..2, 1302 inclusive.. 1304 sprop-interleaving-depth: 1305 This parameter MUST NOT be present when packetization-mode 1306 is not present or the value of packetization-mode is equal 1307 to 0 or 1. This parameter MUST be present when the value of 1308 packetization-mode is equal to 2. 1310 This parameter signals the properties of a NAL unit stream. 1311 It specifies the maximum number of NAL units that precede any 1312 NAL unit in the NAL unit stream in transmission order and 1313 follow the NAL unit in decoding order. Consequently, it is 1314 guaranteed that receivers can reconstruct NAL unit decoding 1315 order when the buffer size for NAL unit decoding order 1316 recovery is at least the value of sprop-interleaving-depth + 1 1317 in terms of NAL units. The value of sprop-interleaving-depth 1318 MUST be an integer in the range of 0 to 32767, inclusive. 1320 sprop-deint-buf-req: 1321 This parameter MUST NOT be present when packetization-mode is 1322 not present or the value of packetization-mode is equal to 0 1323 or 1. It MUST be present when the value of packetization-mode 1324 is equal to 2. 1326 sprop-deint-buf-req signals the required size of the 1327 deinterleaving buffer for the NAL unit stream. The value of 1328 the parameter MUST be greater than or equal to the maximum 1329 buffer occupancy (in units of bytes) required in such a 1330 deinterleaving buffer that is specified in section 11.2. 1332 The value of sprop-deint-buf-req must be an integer in the 1333 range of 0 to 4294967295, inclusive. 1335 deint-buf-cap: 1336 This parameter signals the capabilities of a receiver 1337 implementation and indicates the amount of deinterleaving 1338 buffer space in units of bytes that the receiver has available 1339 for reconstructing the NAL unit decoding order. A receiver is 1340 able to handle any stream for which the value of the 1341 sprop-deint-buf-req parameter is smaller than or equal to 1342 this parameter. 1344 If the parameter is not present, then a value of 0 MUST be 1345 used for deint-buf-cap. The value of deint-buf-cap MUST be 1346 an integer in the range of 0 to 4294967295, inclusive. 1348 sprop-init-buf-time: 1350 This parameter MAY be used to signal the properties of a 1351 NAL unit stream. The parameter MUST NOT be present, if the 1352 value of packetization-mode is equal to 0 or 1. 1354 The parameter signals the initial buffering time that a 1355 receiver MUST buffer before starting decoding to recover the 1356 NAL unit decoding order from the transmission order. The 1357 parameter is the maximum value of (transmission time of a 1358 NAL unit - decoding time of the NAL unit), assuming reliable 1359 and instantaneous transmission, the same timeline for 1360 transmission and decoding, and that decoding starts when the 1361 first packet arrives. 1363 An example of specifying the value of spropinit-buf-time 1364 follows. A NAL unit stream is sent in the following 1365 interleaved order, in which the value corresponds to the 1366 decoding time and the transmission order is from left to 1367 right: 1369 0 2 1 3 5 4 6 8 7 ... 1371 Assuming a steady transmission rate of NAL units, the 1372 transmission times are: 1374 0 1 2 3 4 5 6 7 8 ... 1376 Subtracting the decoding time from the transmission time 1377 column-wise results in the following series: 1379 0 -1 1 0 -1 1 0 -1 1 ... 1381 Thus, in terms of intervals of NAL unit transmission times, 1382 the value of sprop-init-buf-time in this example is 1. 1383 The parameter is coded as a non-negative base10 integer 1384 representation in clock ticks of a 90-kHz clock. If the 1385 parameter is not present, then no initial buffering time value 1386 is defined. Otherwise the value of sprop-initbuf-time MUST be 1387 an integer in the range of 0 to 4294967295, inclusive. 1389 In addition to the signaled sprop-init-buftime, receivers 1390 SHOULD take into account the transmission delay jitter 1391 buffering, including buffering for the delay jitter caused 1392 by any network elements. 1394 sprop-max-don-diff: 1395 This parameter MAY be used to signal the properties of a NAL 1396 unit stream. It MUST NOT be used to signal transmitter or 1397 receiver or codec capabilities. The parameter MUST NOT be 1398 present if the value of packetization-mode is equal to 0 or 1. 1399 sprop-max-don-diff is an integer in the range of 0 to 32767, 1400 inclusive. If sprop-max-don-diff is not present, the value of 1401 the parameter is unspecified. sprop-maxdon-diff is calculated 1402 as follows: 1404 sprop-max-don-diff = max{AbsDON(i) -AbsDON(j)}, 1405 for any i and any j>i, 1407 where i and j indicate the index of the NAL unit in the 1408 transmission order and AbsDON denotes a decoding order number 1409 of the NAL unit that does not wrap around to 0 after 65535. 1410 In other words, AbsDON is calculated as follows: 1412 Let m and n be consecutive NAL units in transmission order. 1413 For the very first NAL unit in transmission order (whose 1414 index is 0), AbsDON(0) = DON(0). For other NAL units, AbsDON 1415 is calculated as follows: 1417 If DON(m) == DON(n), AbsDON(n) = AbsDON(m) 1419 If (DON(m) < DON(n) and DON(n) - DON(m) < 32768), 1420 AbsDON(n) = AbsDON(m) + DON(n) - DON(m) 1422 If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768), 1423 AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n) 1425 If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768), 1426 AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n)) 1428 If (DON(m) > DON(n) and DON(m) - DON(n) < 32768), 1429 AbsDON(n) = AbsDON(m) - (DON(m) - DON(n)) 1431 where DON(i) is the decoding order number of the NAL unit 1432 having index i in the transmission order. The decoding order 1433 number is specified in section 5.5. 1435 max-rcmd-nalu-size: 1436 This parameter MAY be used to signal the capabilities of a 1437 receiver. The parameter MUST NOT be used for any other 1438 purposes. The value of the parameter indicates the largest 1439 NALU size in bytes that the receiver can handle efficiently. 1440 The parameter value is a recommendation, not a strict upper 1441 boundary. The sender MAY create larger NALUs but must be 1442 aware that the handling of these may come at a higher cost 1443 than NALUs conforming to the limitation. 1445 The value of max-rcmd-nalu-size MUST be an integer in the 1446 range of 0 to 4294967295, inclusive. If this parameter is 1447 not specified, no known limitation to the NALU size exists. 1448 Senders still have to consider the MTU size available 1449 between the sender and the receiver and SHOULD run MTU 1450 discovery for this purpose. 1452 Encoding considerations: 1453 This media type is framed and contains binary data. 1455 Security considerations: 1456 See Section 8 of RFC xxxx. 1458 Interoperability considerations: 1459 None. 1461 Public specification: 1462 RFC xxxx. 1464 Applications that use this media type: 1465 Video telephone, video conferencing, Internet media streaming, 1466 IPTV, video-on-demand, etc. 1468 Additional information: 1469 None. 1471 Person and email address to contact for further information: 1472 lshuo@jdl.ac.cn 1474 Intended usage: 1475 COMMON. 1477 Restrictions on usage: 1478 This media type depends on RTP framing; therefore, it is only 1479 defined for transfer via RTP (IETF RFC 3550). 1481 File extensions: 1482 None. 1484 Macintosh file type code: 1485 None. 1487 Object identifier or OID: 1488 None. 1490 Author: 1491 lshuo@jdl.ac.cn 1493 Change controller: 1494 IETF Audio/Video Transport Working Group delegated from the IESG. 1496 7. 2 SDP Parameters 1498 7.2.1 Mapping of Media Type Parameters to SDP 1500 The media type string "video/AVS1-P2" is mapped to fields in the 1501 Session Description Protocol (SDP) [4] as follows: 1503 o The media name in the "m=" line of SDP MUST be video (the type 1504 name). 1506 o The encoding name in the "a=rtpmap" line of SDP MUST be AVS1-P2 1507 (the subtype name). 1509 o The clock rate in the "a=rtpmap" line MUST be 90000. 1511 o The OPTIONAL parameters "profile-level-id", "max-mbps", 1512 "max-fs", "max-dpb", "max-br", "sprop-parameter-sets", 1513 "parameter-add", "packetization-mode", 1514 "sprop-interleaving-depth", "sprop-deint-buf-req", 1515 "deint-buf-cap", "sprop-init-buf-time", "sprop-max-don-diff", 1516 and "max-rcmd-nalu-size", when present, MUST be included in the 1517 "a=fmtp" line of SDP. These parameters are expressed in the form 1518 of a semicolon separated list of parameter=value pairs. 1520 An example of media representation in SDP is as follows (Baseline 1521 Profile, Level 6.0): 1523 m=video 49170 RTP/AVP 98 1524 a=rtpmap:98 AVS1-P2/90000 1525 a=fmtp:98 profile-level-id=2040; sprop-parameter-sets=[SH#0] 1527 where [SH#0] means a base64 expression of sequence header. 1529 7.2.2 Usage with the SDP Offer/Answer Model 1531 When AVS-P2 is offered over RTP using SDP in an Offer/Answer 1532 model [8] for negotiation for unicast usage, the following 1533 limitations and rules apply: 1535 o The parameters identifying a AVS1-P video media format are 1536 "profile-level-id" , "packetization-mode" and 1537 "sprop-deint-buf-req" (if "packetization-mode" is equal to 2). 1538 These three parameters MUST be used symmetrically, which means 1539 the answerer MUST either maintain all configuration parameters 1540 or remove the media format (payload type) completely, if one or 1541 more of the parameter values are not supported. 1543 To simplify handling and matching of these configurations, the same 1544 RTP payload type number used in the offer SHOULD also be used in the 1545 answer, as specified in [8]. An answer MUST NOT contain a payload 1546 type number used in the offer unless the configuration 1547 ("profile-level-id", "packetization-mode", and if present 1548 "sprop-deint-buf-req") is the same as in the offer. 1550 o The parameters "sprop-parameter-sets", "sprop-deint-buf-req", 1551 "sprop-interleaving -depth" , "sprop-max-don-diff", 1552 and "sprop-init-buf-time" describe the properties of the AVS-P2 1553 bit stream that the offerer or answerer is sending for this 1554 media format configuration. This differs from the normal usage 1555 of the Offer/Answer parameters: normally such parameters declare 1556 the properties of the stream that the offerer or the answerer is 1557 able to receive. When dealing with AVS-P2, the offerer assumes 1558 that the answerer will be able to receive media encoded using 1559 the configuration being offered. 1561 o The capability parameters "max-mbps", "max-fs", "max-dpb", 1562 "max-br", and "max-rcmd-nalu-size" MAY be used to declare further 1563 capabilities. Their interpretation depends on the direction 1564 attribute. When the direction attribute is sendonly, then the 1565 parameters describe the limits of the RTP packets and the NAL 1566 unit stream that the sender is capable of producing. When the 1567 direction attribute is sendrecv or recvonly, then the parameters 1568 describe the limitations of what the receiver accepts. 1570 o As specified above, an offerer has to include the size of the 1571 deinterleaving buffer in the offer for an interleaved AVS-P2 1572 stream. To enable the offerer and answerer to inform each other 1573 about their capabilities for deinterleaving buffering, both 1574 parties are RECOMMENDED to include "deint-buf-cap". This 1575 information MAY be used when the value for "sprop-deint-buf-req" 1576 is selected in a second round of offerand answer. For interleaved 1577 streams, it is also RECOMMENDED to consider offering multiple 1578 payload types with different buffering requirements when the 1579 capabilities of the receiver are unknown. 1581 o The "sprop-parameter-sets" parameter is used as described above. 1582 In addition, an answerer MUST maintain all sequence headers 1583 received in the offer in its answer. Depending on the value of 1584 the "parameter-add" parameter, different rules apply: If 1585 "parameter-add" is 0, the answer MUST NOT add any additional 1586 headers. If "parameter-add" is 1, the answerer, in its answer, 1587 MAY add additional headers to the "sprop-parameter-sets" 1588 parameter. The answerer MUST also, independent of the value of 1589 "parameter-add", accept to receive a video stream using the 1590 sprop-parameter-sets it declared in the answer. 1592 For streams being delivered over multicast, the following rules 1593 apply in addition: 1595 o The stream properties parameters "sprop-parameter-sets", 1596 "sprop-deint-buf-req", "sprop-interleaving-depth", 1597 "sprop-max-don-diff", and "sprop-init-buf-time" MUST NOT be 1598 changed by the answerer. Thus, a payload type can either be 1599 accepted unaltered or removed. 1601 o The receiver capability parameters "max-mbps", "max-fs", 1602 "max-dpb", "max-br", and "max-rcmd-nalu-size" MUST be supported 1603 by the answerer for all streams declared assendrecv or recvonly; 1604 otherwise, the media format is removed, or the session rejected. 1606 Below are the complete lists of how the different parameters SHALL 1607 be interpreted in the different combinations of offer or answer and 1608 direction attribute. 1610 o In offers and answers for which "a=sendrecv" or no direction 1611 attribute is used, or in offers and answers for which 1612 "a=recvonly" is used, the following interpretation of the 1613 parameters MUST be used. 1615 Declaring actual configuration or properties for receiving: 1616 - profile-level-id 1617 - packetization-mode 1619 Declaring actual properties of the stream to be sent (applicable 1620 only when "a=sendrecv" or no direction attribute is used): 1621 - sprop-deint-buf-req 1622 - sprop-interleaving-depth 1623 - sprop-parameter-sets 1624 - sprop-max-don-diff 1625 - sprop-init-buf-time 1627 Declaring receiver implementation capabilities: 1628 - max-mbps 1629 - max-fs 1630 - max-dpb 1631 - max-br 1632 - deint-buf-cap 1633 - max-rcmd-nalu-size 1635 Declaring how Offer/Answer negotiation SHALL be performed: 1636 - parameter-add 1638 o In an offer or answer for which the direction attribute 1639 "a=sendonly" is included for the media stream, the following 1640 interpretation of the parameters MUST be used: 1642 Declaring actual configuration and properties of stream proposed 1643 to be sent: 1644 - profile-level-id 1645 - packetization-mode 1646 - sprop-deint-buf-req 1647 - sprop-max-don-diff 1648 - sprop-init-buf-time 1649 - sprop-parameter-sets 1650 - sprop-interleaving-depth 1652 Declaring the capabilities of the sender when it receives a 1653 stream: 1654 - max-mbps 1655 - max-fs 1656 - max-dpb 1657 - max-br 1658 - deint-buf-cap 1659 - max-rcmd-nalu-size 1661 Declaring how Offer/Answer negotiation SHALL be performed: 1663 - parameter-add 1665 Furthermore, the following considerations are necessary: 1667 o Parameters used for declaring receiver capabilities are in 1668 general downgradable, i.e., they express the upper limit for a 1669 sender's possible behavior. Thus a sender MAY select to set its 1670 encoder using only lower/lesser or equal values of these 1671 parameters. "sprop-parameter-sets" MUST NOT be used in a 1672 sender's declaration of its capabilities, as the limits of the 1673 values that are carried inside the parameter sets are implicit 1674 with the profile and level used. 1676 o Parameters declaring a configuration point are not downgradable, 1677 with the exception of the level part of the "profile-level-id" 1678 parameter. This expresses values a receiver expects to be used 1679 and must be used verbatim on the sender side. 1681 o When a sender's capabilities are declared, and non-downgradable 1682 parameters are used in this declaration, then these parameters 1683 express a configuration that is acceptable. In order to achieve 1684 high interoperability levels, it is often advisable to offer 1685 multiple alternative configurations; e.g., for the packetization 1686 mode. It is impossible to offer multiple configurations in a 1687 single payload type. Thus, when multiple configuration offers 1688 are made, each offer requires its own RTP payload type 1689 associated with the offer. 1691 o A receiver SHOULD understand all media type parameters, even if 1692 it only supports a subset of the payload format's functionality. 1693 This ensures that a receiver is capable of understanding when an 1694 offer to receive media can be downgraded to what is supported by 1695 the receiver of the offer. 1697 o An answerer MAY extend the offer with additional media format 1698 configurations. However, to enable their usage, in most cases a 1699 second offer is required from the offerer to provide the stream 1700 properties parameters that the media sender will use. This also 1701 has the effect that the offerer has to be able to receive this 1702 media format configuration, not only to send it. 1704 o If an offerer wishes to have non-symmetric capabilities between 1705 sending and receiving, the offerer has to offer different RTP 1706 sessions; i.e., different "m=" lines declared as "recvonly" and 1707 "sendonly", respectively. 1709 7.2.3 Usage in Declarative Session Descriptions 1711 When AVS-P2 video over RTP is offered with SDP in a declarative 1712 style, as in RTSP [11] or SAP [12], the following considerations 1713 are necessary. 1715 o All parameters capable of indicating the properties of both an 1716 AVS-P2 bit stream and a receiver are used to indicate the 1717 properties of an AVS-P2 bit stream. For example, in this case, 1718 the parameter "profile-level-id" declares the values used by the 1719 stream, instead of the capabilities of the sender. This results 1720 in that the following interpretation of the parameters MUST be 1721 used: 1723 Declaring actual configuration or properties: 1724 - profile-level-id 1725 - sprop-parameter-sets 1726 - packetization-mode 1727 - sprop-interleaving-depth 1728 - sprop-deint-buf-req 1729 - sprop-max-don-diff 1730 - sprop-init-buf-time 1732 Not usable: 1733 - max-mbps 1734 - max-fs 1735 - max-dpb 1736 - max-br 1737 - redundant-pic-cap 1738 - max-rcmd-nalu-size 1739 - parameter-add 1740 - deint-buf-cap 1742 o A receiver of the SDP is REQUIRED to support all parameters and 1743 values of the parameters provided; otherwise, the receiver MUST 1744 reject (RTSP) the session. It falls on the creator of the session 1745 to use values that are expected to be supported by the receiving 1746 application. 1748 7.3 Considerations for Sequence Header 1750 The sequence headers play a vital rule for the operations of AVS1-P2 1751 video codec. Due to their importance for the decoding process, lost 1752 or erroneously transmitted sequence headers can hardly be concealed 1753 locally at the receiver. A reference to a corrupt header has 1754 normally fatal results to the decoding process. Corruption could 1755 occur, for example, due to the erroneous transmission or loss of a 1756 header data structure, or due to the untimely transmission of a 1757 header update. Therefore, the following recommendations are provided 1758 as a guideline for the implementer of the RTP sender: 1760 Sequence header NALUs can be transported using three different 1761 principles: 1763 A. Using a session control protocol (out-of-band) prior to the 1764 actual RTP session. 1766 B. Using a session control protocol (out-of-band) during an ongoing 1767 RTP session. 1769 C. Within the RTP stream in the payload (in-band) during an ongoing 1770 RTP session. 1772 It is necessary to implement principles A and B within a session 1773 control protocol. Principle C is supported by the RTP payload format 1774 defined in this document. 1776 Principle A SHOULD be used for the transmission of initial sequence 1777 header of the whole sequence. Principle B SHOULD be used for update 1778 of in-band sequence header. Principle C SHOULD be used for update of 1779 in-band sequence header. 1781 During a session, the sequence header SHUOLD be transmitted 1782 out-of-band using principle A, and updated using principles B or C. 1783 At least one sequence header MAY be useful using out-of-band 1784 transmission of initial sequence header, and update when new header 1785 is coming. 1787 If principle B is used for updating sequence headers, it is 1788 impossible to ensure the synchronization between the sequence header 1789 and the in-band transmitted NAL units. This will cause confusion in 1790 both senders and receivers. Therefore it is RECOMMENDED to only use 1791 principle C to update the sequence header. 1793 8. Security Considerations 1795 RTP packets using the payload format defined in this document are 1796 subject to the security considerations discussed in IETF RFC 3550, 1797 and in any appropriate RTP profile (for example, IETF RFC 3551 [13]). 1798 This implies that confidentiality of the media streams is achieved 1799 by encryption; for example, IETF RFC 3711 [14]. Because the data 1800 compression used with this payload format is applied end-to-end, any 1801 encryption needs to be performed after compression. 1803 A potential denial-of-service threat exists for data encodings using 1804 compression techniques that have non-uniform receiver-end 1805 computational load. The attacker can inject pathological datagrams 1806 into the stream that are complex to decode and that cause the 1807 receiver to be overloaded. AVS-P2 is particularly vulnerable to such 1808 attacks, as it is extremely simple to generate user-data that affect 1809 the decoding process of future bit stream. Therefore, the usage of 1810 data origin authentication and data integrity protection of at least 1811 the RTP packet is RECOMMENDED; for example, IETF RFC 3711. 1813 Note that the appropriate mechanism to ensure confidentiality and 1814 integrity of RTP packets and their payloads is very dependent on the 1815 application and on the transport and signaling protocols employed. 1816 Thus, although IETF RFC 3711 is given as an example above, other 1817 possible choices exist. 1819 End-to-End security with either authentication, integrity or 1820 confidentiality protection will prevent a MANE from performing 1821 media-aware operations other than discarding complete packets. And 1822 in the case of confidentiality protection it will even be prevented 1823 from performing discarding of packets in a media aware way. To allow 1824 any MANE to perform its operations, it will be REQUIRED to be a 1825 trusted entity which is included in the security context 1826 establishment. 1828 9. Congestion Control 1830 Congestion control for RTP SHALL be used in accordance with RFC 1831 3550, and with any applicable RTP profile; e.g., IETF RFC 3551. An 1832 additional requirement if best-effort service is being used is: 1833 users of this payload format MUST monitor packet loss to ensure 1834 that the packet loss rate is within acceptable parameters. Packet 1835 loss is considered acceptable if a TCP flow across the same network 1836 path, and experiencing the same network conditions, would achieve 1837 an average throughput, measured on a reasonable timescale, that is 1838 not less than the RTP flow is achieving. This condition can be 1839 satisfied by implementing congestion control mechanisms to adapt 1840 the transmission rate, or the number of layers subscribed for a 1841 layered multicast session, or by arranging for a receiver to leave 1842 the session if the loss rate is unacceptably high. 1844 The bit rate adaptation necessary for obeying the congestion control 1845 principle is easily achievable when real-time encoding is used. 1846 However, when pre-encoded content is being transmitted, bandwidth 1847 adaptation requires the availability of more than one coded 1848 representation of the same content, at different bit rates, or the 1849 existence of non-reference pictures in the bitstream. The switching 1850 between the different representations can normally be performed in 1851 the same RTP session; e.g., in the I-Frames. Only when 1852 non-downgradable parameters (such as the profile part of the 1853 profile/level ID) are REQUIRED to be changed does it become 1854 necessary to terminate and re-start the media stream. This may be 1855 accomplished by using a different RTP payload type. 1857 10. IANA Considerations 1859 Apply to IANA for registering one new media type; see section 7.1. 1861 11. De-Packetization Process (Informative) 1863 The de-packetization process is implementation dependent. Therefore, 1864 the following description SHOULD be seen as an example of a 1865 suitable implementation. Other schemes may be used as well. 1866 Optimizations relative to the described algorithms are likely 1867 possible. Section 11.1 presents the de-packetization process for 1868 the single NAL unit and non-interleaved packetization modes, whereas 1869 section 11.2 describes the process for the interleaved mode. Section 1870 11.3 includes additional decapsulation guidelines for receivers. 1872 All normal RTP mechanisms related to buffer management apply. In 1873 particular, duplicated or outdated RTP packets (as indicated by the 1874 RTP sequences number and the RTP timestamp) are removed. To 1875 determine the exact time for decoding, factors such as a possible 1876 intentional delay to allow for proper inter-stream synchronization 1877 must be factored in. 1879 11.1. Single NAL Unit and Non-Interleaved Mode 1881 The receiver includes a receiver buffer to compensate for 1882 transmission delay jitter. The receiver stores incoming packets in 1883 reception order into the receiver buffer. Packets are decapsulated 1884 in RTP sequence number order. If a decapsulated packet is a single 1885 NAL unit packet, the NAL unit contained in the packet is passed 1886 directly to the decoder. If a decapsulated packet is an STAP-A, the 1887 NAL units contained in the packet are passed to the decoder in the 1888 order in which they are encapsulated in the packet. If a 1889 decapsulated packet is an FU-A, all the fragments of the fragmented 1890 NAL unit (if exists) are concatenated and passed to the decoder. 1892 11.2. Interleaved Mode 1894 The general concept behind these de-packetization rules is to 1895 reorder NAL units from transmission order to the NAL unit decoding 1896 order. 1898 The receiver includes a receiver buffer, which is used to compensate 1899 for transmission delay jitter. In this section, the receiver 1900 operation is described under the assumption that there is no 1901 transmission delay jitter. To make a difference from a practical 1902 receiver buffer that is also used for compensation of transmission 1903 delay jitter, the receiver buffer is here called the deinterleaving 1904 buffer in this section. 1906 11.2.1 Size of the Deinterleaving Buffer 1908 When SDP Offer/Answer model or any other capability exchange 1909 procedure is used in session setup, the properties of the received 1910 stream SHOULD be such that the receiver capabilities are not 1911 exceeded. In the SDP Offer/Answer model, the receiver can indicate 1912 its capabilities to allocate a deinterleaving buffer with the 1913 deint-buf-cap media type parameter. The sender indicates the 1914 requirement for the deinterleaving buffer size with the 1915 sprop-deint-buf-req parameter. It is therefore RECOMMENDED to set 1916 the deinterleaving buffer size, in terms of number of bytes, equal 1917 to or greater than the value of sprop-deint-buf-req parameter. 1919 When a declarative session description is used in session setup, 1920 the sprop-deint-buf-req parameter signals the requirement for the 1921 deinterleaving buffer size. It is therefore RECOMMENDED to set the 1922 deinterleaving buffer size, in terms of number of bytes, equal to or 1923 greater than the value of sprop-deint-buf-req parameter. 1925 11.2.2 Deinterleaving Process 1927 There are two buffering states in the receiver: initial buffering 1928 and buffering while playing. Initial buffering occurs when the RTP 1929 session is initialized. After initial buffering, decoding and 1930 playback is started, and the buffering-while-playing mode is used. 1932 Regardless of the buffering state, the receiver stores incoming NAL 1933 units, in reception order, in the deinterleaving buffer as follows. 1934 NAL units of aggregation packets are stored in the deinterleaving 1935 buffer individually. The value of DON is calculated and stored for 1936 all NAL units. 1938 The receiver operation is described below with the help of the 1939 following functions and constants: 1941 o Function AbsDON is specified in section 7.1. 1943 o Function don_diff is specified in section 5.4. 1945 o Constant N is the value of the OPTIONAL sprop-interleaving-depth 1946 media type parameter (see section 7.1) incremented by 1. 1948 Initial buffering lasts until one of the following conditions is 1949 fulfilled: 1951 o There are N NAL units in the deinterleaving buffer. 1953 o If sprop-max-don-diff is present, don_diff(m,n) is greater than 1954 the value of sprop-max-don-diff, in which n corresponds to the 1955 NAL unit having the greatest value of AbsDON among the received 1956 NAL units and m corresponds to the NAL unit having the smallest 1957 value of AbsDON among the received NAL units. 1959 o Initial buffering has lasted for the duration equal to or greater 1960 than the value of the OPTIONAL sprop-init-buf-time parameter. 1962 The NAL units to be removed from the deinterleaving buffer are 1963 determined as follows: 1965 o If the deinterleaving buffer contains at least N NAL units, NAL 1966 units are removed from the deinterleaving buffer and passed to 1967 the decoder in the order specified below until the buffer 1968 contains (N-1) NAL units. 1970 o If sprop-max-don-diff is present, all NAL units m for which 1971 don_diff(m,n) is greater than sprop-max-don-diff are removed from 1972 the deinterleaving buffer and passed to the decoder in the order 1973 specified below. Herein, n corresponds to the NAL unit having the 1974 greatest value of AbsDON among the received NAL units and m 1975 corrensponds to the being measured NAL units. 1977 The order in which NAL units, whitch is removed from the 1978 deinterleaving buffer, are passed to the decoder is specified as 1979 follows: 1981 o Let PDON be a variable that is initialized to 0 at the beginning 1982 of an RTP session. 1984 o For each NAL unit associated with a value of DON, a DON distance 1985 is calculated as follows: If the value of DON of the NAL unit 1986 is larger than the value of PDON, the DON distance is equal to 1987 DON - PDON. Otherwise, the DON distance is equal to 1988 65535 - PDON + DON + 1. 1990 o NAL units are delivered to the decoder in ascending order of DON 1991 distance. If several NAL units share the same value of DON 1992 distance, they can be passed to the decoder in any order. 1994 o When the number of NAL units have been only (N-1), the value of 1995 PDON is set to the value of DON for the last NAL unit passed to 1996 the decoder. 1998 11.3. Additional De-Packetization Guidelines 2000 The following additional de-packetization rules may be used to 2001 implement an operational AVS-P2 Video de-packetizer: 2003 o RTP receivers (e.g., in gateways) may identify lost coded slice 2004 data partitions A (DPAs). If a lost DPA is found, a gateway may 2005 decide not to send the corresponding coded slice data partitions, 2006 as their information is meaningless for AVS-P2 Video decoders. In 2007 this way a MANE can reduce network load by discarding useless 2008 packets without parsing a complex bitstream. 2010 o Receivers having to discard packets or NALUs SHOULD first discard 2011 all packets/NALUs in which the value of the NRI field of the NAL 2012 unit type octet is equal to 0. This will minimize the impact on 2013 user experience and keep the reference pictures intact. If more 2014 packets have to be discarded, then packets with a numerically 2015 lower NRI value SHOULD be discarded before packets with a 2016 numerically higher NRI value. However, discarding any packets 2017 with an NRI bigger than 0 very likely leads to decoder drift and 2018 SHOULD be avoided. 2020 12. References 2022 12.1 Normative references 2024 [1] Standardization Administration of China, "GB/T 20090.2-2006, 2025 Information technology - Advanced coding of audio and video, 2026 Part 2: Video", March, 2006. 2028 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 2029 Levels", BCP 14, RFC 2119, March 1997. 2030 [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 2031 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 2032 RFC 3550, July 2003. 2033 [4] Handley, M. and V. Jacobson, "SDP: Session Description 2034 Protocol", RFC 2327, April 1998. 2035 [5] Freed, N. and Klensin, J., "Media Type Specifications and 2036 Registration Procedures", BCP 13, RFC 4288, December 2005. 2037 [6] Casner, S. and P. Hoschka, "MIME Type Registration of RTP 2038 Payload Formats", RFC 3555, July 2003. 2039 [7] Josefsson, S., Ed., "The Base16, Base32, and Base64 Data 2040 Encodings", RFC 3548, July 2003. 2041 [8] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 2042 Session Description Protocol (SDP)", RFC 3264, June 2002. 2044 12.2 Informative references 2046 [9] Wang X.F., and Zhao D.B., "Performance Comparison of AVS and 2047 H.264/AVC Video Coding Standards," Journal of Computer Science 2048 & Technology. Vol. 21, No. 3, pp310-314, May 2006. 2049 [10]Wenger S., Hannuksela M.M., Stockhammer T., Westerlund M., 2050 and Singer D., "RTP Payload Format for H.264 Video", RFC 3984, 2051 February 2005. 2052 [11]Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming 2053 Protocol (RTSP)", RFC 2326, April 1998. 2054 [12] Handley, M., Perkins, C., and E. Whelan, "Session Announcement 2055 Protocol", RFC 2974, October 2000. 2056 [13]Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video 2057 Conferences with Minimal Control", STD 65, RFC 3551, July 2003. 2058 [14]Baugher, M., McGrew, D., Naslund, M., Carrara, E., and 2059 K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", 2060 RFC 3711, March 2004. 2062 Author's Addresses 2064 Longshe Huo 2065 Peking University 2066 School of EE & CS 2067 #5 YiHeYuan Road, Haidian District 2068 Beijing, 100871 2069 P.R. China 2070 Email: lshuo@jdl.ac.cn 2072 Lei Wang 2073 Beijing Univ. of P&T 2074 School of Telecom Engineering 2075 Beijing University of Posts and Telecommunications 2076 #10 XiTuCheng Road, Haidian District 2077 Beijing, 100876 2078 P.R. China 2079 Phone: +861062282720 2080 Email: wanglei_elf@bbn.cn 2082 IPR Notices 2084 The IETF takes no position regarding the validity or scope of any 2085 Intellectual Property Rights or other rights that might be claimed to 2086 pertain to the implementation or use of the technology described in 2087 this document or the extent to which any license under such rights 2088 might or might not be available; nor does it represent that it has 2089 made any independent effort to identify any such rights. Information 2090 on the procedures with respect to rights in RFC documents can be 2091 found in BCP 78 and BCP 79. 2093 Copies of IPR disclosures made to the IETF Secretariat and any 2094 assurances of licenses to be made available, or the result of an 2095 attempt made to obtain a general license or permission for the use of 2096 such proprietary rights by implementers or users of this 2097 specification can be obtained from the IETF on-line IPR repository at 2098 http://www.ietf.org/ipr. 2100 The IETF invites any interested party to bring to its attention any 2101 copyrights, patents or patent applications, or other proprietary 2102 rights that may cover technology that may be required to implement 2103 this standard. Please address the information to the IETF at 2104 ietf-ipr@ietf.org. 2106 Full Copyright Statement 2108 Copyright (C) The IETF Trust (2007). 2110 This document is subject to the rights, licenses and restrictions 2111 contained in BCP 78, and except as set forth therein, the authors 2112 retain all their rights. 2114 This document and the information contained herein are provided on an 2115 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2116 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 2117 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 2118 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 2119 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2120 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.