idnits 2.17.00 (12 Aug 2021) /tmp/idnits21254/draft-westerlund-avtcore-multiplex-architecture-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 12, 2012) is 3721 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-01) exists of draft-alvestrand-rtp-sess-neutral-00 == Outdated reference: draft-ietf-avtext-multiple-clock-rates has been published as RFC 7160 == Outdated reference: draft-ietf-payload-rtp-howto has been published as RFC 8088 -- Obsolete informational reference (is this intentional?): RFC 2326 (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft B. Burman 4 Intended status: Informational Ericsson 5 Expires: September 13, 2012 C. Perkins 6 University of Glasgow 7 March 12, 2012 9 RTP Multiplexing Architecture 10 draft-westerlund-avtcore-multiplex-architecture-01 12 Abstract 14 Real-time Transport Protocol is a flexible protocol possible to use 15 in a wide range of applications and network and system topologies. 16 This flexibility and the implications of different choices should be 17 understood by any application developer using RTP. To facilitate 18 that understanding, this document contains an in-depth discussion of 19 the usage of RTP's multiplexing points; the RTP session, the 20 Synchronization Source Identifier (SSRC), and the payload type. The 21 focus is put on the first two, trying to give guidance and source 22 material for an analysis on the most suitable choices for the 23 application being designed. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on September 13, 2012. 42 Copyright Notice 44 Copyright (c) 2012 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 61 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 62 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 63 3. RTP Multiplex Points . . . . . . . . . . . . . . . . . . . . . 6 64 3.1. Session . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.2. SSRC . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 66 3.3. CSRC . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 67 3.4. Payload Type . . . . . . . . . . . . . . . . . . . . . . . 9 68 4. Multiple Streams Alternatives . . . . . . . . . . . . . . . . 10 69 5. RTP Topologies and Issues . . . . . . . . . . . . . . . . . . 11 70 5.1. Point to Point . . . . . . . . . . . . . . . . . . . . . . 12 71 5.1.1. RTCP Reporting . . . . . . . . . . . . . . . . . . . . 12 72 5.1.2. Compound RTCP Packets . . . . . . . . . . . . . . . . 13 73 5.2. Point to Multipoint Using Multicast . . . . . . . . . . . 13 74 5.3. Point to Multipoint Using an RTP Translator . . . . . . . 15 75 5.4. Point to Multipoint Using an RTP Mixer . . . . . . . . . . 16 76 5.5. Point to Multipoint using Multiple Unicast flows . . . . . 17 77 5.6. De-composite End-Point . . . . . . . . . . . . . . . . . . 18 78 6. Multiple Streams Discussion . . . . . . . . . . . . . . . . . 19 79 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 19 80 6.2. RTP/RTCP Aspects . . . . . . . . . . . . . . . . . . . . . 19 81 6.2.1. The RTP Specification . . . . . . . . . . . . . . . . 19 82 6.2.2. Handling Varying sets of Senders . . . . . . . . . . . 22 83 6.2.3. Cross Session RTCP Requests . . . . . . . . . . . . . 22 84 6.2.4. Binding Related Sources . . . . . . . . . . . . . . . 22 85 6.2.5. Forward Error Correction . . . . . . . . . . . . . . . 24 86 6.2.6. Transport Translator Sessions . . . . . . . . . . . . 25 87 6.3. Interworking . . . . . . . . . . . . . . . . . . . . . . . 25 88 6.3.1. Interworking Applications . . . . . . . . . . . . . . 26 89 6.3.2. Multiple SSRC Legacy Considerations . . . . . . . . . 27 90 6.4. Signalling Aspects . . . . . . . . . . . . . . . . . . . . 28 91 6.4.1. Session Oriented Properties . . . . . . . . . . . . . 28 92 6.4.2. SDP Prevents Multiple Media Types . . . . . . . . . . 29 93 6.4.3. Media Stream Usage . . . . . . . . . . . . . . . . . . 29 94 6.5. Network Aspects . . . . . . . . . . . . . . . . . . . . . 30 95 6.5.1. Quality of Service . . . . . . . . . . . . . . . . . . 30 96 6.5.2. NAT and Firewall Traversal . . . . . . . . . . . . . . 31 97 6.5.3. Multicast . . . . . . . . . . . . . . . . . . . . . . 32 98 6.5.4. Multiplexing multiple RTP Session on a Single 99 Transport . . . . . . . . . . . . . . . . . . . . . . 33 100 6.6. Security Aspects . . . . . . . . . . . . . . . . . . . . . 33 101 6.6.1. Security Context Scope . . . . . . . . . . . . . . . . 33 102 6.6.2. Key-Management for Multi-party session . . . . . . . . 34 103 6.6.3. Complexity Implications . . . . . . . . . . . . . . . 34 104 6.7. Multiple Media Types in one RTP session . . . . . . . . . 35 105 7. Arch-Types . . . . . . . . . . . . . . . . . . . . . . . . . . 37 106 7.1. Single SSRC per Session . . . . . . . . . . . . . . . . . 37 107 7.2. Multiple SSRCs of the Same Media Type . . . . . . . . . . 39 108 7.3. Multiple Sessions for one Media type . . . . . . . . . . . 40 109 7.4. Multiple Media Types in one Session . . . . . . . . . . . 41 110 7.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 43 111 8. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 43 112 9. Proposal for Future Work . . . . . . . . . . . . . . . . . . . 44 113 10. RTP Specification Clarifications . . . . . . . . . . . . . . . 45 114 10.1. RTCP Reporting from all SSRCs . . . . . . . . . . . . . . 45 115 10.2. RTCP Self-reporting . . . . . . . . . . . . . . . . . . . 45 116 10.3. Combined RTCP Packets . . . . . . . . . . . . . . . . . . 45 117 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46 118 12. Security Considerations . . . . . . . . . . . . . . . . . . . 46 119 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 46 120 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 46 121 14.1. Normative References . . . . . . . . . . . . . . . . . . . 46 122 14.2. Informative References . . . . . . . . . . . . . . . . . . 46 123 Appendix A. Dismissing Payload Type Multiplexing . . . . . . . . 49 124 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 51 126 1. Introduction 128 Real-time Transport Protocol (RTP) [RFC3550] is a commonly used 129 protocol for real-time media transport. It is a protocol that 130 provides great flexibility and can support a large set of different 131 applications. RTP has several multiplexing points designed for 132 different purposes. These enable support of multiple media streams 133 and switching between different encoding or packetization of the 134 media. By using multiple RTP sessions, sets of media streams can be 135 structured for efficient processing or identification. Thus the 136 question for any RTP application designer is how to best use the RTP 137 session, the SSRC and the payload type to meet the application's 138 needs. 140 The purpose of this document is to provide clear information about 141 the possibilities of RTP when it comes to multiplexing. The RTP 142 application designer should understand the implications that come 143 from a particular choice of RTP multiplexing points. The document 144 will recommend against some usages as being unsuitable, in general or 145 for particular purposes. 147 RTP was from the beginning designed for multiple participants in a 148 communication session. This is not restricted to multicast, as some 149 may believe, but also provides functionality over unicast, using 150 either multiple transport flows below RTP or a network node that re- 151 distributes the RTP packets. The re-distributing node can for 152 example be a transport translator (relay) that forwards the packets 153 unchanged, a translator performing media translation in addition to 154 forwarding, or an RTP mixer that creates new conceptual sources from 155 the received streams. In addition, multiple streams may occur when a 156 single end-point have multiple media sources, like multiple cameras 157 or microphones that need to be sent simultaneously. 159 This document has been written due to increased interest in more 160 advanced usage of RTP, resulting in questions regarding the most 161 appropriate RTP usage. The limitations in some implementations, RTP/ 162 RTCP extensions, and signalling has also been exposed. It is 163 expected that some limitations will be addressed by updates or new 164 extensions resolving the shortcomings. The authors also hope that 165 clarification on the usefulness of some functionalities in RTP will 166 result in more complete implementations in the future. 168 The document starts with some definitions and then goes into the 169 existing RTP functionalities around multiplexing. Both the desired 170 behavior and the implications of a particular behavior depend on 171 which topologies are used, which requires some consideration. This 172 is followed by a discussion of some choices in multiplexing behavior 173 and their impacts. Some arch-types of RTP usage are discussed. 175 Finally, some recommendations and examples are provided. 177 This document is currently an individual contribution, but it is the 178 intention of the authors that this should become a WG document that 179 objectively describes and provides suitable recommendations for which 180 there is WG consensus. Currently this document only represents the 181 views of the authors. The authors gladly accept any feedback on the 182 document and will be happy to discuss suitable recommendations. 184 2. Definitions 186 2.1. Requirements Language 188 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 189 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 190 document are to be interpreted as described in RFC 2119 [RFC2119]. 192 2.2. Terminology 194 The following terms and abbreviations are used in this document: 196 End-point: A single entity sending or receiving RTP packets. It may 197 be decomposed into several functional blocks, but as long as it 198 behaves a single RTP stack entity it is classified as a single 199 end-point. 201 Media Stream: A sequence of RTP packets using a single SSRC that 202 together carries part or all of the content of a specific Media 203 Type from a specific sender source within a given RTP session. 205 Media Source: The originator or source of a particular Media Stream. 206 It can either be a single media capturing device such as a video 207 camera, a microphone, or a specific output of a media production 208 function, such as an audio mixer, or some video editing function. 210 Media Aggregate: All Media Streams related to a particular Source. 212 Media Type: Audio, video, text or data whose form and meaning are 213 defined by a specific real-time application. 215 Multiplex: The operation of taking multiple entities as input, 216 aggregating them onto some common resource while keeping the 217 individual entities addressable such that they can later be fully 218 and unambiguously separated (de-multiplexed) again. 220 RTP Session: As defined by [RFC3550], the end-points belonging to 221 the same RTP Session are those that share a single SSRC space. 222 That is, those end-points can see an SSRC identifier transmitted 223 by any one of the other end-points. An end-point can receive an 224 SSRC either as SSRC or as CSRC in RTP and RTCP packets. Thus, the 225 RTP Session scope is decided by the end-points' network 226 interconnection topology, in combination with RTP and RTCP 227 forwarding strategies deployed by end-points and any 228 interconnecting middle nodes. 230 Source: See Media Source. 232 3. RTP Multiplex Points 234 This section describes the existing RTP tools that enable 235 multiplexing of different media streams. 237 3.1. Session 239 The RTP Session is the highest semantic level in RTP and contains all 240 of the RTP functionality. 242 Identifier: RTP in itself does not contain any Session identifier, 243 but relies either on the underlying transport or on the used 244 signalling protocol, depending on in which context the identifier 245 is used (e.g. transport or signalling). Due to this, a single RTP 246 Session may have multiple associated identifiers belonging to 247 different contexts. 249 Position: Depending on underlying transport and signalling 250 protocol. For example, when running RTP on top of UDP, an RTP 251 endpoint can identify and delimit an RTP Session from other RTP 252 Sessions through the UDP source and destination transport 253 address, consisting of network address and port number(s). 254 Commonly, RTP and RTCP use separate ports and the destination 255 transport address is in fact an address pair, but in the case 256 of RTP/RTCP multiplex [RFC5761] there is only a single port. 257 Another example is SDP signalling [RFC4566], where the grouping 258 framework [RFC5888] uses an identifier per "m="-line. If there 259 is a one-to-one mapping between "m="-line and RTP Session, that 260 grouping framework identifier can identify a single RTP 261 Session. 263 Usage: Identify separate RTP Sessions. 265 Uniqueness: Globally unique within the general communication 266 context for the specific end-point. 268 Inter-relation: Depending on the underlying transport and 269 signalling protocol. 271 Special Restrictions: None. 273 A source that changes its source transport address during a session 274 must also choose a new SSRC identifier to avoid being interpreted as 275 a looped source. 277 The set of participants considered part of the same RTP Session is 278 defined by[RFC3550] as those that share a single SSRC space. That 279 is, those participants that can see an SSRC identifier transmitted by 280 any one of the other participants. A participant can receive an SSRC 281 either as SSRC or CSRC in RTP and RTCP packets. Thus, the RTP 282 Session scope is decided by the participants' network interconnection 283 topology, in combination with RTP and RTCP forwarding strategies 284 deployed by end-points and any interconnecting middle nodes. 286 3.2. SSRC 288 An RTP Session serves one or more Media Sources, each sending a Media 289 Stream. 291 Identifier: Synchronization Source (SSRC), 32-bit unsigned number. 293 Position: In every RTP and RTCP packet header. May be present in 294 RTCP payload. May be present in SDP signalling. 296 Usage: Identify individual Media Sources within an RTP Session. 297 Refer to individual Media Sources in RTCP messages and SDP 298 signalling. 300 Uniqueness: Randomly chosen, globally unique within an RTP 301 Session and not dependent on network address. 303 Inter-relation: SSRC belonging to the same synchronization 304 context (originating from the same end-point), within or 305 between RTP Sessions, are indicated through use of identical 306 SDES CNAME items in RTCP compound packets with those SSRC as 307 originating source. SDP signalling can provide explicit SSRC 308 grouping [RFC5576]. When CNAME is inappropriate or 309 insufficient, there exist a few other methods to relate 310 different SSRC. One such case is session-based RTP 311 retransmission [RFC4588]. In some cases, the same SSRC 312 Identifier value is used to relate streams in two different RTP 313 Sessions, such as in Multi-Session Transmission of scalable 314 video [RFC6190]. 316 Special Restrictions: All RTP implementations must be prepared to 317 use procedures for SSRC collision handling, which results in an 318 SSRC number change. A Media Source that changes its RTP Session 319 identifier (e.g. source transport address) during a session must 320 also choose a new SSRC identifier to avoid being interpreted as 321 looped source. Note that RTP sequence number and RTP timestamp 322 are scoped by SSRC and thus independent between different SSRCs. 324 A media source having an SSRC identifier can be of different types: 326 Real: Connected to a "physical" media source, for example a camera 327 or microphone. 329 Conceptual: A source with some attributed property generated by some 330 network node, for example a filtering function in an RTP mixer 331 that provides the most active speaker based on some criteria, or a 332 mix representing a set of other sources. 334 Virtual: A source that does not generate any RTP media stream in 335 itself (e.g. an end-point only receiving in an RTP session), but 336 anyway need a sender SSRC for use as source in RTCP reports. 338 Note that a "multimedia source" that generates more than one media 339 type, e.g. a conference participant sending both audio and video, 340 need not (and commonly should not) use the same SSRC value across RTP 341 sessions. RTCP Compound packets containing the CNAME SDES item is 342 the designated method to bind an SSRC to a CNAME, effectively cross- 343 correlating SSRCs within and between RTP Sessions as coming from the 344 same end-point. The main property attributed to SSRCs associated 345 with the same CNAME is that they are from a particular 346 synchronization context and may be synchronized at playback. 348 Note also that RTP sequence number and RTP timestamp are scoped by 349 SSRC and thus independent between different SSRCs. 351 An RTP receiver receiving a previously unseen SSRC value must 352 interpret it as a new source. It may in fact be a previously 353 existing source that had to change SSRC number due to an SSRC 354 conflict. However, the originator of the previous SSRC should have 355 ended the conflicting source by sending an RTCP BYE for it prior to 356 starting to send with the new SSRC, so the new SSRC is anyway 357 effectively a new source. 359 Some RTP extension mechanisms already require the RTP stacks to 360 handle additional SSRCs, like SSRC multiplexed RTP retransmission 362 [RFC4588]. However, that still only requires handling a single media 363 decoding chain per pair of SSRCs. 365 3.3. CSRC 367 The Contributing Source (CSRC) can arguably be seen as a sub-part of 368 a specific SSRC and thus a multiplexing point. It is optionally 369 included in the RTP header, shares the SSRC number space and 370 specifies which set of SSRCs that has contributed to the RTP payload. 371 However, even though each RTP packet and SSRC can be tagged with the 372 contained CSRCs, the media representation of an individual CSRC is in 373 general not possible to extract from the RTP payload since it is 374 typically the result of a media mixing (merge) operation (by an RTP 375 mixer) on the individual media streams corresponding to the CSRC 376 identifiers. Due to these restrictions, CSRC will not be considered 377 a fully qualified multiplex point and will be disregarded in the rest 378 of this document. 380 3.4. Payload Type 382 Each Media Stream can be represented in various encoding formats. 384 Identifier: Payload Type number. 386 Position: In every RTP header and in SDP signalling. 388 Usage: Identify a specific Media Stream encoding format. The 389 format definition may be taken from [RFC3551] for statically 390 allocated Payload Types, but should be explicitly defined in 391 signalling, such as SDP, both for static and dynamic Payload 392 Types. The term "format" here includes whatever can be 393 described by out-of-band signaling means. In SDP, the term 394 "format" includes media type, RTP timestamp sampling rate, 395 codec, codec configuration, payload format configurations, and 396 various robustness mechanisms such as redundant encodings 397 [RFC2198]. 399 Uniqueness: Scoped by sending end-point within an RTP Session. 400 To avoid any potential for ambiguity, it is desirable that 401 payload types are unique across all sending end-points within 402 an RTP session, but this is often not true in practice. All 403 SSRC in an RTP session sent from an single end-point share the 404 same Payload Types definitions. The RTP Payload Type is 405 designed such that only a single Payload Type is valid at any 406 time instant in the SSRC's RTP timestamp time line, effectively 407 time-multiplexing different Payload Types if any change occurs. 408 Used Payload Type may change on a per-packet basis for an SSRC, 409 for example a speech codec making use of generic Comfort Noise 411 [RFC3389]. 413 Inter-relation: There are some uses where Payload Type numbers 414 need be unique across RTP Sessions. This is for example the 415 case in Media Decoding Dependency [RFC5583] where Payload Types 416 are used to describe media dependency across RTP Sessions. 417 Another example is session-based RTP retransmission [RFC4588]. 419 Special Restrictions: Using different RTP timestamp clock rates for 420 the RTP Payload Types in use in the same RTP Session have issues 421 such as loss of synchronization. Payload Type clock rate 422 switching requires some special consideration that is described in 423 the multiple clock rates specification 424 [I-D.ietf-avtext-multiple-clock-rates]. 426 If there is a true need to send multiple Payload Types for the same 427 SSRC that are valid for the same RTP Timestamps, then redundant 428 encodings [RFC2198] can be used. Several additional constraints than 429 the ones mentioned above need to be met to enable this use, one of 430 which is that the combined payload sizes of the different Payload 431 Types must not exceed the transport MTU. 433 Other aspects of RTP payload format use are described in RTP Payload 434 HowTo [I-D.ietf-payload-rtp-howto]. 436 4. Multiple Streams Alternatives 438 This section reviews the alternatives to enable multi-stream 439 handling. Let's start with describing mechanisms that could enable 440 multiple media streams, independent of the purpose for having 441 multiple streams. 443 SSRC Multiplexing: Each additional Media Stream gets its own SSRC 444 within a RTP Session. 446 Session Multiplexing: Using additional RTP Sessions to handle 447 additional Media Streams 449 Payload Type Multiplexing: Using different RTP payload types for 450 different additional streams. 452 Independent of the reason to use additional media streams, achieving 453 it using payload type multiplexing is not a good choice as can be 454 seen in the Appendix A. The RTP payload type alone is not suitable 455 for cases where additional media streams are required. Streams need 456 their own SSRCs, so that they get their own sequence number space. 457 The SSRC itself is also important so that the media stream can be 458 referenced and reported on. 460 This leaves us with two main choices, either using SSRC multiplexing 461 to have multiple SSRCs from one end-point in one RTP session, or 462 create an additional RTP session to hold that additional SSRC. As 463 the below discussion will show, in reality we cannot choose a single 464 one of the two solutions. To utilize RTP well and as efficiently as 465 possible, both are needed. The real issue is finding the right 466 guidance on when to create RTP sessions and when additional SSRCs in 467 an RTP session is the right choice. 469 In the below discussion, please keep in mind that the reasons for 470 having multiple media streams vary and include but are not limited to 471 the following: 473 o Multiple Media Sources 475 o Retransmission streams 477 o FEC stream 479 o Alternative Encodings 481 o Scalability layers 483 Thus the choice made due to one reason may not be the choice suitable 484 for another reason. In the above list, the different items have 485 different levels of maturity in the discussion on how to solve them. 486 The clearest understanding is associated with multiple media sources 487 of the same media type. However, all warrant discussion and 488 clarification on how to deal with them. 490 5. RTP Topologies and Issues 492 The impact of how RTP Multiplex is performed will in general vary 493 with how the RTP Session participants are interconnected; the RTP 494 Topology [RFC5117]. This section describes the topologies and 495 attempts to highlight the important behaviors concerning RTP 496 multiplexing and multi-stream handling. It lists any identified 497 issues regarding RTP and RTCP handling, and introduces additional 498 topologies that are supported by RTP beyond those included in RTP 499 Topologies [RFC5117]. The RTP Topologies that do not follow the RTP 500 specification or do not attempt to utilize the facilities of RTP are 501 ignored in this document. 503 5.1. Point to Point 505 This is the most basic use case with an RTP session containing two 506 end-points. Each end-point has one or more SSRCs. 508 +---+ +---+ 509 | A |<------->| B | 510 +---+ +---+ 512 Figure 1: Point to Point 514 5.1.1. RTCP Reporting 516 In cases when an end-point uses multiple SSRCs, we have found two 517 closely related issues. The first is if every SSRC shall report on 518 all other SSRC, even the ones originating from the same end-point. 519 The reason for this would be to ensure that no monitoring function 520 should suspect a breakage in the RTP session. 522 The second issue around RTCP reporting arise when an end-point 523 receives one or more media streams, and when the receiving end-point 524 itself sends multiple SSRC in the same RTP session. As transport 525 statistics are gathered per end-point and shared between the nodes, 526 all the end-point's SSRC will report based on the same received data, 527 the only difference will be which SSRCs sends the report. This could 528 be considered unnecessary overhead, but for consistency it might be 529 simplest to always have all sending SSRCs send RTCP reports on all 530 media streams the end-point receives. 532 The current RTP text is silent about sending RTCP Receiver Reports 533 for an endpoint's own sources, but does not preclude either sending 534 or omitting them. The uncertainty in the expected behavior in those 535 cases has likely caused variations in the implementation strategy. 536 This could cause an interoperability issue where it is not possible 537 to determine if the lack of reports is a true transport issue, or 538 simply a result of implementation. 540 Although this issue is valid already for the simple point to point 541 case, it needs to be considered in all topologies. From the 542 perspective of an end-point, any solution needs to take into account 543 what a particular end-point can determine without explicit 544 information of the topology. For example, a Transport Translator 545 (Relay) topology will look quite similar to point to point on a 546 transport level but is different on RTP level. Assume a first 547 scenario with two SSRC being sent from an end-point to a Transport 548 Translator, and a second scenario with two single SSRC remote end- 549 points sending to the same Transport Translator. The main 550 differences between those two scenarios are that in the second 551 scenario, the RTT may vary between the SSRCs (but it is not 552 guaranteed), and the SSRCs may also have different CNAMEs. 554 5.1.2. Compound RTCP Packets 556 When an end-point has multiple SSRCs and it needs to send RTCP 557 packets on behalf of these SSRCs, the question arises if and how RTCP 558 packets with different source SSRCs can be sent in the same compound 559 packet. If it is allowed, then some consideration of the 560 transmission scheduling is needed. 562 5.2. Point to Multipoint Using Multicast 564 This section discusses the Point to Multi-point using Multicast to 565 interconnect the session participants. This needs to consider both 566 Any Source Multicast (ASM) and Source-Specific Multicast (SSM). 567 There are large commercial deployments of multicast for applications 568 like IPTV. 570 +-----+ 571 +---+ / \ +---+ 572 | A |----/ \---| B | 573 +---+ / Multi- \ +---+ 574 + Cast + 575 +---+ \ Network / +---+ 576 | C |----\ /---| D | 577 +---+ \ / +---+ 578 +-----+ 580 Figure 2: Point to Multipoint Using Any Source Multicast 582 In Any Source Multicast, any of the participants can send to all the 583 other participants, simply by sending a packet to the multicast 584 group. That is not possible in Source Specific Multicast [RFC4607] 585 where only a single source (Distribution Source) can send to the 586 multicast group, creating a topology that looks like the one below: 588 +--------+ +-----+ 589 |Media | | | Source-specific 590 |Sender 1|<----->| D S | Multicast 591 +--------+ | I O | +--+----------------> R(1) 592 | S U | | | | 593 +--------+ | T R | | +-----------> R(2) | 594 |Media |<----->| R C |->+ | : | | 595 |Sender 2| | I E | | +------> R(n-1) | | 596 +--------+ | B | | | | | | 597 : | U | +--+--> R(n) | | | 598 : | T +-| | | | | 599 : | I | |<---------+ | | | 600 +--------+ | O |F|<---------------+ | | 601 |Media | | N |T|<--------------------+ | 602 |Sender M|<----->| | |<-------------------------+ 603 +--------+ +-----+ RTCP Unicast 605 FT = Feedback Target 606 Transport from the Feedback Target to the Distribution 607 Source is via unicast or multicast RTCP if they are not 608 co-located. 610 Figure 3: Point to Multipoint using Source Specific Multicast 612 In this topology a number of Media Senders (1 to M) are allowed to 613 send media to the SSM group, sends media to the distribution source 614 which then forwards the media streams to the multicast group. The 615 media streams reach the Receivers (R(1) to R(n)). The Receiver's 616 RTCP cannot be sent to the multicast group. To support RTCP, an RTP 617 extension for SSM [RFC5760] was defined to use unicast transmission 618 to send RTCP from the receivers to one or more Feedback Targets (FT). 620 As multicast is a one to many distribution system, this must be taken 621 into consideration. For example, the only practical method for 622 adapting the bit-rate sent towards a given receiver for large groups 623 is to use a set of multicast groups, where each multicast group 624 represents a particular bit-rate. Otherwise the whole group gets 625 media adapted to the participant with the worst conditions. The 626 media encoding is either scalable, where multiple layers can be 627 combined, or simulcast where a single version is selected. By either 628 selecting or combing multicast groups, the receiver can control the 629 bit-rate sent on the path to itself. It is also common that streams 630 that improve transport robustness is sent in its own multicast group 631 to allow for interworking with legacy or to support different levels 632 of protection. 634 The result of this is three common behaviors for RTP multicast: 636 1. Use of multiple RTP sessions for the same media type. 638 2. The need for identifying RTP sessions that are related in one of 639 several possible ways. 641 3. The need for binding related SSRCs in different RTP sessions 642 together. 644 This indicates that Multicast is an important consideration when 645 working with the RTP multiplexing and multi stream architecture 646 questions. It is also important to note that so far there is no 647 special mode for basic behavior between multicast and unicast usages 648 of RTP. Yes, there are extensions targeted to deal with multicast 649 specific cases, but the general applicability does need to be 650 considered. 652 5.3. Point to Multipoint Using an RTP Translator 654 Transport Translators (Relays) are a very important consideration for 655 this document as they result in an RTP session situation that is very 656 similar to how an ASM group RTP session would behave. 658 +---+ +------------+ +---+ 659 | A |<---->| |<---->| B | 660 +---+ | | +---+ 661 | Translator | 662 +---+ | | +---+ 663 | C |<---->| |<---->| D | 664 +---+ +------------+ +---+ 666 Figure 4: Transport Translator (Relay) 668 One of the most important aspects with the simple relay is that it is 669 both easy to implement and require minimal amount of resources as 670 only transport headers are rewritten, no RTP modifications nor media 671 transcoding occur. Thus it is most likely the cheapest and most 672 generally deployable method for multi-point sessions. The most 673 obvious downside of this basic relaying is that the translator has no 674 control over how many streams needs to be delivered to a receiver. 675 Nor can it simply select to deliver only certain streams, as it 676 creates session inconsistencies. If some middlebox temporarily stops 677 a stream, this prevents some receivers from reporting on it. From 678 the senders perspective it will look like a transport failure. 679 Applications having needs to stop or switch streams in the central 680 node should consider using an RTP mixer to avoid this issue. 682 The Transport Translator does not need to have an SSRC of itself, nor 683 need it send any RTCP reports on the flows that pass it, but it may 684 choose to do that. 686 Use of a transport translator results in that any of the end-points 687 will receive multiple SSRCs over a single unicast transport flow from 688 the translator. That is independent of the other end-points having 689 only a single or several SSRCs. End-points that have multiple SSRCs 690 put further requirements on how SSRCs can be related or bound within 691 and across RTP sessions and how they can be identified on an 692 application level. The transport translator has a signalling 693 requirement that also exist in any source multicast; all of the 694 participants will need to have the same RTP and payload type 695 configuration. Otherwise, A could for example be using payload type 696 97 as the video codec H.264 while B thinks it is MPEG-2. It should 697 be noted that SDP offer/answer [RFC3264] has issues with ensuring 698 this property. 700 A Media Translator can perform a large variety of media functions 701 affecting the media stream passing the translator, coming from one 702 source and destined to a particular end-point. The translator can 703 transcode to a different bit-rate, transcode to use another encoder, 704 change the packetization of the media stream, add FEC streams, or 705 terminate RTP retransmissions. The latter behaviors require the 706 translator to use SSRCs that only exist in a particular sub-domain of 707 the RTP session, and it may also create additional sessions when the 708 mechanism applied on one side so requires. 710 5.4. Point to Multipoint Using an RTP Mixer 712 The most commonly used topology in centralized conferencing is based 713 on the RTP Mixer. The main reason for this is that it provides a 714 very consistent view of the RTP session towards each participant. 715 That is accomplished through the mixer having its own SSRCs and any 716 media sent to the participants will be sent using those SSRCs. If 717 the mixer wants to identify the underlying media sources for its 718 conceptual streams, it can identify them using CSRC. The media 719 stream the mixer provides can be an actual media mixing of multiple 720 media sources. It might also be as simple as selecting one of the 721 underlying sources based on some mixer policy or control signalling. 723 +---+ +------------+ +---+ 724 | A |<---->| |<---->| B | 725 +---+ | | +---+ 726 | Mixer | 727 +---+ | | +---+ 728 | C |<---->| |<---->| D | 729 +---+ +------------+ +---+ 731 Figure 5: RTP Mixer 733 In the case where the mixer does stream selection, an application may 734 in fact desire multiple simultaneous streams but only as many as the 735 mixer can handle. As long as the mixer and an end-point can agree on 736 the maximum number of streams and how the streams that are delivered 737 are selected, this provides very good functionality. As these 738 streams are forwarded using the mixer's SSRCs, there are no 739 inconsistencies within the session. 741 5.5. Point to Multipoint using Multiple Unicast flows 743 Based on the RTP session definition, it is clearly possible to have a 744 joint RTP session over multiple transport flows like the below three 745 end-point joint session. In this case, A needs to send its' media 746 streams and RTCP packets to both B and C over their respective 747 transport flows. As long as all participants do the same, everyone 748 will have a joint view of the RTP session. 750 +---+ +---+ 751 | A |<---->| B | 752 +---+ +---+ 753 ^ ^ 754 \ / 755 \ / 756 v v 757 +---+ 758 | C | 759 +---+ 761 Figure 6: Point to Multi-Point using Multiple Unicast Transports 763 This doesn't create any additional requirements beyond the need to 764 have multiple transport flows associated with a single RTP session. 765 Note that an end-point may use a single local port to receive all 766 these transport flows, or it might have separate local reception 767 ports for each of the end-points. 769 There exists an alternative structure for establishing the above 770 communication scenario (Figure 6) which uses independent RTP sessions 771 between each pair of peers, i.e. three different RTP sessions. 772 Unless independently adapted the same RTP media stream could be sent 773 in both of the RTP sessions an end-point has. The difference exists 774 in the behaviors around RTCP, for example common RTCP bandwidth for 775 one joint session, rather than three independent pools, and the 776 awareness based on RTCP reports between the peers of how that third 777 leg is doing. 779 5.6. De-composite End-Point 781 There is some possibility that an RTP end-point implementation in 782 fact reside on multiple devices, each with their own network address. 783 A very basic use case for this would be to separate audio and video 784 processing for a particular end-point, like a conference room, into 785 one device handling the audio and another handling the video, being 786 interconnected by some control functions allowing them to behave as a 787 single end-point. 789 +---------------------+ 790 | End-point A | 791 | Local Area Network | 792 | +------------+ | 793 | +->| Audio |<+----\ 794 | | +------------+ | \ +------+ 795 | | +------------+ | +-->| | 796 | +->| Video |<+--------->| B | 797 | | +------------+ | +-->| | 798 | | +------------+ | / +------+ 799 | +->| Control |<+----/ 800 | +------------+ | 801 +---------------------+ 803 Figure 7: De-composite End-Point 805 In the above usage, let us assume that the RTP sessions are different 806 for audio and video. The audio and video parts will use a common 807 CNAME and also have a common clock to ensure that synchronization and 808 clock drift handling works despite the decomposition. 810 However, if the audio and video were in a single RTP session then 811 this use case becomes problematic. This as all transport flow 812 receivers will need to receive all the other media streams that are 813 part of the session. Thus the audio component will receive also all 814 the video media streams, while the video component will receive all 815 the audio ones, doubling the site's bandwidth requirements from all 816 other session participants. With a joint RTP session it also becomes 817 evident that a given end-point, as interpreted from a CNAME 818 perspective, has two sets of transport flows for receiving the 819 streams and the decomposition is not hidden. 821 The requirements that can derived from the above usage is that the 822 transport flows for each RTP session might be under common control 823 but still go to what looks like different end-points based on 824 addresses and ports. A conclusion can also be reached that 825 decomposition without using separate RTP sessions has downsides and 826 potential for RTP/RTCP issues. 828 There exist another use case which might be considered as a de- 829 composite end-point. However, as will be shown this should be 830 considered a translator instead. An example of this is when an end- 831 point A sends a media flow to B. On the path there is a device C that 832 on A's behalf does something with the media streams, for example adds 833 an RTP session with FEC information for A's media streams. C will in 834 this case need to bind the new FEC streams to A's media stream by 835 using the same CNAME as A. 837 +------+ +------+ +------+ 838 | | | | | | 839 | A |------->| C |-------->| B | 840 | | | |---FEC-->| | 841 +------+ +------+ +------+ 843 Figure 8: When De-composition is a Translator 845 This type of functionality where C does something with the media 846 stream on behalf of A is clearly covered under the media translator 847 definition (Section 5.3). 849 6. Multiple Streams Discussion 851 6.1. Introduction 853 Using multiple media streams is a well supported feature of RTP. 854 However, it can be unclear for most implementers or people writing 855 RTP/RTCP applications or extensions attempting to apply multiple 856 streams when it is most appropriate to add an additional SSRC in an 857 existing RTP session and when it is better to use multiple RTP 858 sessions. This section tries to discuss the various considerations 859 needed. The next section then concludes with some guidelines. 861 6.2. RTP/RTCP Aspects 863 This section discusses RTP and RTCP aspects worth considering when 864 selecting between SSRC multiplexing and Session multiplexing. 866 6.2.1. The RTP Specification 868 RFC 3550 contains some recommendations and a bullet list with 5 869 arguments for different aspects of RTP multiplexing. Let's review 870 Section 5.2 of [RFC3550], reproduced below: 872 "For efficient protocol processing, the number of multiplexing points 873 should be minimized, as described in the integrated layer processing 874 design principle [ALF]. In RTP, multiplexing is provided by the 875 destination transport address (network address and port number) which 876 is different for each RTP session. For example, in a teleconference 877 composed of audio and video media encoded separately, each medium 878 SHOULD be carried in a separate RTP session with its own destination 879 transport address. 881 Separate audio and video streams SHOULD NOT be carried in a single 882 RTP session and demultiplexed based on the payload type or SSRC 883 fields. Interleaving packets with different RTP media types but 884 using the same SSRC would introduce several problems: 886 1. If, say, two audio streams shared the same RTP session and the 887 same SSRC value, and one were to change encodings and thus 888 acquire a different RTP payload type, there would be no general 889 way of identifying which stream had changed encodings. 891 2. An SSRC is defined to identify a single timing and sequence 892 number space. Interleaving multiple payload types would require 893 different timing spaces if the media clock rates differ and would 894 require different sequence number spaces to tell which payload 895 type suffered packet loss. 897 3. The RTCP sender and receiver reports (see Section 6.4) can only 898 describe one timing and sequence number space per SSRC and do not 899 carry a payload type field. 901 4. An RTP mixer would not be able to combine interleaved streams of 902 incompatible media into one stream. 904 5. Carrying multiple media in one RTP session precludes: the use of 905 different network paths or network resource allocations if 906 appropriate; reception of a subset of the media if desired, for 907 example just audio if video would exceed the available bandwidth; 908 and receiver implementations that use separate processes for the 909 different media, whereas using separate RTP sessions permits 910 either single- or multiple-process implementations. 912 Using a different SSRC for each medium but sending them in the same 913 RTP session would avoid the first three problems but not the last 914 two. 916 On the other hand, multiplexing multiple related sources of the same 917 medium in one RTP session using different SSRC values is the norm for 918 multicast sessions. The problems listed above don't apply: an RTP 919 mixer can combine multiple audio sources, for example, and the same 920 treatment is applicable for all of them. It may also be appropriate 921 to multiplex streams of the same medium using different SSRC values 922 in other scenarios where the last two problems do not apply." 923 Let's consider one argument at a time. The first is an argument for 924 using different SSRC for each individual media stream, which still is 925 very applicable. 927 The second argument is advocating against using payload type 928 multiplexing, which still stands as can been seen by the extensive 929 list of issues found in Appendix A. 931 The third argument is yet another argument against payload type 932 multiplexing. 934 The fourth is an argument against multiplexing media streams that 935 require different handling into the same session. This is to 936 simplify the processing at any receiver of the media stream. If all 937 media streams that exist in an RTP session are of one media type and 938 one particular purpose, there is no need for deeper inspection of the 939 packets before processing them in both end-points and RTP aware 940 middle nodes. 942 The fifth argument discusses network aspects that we will discuss 943 more below in Section 6.5. It also goes into aspects of 944 implementation, like decomposed end-points where different processes 945 or inter-connected devices handle different aspects of the whole 946 multi-media session. 948 A summary of RFC 3550's view on multiplexing is to use unique SSRCs 949 for anything that is its' own media/packet stream, and secondly use 950 different RTP sessions for media streams that don't share media type 951 and purpose, to maximize flexibility when it comes to processing and 952 handling of the media streams. 954 This mostly agrees with the discussion and recommendations in this 955 document. However, there has been an evolution of RTP since that 956 text was written which needs to be reflected in the discussion. 957 Additional clarifications for specific cases are also needed. 959 6.2.1.1. Different Media Types Recommendations 961 The above quote from RTP [RFC3550] includes a strong recommendation: 963 "For example, in a teleconference composed of audio and video 964 media encoded separately, each medium SHOULD be carried in a 965 separate RTP session with its own destination transport address." 967 It has been identified in "Why RTP Sessions Should Be Content 968 Neutral" [I-D.alvestrand-rtp-sess-neutral] that the above statement 969 is poorly supported by any of the motivations provided in the RTP 970 specification. This document has a more detailed analysis of 971 potential issues in having multiple media types in the same RTP 972 session in Section 6.7. An important influence for underlying 973 thinking for the RTP design and likely this statement can be found in 974 the academic paper by David Clark and David Tennenhouse 975 "Architectural considerations for a new generation of protocols" 976 [ALF]. 978 6.2.2. Handling Varying sets of Senders 980 A potential issue that some application designers may need to 981 consider is the case where the set of simultaneously active sources 982 varies within a larger set of session members. As each media 983 decoding chain may contain state, it is important that this type of 984 usage ensures that a receiver can flush a decoding state for an 985 inactive source and if that source becomes active again, it does not 986 assume that this previous state exists. 988 This behavior will cause similar issues independent of SSRC or 989 Session multiplexing. It might be possible in certain applications 990 to limit the changes to a subset of communication session 991 participants by have the sub-set use particular RTP Sessions. 993 6.2.3. Cross Session RTCP Requests 995 There currently exists no functionality to make truly synchronized 996 and atomic RTCP messages with some type of request semantics across 997 multiple RTP Sessions. Instead, separate RTCP messages will have to 998 be sent in each session. This gives SSRC multiplexed streams a 999 slight advantage as RTCP messages for different streams in the same 1000 session can be sent in a compound RTCP packet. Thus providing an 1001 atomic operation if different modifications of different streams are 1002 requested at the same time. 1004 In Session multiplexed cases, the RTCP timing rules in the sessions 1005 and the transport aspects, such as packet loss and jitter, prevents a 1006 receiver from relying on atomic operations, forcing it to use more 1007 robust and forgiving mechanisms. 1009 6.2.4. Binding Related Sources 1011 A common problem in a number of various RTP extensions has been how 1012 to bind related sources together. This issue is common to SSRC 1013 multiplexing and Session Multiplexing, and any solution and 1014 recommendation related to the problem should work equally well with 1015 both methods to avoid creating barriers between using session 1016 multiplexing and SSRC multiplexing. 1018 The current solutions do not have these properties. There exists one 1019 solution for grouping RTP session together in SDP [RFC5888] to know 1020 which RTP session contains for example the FEC data for the source 1021 data in another session. However, this mechanism does not work on 1022 individual media flows and is thus not directly applicable to the 1023 problem. The other solution is also SDP based and can group SSRCs 1024 within a single RTP session [RFC5576]. Thus this mechanism can bind 1025 media streams in SSRC multiplexed cases. Both solutions have the 1026 shortcoming of being restricted to SDP based signalling and also do 1027 not work in cases where the session's dynamic properties are such 1028 that it is difficult or resource consuming to keep the list of 1029 related SSRCs up to date. 1031 One possible solution could be to mandate the same SSRC being used in 1032 all RTP session in case of session multiplexing. We do note that 1033 Section 8.3 of the RTP Specification [RFC3550] recommends using a 1034 single SSRC space across all RTP sessions for layered coding. 1035 However this recommendation has some downsides and is less applicable 1036 beyond the field of layered coding. To use the same sender SSRC in 1037 all RTP sessions from a particular end-point can cause issues if an 1038 SSRC collision occurs. If the same SSRC is used as the required 1039 binding between the streams, then all streams in the related RTP 1040 sessions must change their SSRC. This is extra likely to cause 1041 problems if the participant populations are different in the 1042 different sessions. For example, in case of large number of 1043 receivers having selected totally random SSRC values in each RTP 1044 session as RFC 3550 specifies, a change due to a SSRC collision in 1045 one session can then cause a new collision in another session. This 1046 cascading effect is not severe but there is an increased risk that 1047 this occurs for well populated sessions. In addition, being forced 1048 to change the SSRC affects all the related media streams; instead of 1049 having to re-synchronize only the originally conflicting stream, all 1050 streams will suddenly need to be re-synchronized with each other. 1051 This will prevent also the media streams not having an actual 1052 collision from being usable during the re-synchronization and also 1053 increases the time until synchronization is finalized. In addition, 1054 it requires exception handling in the SSRC generation. 1056 The above collision issue does not occur in case of having only one 1057 SSRC space across all sessions and all participants will be part of 1058 at least one session, like the base layer in layered encoding. In 1059 that case the only downside is the special behavior that needs to be 1060 well defined by anyone using this. But, having an exception behavior 1061 where the SSRC space is common across all session is an issue as this 1062 behavior does not fit all the RTP extensions or payload formats. It 1063 is possible to create a situation where the different mechanisms 1064 cannot be combined due to the non standard SSRC allocation behavior. 1066 Existing mechanisms with known issues: 1068 RTP Retransmission (RFC4588): Has two modes, one for SSRC 1069 multiplexing and one for Session multiplexing. The session 1070 multiplexing requires the same CNAME and mandates that the same 1071 SSRC is used in both sessions. Using the same SSRC does work but 1072 will potentially have issues in certain cases. In SSRC 1073 multiplexed mode the CNAME is used to bind media and 1074 retransmission streams together. However, if multiple media 1075 streams are sent from the same end-point in the same session this 1076 does not provide non-ambiguous binding. Therefore when the first 1077 retransmission request for a media stream is sent, one must not 1078 have another retransmission request outstanding for an SSRC which 1079 don't have a binding between the original SSRC and the 1080 retransmission stream's SSRC. This works but creates some 1081 limitations that can be avoided by a more explicit mechanism. The 1082 SDP based ssrc-group mechanism is sufficient in this case as long 1083 as the application can rely on the signalling based solution. 1085 Scalable Video Coding (RFC6190): As an example of scalable coding, 1086 SVC [RFC6190] has various modes. The Multi Session Transmission 1087 (MST) uses Session multiplexing to separate scalability layers. 1088 However, this specification has failed to be explicit on how these 1089 layers are bound together in cases where CNAME is not sufficient. 1090 CNAME is no longer sufficient when more than one media source 1091 occur within a session that has the same CNAME, for example due to 1092 multiple video cameras capturing the same lecture hall. This 1093 likely implies that a single SSRC space as recommend by Section 1094 8.3 of RTP [RFC3550] is to be used. 1096 Forward Error Correction: If some type of FEC or redundancy stream 1097 is being sent, it needs its own SSRC, with the exception of 1098 constructions like redundancy encoding [RFC2198]. Thus in case of 1099 transmitting the FEC in the same session as the source data, the 1100 inter SSRC relation within a session is needed. In case of 1101 sending the redundant data in a separate session from the source, 1102 the SSRC in each session needs to be related. This occurs for 1103 example in RFC5109 when using session separation of original and 1104 FEC data. SSRC multiplexing is not supported, only using 1105 redundant encoding is supported. 1107 This issue appears to need action to harmonize and avoid future 1108 shortcomings in extension specifications. A proposed solution for 1109 handling this issue is [I-D.westerlund-avtext-rtcp-sdes-srcname]. 1111 6.2.5. Forward Error Correction 1113 There exist a number of Forward Error Correction (FEC) based schemes 1114 for how to reduce the packet loss of the original streams. Most of 1115 the FEC schemes will protect a single source flow. The protection is 1116 achieved by transmitting a certain amount of redundant information 1117 that is encoded such that it can repair one or more packet loss over 1118 the set of packets they protect. This sequence of redundant 1119 information also needs to be transmitted as its own media stream, or 1120 in some cases instead of the original media stream. Thus many of 1121 these schemes create a need for binding the related flows as 1122 discussed above. They also create additional flows that need to be 1123 transported. Looking at the history of these schemes, there is both 1124 SSRC multiplexed and Session multiplexed solutions and some schemes 1125 that support both. 1127 Using a Session multiplexed solution provides good support for legacy 1128 when deploying FEC or changing the scheme used, in the sense that it 1129 supports the case where some set of receivers may not be able to 1130 utilize the FEC information. By placing it in a separate RTP 1131 session, it can easily be ignored. 1133 In usages involving multicast, having the FEC information on its own 1134 multicast group and RTP session allows for flexibility, for example 1135 when using Rapid Acquisition of Multicast Groups (RAMS) [RFC6285]. 1136 During the RAMS burst where data is received over unicast and where 1137 it is possible to combine with unicast based retransmission 1138 [RFC4588], there is no need to burst the FEC data related to the 1139 burst of the source media streams needed to catch up with the 1140 multicast group. This saves bandwidth to the receiver during the 1141 burst, enabling quicker catch up. When the receiver has caught up 1142 and joins the multicast group(s) for the source, it can at the same 1143 time join the multicast group with the FEC information. Having the 1144 source stream and the FEC in separate groups allow for easy 1145 separation in the Burst/Retransmission Source (BRS) without having to 1146 individually classify packets. 1148 6.2.6. Transport Translator Sessions 1150 A basic Transport Translator relays any incoming RTP and RTCP packets 1151 to the other participants. The main difference between SSRC 1152 multiplexing and Session multiplexing resulting from this use case is 1153 that for SSRC multiplexing it is not possible for a particular 1154 session participant to decide to receive a subset of media streams. 1155 When using separate RTP sessions for the different sets of media 1156 streams, a single participant can choose to leave one of the sessions 1157 but not the other. 1159 6.3. Interworking 1161 There are several different kinds of interworking, and this section 1162 discusses two related ones. The interworking between different 1163 applications and the implications of potentially different choices of 1164 usage of RTP's multiplexing points. The second topic relates to what 1165 limitations may have to be considered working with some legacy 1166 applications. 1168 6.3.1. Interworking Applications 1170 It is not uncommon that applications or services of similar usage, 1171 especially the ones intended for interactive communication, ends up 1172 in a situation where one want to interconnect two or more of these 1173 applications. From an RTP perspective this could be problem free if 1174 all the applications have made the same multiplexing choices, have 1175 the same capabilities in number of simultaneous media streams 1176 combined with the same set of RTP/RTCP extensions being supported. 1177 Unfortunately this may not always be true. 1179 In these cases one ends up in a situation where one might use a 1180 gateway to interconnect applications. This gateway then needs to 1181 change the multiplexing structure or adhere to limitations in each 1182 application. If one's goal is to make minimal amount of work in such 1183 a gateway, there are some multiplexing choices that one should avoid. 1184 The lowest amount of work represents solutions where one can take an 1185 SSRC from one RTP session in one application and forward it into 1186 another RTP session. For example if one has one application that has 1187 multiple SSRCs for one media type in one session and another 1188 application that instead has chosen to use multiple RTP sessions with 1189 only a single SSRC per end-point in each of these sessions. Then 1190 mapping an SSRC from the side with one session into an RTP session is 1191 possible. However mapping SSRC from different RTP sessions into a 1192 single RTP session has the potential of creating SSRC collisions, 1193 especially if an end-point has not generated independent random SSRC 1194 values in each RTP session. This issue is even more likely in a case 1195 where one side uses a single RTP session with multiple media types 1196 and the other uses different RTP session for different media or 1197 robustness mechanism such as retransmission [RFC4588]. Then it is 1198 more likely or maybe even required to use the same SSRC in the 1199 different RTP sessions. 1201 In cases where the used structure is incompatible, the gateway will 1202 need to make SSRC translation. Thus this incurs overhead and some 1203 potential loss of functionality. First of all, if one translates the 1204 SSRC in an RTP header then one will be forced to decrypt and re- 1205 encrypt if one uses SRTP and thus also needs to be part of the 1206 security association. Secondly, changing the SSRC also means that 1207 one needs to translate all RTCP messages. This can be more complex, 1208 but important so that the gateway does not end up having to terminate 1209 the end-to-end RTCP chain. In that case the gateway will need to be 1210 able to take the role of a true end-point in each session, which may 1211 include functions such as bit-rate adaptation and correctly respond 1212 to whatever RTCP extensions are being used, and then translate them 1213 or locally respond to them. Thirdly, an SSRC translation may require 1214 that one changes RTP payloads; for example, an RTP retransmission 1215 packet contains an original sequence number that must match the 1216 sequence number used in for the corresponding packet with the new 1217 SSRC. And for FEC packets this is even worse, as the original SSRC 1218 is included as part of the data for which FEC redundant data is 1219 calculated. A fourth issue is the potential for these gateways to 1220 block evolution of the applications by blocking unknown RTP and RTCP 1221 extensions that the regular application has been extended with. 1223 If one uses security functions, like SRTP, they can as seen above 1224 incur both additional risk due to the gateway needing to be in 1225 security association between the end-points, unless the gateway is on 1226 the transport level, and additional complexities in form of the 1227 decrypt-encrypt cycles needed for each forwarded packet. SRTP, due 1228 to its keying structure, also makes it hard to move a flow from one 1229 RTP session to another as each RTP session will have one or more 1230 different master keys and these must not be the same in multiple RTP 1231 sessions as that can result in two-time pads that completely breaks 1232 the confidentiality of the packets. 1234 An additional issue around interworking is that for multi-party 1235 applications it can be impossible to judge which different RTP 1236 multiplexing behaviors that will be used by end-points that attempt 1237 to join a session. Thus if one attempts to use a multiplexing choice 1238 that has poor interworking, one may have to switch at a later stage 1239 when someone wants to participate in a multi-party session using an 1240 RTP application supporting only another behavior. It is likely 1241 difficult to implement the switch without some media disruption. 1243 To summarize, certain types of applications are likely to be inter- 1244 worked. Sets of applications of similar type should strive to use 1245 the same multiplexing structure to avoid the need to make an RTP 1246 session level gateway. This as it incurs complexity costs, can force 1247 the gateway to be part of security associations, force SSRC 1248 translation and even payload translation which is also a potential 1249 hinder to application evolution. 1251 6.3.2. Multiple SSRC Legacy Considerations 1253 Historically, the most common RTP use cases have been point to point 1254 Voice over IP (VoIP) or streaming applications, commonly with no more 1255 than one media source per end-point and media type (typically audio 1256 and video). Even in conferencing applications, especially voice 1257 only, the conference focus or bridge has provided a single stream 1258 with a mix of the other participants to each participant. It is also 1259 common to have individual RTP sessions between each end-point and the 1260 RTP mixer. 1262 When establishing RTP sessions that may contain end-points that 1263 aren't updated to handle multiple streams following these 1264 recommendations, a particular application can have issues with 1265 multiple SSRCs within a single session. These issues include: 1267 1. Need to handle more than one stream simultaneously rather than 1268 replacing an already existing stream with a new one. 1270 2. Be capable of decoding multiple streams simultaneously. 1272 3. Be capable of rendering multiple streams simultaneously. 1274 RTP Session multiplexing could potentially avoid these issues if 1275 there is only a single SSRC at each end-point, and in topologies 1276 which appears like point to point as seen the end-point. However, 1277 forcing the usage of session multiplexing due to this reason would be 1278 a great mistake, as it is likely that a significant set of 1279 applications will need a combination of SSRC multiplexing of several 1280 media sources and session multiplexing for other aspects such as 1281 encoding alternatives, adding robustness or simply to support legacy. 1282 However, this issue does need consideration when deploying multiple 1283 media streams within an RTP session where legacy end-points may 1284 occur. 1286 6.4. Signalling Aspects 1288 There exist various signalling solutions for establishing RTP 1289 sessions. Many are SDP [RFC4566] based, however SDP functionality is 1290 also dependent on the signalling protocols carrying the SDP. Where 1291 RTSP [RFC2326] and SAP [RFC2974] both use SDP in a declarative 1292 fashion, while SIP [RFC3261] uses SDP with the additional definition 1293 of Offer/Answer [RFC3264]. The impact on signalling and especially 1294 SDP needs to be considered as it can greatly affect how to deploy a 1295 certain multiplexing point choice. 1297 6.4.1. Session Oriented Properties 1299 One aspect of the existing signalling is that it is focused around 1300 sessions, or at least in the case of SDP the media description. 1301 There are a number of things that are signalled on a session level/ 1302 media description but those are not necessarily strictly bound to an 1303 RTP session and could be of interest to signal specifically for a 1304 particular media stream (SSRC) within the session. The following 1305 properties have been identified as being potentially useful to signal 1306 not only on RTP session level: 1308 o Bitrate/Bandwidth exist today only at aggregate or a common any 1309 media stream limit 1311 o Which SSRC that will use which RTP Payload Types 1313 Some of these issues are clearly SDP's problem rather than RTP 1314 limitations. However, if the aim is to deploy an SSRC multiplexed 1315 solution that contains several sets of media streams with different 1316 properties (encoding/packetization parameter, bit-rate, etc), putting 1317 each set in a different RTP session would directly enable negotiation 1318 of the parameters for each set. If insisting on SSRC multiplexing 1319 only, a number of signalling extensions are needed to clarify that 1320 there are multiple sets of media streams with different properties 1321 and that they shall in fact be kept different, since a single set 1322 will not satisfy the application's requirements. 1324 This does in fact create a strong driver to use RTP session 1325 multiplexing for any case where different sets of media streams with 1326 different requirements exist. 1328 6.4.2. SDP Prevents Multiple Media Types 1330 SDP encoded in its structure prevention against using multiple media 1331 types in the same RTP session. A media description in SDP can only 1332 have a single media type; audio, video, text, image, application. 1333 This media type is used as the top-level media type for identifying 1334 the actual payload format bound to a particular payload type using 1335 the rtpmap attribute. Thus a high fence against using multiple media 1336 types in the same session was created. 1338 There is an accepted WG item in the MMUSIC WG to define how multiple 1339 media lines describe a single underlying transport 1340 [I-D.holmberg-mmusic-sdp-bundle-negotiation] and thus it becomes 1341 possible in SDP to define one RTP session with multiple media types. 1343 6.4.3. Media Stream Usage 1345 Media streams being transported in RTP has some particular usage in 1346 an RTP application. This usage of the media stream is in many 1347 applications so far implicitly signalled. For example by having all 1348 audio media streams arriving in the only audio RTP session they are 1349 to be decoded, mixed and played out. However, in more advanced 1350 applications that use multiple media streams there will be more than 1351 a single usage or purpose among the set of media streams being sent 1352 or received. RTP applications will need to signal this usage 1353 somehow. Here the choice of SSRC multiplexing versus session 1354 multiplexing will have significant impact. If one uses SSRC 1355 multiplexing to its full extent one will have to explicitly indicate 1356 for each SSRC what its' usage and purpose are using some signalling 1357 between the application instances. 1359 This SSRC usage signalling will have some impact on the application 1360 and also on any central RTP nodes. It is important in the design to 1361 consider the implications of the need for additional signalling 1362 between the nodes. One consideration is if a receiver can utilize 1363 the media stream at all before it has received the signalling message 1364 describing the media stream and its usage. Another consideration is 1365 that any RTP central node, like an RTP mixer or translator that 1366 selects, mixes or processes streams, in most cases will need to 1367 receive the same signalling to know how to treat media streams with 1368 different usage in the right fashion. 1370 Application designers should consider putting media streams of the 1371 same usage and/or receiving the same treatment in middleboxes in the 1372 same RTP sessions and use the RTP session as an explicit indication 1373 of how to deal with media streams. By having session level 1374 indication of usage and have different RTP sessions for different 1375 usages, the need for stream specific signalling can be reduced. 1376 Especially signalling of the type that is time critical and needs to 1377 be provided prior to the media stream being available. 1379 6.5. Network Aspects 1381 The multiplexing choice has impact on network level mechanisms that 1382 need to be considered by the implementor. 1384 6.5.1. Quality of Service 1386 When it comes to Quality of Service mechanisms, they are either flow 1387 based or marking based. RSVP [RFC2205] is an example of a flow based 1388 mechanism, while Diff-Serv [RFC2474] is an example of a Marking based 1389 one. For a marking based scheme, the method of multiplexing will not 1390 affect the possibility to use QoS. 1392 However, for a flow based scheme there is a clear difference between 1393 the methods. SSRC multiplexing will result in all media streams 1394 being part of the same 5-tuple (protocol, source address, destination 1395 address, source port, destination port) which is the most common 1396 selector for flow based QoS. Thus, separation of the level of QoS 1397 between media streams is not possible. That is however possible for 1398 session based multiplexing, where each different version can be in a 1399 different RTP session that can be sent over different 5-tuples. 1401 6.5.2. NAT and Firewall Traversal 1403 In today's network there exist a large number of middleboxes. The 1404 ones that normally have most impact on RTP are Network Address 1405 Translators (NAT) and Firewalls (FW). 1407 Below we analyze and comment on the impact of requiring more 1408 underlying transport flows in the presence of NATs and Firewalls: 1410 End-Point Port Consumption: A given IP address only has 65536 1411 available local ports per transport protocol for all consumers of 1412 ports that exist on the machine. This is normally never an issue 1413 for an end-user machine. It can become an issue for servers that 1414 handle large number of simultaneous streams. However, if the 1415 application uses ICE to authenticate STUN requests, a server can 1416 serve multiple end-points from the same local port, and use the 1417 whole 5-tuple (source and destination address, source and 1418 destination port, protocol) as identifier of flows after having 1419 securely bound them to the remote end-point address using the STUN 1420 request. In theory the minimum number of media server ports 1421 needed are the maximum number of simultaneous RTP Sessions a 1422 single end-point may use. In practice, implementation will 1423 probably benefit from using more server ports to simplify 1424 implementation or avoid performance bottlenecks. 1426 NAT State: If an end-point sits behind a NAT, each flow it generates 1427 to an external address will result in a state that has to be kept 1428 in the NAT. That state is a limited resource. In home or Small 1429 Office/Home Office (SOHO) NATs, memory or processing are usually 1430 the most limited resources. For large scale NATs serving many 1431 internal end-points, available external ports are typically the 1432 scarce resource. Port limitations is primarily a problem for 1433 larger centralized NATs where end-point independent mapping 1434 requires each flow to use one port for the external IP address. 1435 This affects the maximum number of internal users per external IP 1436 address. However, it is worth pointing out that a real-time video 1437 conference session with audio and video is likely using less than 1438 10 UDP flows, compared to certain web applications that can use 1439 100+ TCP flows to various servers from a single browser instance. 1441 NAT Traversal Excess Time: Making the NAT/FW traversal takes a 1442 certain amount of time for each flow. It also takes time in a 1443 phase of communication between accepting to communicate and the 1444 media path being established which is fairly critical. The best 1445 case scenario for how much extra time it can take following the 1446 specified ICE procedures are: 1.5*RTT + Ta*(Additional_Flows-1), 1447 where Ta is the pacing timer, which ICE specifies to be no smaller 1448 than 20 ms. That assumes a message in one direction, and then an 1449 immediate triggered check back. This as ICE first finds one 1450 candidate pair that works prior to establish multiple flows. 1451 Thus, there is no extra time until one has found a working 1452 candidate pair. Based on that working pair the needed extra time 1453 is to in parallel establish the, in most cases 2-3, additional 1454 flows. 1456 NAT Traversal Failure Rate: Due to the need to establish more than a 1457 single flow through the NAT, there is some risk that establishing 1458 the first flow succeeds but that one or more of the additional 1459 flows fail. The risk that this happens is hard to quantify, but 1460 it should be fairly low as one flow from the same interfaces has 1461 just been successfully established. Thus only rare events such as 1462 NAT resource overload, or selecting particular port numbers that 1463 are filtered etc, should be reasons for failure. 1465 Deep Packet Inspection and Multiple Streams: Firewalls differ in how 1466 deeply they inspect packets. There exist some potential that 1467 deeply inspecting firewalls will have similar legacy issues with 1468 multiple SSRCs as some stack implementations. 1470 SSRC multiplexing keeps additional media streams within one RTP 1471 Session and does not introduce any additional NAT traversal 1472 complexities per media stream. In contrast, the session multiplexing 1473 is using one RTP session per media stream. Thus additional lower 1474 layer transport flows will be required, unless an explicit de- 1475 multiplexing layer is added between RTP and the transport protocol. 1476 A proposal for how to multiplex multiple RTP sessions over the same 1477 single lower layer transport exist in 1478 [I-D.westerlund-avtcore-single-transport-multiplexing]. 1480 6.5.3. Multicast 1482 Multicast groups provides a powerful semantics for a number of real- 1483 time applications, especially the ones that desire broadcast-like 1484 behaviors with one end-point transmitting to a large number of 1485 receivers, like in IPTV. But that same semantics do result in a 1486 certain number of limitations. 1488 One limitation is that for any group, sender side adaptation to the 1489 actual receiver properties causes degradation for all participants to 1490 what is supported by the receiver with the worst conditions among the 1491 group participants. In most cases this is not acceptable. Instead 1492 various receiver based solutions are employed to ensure that the 1493 receivers achieve best possible performance. By using scalable 1494 encoding and placing each scalability layer in a different multicast 1495 group, the receiver can control the amount of traffic it receives. 1496 To have each scalability layer on a different multicast group, one 1497 RTP session per multicast group is used. 1499 If instead a single RTP session over multiple transports were to be 1500 deployed, i.e. multicast groups with each layer as it's own SSRC, 1501 then very different views of the RTP session would exist. That as 1502 one receiver may see only a single layer (SSRC), while another may 1503 see three SSRCs if it joined three multicast groups. This would 1504 cause disjoint RTCP reports where a management system would not be 1505 able to determine if a receiver isn't reporting on a particular SSRC 1506 due to that it is not a member of that multicast group, or because it 1507 doesn't receive it as a result of a transport failure. 1509 Thus it appears easiest and most straightforward to use multiple RTP 1510 sessions. In addition, the transport flow considerations in 1511 multicast are a bit different from unicast. First of all there is no 1512 shortage of port space, as each multicast group has its own port 1513 space. 1515 6.5.4. Multiplexing multiple RTP Session on a Single Transport 1517 For applications that doesn't need flow based QoS and like to save 1518 ports and NAT/FW traversal costs and where usage of multiple media 1519 types in one RTP session is not suitable, there is a proposal for how 1520 to achieve multiplexing of multiple RTP sessions over the same lower 1521 layer transport 1522 [I-D.westerlund-avtcore-single-transport-multiplexing]. Using such a 1523 solution would allow session multiplexing without most of the 1524 perceived downsides of additional RTP sessions creating a need for 1525 additional transport flows. 1527 6.6. Security Aspects 1529 On the basic level there is no significant difference in security 1530 when having one RTP session and having multiple. However, there are 1531 a few more detailed considerations that might need to be considered 1532 in certain usages. 1534 6.6.1. Security Context Scope 1536 When using SRTP [RFC3711] the security context scope is important and 1537 can be a necessary differentiation in some applications. As SRTP's 1538 crypto suites (so far) is built around symmetric keys, the receiver 1539 will need to have the same key as the sender. This results in that 1540 no one in a multi-party session can be certain that a received packet 1541 really was sent by the claimed sender or by another party having 1542 access to the key. In most cases this is a sufficient security 1543 property, but there are a few cases where this does create 1544 situations. 1546 The first case is when someone leaves a multi-party session and one 1547 wants to ensure that the party that left can no longer access the 1548 media streams. This requires that everyone re-keys without 1549 disclosing the keys to the excluded party. 1551 A second case is when using security as an enforcing mechanism for 1552 differentiation. Take for example a scalable layer or a high quality 1553 simulcast version which only premium users are allowed to access. 1554 The mechanism preventing a receiver from getting the high quality 1555 stream can be based on the stream being encrypted with a key that 1556 user can't access without paying premium, having the key-management 1557 limit access to the key. 1559 In the latter case it is likely easiest from signalling, transport 1560 (if done over multicast) and security to use a different RTP session. 1561 That way the user(s) not intended to receive a particular stream can 1562 easily be excluded. There is no need to have SSRC specific keys, 1563 which many of the key-management systems cannot handle. 1565 6.6.2. Key-Management for Multi-party session 1567 Performing key-management for Multi-party session can be a challenge. 1568 This section considers some of the issues. 1570 Transport translator based session cannot use Security Description 1571 [RFC4568] nor DTLS-SRTP [RFC5764] without an extension as each end- 1572 point provides its set of keys. In centralized conference, the 1573 signalling counterpart is a conference server and the media plane 1574 unicast counterpart (to which DTLS messages would be sent) is the 1575 translator. Thus an extension like Encrypted Key Transport 1576 [I-D.ietf-avt-srtp-ekt] is needed or a MIKEY [RFC3830] based solution 1577 that allows for keying all session participants with the same master 1578 key. 1580 Keying of multicast transported SRTP face similar challenges as the 1581 transport translator case. 1583 6.6.3. Complexity Implications 1585 The usage of security functions can surface complexity implications 1586 of the choice of multiplexing and topology. This becomes especially 1587 evident in RTP topologies having any type of middlebox that processes 1588 or modifies RTP/RTCP packets. Where there is very small overhead for 1589 a not secured RTP translator or mixer to rewrite an SSRC value in the 1590 RTP packet, the cost of doing it when using cryptographic security 1591 functions is higher. For example if using SRTP [RFC3711], the actual 1592 security context and exact crypto key are determined by the SSRC 1593 field value. If one changes it, the encryption and authentication 1594 tag must be performed using another key. Thus changing the SSRC 1595 value implies a decryption using the old SSRC and its security 1596 context followed by an encryption using the new one. 1598 There exist many valid cases where a middlebox will be forced to 1599 perform such cryptographic operations due to the intended purpose of 1600 the middlebox, for example a media transcoding RTP translator cannot 1601 avoid performing these operations as they will produce a different 1602 payload compared to the input. However, there exist some cases where 1603 another topology and/or multiplexing choice could avoid the 1604 complexities. 1606 6.7. Multiple Media Types in one RTP session 1608 Having different media types, like audio and video, in the same RTP 1609 sessions is not forbidden, only recommended against as earlier 1610 discussed in Section 6.2.1.1. When using multiple media types, there 1611 are a number of considerations: 1613 Payload Type gives Media Type: This solution is dependent on getting 1614 the media type from the Payload Type. Thus overloading this de- 1615 multiplexing point in a receiver making it serve two purposes. 1616 First to provide the main media type and determining the 1617 processing chain, then later for the exact configuration of the 1618 encoder and packetization. 1620 Payload Type field limitations: The total number of Payload Types 1621 available to use in an RTP session is fairly limited, especially 1622 if Multiplexing RTP Data and Control Packets on a Single Port 1623 [RFC5761] is used. For certain applications negotiating a large 1624 set of codes and configuration this may become an issue. 1626 An SSRC cannot use two clock rates simultaneously: The used RTP 1627 clock rate for an SSRC is determined from the payload type. As 1628 discussed in Appendix A it is not possible to simultaneously use 1629 two different clock rates for the same SSRC. Even switching clock 1630 rate once has potential issues if packet loss occurs at the same 1631 time. Different media types commonly have different clock rates 1632 preventing or creating issues to use two different media types for 1633 the same SSRC. 1635 Do not switch media types for an SSRC: The primary reasons to avoid 1636 switching from sending for example audio to sending video using 1637 the same SSRC is the implications on a receiver. When this 1638 happens, the processing chain in the receiver will have to switch 1639 from one media type to another. As the different media type's 1640 entire processing chains are different and are connected to 1641 different outputs it is difficult to reuse the decoding chain, 1642 which a normal codec change likely can. Instead the entire 1643 processing chain has to be torn down and replaced. In addition, 1644 there is likely a clock rate switching problem, possibly resulting 1645 in synchronization loss at the point of switching media type if 1646 some packet loss occurs. So this is a behavior that shall be 1647 avoided. 1649 RTCP Bit-rate Issues: If the media types are significantly different 1650 in bit-rate, the RTCP bandwidth rates assigned to each source in a 1651 session can result in interesting effects, like that the RTCP bit- 1652 rate share for an audio stream is larger than the actual audio 1653 bit-rate. In itself this doesn't cause any conflicts, only 1654 potentially unnecessary overhead. It is possible to avoid this 1655 using AVPF or SAVPF and setting trr-int parameter, which can bring 1656 down unnecessary regular reporting while still allowing for rapid 1657 feedback. 1659 De-composite end-points: De-composite nodes that rely on the regular 1660 network to separate audio and video to different devices do not 1661 work well with this session setup. If they are forced to work, 1662 all media receiver parts of a de-composite end-point will receive 1663 all media, thus doubling the bit-rate consumption for the end- 1664 point. 1666 Flow based QoS Separation: Flow based QoS mechanisms will see all 1667 the media streams in the RTP session as part of a single flow. 1668 Therefore there is no possibility to provide separated QoS 1669 behavior for the different media types or flows. 1671 RTP Mixers and Translators: An RTP mixer or Media Translator will 1672 also have to support this particular session setup, where it 1673 before could rely on the RTP session to determine what processing 1674 options should be applied to the incoming packets. 1676 Legacy Implementations: The use of multiple media types has the 1677 potential for even larger issues with legacy implementations than 1678 single media type SSRC multiplexing due to the occurrence of 1679 multiple media types among the payload type configurations. 1681 As can be seen, there is nothing in here that prevents using a single 1682 RTP session for multiple media types, however it does create a number 1683 of limitations and special case implementation requirements. So 1684 anyone considering using this setup should carefully review if the 1685 reasons for using a single RTP session are sufficient to motivate the 1686 needed special handling. 1688 7. Arch-Types 1690 This section discusses some arch-types of how RTP multiplexing can be 1691 used in applications to achieve certain goals and a summary of their 1692 implications. For each arch-type there is discussion of benefits and 1693 downsides. 1695 7.1. Single SSRC per Session 1697 In this arch-type each end-point in a point-to-point session has only 1698 a single SSRC, thus the RTP session contains only two SSRCs, one 1699 local and one remote. This session can be used both unidirectional, 1700 i.e. only a single media stream or bi-directional, i.e. both end- 1701 points have one media stream each. If the application needs 1702 additional media flows between the end-points, they will have to 1703 establish additional RTP sessions. 1705 The Pros: 1707 1. This arch-type has great legacy interoperability potential as it 1708 will not tax any RTP stack implementations. 1710 2. The signalling has good possibilities to negotiate and describe 1711 the exact formats and bit-rates for each media stream, especially 1712 using today's tools in SDP. 1714 3. It does not matter if usage or purpose of the media stream is 1715 signalled on media stream level or session level as there is no 1716 difference. 1718 4. It is possible to control security association per RTP session 1719 with current key-management. 1721 The Cons: 1723 a. The number of required RTP sessions cannot really be higher, 1724 which has the implications: 1726 * Linear growth of the amount of NAT/FW state with number of 1727 media streams. 1729 * Increased delay and resource consumption from NAT/FW 1730 traversal. 1732 * Likely larger signalling message and signalling processing 1733 requirement due to the amount of session related information. 1735 * Higher potential for a single media stream to fail during 1736 transport between the end-points. 1738 b. When the number of RTP sessions grows, the amount of explicit 1739 state for relating media stream also grows, linearly or possibly 1740 exponentially, depending on how the application needs to relate 1741 media streams. 1743 c. The port consumption may become a problem for centralized 1744 services, where the central node's port consumption grows rapidly 1745 with the number of sessions. 1747 d. For applications where the media streams are highly dynamic in 1748 their usage, i.e. entering and leaving, the amount of signalling 1749 can grow high. Issues arising from the timely establishment of 1750 additional RTP sessions can also arise. 1752 e. Cross session RTCP requests needs is likely to exist and may 1753 cause issues. 1755 f. If the same SSRC value is reused in multiple RTP sessions rather 1756 than being randomly chosen, interworking with applications that 1757 uses another multiplexing structure than this application will 1758 have issues and require SSRC translation. 1760 g. Cannot be used with Any Source Multicast (ASM) as one cannot 1761 guarantee that only two end-points participate as packet senders. 1762 Using SSM, it is possible to restrict to these requirements if no 1763 RTCP feedback is used. 1765 h. For most security mechanisms, each RTP session or transport flow 1766 requires individual key-management and security association 1767 establishment thus increasing the overhead. 1769 i. Does not support multiparty session within a session. Instead 1770 each multi-party participant will require an individual RTP 1771 session to a given end-point, even if a central node is used. 1773 RTP applications that need to inter-work with legacy RTP 1774 applications, like VoIP and video conferencing, can potentially 1775 benefit from this structure. However, a large number of media 1776 descriptions in SDP can also run into issues with existing 1777 implementations. For any application needing a larger number of 1778 media flows, the overhead can become very significant. This 1779 structure is also not suitable for multi-party sessions, as any given 1780 media stream from each participant, although having same usage in the 1781 application, must have its own RTP session. In addition, the dynamic 1782 behavior that can arise in multi-party applications can tax the 1783 signalling system and make timely media establishment more difficult. 1785 7.2. Multiple SSRCs of the Same Media Type 1787 In this arch-type, each RTP session serves only a single media type. 1788 The RTP session can contain multiple media streams, either from a 1789 single end-point or due to multiple end-points. This commonly 1790 creates a low number of RTP sessions, typically only two one for 1791 audio and one for video with a corresponding need for two listening 1792 ports when using RTP and RTCP multiplexing. 1794 The Pros: 1796 1. Low number of RTP sessions needed compared to single SSRC case. 1797 This implies: 1799 * Reduced NAT/FW state 1801 * Lower NAT/FW Traversal Cost in both processing and delay. 1803 2. Allows for early de-multiplexing in the processing chain in RTP 1804 applications where all media streams of the same type have the 1805 same usage in the application. 1807 3. Works well with media type de-composite end-points. 1809 4. Enables Flow-based QoS with different prioritization between 1810 media types. 1812 5. For applications with dynamic usage of media streams, i.e. they 1813 come and go frequently, having much of the state associated with 1814 the RTP session rather than an individual SSRC can avoid the need 1815 for in-session signalling of meta-information about each SSRC. 1817 6. Low overhead for security association establishment. 1819 The Cons: 1821 a. May have some need for cross session RTCP requests for things 1822 that affect both media types in an asynchronous way. 1824 b. Some potential for concern with legacy implementations that does 1825 not support the RTP specification fully when it comes to handling 1826 multiple SSRC per end-point. 1828 c. Will not be able to control security association for sets of 1829 media streams within the same media type with today's key- 1830 management mechanisms, only between SDP media descriptions. 1832 For RTP applications where all media streams of the same media type 1833 share same usage, this structure provides efficiency gains in amount 1834 of network state used and provides more faith sharing with other 1835 media flows of the same type. At the same time, it is still 1836 maintaining almost all functionalities when it comes to negotiation 1837 in the signalling of the properties for the individual media type and 1838 also enabling flow based QoS prioritization between media types. It 1839 handles multi-party session well, independently of multicast or 1840 centralized transport distribution, as additional sources can 1841 dynamically enter and leave the session. 1843 7.3. Multiple Sessions for one Media type 1845 In this arch-type one goes one step further than in the above 1846 (Section 7.2) by using multiple RTP sessions also for a single media 1847 type. The main reason for going in this direction is that the RTP 1848 application needs separation of the media streams due to their usage. 1849 Some typical reasons for going to this arch-type are scalability over 1850 multicast, simulcast, need for extended QoS prioritization of media 1851 streams due to their usage in the application, or the need for fine 1852 granular signalling using today's tools. 1854 The Pros: 1856 1. More suitable for Multicast usage where receivers can 1857 individually select which RTP sessions they want to participate 1858 in, assuming each RTP session has its own multicast group. 1860 2. Detailed indication of the application's usage of the media 1861 stream, where multiple different usages exist. 1863 3. Less need for SSRC specific explicit signalling for each media 1864 stream and thus reduced need for explicit and timely signalling. 1866 4. Enables detailed QoS prioritization for flow based mechanisms. 1868 5. Works well with de-composite end-points. 1870 6. Handles dynamic usage of media streams well. 1872 7. For transport translator based multi-party sessions, this 1873 structure allows for improved control of which type of media 1874 streams an end-point receives. 1876 8. The scope for who is included in a security association can be 1877 structured around the different RTP sessions, thus enabling such 1878 functionality with existing key-management. 1880 The Cons: 1882 a. Increases the amount of RTP sessions compared to Multiple SSRCs 1883 of the Same Media Type. 1885 b. Increased amount of session configuration state. 1887 c. May need synchronized cross-session RTCP requests and require 1888 some consideration due to this. 1890 d. For media streams that are part of scalability, simulcast or 1891 transport robustness it will be needed to bind sources, which 1892 must support multiple RTP sessions. 1894 e. Some potential for concern with legacy implementations that does 1895 not support the RTP specification fully when it comes to handling 1896 multiple SSRC per end-point. 1898 f. Higher overhead for security association establishment. 1900 g. If the applications need finer control than on media type level 1901 over which session participants that are included in different 1902 sets of security associations, most of today's key-management 1903 will have difficulties establishing such a session. 1905 For more complex RTP applications that have several different usages 1906 for media streams of the same media type and / or uses scalability or 1907 simulcast, this solution can enable those functions at the cost of 1908 increased overhead associated with the additional sessions. This 1909 type of structure is suitable for more advanced applications as well 1910 as multicast based applications requiring differentiation to 1911 different participants. 1913 7.4. Multiple Media Types in one Session 1915 This arch-type is to use a single RTP session for multiple different 1916 media types, like audio and video, and possibly also transport 1917 robustness mechanisms like FEC or Retransmission. Each media stream 1918 will use its own SSRC and a given SSRC value from a particular end- 1919 point will never use the SSRC for more than a single media type. 1921 The Pros: 1923 1. Single RTP session which implies: 1925 * Minimal NAT/FW state. 1927 * Minimal NAT/FW Traversal Cost. 1929 * Fate-sharing for all media flows. 1931 2. Enables separation of the different media types based on the 1932 payload types so media type specific end-point or central 1933 processing can still be supported despite single session. 1935 3. Can handle dynamic allocations of media streams well on an RTP 1936 level. Depends on the application's needs for explicit 1937 indication of the stream usage and how timely that can be 1938 signalled. 1940 4. Minimal overhead for security association establishment. 1942 The Cons: 1944 a. Not suitable for interworking with other applications that uses 1945 individual RTP sessions per media type or multiple sessions for a 1946 single media type, due to high risk of forced SSRC translation. 1948 b. Negotiation of bandwidth for the different media types is 1949 currently not possible in SDP. This requires SDP extensions to 1950 enable payload or source specific bandwidth. Likely to be a 1951 problem due to media type asymmetry in required bandwidth. 1953 c. Does enforce higher bandwidth and processing on de-composite end- 1954 points. 1956 d. Flow based QoS cannot provide separate treatment to some media 1957 streams compared to other in the single RTP session. 1959 e. If there is significant asymmetry between the media streams RTCP 1960 reporting needs, there are some challenges in configuration and 1961 usage to avoid wasting RTCP reporting on the media stream that 1962 does not need that frequent reporting. 1964 f. Not suitable for applications where some receivers like to 1965 receive only a subset of the media streams, especially if 1966 multicast or transport translator is being used. 1968 g. Additional concern with legacy implementations that does not 1969 support the RTP specification fully when it comes to handling 1970 multiple SSRC per end-point, as also multiple simultaneous media 1971 types needs to be handled. 1973 h. If the applications need finer control over which session 1974 participants that are included in different sets of security 1975 associations, most key-management will have difficulties 1976 establishing such a session. 1978 The analysis in this document and considerations in Section 6.7 1979 implies that this is suitable only in a set of restricted use cases. 1980 The aspect in the above list that can be most difficult to judge long 1981 term is likely the potential need for interworking with other 1982 applications and services. 1984 7.5. Summary 1986 There are some clear relations between these arch-types. Both the 1987 "single SSRC per RTP session" and the "multiple media types in one 1988 session" are cases which require full explicit signalling of the 1989 media stream relations. However, they operate on two different 1990 levels where the first primarily enables session level binding, and 1991 the second needs to do it all on SSRC level. From another 1992 perspective, the two solutions are the two extreme points when it 1993 comes to number of RTP sessions required. 1995 The two other arch-types "Multiple SSRCs of the Same Media Type" and 1996 "Multiple Sessions for one Media Type" are examples of two other 1997 cases that first of all allows for some implicit mapping of the role 1998 or usage of the media streams based on which RTP session they appear 1999 in. It thus potentially allows for less signalling and in particular 2000 reduced need for real-time signalling in dynamic sessions. They also 2001 represent points in between the first two when it comes to amount of 2002 RTP sessions established, i.e. representing an attempt to reduce the 2003 amount of sessions as much as possible without compromising the 2004 functionality the session provides both on network level and on 2005 signalling level. 2007 8. Guidelines 2009 This section contains a number of recommendations for implementors or 2010 specification writers when it comes to handling multi-stream. 2012 Do not Require the same SSRC across Sessions: As discussed in 2013 Section 6.2.4 there exist drawbacks in using the same SSRC in 2014 multiple RTP sessions as a mechanism to bind related media streams 2015 together. It is instead recommended that a mechanism to 2016 explicitly signal the relation is used, either in RTP/RTCP or in 2017 the used signalling mechanism that establishes the RTP session(s). 2019 Use SSRC multiplexing for additional Media Sources: In the cases an 2020 RTP end-point needs to transmit additional media source(s) of the 2021 same media type and purpose in the application, it is recommended 2022 to send them as additional SSRCs in the same RTP session. For 2023 example a tele-presence room where there are three cameras, and 2024 each camera captures 2 persons sitting at the table, sending each 2025 camera as its own SSRC within a single RTP session is recommended. 2027 Use additional RTP sessions for streams with different purposes: 2028 When media streams have different purpose or processing 2029 requirements it is recommended that the different types of streams 2030 are put in different RTP sessions. 2032 When using Session Multiplexing use grouping: When using Session 2033 Multiplexing solutions, it is recommended to be explicitly group 2034 the involved RTP sessions using the signalling mechanism, for 2035 example The Session Description Protocol (SDP) Grouping Framework. 2036 [RFC5888], using some appropriate grouping semantics. 2038 RTP/RTCP Extensions May Support SSRC and Session Multiplexing: When 2039 defining an RTP or RTCP extension, the creator needs to consider 2040 if this extension is applicable in both SSRC multiplexed and 2041 Session multiplexed usages. Any extension intended to be generic 2042 is recommended to support both. Applications that are not as 2043 generally applicable will have to consider if interoperability is 2044 better served by defining a single solution or providing both 2045 options. 2047 Transport Support Extensions: When defining new RTP/RTCP extensions 2048 intended for transport support, like the retransmission or FEC 2049 mechanisms, they are recommended to include support for both SSRC 2050 and Session multiplexing so that application developers can choose 2051 freely from the set of mechanisms without concerning themselves 2052 with which of the multiplexing choices a particular solution 2053 supports. 2055 9. Proposal for Future Work 2057 The above discussion and guidelines indicates that a small set of 2058 extension mechanisms could greatly improve the situation when it 2059 comes to using multiple streams independently of Session multiplexing 2060 or SSRC multiplexing. These extensions are: 2062 Media Source Identification: A Media source identification that can 2063 be used to bind together media streams that are related to the 2064 same media source. A proposal 2065 [I-D.westerlund-avtext-rtcp-sdes-srcname] exist for a new SDES 2066 item SRCNAME that also can be used with the a=ssrc SDP attribute 2067 to provide signalling layer binding information. 2069 SSRC limitations within RTP sessions: By providing a signalling 2070 solution that allows the signalling peers to explicitly express 2071 both support and limitations on how many simultaneous media 2072 streams an end-point can handle within a given RTP Session. That 2073 ensures that usage of SSRC multiplexing occurs when supported and 2074 without overloading an end-point. This extension is proposed in 2075 [I-D.westerlund-avtcore-max-ssrc]. 2077 10. RTP Specification Clarifications 2079 This section describes a number of clarifications to the RTP 2080 specifications that are likely necessary for aligned behavior when 2081 RTP sessions contain more SSRCs than one local and one remote. 2083 10.1. RTCP Reporting from all SSRCs 2085 When one have multiple SSRC in an RTP node, all these SSRC must send 2086 RTCP SR or RR as long as the SSRC exist. It is not sufficient that 2087 only one SSRC in the node sends report blocks on the incoming RTP 2088 streams. The reason for this is that a third party monitor may not 2089 necessarily be able to determine that all these SSRC are in fact co- 2090 located and originate from the same stack instance that gather report 2091 data. 2093 10.2. RTCP Self-reporting 2095 For any RTP node that sends more than one SSRC, there is the question 2096 if SSRC1 needs to report its reception of SSRC2 and vice versa. The 2097 reason that they in fact need to report on all other local streams as 2098 being received is report consistency. A third party monitor that 2099 considers the full matrix of media streams and all known SSRC reports 2100 on these media streams would detect a gap in the reports which could 2101 be a transport issue unless identified as in fact being sources from 2102 same node. 2104 10.3. Combined RTCP Packets 2106 When a node contains multiple SSRCs, it is questionable if an RTCP 2107 compound packet can only contain RTCP packets from a single SSRC or 2108 if multiple SSRCs can include their packets in a joint compound 2109 packet. The high level question is a matter for any receiver 2110 processing on what to expect. In addition to that question there is 2111 the issue of how to use the RTCP timer rules in these cases, as the 2112 existing rules are focused on determining when a single SSRC can 2113 send. 2115 11. IANA Considerations 2117 This document makes no request of IANA. 2119 Note to RFC Editor: this section may be removed on publication as an 2120 RFC. 2122 12. Security Considerations 2124 There is discussion of the security implications of choosing SSRC vs 2125 Session multiplexing in Section 6.6. 2127 13. Acknowledgements 2129 The authors would like to thanks Harald Alvestrand for providing 2130 input into the discussion regarding multiple media types in a single 2131 RTP session. 2133 14. References 2135 14.1. Normative References 2137 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2138 Requirement Levels", BCP 14, RFC 2119, March 1997. 2140 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 2141 Jacobson, "RTP: A Transport Protocol for Real-Time 2142 Applications", STD 64, RFC 3550, July 2003. 2144 14.2. Informative References 2146 [ALF] Clark, D. and D. Tennenhouse, "Architectural 2147 Considerations for a New Generation of Protocols", SIGCOMM 2148 Symposium on Communications Architectures and 2149 Protocols (Philadelphia, Pennsylvania), pp. 200--208, IEEE 2150 Computer Communications Review, Vol. 20(4), 2151 September 1990. 2153 [I-D.alvestrand-rtp-sess-neutral] 2154 Alvestrand, H., "Why RTP Sessions Should Be Content 2155 Neutral", draft-alvestrand-rtp-sess-neutral-00 (work in 2156 progress), December 2011. 2158 [I-D.holmberg-mmusic-sdp-bundle-negotiation] 2159 Holmberg, C. and H. Alvestrand, "Multiplexing Negotiation 2160 Using Session Description Protocol (SDP) Port Numbers", 2161 draft-holmberg-mmusic-sdp-bundle-negotiation-00 (work in 2162 progress), October 2011. 2164 [I-D.ietf-avt-srtp-ekt] 2165 Wing, D., McGrew, D., and K. Fischer, "Encrypted Key 2166 Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03 2167 (work in progress), October 2011. 2169 [I-D.ietf-avtext-multiple-clock-rates] 2170 Petit-Huguenin, M., "Support for multiple clock rates in 2171 an RTP session", draft-ietf-avtext-multiple-clock-rates-02 2172 (work in progress), January 2012. 2174 [I-D.ietf-payload-rtp-howto] 2175 Westerlund, M., "How to Write an RTP Payload Format", 2176 draft-ietf-payload-rtp-howto-01 (work in progress), 2177 July 2011. 2179 [I-D.westerlund-avtcore-max-ssrc] 2180 Westerlund, M., Burman, B., and F. Jansson, "Multiple 2181 Synchronization sources (SSRC) in RTP Session Signaling", 2182 draft-westerlund-avtcore-max-ssrc (work in progress), 2183 October 2011. 2185 [I-D.westerlund-avtcore-single-transport-multiplexing] 2186 Westerlund, M., "Multiple RTP Session on a Single Lower- 2187 Layer Transport", 2188 draft-westerlund-avtcore-transport-multiplexing (work in 2189 progress), October 2011. 2191 [I-D.westerlund-avtext-rtcp-sdes-srcname] 2192 Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES 2193 Item SRCNAME to Label Individual Sources", 2194 draft-westerlund-avtext-rtcp-sdes-srcname (work in 2195 progress), October 2011. 2197 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 2198 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 2199 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 2200 September 1997. 2202 [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. 2203 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 2204 Functional Specification", RFC 2205, September 1997. 2206 [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time 2207 Streaming Protocol (RTSP)", RFC 2326, April 1998. 2209 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 2210 "Definition of the Differentiated Services Field (DS 2211 Field) in the IPv4 and IPv6 Headers", RFC 2474, 2212 December 1998. 2214 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 2215 Announcement Protocol", RFC 2974, October 2000. 2217 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 2218 A., Peterson, J., Sparks, R., Handley, M., and E. 2219 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 2220 June 2002. 2222 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 2223 with Session Description Protocol (SDP)", RFC 3264, 2224 June 2002. 2226 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 2227 Comfort Noise (CN)", RFC 3389, September 2002. 2229 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 2230 Video Conferences with Minimal Control", STD 65, RFC 3551, 2231 July 2003. 2233 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 2234 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 2235 RFC 3711, March 2004. 2237 [RFC3830] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. 2238 Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, 2239 August 2004. 2241 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 2242 Conversation", RFC 4103, June 2005. 2244 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 2245 Description Protocol", RFC 4566, July 2006. 2247 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 2248 Description Protocol (SDP) Security Descriptions for Media 2249 Streams", RFC 4568, July 2006. 2251 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 2252 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 2253 July 2006. 2255 [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for 2256 IP", RFC 4607, August 2006. 2258 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 2259 "Codec Control Messages in the RTP Audio-Visual Profile 2260 with Feedback (AVPF)", RFC 5104, February 2008. 2262 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 2263 January 2008. 2265 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 2266 Media Attributes in the Session Description Protocol 2267 (SDP)", RFC 5576, June 2009. 2269 [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding 2270 Dependency in the Session Description Protocol (SDP)", 2271 RFC 5583, July 2009. 2273 [RFC5760] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control 2274 Protocol (RTCP) Extensions for Single-Source Multicast 2275 Sessions with Unicast Feedback", RFC 5760, February 2010. 2277 [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and 2278 Control Packets on a Single Port", RFC 5761, April 2010. 2280 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer 2281 Security (DTLS) Extension to Establish Keys for the Secure 2282 Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. 2284 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 2285 Protocol (SDP) Grouping Framework", RFC 5888, June 2010. 2287 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 2288 "RTP Payload Format for Scalable Video Coding", RFC 6190, 2289 May 2011. 2291 [RFC6285] Ver Steeg, B., Begen, A., Van Caenegem, T., and Z. Vax, 2292 "Unicast-Based Rapid Acquisition of Multicast RTP 2293 Sessions", RFC 6285, June 2011. 2295 Appendix A. Dismissing Payload Type Multiplexing 2297 This section documents a number of reasons why using the payload type 2298 as a multiplexing point for most things related to multiple streams 2299 is unsuitable. If one attempts to use Payload type multiplexing 2300 beyond it's defined usage, that has well known negative effects on 2301 RTP. To use Payload type as the single discriminator for multiple 2302 streams implies that all the different media streams are being sent 2303 with the same SSRC, thus using the same timestamp and sequence number 2304 space. This has many effects: 2306 1. Putting restraint on RTP timestamp rate for the multiplexed 2307 media. For example, media streams that use different RTP 2308 timestamp rates cannot be combined, as the timestamp values need 2309 to be consistent across all multiplexed media frames. Thus 2310 streams are forced to use the same rate. When this is not 2311 possible, Payload Type multiplexing cannot be used. 2313 2. Many RTP payload formats may fragment a media object over 2314 multiple packets, like parts of a video frame. These payload 2315 formats need to determine the order of the fragments to 2316 correctly decode them. Thus it is important to ensure that all 2317 fragments related to a frame or a similar media object are 2318 transmitted in sequence and without interruptions within the 2319 object. This can relatively simple be solved on the sender side 2320 by ensuring that the fragments of each media stream are sent in 2321 sequence. 2323 3. Some media formats require uninterrupted sequence number space 2324 between media parts. These are media formats where any missing 2325 RTP sequence number will result in decoding failure or invoking 2326 of a repair mechanism within a single media context. The text/ 2327 T140 payload format [RFC4103] is an example of such a format. 2328 These formats will need a sequence numbering abstraction 2329 function between RTP and the individual media stream before 2330 being used with Payload Type multiplexing. 2332 4. Sending multiple streams in the same sequence number space makes 2333 it impossible to determine which Payload Type and thus which 2334 stream a packet loss relates to. 2336 5. If RTP Retransmission [RFC4588] is used and there is a loss, it 2337 is possible to ask for the missing packet(s) by SSRC and 2338 sequence number, not by Payload Type. If only some of the 2339 Payload Type multiplexed streams are of interest, there is no 2340 way of telling which missing packet(s) belong to the interesting 2341 stream(s) and all lost packets must be requested, wasting 2342 bandwidth. 2344 6. The current RTCP feedback mechanisms are built around providing 2345 feedback on media streams based on stream ID (SSRC), packet 2346 (sequence numbers) and time interval (RTP Timestamps). There is 2347 almost never a field to indicate which Payload Type is reported, 2348 so sending feedback for a specific media stream is difficult 2349 without extending existing RTCP reporting. 2351 7. The current RTCP media control messages [RFC5104] specification 2352 is oriented around controlling particular media flows, i.e. 2353 requests are done addressing a particular SSRC. Such mechanisms 2354 would need to be redefined to support Payload Type multiplexing. 2356 8. The number of payload types are inherently limited. 2357 Accordingly, using Payload Type multiplexing limits the number 2358 of streams that can be multiplexed and does not scale. This 2359 limitation is exacerbated if one uses solutions like RTP and 2360 RTCP multiplexing [RFC5761] where a number of payload types are 2361 blocked due to the overlap between RTP and RTCP. 2363 9. At times, there is a need to group multiplexed streams and this 2364 is currently possible for RTP Sessions and for SSRC, but there 2365 is no defined way to group Payload Types. 2367 10. It is currently not possible to signal bandwidth requirements 2368 per media stream when using Payload Type Multiplexing. 2370 11. Most existing SDP media level attributes cannot be applied on a 2371 per Payload Type level and would require re-definition in that 2372 context. 2374 12. A legacy end-point that doesn't understand the indication that 2375 different RTP payload types are different media streams may be 2376 slightly confused by the large amount of possibly overlapping or 2377 identically defined RTP Payload Types. 2379 Authors' Addresses 2381 Magnus Westerlund 2382 Ericsson 2383 Farogatan 6 2384 SE-164 80 Kista 2385 Sweden 2387 Phone: +46 10 714 82 87 2388 Email: magnus.westerlund@ericsson.com 2389 Bo Burman 2390 Ericsson 2391 Farogatan 6 2392 SE-164 80 Kista 2393 Sweden 2395 Phone: +46 10 714 13 11 2396 Email: bo.burman@ericsson.com 2398 Colin Perkins 2399 University of Glasgow 2400 School of Computing Science 2401 Glasgow G12 8QQ 2402 United Kingdom 2404 Email: csp@csperkins.org