idnits 2.17.00 (12 Aug 2021) /tmp/idnits21604/draft-lennox-avt-app-sharing-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1.a on line 19. -- Found old boilerplate from RFC 3978, Section 5.5 on line 941. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 918. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 925. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 931. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: This document is an Internet-Draft and is subject to all provisions of Section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 2004) is 6359 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '15' is defined on line 858, but no explicit reference was found in the text == Unused Reference: '17' is defined on line 865, but no explicit reference was found in the text == Outdated reference: draft-ietf-mmusic-sdp-new has been published as RFC 4566 == Outdated reference: draft-ietf-avt-rtp-framing-contrans has been published as RFC 4571 -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '7' == Outdated reference: A later version (-02) exists of draft-barnes-xcon-framework-00 == Outdated reference: draft-ietf-avt-srtp has been published as RFC 3711 == Outdated reference: draft-ietf-mmusic-comedia-tls has been published as RFC 4572 -- Obsolete informational reference (is this intentional?): RFC 2032 (ref. '15') (Obsoleted by RFC 4587) Summary: 5 errors (**), 0 flaws (~~), 10 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Audio/Video Transport J. Lennox 3 Internet-Draft H. Schulzrinne 4 Expires: June 1, 2005 J. Nieh 5 R. Barrato 6 Columbia U. 7 December 2004 9 Protocols for Application and Desktop Sharing 10 draft-lennox-avt-app-sharing-00 12 Status of this Memo 14 This document is an Internet-Draft and is subject to all provisions 15 of section 3 of RFC 3667. By submitting this Internet-Draft, each 16 author represents that any applicable patent or other IPR claims of 17 which he or she is aware have been or will be disclosed, and any of 18 which he or she become aware will be disclosed, in accordance with 19 RFC 3668. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as 24 Internet-Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on June 1, 2005. 39 Copyright Notice 41 Copyright (C) The Internet Society (2004). 43 Abstract 45 This document defines several protocols to support accessing general 46 graphical user interface (GUI) desktops and applications remotely, 47 either by a single remote user or embedded into a multiparty 48 conference. The protocols are designed to allow sharing of, and 49 access to general windowing system applications that are not 50 expressly written to be accessed remotely. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . 5 58 3.2 Protocol Components . . . . . . . . . . . . . . . . . . . 5 59 4. Common Protocol Elements . . . . . . . . . . . . . . . . . . . 6 60 5. Output Protocols . . . . . . . . . . . . . . . . . . . . . . . 6 61 5.1 Window Identifiers and Output Meta-Format . . . . . . . . 6 62 5.2 Window State Protocol . . . . . . . . . . . . . . . . . . 7 63 5.3 Window Pixel Data . . . . . . . . . . . . . . . . . . . . 9 64 5.4 Pointer Representation . . . . . . . . . . . . . . . . . . 10 65 6. Input Protocols . . . . . . . . . . . . . . . . . . . . . . . 10 66 6.1 Keyboard Input . . . . . . . . . . . . . . . . . . . . . . 10 67 6.2 Pointer Position . . . . . . . . . . . . . . . . . . . . . 15 68 7. Implementation Notes . . . . . . . . . . . . . . . . . . . . . 16 69 8. Open issues . . . . . . . . . . . . . . . . . . . . . . . . . 16 70 9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 71 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . 18 72 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 73 11.1 Normative References . . . . . . . . . . . . . . . . . . . . 18 74 11.2 Informative References . . . . . . . . . . . . . . . . . . . 19 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 20 76 Intellectual Property and Copyright Statements . . . . . . . . 22 78 1. Introduction 80 While two-party and multi-party conferencing using standards-based 81 protocols is now common and well-developed, protocols for sharing 82 applications are largely proprietary or based on the aging T.120 [8] 83 suite of protocols. In this document, we define a set of protocols 84 for application and desktop sharing. 86 We note that there are large similarities between remote access to an 87 application ("remote desktop") and by multiple users sharing an 88 application within a collaboration setting such as a multimedia call 89 or multiparty conference. The protocols defined in this document 90 therefore support both. 92 Remote access differs from video transmission of the sort for which 93 most video encodings have been designed. In particular, screen 94 encoding may need to be lossless and typically operates on artificial 95 rather than natural (photographic) video input. The video input is 96 characterized by large areas of the screen that remain unchanged for 97 long periods of time, while others change rapidly. (However, 98 rendering the output of a modern computer-generated animation 99 application such as video games blurs the distinction between 100 traditional motion video output and screen sharing.) 102 Unlike earlier systems, such as T.120, we believe that application 103 sharing should be integrated into the existing IETF session model, 104 encompassing session descriptions using the Session Description 105 Protocol (SDP) [1] or successors and the Session Initiation Protocol 106 (SIP) [9]. Application sharing needs many of the same control 107 functions as other multimedia sessions, such as address binding and 108 session feature and media negotiation. We believe that use of the 109 session model is also beneficial for the remote desktop case, as it 110 allows to re-use many of the well-developed session components and 111 easily supports hybrid models, such as the delivery of desktop audio 112 to the remote user. 114 Remote access to graphical applications and desktops, as defined in 115 this document, has two important characteristics. First, the access 116 protocol is unaware of any semantic characteristics of the 117 applications being shared; it only transmits the visual 118 characteristics of the windows. This is different, therefore, from 119 shared-drawing or shared-editing tools that allow distributed 120 modification of documents. Secondly, the protocol is designed to 121 work with applications which were not written to be used remotely, by 122 intercepting or simulating their connections to their native window 123 systems. In this way, it is distinguished from systems such as the X 124 Window System [10], which allow natively-written applications to be 125 displayed on remote viewers. 127 We distinguish between local and remote users. Local users employ 128 normal operating system mechanisms to interact with the running 129 application. Remote users interact via the delivery protocols 130 described here. 132 The application sharing problem can be divided into four components: 133 (1) setting up a session to the node running the application, (2) 134 transporting user input events from the remote viewers such as 135 conference participants to the application, (3) delivering screen 136 output from the application to the participants, (4) moderating 137 access to shared human interface devices such as pointing devices 138 (e.g., mice, joystick, trackball) and text input (keyboard). We 139 refer to components (2) and (3) as the "remoting protocol". They are 140 the focus of this document, and are described in Section 6 and 141 Section 5 respectively. 143 Session negotiation and description can be provided by existing 144 session setup protocols; user input access can be moderated by a 145 floor control protocol. Thus, these two components are beyond the 146 scope of this document, although they are important for an acceptable 147 overall user experience. 149 Applications are more than just windows; they are a stack of related 150 windows which serve the same task and are usually associated with the 151 same process on the server. Some applications impose special 152 constraints on the user input, e.g., through modal dialogs, which 153 temporarily exclusively acquire input focus, and floating 154 (always-on-top) windows. 156 The protocols described in this document are intended to fulfil the 157 requirements described in the Internet-Draft Sharing and Remote 158 Access to Applications [11]. 160 The rest of this document is laid out as follows. Section 2 defines 161 the common terminology for normative references. Section 3 gives an 162 overview of the protocol's architecture and components. Section 4 163 defines common elements of the output and input protocols, which are 164 then further described in Section 5 and Section 6 respectively. 165 Section 7 gives implementation notes, and Section 8 discusses open 166 issues with the design of the protocol. Finally, Section 9 discusses 167 security considerations, and Section 10 gives IANA considerations. 169 2. Terminology 171 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 172 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 173 and "OPTIONAL" are to be interpreted as described in RFC 2119 [2] and 174 indicate requirement levels for compliant implementations. 176 3. Overview 178 3.1 Architecture 180 Application and desktop sharing consists of two classes of 181 components: "viewers" and "application hosts". Viewers receive 182 remote graphics and provide input. Application hosts receive input 183 from local and remote users, and host and transmit applications and 184 graphics. 186 The application and desktop sharing models defined in this document 187 are integrated into the IETF conferencing model. In particular, the 188 Session Initiation Protocol (SIP) [9] is used to intiate and control 189 remote access. This allows the use of existing SIP mechanisms for 190 confidentiality, authentication and authorization, user location, 191 conferencing, etc. 193 In the IETF conferencing model media sessions can consist of multiple 194 participants; this document's protocols are designed to work in this 195 case. The various Centralized Conferencing (XCON) [12] control 196 protocols can be used for floor control, to determine which member of 197 a conference is permitted to send input to an application host at any 198 given time. Conferencing also gives rise to issues of late-joiners; 199 the protocol is designed to make it relatively easy for a protocol 200 relay, which receives input from one application host and forwards it 201 to multiple viewers, to send all the necessary information about a 202 sharing session to new conference arrivals. Similarly, it is 203 possible to record and replay a shared application session without 204 semantic awareness of the protocol. 206 3.2 Protocol Components 208 The three core components of desktop sharing are: input protocols 209 which represent user input from keyboards or pointing devices such as 210 mice, trackballs, or touchscreens; an output protocol which can 211 represent screen pixels and related data; and a negotiation mechanism 212 which can convey attributes of the session such as the desired size 213 of the screen. In addition, application sharing requires a mechanism 214 to represent window state, position and stacking. In this document, 215 the negotiation mechanism is defined in Section 4; output protocols, 216 including window-state handling, are defined in Section 5; and input 217 protocols are defined in Section 6. 219 Additional, optional mechanisms can enhance both window and 220 application sharing. Additional input mechanisms such as joysticks 221 or other game controllers can be supported; audio streams can be 222 associated with a desktop or application; viewer-side scaling and 223 porthole requests can be used to optimize transmission of data to 224 viewers with a small screen; and it is often useful to allow 225 copy-and-paste between applications running on a viewer and those 226 running on an an application host. This document does not define any 227 such extensions; they may be defined elsewhere. 229 4. Common Protocol Elements 231 Protocol negotiation is carried out using the Session Description 232 Protocol (SDP) [1], while all input and output protocols run over the 233 Real-Time Protocol (RTP) [3]. In most use cases for application and 234 desktop sharing, reliability is more important than latency, and flow 235 control and dynamic bandwidth adjustment are crucial. As such, 236 viewers and application hosts SHOULD use RTP Framing [4] to send the 237 RTP packets over TCP, unless there is a strong reason, such as the 238 need to distribute a desktop session over multicast, to do otherwise. 240 5. Output Protocols 242 5.1 Window Identifiers and Output Meta-Format 244 A shared application consists of a set of overlapping windows, 245 usually rectangular. Each window needs a unique identifier, and most 246 output data (other than audio or other non-visual output mechanisms 247 not specified here) needs to be associated with a particular window. 249 Windows in an application need to be created and destroyed relatively 250 frequently. Re-negotiating SDP descriptions whenever a window is 251 opened or closed would therefore not be practical, and so data for 252 multiple windows needs to be multiplexed into a single RTP stream. 253 Since multiple windows are from a single source, multiplexing on the 254 RTP SSRC (synchronization source) field would not be appropriate. 255 Instead, each window is assigned a unique window identifier. 257 Rather than require each payload type to define a field to carry this 258 window identifier we instead define a payload meta-format which 259 precedes each payload. The meta-format carries the relevant window 260 identifier and coordinates, and is then followed by the actual data 261 for the particular payload being sent. (This allows existing payload 262 type definitions to be re-used in the context of a window.) 264 This protocol has three mandatory format-specific parameters, which 265 are carried in an SDP "a=fmtp:" parameter. The parameter "height" 266 indicates the desktop height in pixels; "width" indicates the desktop 267 width in pixels. All images and other coordinates sent for this 268 protocol must lie within these boundaries. The third parameter is 269 "mode", which can take the value "desktop", indicating one big 270 drawing pane, or "application", indicating that individual windows 271 will created and destroyed as needed, and all drawing will occur in 272 individual windows. 274 In application mode, SDP "i=" lines for this protocol SHOULD contain 275 a human-readable description of the application. 277 0 1 2 3 278 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 279 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 280 . . 281 . RTP header . 282 . . 283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 284 | X Offset | 285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 286 | Y Offset | 287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 288 | Window ID | MBZ | PT | 289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 290 . . 291 . Payload . 292 . . 293 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 295 Figure 1: RTP putput payload meta-protocol 297 Figure 1 shows how all output data formats are defined. The "X 298 Offset" and "Y Offset" fields indicate the upper-left corner of the 299 area the message is describing. (In desktop mode, these coordinates 300 are relative to the desktop; in application mode, these coordinates 301 are relative to the enclosing window, except for window state 302 protocol messages.) The Window ID is a 16-bit unique identifier for 303 each window; these are assigned arbitrarily by the application host, 304 and SHOULD be recycled on a least-recently-used basis. If the 305 meta-protocol is running in desktop mode, the Window ID MUST always 306 be set to zero by the server and SHOULD be ignored by viewers. In 307 application mode, Window IDs are defined by the window state 308 protocol, defined in Section 5.2. Messages other than Window State 309 Protocol messages which reference unknown Window IDs SHOULD be 310 ignored. The PT indicates the actual payload type of the rest of the 311 data. The MBZ bits must be zero. 313 Fields of the RTP header other than the PT field are set as 314 appropriate for the enclosed payload type; the meta-protocol does not 315 define any specific uses for them. 317 5.2 Window State Protocol 319 The window state protocol is an output protocol which handles the 320 creation, destruction, resizing, raising and lowering, positioning, 321 and characteristics of application windows when the protocol is being 322 run in application mode. It MUST NOT be used in desktop mode. 324 The specific details and packet format of this protocol are not yet 325 defined. The rest of this section describes it at a high level. 327 Creating, moving, resizing, and raising or lowering a window are 328 indicated by the same message. This message starts with the common 329 meta-protocol header, whose coordinates indicate the position of the 330 window relative to the desktop. Following this is a code indicating 331 that this is a "window state" message, and the window's X and Y sizes 332 as 32-bit integers. Finally, the message contains a list of all the 333 application's windows, in Z order, bottom to top. (Listing the 334 entire Z order in every message helps prevent the Z order list from 335 getting out of sync between viewers and the application host.) A flag 336 in the Z order list can indicate that some windows should be 337 considered "always on top", and float above all non-"on top" other 338 windows on the viewer (shared or not). 340 The size and position given for a window includes all the "trim" 341 provided by its window manager -- title bars, frames, and the like. 342 (Otherwise, the remoting protocol would need to include input and 343 output messages to indicate window state changes and manipulation.) 345 Non-rectangular and translucent windows can additionally have an 346 alpha channel specified. This is sent as a black-and-white or 347 grayscale PNG [5] image corresponding to the window's transparent or 348 translucent pixels. Window alpha channels can change dynamically. 350 Window removal consists of the meta-protocol header followed by a 351 "window remove" code. On receipt of this message, a viewer erases 352 the corresponding window and removes it from the Z order list. The X 353 and Y coordinates given in the meta-protocol are ignored and SHOULD 354 be zero. 356 An additional message indicates the "pointer capture", in which a 357 window indicates that it should exclusively receive all pointer 358 events until it indicates otherwise. This is necessary when menus 359 are pulled down, for example; a window with a pulled-down menu 360 receives a "release menu" mouse click whether or not it the cursor is 361 still over the original window. "Stop pointer capture" is the same 362 message, with a flag set. 364 The window state protocol may need to carry additional information, 365 as well; see the open issues list in Section 8. 367 5.3 Window Pixel Data 369 There are three basic window data operations: pixel images, fills, 370 and block copies. Each operation uses the common meta-protocol 371 header. Each operation has its own MIME type, and thus a unique RTP 372 payload type in the meta-protocol PT field. 374 Pixel images contain arbitrary graphical data to be applied to 375 windows. They are conveyed as PNG [5] images. The PNG image follows 376 the meta-protocol header (which indicates the offset of the image 377 within the window or desktop) and consists of an area of the screen 378 to be updated. If the PNG image contains an alpha channel, the image 379 is composited with the existing contents of the window or desktop. 380 (The PNG images defining the initial graphical contents of a window 381 or desktop MUST NOT contain alpha channels.) In window mode, if a 382 window has an alpha channel with completely transparent pixels -- 383 i.e., if a window is non-rectangular -- the corresponding pixels in 384 the PNG image are ignored. 386 As an optimization, two additional window data operations are 387 defined. A fill defines an area of a window to be filled by a single 388 solid color. Following the meta-protocol header, it consists of a 389 height and width (specified as 32-bit coordinates), followed by the 390 fill color. Colors are specified as one byte each of Red, Green, and 391 Blue, i.e. as PNG color type 2 with 8-bit sample depth. (Color 392 sample depths greater than 8 bits per channel cannot be spsecified 393 with the fill operation, and must use the general PNG pixel image 394 form.) 396 The block copy operation copies a region of a desktop or window from 397 one position to another. Following the meta-protocol header (which 398 indicates the destination position) are the source position and size 399 (both as 32-bit coordinates). The destination region MAY overlap the 400 source region. Both the source and destination regions MUST NOT 401 extend beyond the boundaries of the window (in window mode) or 402 desktop (in desktop mode). In window mode, cross-window moves are 403 not supported. Portions of the source region which do not overlap 404 with the destination region remain unmodified. 406 Additionally, if the viewer and application host negotiate support 407 for other video/* MIME types, video streams can be sent following the 408 meta-protocol header. For video this will often be more efficient 409 than sending raw screen images. 411 In window mode, graphics to be drawn MUST NOT extend beyond the 412 boundaries of the window; in either mode, images to be drawn MUST NOT 413 extend beyond the defined borders of the desktop. (Window-related 414 images such as drop-down menus or tooltips which can extend beyond 415 the boundaries of a window SHOULD be transmitted as separate 416 windows.) 418 5.4 Pointer Representation 420 For efficiency, pointers can be represented separately from other 421 window data. This is accomplished by transmitting, in a special 422 protocol, PNG [5]s with alpha channels and hotspots for the pointers' 423 images, and then RFC 2862 [6] streams to indicate pointers' 424 positions. These protocols still need to be defined in detail. 426 6. Input Protocols 428 6.1 Keyboard Input 430 The viewer represents keyboard input to the server by sending a list 431 of depressed keys, updated whenever this state changes. (Note that 432 this is unlike how keys are represented in most window systems, which 433 instead use individual key-down and key-up events. The latter can be 434 derived from the former.) Key repetition is handled by the 435 application host. 437 There are two types of keys that can be represented. "Encoding keys" 438 are keys that encode a specific Unicode [7] character, whereas 439 "virtual keys" do not. Encoding keys are indicated by the Unicode 440 value of the character they encode. 442 Virtual keys are indicated by codes from the tables of virtual 443 keycodes listed in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, 444 Figure 7, Figure 8, and Figure 9. These codes are those of the X 445 Window System [10], taken from the header file . 446 They are all of X's keycodes with 0xFFnn values, except for the 447 XK_KP_* keypad keys. 449 Unicode control characters -- characters in the range 0x0000 - 0x001f 450 or 0x007f - 0x009f -- MUST NOT be sent. Instead, the corresponding 451 virtual key, or the virtual key Control (Left) or Control (Right) 452 (0xE3 or 0xE4) plus a Unicode codepoint, should be used. 454 Name Code Note 455 ---- ---- ---- 456 Backspace 0x08 Back space, back char 457 Tab 0x09 458 Linefeed 0x0A Linefeed, LF 459 Clear 0x0B 460 Return 0x0D Return, enter 461 Pause 0x13 Pause, hold 462 Scroll Lock 0x14 463 Sys Req 0x15 464 Escape 0x1B 465 Delete 0xFF Delete, rubout 467 These codes have been chosen to map to ASCII, for convenience of 468 programming, but could have been arbitrary (at the cost of lookup 469 tables in viewer code). 471 Figure 2: Virtual keycodes: teletype keys 473 Name Code Note 474 ---- ---- ---- 475 Multi key 0x20 Multi-key character compose 476 Code Input 0x37 477 Single Candidate 0x3C 478 Multiple Candidate 0x3D 479 Previous Candidate 0x3E 481 Figure 3: Virtual keycodes: international and multi-key character 482 composition 484 Name Code Note 485 ---- ---- ---- 486 Kanji 0x21 Kanji, Kanji convert 487 Muhenkan 0x22 Cancel Conversion 488 Henkan Mode 0x23 Start/Stop Conversion 489 Romaji 0x24 to Romaji 490 Hiragana 0x25 to Hiragana 491 Katakana 0x26 to Katakana 492 Hiragana/Katakana 0x27 Hiragana/Katakana toggle 493 Zenkaku 0x28 to Zenkaku 494 Hankaku 0x29 to Hankaku 495 Zenkaku/Hankaku 0x2A Zenkaku/Hankaku toggle 496 Touroku 0x2B Add to Dictionary 497 Massyo 0x2C Delete from Dictionary 498 Kana Lock 0x2D Kana Lock 499 Kana Shift 0x2E Kana Shift 500 Eisu Shift 0x2F Alphanumeric Shift 501 Eisu Toggle 0x30 Alphanumeric toggle 502 Kanji Bangou 0x37 Codeinput 503 Zen Koho 0x3D Multiple/All Candidate(s) 504 Mae Koho 0x3E Previous Candidate 506 Note that some of these codes are also used for equivalent Hangul 507 keyboard keys listed in Figure 9. 509 Figure 4: Virtual keys: Japanese keyboard support 511 Name Code Note 512 ---- ---- ---- 513 Home 0x50 514 Left 0x51 Move left, left arrow 515 Up 0x52 Move up, up arrow 516 Right 0x53 Move right, right arrow 517 Down 0x54 Move down, down arrow 518 Prior 0x55 Prior, previous 519 Page Up 0x55 520 Next 0x56 Next 521 Page Down 0x56 522 End 0x57 EOL 523 Begin 0x58 BOL 525 Figure 5: Virtual keycodes: cursor control and motion 527 Name Code Note 528 ---- ---- ---- 529 Select 0x60 Select, mark 530 Print 0x61 531 Execute 0x62 Execute, run, do 532 Insert 0x63 Insert, insert here 533 Undo 0x65 Undo, oops 534 Redo 0x66 Redo, again 535 Menu 0x67 536 Find 0x68 Find, search 537 Cancel 0x69 Cancel, stop, abort, exit 538 Help 0x6A Help 539 Break 0x6B 540 Mode switch 0x7E Character set switch (*) 541 Num Lock 0x7F 543 (*) The "Mode switch" key is variously used on Katakana, Arabic, 544 Greek, Hebrew, and Hangul keyboards to switch between the Roman and 545 native alphabets. 547 Figure 6: Virtual keycodes: miscellaneous functions 549 Name Code 550 ---- ---- 551 F1 0xBE 552 F2 0xBF 553 F3 0xC0 554 F4 0xC1 555 F5 0xC2 556 F6 0xC3 557 F7 0xC4 558 F8 0xC5 559 F9 0xC6 560 F10 0xC7 561 F11/L1 0xC8 562 F12/L2 0xC9 563 F13/L3 0xCA 564 F14/L4 0xCB 565 F15/L5 0xCC 566 F16/L6 0xCD 567 F17/L7 0xCE 568 F18/L8 0xCF 569 F19/L9 0xD0 570 F20/L10 0xD1 571 F21/R1 0xD2 572 F22/R2 0xD3 573 F23/R3 0xD4 574 F24/R4 0xD5 575 F25/R5 0xD6 576 F26/R6 0xD7 577 F27/R7 0xD8 578 F28/R8 0xD9 579 F29/R9 0xDA 580 F30/R10 0xDB 581 F31/R11 0xDC 582 F32/R12 0xDD 583 F33/R13 0xDE 584 F34/R14 0xDF 585 F35/R15 0xE0 587 Sun keyboards and a few other manufacturers have additional Left and 588 Right function key groups on the left and/or right sides of the 589 keyboard. 591 Figure 7: Virtual keycodes: auxiliary functions 593 Name Code Note 594 ---- ---- ---- 595 Shift (Left) 0xE1 596 Shift (Right) 0xE2 597 Control (Left) 0xE3 598 Control (Right) 0xE4 599 Caps Lock 0xE5 600 Shift Lock 0xE6 602 Meta (Left) 0xE7 Windows (Microsoft); Option (Macintosh) 603 Meta (Right) 0xE8 Windows (Microsoft); Option (Macintosh) 604 Alt (Left) 0xE9 Command (Macintosh) 605 Alt (Right) 0xEA Command (Macintosh) 606 Super (Left) 0xEB 607 Super (Right) 0xEC 608 Hyper (Left) 0xED 609 Hyper (Right) 0xEE 611 Application hosts which lack right-hand versions of modifiers SHOULD 612 treat them as though the left-hand version had been received. 614 Figure 8: Virtual keys: modifiers 616 Name Code Note 617 ---- ---- ---- 618 Hangul 0x31 Hangul start/stop(toggle) 619 Hangul Start 0x32 Hangul start 620 Hangul End 0x33 Hangul end, English start 621 Hangul Hanja 0x34 Start Hangul->Hanja Conversion 622 Hangul Jamo 0x35 Hangul Jamo mode 623 Hangul Romaja 0x36 Hangul Romaja mode 624 Hangul Code Input 0x37 Hangul code input mode 625 Hangul Jeonja 0x38 Jeonja mode 626 Hangul Banja 0x39 Banja mode 627 Hangul Pre-Hanja 0x3A Pre Hanja conversion 628 Hangul Post-Hanja 0x3B Post Hanja conversion 629 Hangul Single Candidate 0x3C Single candidate 630 Hangul Multiple Candidate 0x3D Multiple candidate 631 Hangul Previous Candidate 0x3E Previous candidate 632 Hangul Special 0x3F Special symbols 634 Figure 9: Virtual keys: Hangul (Korean) 636 0 1 2 3 637 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 639 |V|P| MBZ | Unicode codepoint or virtual keycode | 640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 642 Figure 10: One entry in the keyboard state list 644 Figure 10 illustrates the format of entries in the keyboard state 645 list. If the "V" bit is set, it indicates that the codepoint number 646 indicates a virtual keycode, otherwise it indicates a Unicode 647 codepoint. If the "P" bit is set, it indicates that it is the 648 version of a key from a separate keypad. If the server does not have 649 a concept of a separate keypad, or the key indicated does not appear 650 on a keypad, the "P" bit MAY be ignored. 652 The highest Unicode codepoint is 0x10ffff; thus, all Unicode 653 characters fit comfortably in the 24-bit codepoint field of the 654 keycode format. For non-virtual, non-keypad keys, for which the V 655 and P bits are both zero, the format is identical to a UTF-32BE 656 encoding of the same character. (The format can, thus, be considered 657 alternatively as UTF-32BE bitwise-or'd with 0x80000000 for "V" and 658 0x40000000 for "P".) 660 6.2 Pointer Position 662 The viewer indicates pointer position to the application host using 663 the media type video/pointer, defined in RFC 2862, the RTP Payload 664 Format for Real-Time Pointers [6]. 666 7. Implementation Notes 668 Application hosts shouldn't blindly send every screen update they 669 receive down the RTP channel. Instead, they should monitor the state 670 of their TCP transmission buffers (through mechanisms such as the 671 select() command) and only send the most recent screen data when 672 there is not a backlog. This will prevent screen latency for 673 rapidly-changing images, when a viewer usually only needs to see the 674 final state of the image. 676 To conserve bandwidth, application hosts SHOULD use PNG's palette or 677 grayscale image format wherever possible, and SHOULD use a minimal 678 palette and image bit depth, subject to encoding delay constraints. 679 In particular, two-color images, and one-color images drawn over the 680 existing image, SHOULD use a one-bit PNG with a two-entry palette, in 681 the latter case with a transparency chunk. 683 In window mode, application hosts SHOULD be aware of unshared local 684 windows on the host. If an unshared window obscures a shared window, 685 the application host SHOULD obscure its contents (through a mechanism 686 such as transmitting a neutral color) so that the viewer experience 687 reflects as closely as possible the experience on the host. 688 Application hosts MAY choose to treat portions of windows obscured by 689 other shared windows the same way. 691 8. Open issues 693 We need to determine what mechanism to recommend to secure the input 694 and output streams. The two logical possibilities would be Secure 695 RTP [13] and Transport-Layer Security (TLS) [14]. Either would work; 696 neither is currently specified for connection-oriented RTP (neither 697 TCP/RTP/SAVP nor TCP/TLS/RTP/AVP is defined). One or the other ought 698 to be recommended to facilitate interoperability. 700 It seems likely that "beep" needs to be defined specially, as an 701 output type, and needs to be defined separately from other audio 702 channels. Many systems allow beeps to be rendered visually, either 703 for accessibility for the deaf or because systems are being used in 704 quiet environments. 706 We need a name for the meta-protocol of Section 5.1. It's also 707 unclear whether it should have an application/* or video/* MIME type. 709 SDP doesn't normally allow you to send traffic with different 710 top-level MIME types over the same RTP channel. Do we need to add an 711 extension to work around this? Some of the RTP payloads described in 712 this document should pretty clearly be application/*, some should be 713 video/*, some (beeping) audio/*, PNG is already defined as image/png, 714 etc. 716 All the payload types need MIME-type assignments and RTP payload 717 characteristics (sample rate, use of marker bit, etc.) defined. 719 Is there anything useful that can be done with the MBZ bits of the 720 meta-protocol? 722 What other information needs to be carried in the window state 723 protocol? One particular concern is taskbar support. Window 724 information that might be carried to support taskbars includes the 725 window title, a list of minimized windows, and whether each window 726 should be listed in the taskbar. Taskbar support also requires an 727 input protocol to support taskbar actions (right-clicking on a 728 taskbar item): unminimize, maximize, close, etc. 730 Another flag that might be carried by the window state protocol is 731 whether a window is "taggable", i.e. whether it is a good candidate 732 for a viewer-side marker indicating that this is a shared 733 application. Top-level windows would typically get this, while 734 subsidiary windows such as dialog boxes would not. (This is a 735 feature of T.128.) 737 Do we need to define what an un-drawn-upon window looks like? T.128 738 seems to assume that windows are transparent until drawn to, and uses 739 this fact: a server can define a full-screen window on top of all 740 others, and draw directly to it. I don't think this is necessary, 741 but it's an important consideration. 743 Should pointer images be window-associated? Should there be a 744 pointer image cache? (T.128 has one; VNC doesn't.) 746 RFC 2862 only supports 12-bit positioning for the mouse pointer. I 747 believe this is already too small; screens wider than 4096 pixels 748 already exist, especially virtual desktops. Additionally, we want to 749 support additional mouse information, most notably mousewheels. A 750 protocol obsoleting RFC 2862 is probably in order. 752 Do we want to support a mechanism by which viewers can request a full 753 screen refresh, analogous with RFC 2032 [15]'s Full Intra-frame 754 Request (FIR) RTCP packet? 756 One other optimized pixel transmission operation that could be used 757 is the tiling operation: transmit one image and a number of times it 758 should be repeated horizontally and vertically. Would this be worth 759 the bandwidth/complexity tradeoff? (Note that it can be emulated 760 without too much overhead by the copy operation.) 762 Is it necessary for viewers and application hosts to be able to 763 negotiate the maximum supported size and color depth of pointers? 764 This would presumably be a format parameter on the pointer image 765 representation payload type. 767 Do we want to support multiple payloads in one RTP packet, either 768 reusing or inspired by RFC 2198 [16]? 770 How should "lock" key state (caps lock, num lock, scroll lock) be 771 represented in the keyboard input protocol? Should there be flags of 772 some sort? Should there be separate virtual keycodes for "lock key 773 depressed" and "lock state enabled"? Should this simply be handled 774 on the viewer side, altering the keycodes that are sent? (This works 775 for caps and num lock, not so well for scroll lock.) 777 9. Security Considerations 779 Both input and output data may be highly sensitive. For example, 780 input data may contain user passwords. Thus, encryption of all user 781 input is likely to be required. For some applications, such as 782 sharing slides during a public lecture, confidentiality for user 783 output may not be required. Given the broad set of applications, 784 viewers and application hosts MUST support or be able to leverage 785 end-to-end confidentiality and integrity protection mechanism. 787 Application sharing inherently exposes the shared applications to 788 risks by malicious participants. They may, for example, access 789 resources beyond the application itself, e.g., by installing or 790 running scripts. It may be difficult to constrain access to specific 791 user data, e.g., a specific set of slides, unless the user 792 application can be sandboxed or run in some kind of "jail", with the 793 sandbox control outside the view of the remoting protocol. 795 10. IANA Considerations 797 TODO; MIME type definitions for everything. 799 11. References 801 11.1 Normative References 803 [1] Handley, M., Jacobson, V. and C. Perkins, "SDP: Session 804 Description Protocol", draft-ietf-mmusic-sdp-new-21 (work in 805 progress), October 2004. 807 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 808 Levels", BCP 14, RFC 2119, March 1997. 810 [3] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, 811 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 812 RFC 3550, July 2003. 814 [4] Lazzaro, J., "Framing RTP and RTCP Packets over 815 Connection-Oriented Transport", 816 draft-ietf-avt-rtp-framing-contrans-03 (work in progress), July 817 2004. 819 [5] Duce, D., "Portable Network Graphics (PNG) Specification (Second 820 Edition)", W3C REC REC-PNG-20031110, November 2003. 822 [6] Civanlar, M. and G. Cash, "RTP Payload Format for Real-Time 823 Pointers", RFC 2862, June 2000. 825 [7] International Organization for Standardization, "Information 826 Technology - Universal Multiple-octet coded Character Set 827 (UCS)", ISO Standard 10646, December 2003. 829 11.2 Informative References 831 [8] International Telecommunication Union, "Data Protocols for 832 Multimedia Conferencing", ITU-T Recommendation T.120, July 833 1996. 835 [9] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 836 Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: 837 Session Initiation Protocol", RFC 3261, June 2002. 839 [10] Scheifler, R., "X Window System Protocol", X Consortium 840 Standard X Version 11, Release 6.7, November 2004. 842 [11] Schulzrinne, H., "Sharing and Remote Access to Applications", 843 draft-schulzrinne-mmusic-sharing-00 (work in progress), October 844 2004. 846 [12] Barnes, M. and C. Boulton, "A Framework for Centralized 847 Conferencing", draft-barnes-xcon-framework-00 (work in 848 progress), October 2004. 850 [13] Baugher, M., "The Secure Real-time Transport Protocol", 851 draft-ietf-avt-srtp-09 (work in progress), July 2003. 853 [14] Lennox, J., "Connection-Oriented Media Transport over the 854 Transport Layer Security (TLS) Protocol in the Session 855 Description Protocol (SDP)", draft-ietf-mmusic-comedia-tls-02 856 (work in progress), October 2004. 858 [15] Turletti, T., "RTP Payload Format for H.261 Video Streams", RFC 859 2032, October 1996. 861 [16] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, 862 M., Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP 863 Payload for Redundant Audio Data", RFC 2198, September 1997. 865 [17] International Telecommunication Union, "Multipoint Application 866 Sharing", ITU-T Recommendation T.128, February 1998. 868 Authors' Addresses 870 Jonathan Lennox 871 Columbia University Department of Computer Science 872 450 Computer Science 873 1214 Amsterdam Ave., M.C. 0401 874 New York, NY 10027 875 US 877 Phone: +1 212 939 7018 878 EMail: lennox@cs.columbia.edu 880 Henning Schulzrinne 881 Columbia University Department of Computer Science 882 450 Computer Science 883 1214 Amsterdam Ave., M.C. 0401 884 New York, NY 10027 885 US 887 Phone: +1 212 939 7004 888 EMail: hgs+mmusic@cs.columbia.edu 890 Jason Nieh 891 Columbia University Department of Computer Science 892 450 Computer Science 893 1214 Amsterdam Ave., M.C. 0401 894 New York, NY 10027 895 US 897 Phone: +1 212 939 7000 898 EMail: nieh@cs.columbia.edu 899 Ricardo Baratto 900 Columbia University Department of Computer Science 901 450 Computer Science 902 1214 Amsterdam Ave., M.C. 0401 903 New York, NY 10027 904 US 906 Phone: +1 212 939 7000 907 EMail: ricardo@cs.columbia.edu 909 Intellectual Property Statement 911 The IETF takes no position regarding the validity or scope of any 912 Intellectual Property Rights or other rights that might be claimed to 913 pertain to the implementation or use of the technology described in 914 this document or the extent to which any license under such rights 915 might or might not be available; nor does it represent that it has 916 made any independent effort to identify any such rights. Information 917 on the procedures with respect to rights in RFC documents can be 918 found in BCP 78 and BCP 79. 920 Copies of IPR disclosures made to the IETF Secretariat and any 921 assurances of licenses to be made available, or the result of an 922 attempt made to obtain a general license or permission for the use of 923 such proprietary rights by implementers or users of this 924 specification can be obtained from the IETF on-line IPR repository at 925 http://www.ietf.org/ipr. 927 The IETF invites any interested party to bring to its attention any 928 copyrights, patents or patent applications, or other proprietary 929 rights that may cover technology that may be required to implement 930 this standard. Please address the information to the IETF at 931 ietf-ipr@ietf.org. 933 Disclaimer of Validity 935 This document and the information contained herein are provided on an 936 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 937 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 938 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 939 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 940 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 941 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 943 Copyright Statement 945 Copyright (C) The Internet Society (2004). This document is subject 946 to the rights, licenses and restrictions contained in BCP 78, and 947 except as set forth therein, the authors retain all their rights. 949 Acknowledgment 951 Funding for the RFC Editor function is currently provided by the 952 Internet Society.