idnits 2.17.00 (12 Aug 2021) /tmp/idnits26432/draft-hoffman-utf16-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 585 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'MIME' is mentioned on line 493, but not defined ** Obsolete normative reference: RFC 2278 (ref. 'CHARSET-REG') (Obsoleted by RFC 2978) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' ** Obsolete normative reference: RFC 2279 (ref. 'UTF-8') (Obsoleted by RFC 3629) ** Downref: Normative reference to an Informational RFC: RFC 2130 (ref. 'WORKSHOP') Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Paul Hoffman 2 Internet Mail Consortium 3 April 19, 1999 Francois Yergeau 4 Alis Technologies 6 UTF-16, an encoding of ISO 10646 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with all 11 provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering Task 14 Force (IETF), its areas, and its working groups. Note that other groups 15 may also distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference material 20 or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 Copyright (C) The Internet Society (1999). All Rights Reserved. 30 1. Introduction 32 This document describes the UTF-16 encoding of Unicode/ISO-10646, 33 addresses the issues of serializing UTF-16 as an octet stream for 34 transmission over the Internet, defines MIME charset naming as 35 described in [CHARSET-REG], and contains the registration for three 36 MIME charset parameter values: UTF-16BE (big-endian), UTF-16LE 37 (little-endian), and UTF-16. 39 1.1 Background and motivation 41 The Unicode Standard [UNICODE], and ISO/IEC 10646 [ISO-10646] jointly 42 define a coded character set (CCS), hereafter referred to as Unicode, 43 which encompasses most of the world's writing systems [WORKSHOP]. 44 UTF-16, the object of this specification, is one of the standard ways 45 of encoding Unicode character data; it has the characteristics of 46 encoding all currently defined characters (in plane 0, the BMP) in 47 exactly two octets and of being able to encode all other characters 48 likely to be defined (the next 16 planes) in exactly four octets. 50 The Unicode Standard further defines additional character properties 51 and other application details of great interest to implementors. Up to 52 the present time, changes in Unicode and amendments to ISO/IEC 10646 53 have tracked each other, so that the character repertoires and code 54 point assignments have remained in sync. The relevant standardization 55 committees have committed to maintain this very useful synchronism, as 56 well as not to assign characters outside of the 17 planes accessible to 57 UTF-16. 59 The IETF policy on character sets and languages [CHARPOLICY] says that 60 IETF protocols MUST be able to use the UTF-8 charset [UTF-8]. Although 61 UTF-8 has many beneficial properties, such as the direct encoding of 62 US-ASCII characters, re-synchronization after loss of octets and 63 immunity to the byte-order issue (see 3.1 below), it is a 64 variable-width encoding and is less dense than UTF-16 for characters 65 whose values are between 0x0800 and 0xFFFF. Some products and network 66 standards already specify UTF-16, making it an important encoding for 67 the Internet. 69 1.2 Terminology 71 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 72 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 73 document are to be interpreted as described in RFC 2119 [MUSTSHOULD]. 75 Throughout this document, character values are shown in hexadecimal 76 notation. For example, "0x013C" is the character whose value is the 77 character assigned the integer value 316 (decimal) in the CCS. 79 2. UTF-16 definition 81 In ISO 10646, each character is assigned a number, which Unicode calls 82 the Unicode scalar value. This number is the same as the UCS-4 value of 83 the character, and this document will refer to it as the "character 84 value" for brevity. In the UTF-16 encoding, characters are represented 85 using either one or two unsigned 16-bit integers, depending on the 86 character value. Serialization of these integers for transmission as a 87 byte stream is discussed in Section 3. 89 The rules for how characters are encoded in UTF-16 are: 91 - Characters with values less than 0x10000 are represented as a single 92 16-bit integer with a value equal to that of the character number. 94 - Characters with values between 0x10000 and 0x10FFFF are represented 95 by a 16-bit integer with a value between 0xD800 and 0xDBFF (within 96 the so-called high-half zone or high surrogate area) followed by a 97 16-bit integer with a value between 0xDC00 and 0xDFFF (within the 98 so-called low-half zone or low surrogate area). 100 - Characters with values greater than 0x10FFFF cannot be encoded in 101 UTF-16. 103 2.1 Encoding UTF-16 105 Encoding of a single character from an ISO 10646 character value to 106 UTF-16 proceeds as follows. Let U be the character number, no greater 107 than 0x10FFFF. 109 1) If U < 0x10000, encode U as a 16-bit unsigned integer and terminate. 111 2) Let U' = U - 0x10000. Because U is less than or equal to 0x10FFFF, 112 U' must be less than or equal to 0xFFFFF. That is, U' can be 113 represented in 20 bits. 115 3) Initialize two 16-bit unsigned integers, W1 and W2, to 0xD800 and 116 0xDC00, respectively. These integers each have 10 bits free to encode 117 the character value, for a total of 20 bits. 119 4) Assign the 10 high-order bits of the 20-bit U' to the 10 low-order 120 bits of W1 and the 10 low-order bits of U' to the 10 low-order bits of 121 W2. Terminate. 123 Graphically, steps 2 through 4 look like: 124 U' = yyyyyyyyyyxxxxxxxxxx 125 W1 = 110110yyyyyyyyyy 126 W2 = 110111xxxxxxxxxx 128 2.2 Decoding UTF-16 130 Decoding of a single character from UTF-16 to an ISO 10646 character 131 value proceeds as follows. Let W1 be the next 16-bit integer in the 132 sequence of integers representing the text. Let W2 be the (eventual) 133 next integer following W1. 135 1) If W1 < 0xD800 or W1 > 0xDFFF, the character value U is the value of 136 W1. Terminate. 138 2) Determine if W1 is between 0xD800 and 0xDBFF. If not, the sequence 139 is in error and no valid character can be obtained using W1. Terminate. 141 3) If there is no W2 (that is, the sequence ends with W1), or if W2 is 142 not between 0xDC00 and 0xDFFF, the sequence is in error. Terminate. 144 4) Construct a 20-bit unsigned integer U', taking the 10 low-order bits 145 of W1 as its 10 high-order bits and the 10 low-order bits of W2 as its 146 10 low-order bits. 148 5) Add 0x10000 to U' to obtain the character value U. Terminate. 150 Note that steps 2 and 3 indicate errors. Error recovery is not 151 specified by this document. When terminating with an error in steps 2 152 and 3, it may be wise to set U to the value of W1 to help the caller 153 diagnose the error and not lose information. Also note that a string 154 decoding algorithm, as opposed to the single-character decoding 155 described above, need not terminate upon detection of an error, if 156 proper error reporting and/or recovery is provided. 158 3. Labelling UTF-16 text 160 This specification contains registration for three MIME charsets: 161 "UTF-16BE", "UTF-16LE", and "UTF-16". MIME charsets represent the 162 combination of a CCS and a CES. Here the CCS is Unicode/ISO 10646 and 163 the CES is the same in all three cases, except for the serialization 164 order of the octets in each character, and the external determination 165 of which serialization is used. 167 This section describes which of the three labels to apply to a stream 168 of text. Section 4 describes how to interpret the labels on a stream of 169 text. 171 3.1 Definition of big-endian and little-endian 173 Historically, computer hardware has processed two-octet entities such 174 as 16-bit integers in one of two ways. So-called "big-endian" hardware 175 handles two-octet entities with the higher-order octet first, that is 176 at the lower address in memory; when written out to disk or to a 177 network interface (serializing), the high-order octet thus appears 178 first in the data stream. On the other hand, "Little-endian" hardware 179 handles two-octet entities with the lower-order octet first. Hardware 180 of both kinds is common today. 182 For example, the unsigned 16-bit integer that represents the decimal 183 number 258 is 0x0102. The big-endian serialization of that number is 184 the octet 0x01 followed by the octet 0x02. The little-endian 185 serialization of that number is the octet 0x02 followed by the octet 186 0x01. The following C code fragment demonstrates a way to write 16-bit 187 quantities to a file in big-endian order, irrespective of the 188 hardware's native byte order. 190 void write_be(unsigned short u, FILE f) /* assume short is 16 bits */ 191 { 192 putc(u >> 8, f); /* output high-order byte */ 193 putc(u & 0xFF, f); /* then low-order */ 194 } 196 The term "network byte order" has been used in many RFCs to indicate 197 big-endian serialization, although that term has yet to be formally 198 defined in a standards-track document. Although ISO 10646 prefers 199 big-endian serialization (section 6.3 of [ISO-10646]), it is likely 200 that little-endian order will also be used on the Internet. 202 3.2 Byte order mark (BOM) 204 The Unicode Standard and ISO 10646 define the character "ZERO WIDTH 205 NON-BREAKING SPACE" (0xFEFF), which is also known informally as "BYTE 206 ORDER MARK" (abbreviated "BOM"). The latter name hints at a second 207 possible usage of the character, in addition to its normal use as a 208 genuine "ZERO WIDTH NON-BREAKING SPACE" within text. This usage, 209 suggested by Unicode section 2.4 and ISO 10646 Annex F (informative), 210 is to prepend a 0xFEFF character to a stream of Unicode characters as a 211 "signature"; a receiver of such a serialized stream may then use the 212 initial character both as a hint that the stream consists of Unicode 213 characters and as a way to recognize the serialization order. In 214 serialized UTF-16 prepended with such a signature, the order is 215 big-endian if the first two octets are 0xFE followed by 0xFF; if they 216 are 0xFF followed by 0xFE, the order is little-endian. Note that 0xFFFE 217 is not a Unicode character, precisely to preserve the usefulness of 218 0xFEFF as a byte-order mark. 220 It is important to understand that the character 0xFEFF appearing at 221 any position other than the beginning of a stream MUST be interpreted 222 with the semantics for the zero-width non-breaking space, and MUST NOT 223 be interpreted as a byte-order mark. The contrapositive of that 224 statement is not always true: the character 0xFEFF in the first 225 position of a stream MAY be interpreted as a zero-width non-breaking 226 space, and is not always a byte-order mark. For example, if a process 227 splits a UTF-16 string into many parts, a part might begin with 0xFEFF 228 because there was a zero-width non-breaking space at the beginning of 229 that substring. 231 The Unicode standard further suggests than an initial 0xFEFF character 232 may be stripped before processing the text, the rationale being that 233 such a character in initial position may be an artifact of the encoding 234 (an encoding signature), not a genuine intended "ZERO WIDTH 235 NON-BREAKING SPACE". Note that such stripping might affect an external 236 process at a different layer (such as a digital signature or a count of 237 the characters) that is relying on the presence of all characters in 238 the stream. 240 In particular, in UTF-16 plain text it is likely, but not certain, that 241 an initial 0xFEFF is a signature. When concatenating two strings, it is 242 important to strip out those signatures, because otherwise the 243 resulting string may contain an unintended "ZERO WIDTH NON-BREAKING 244 SPACE" at the connection point. Also, some specifications mandate an 245 initial 0xFEFF character in objects encoded in UTF-16 and specify that 246 this signature is not part of the object. 248 3.3 Choosing a label for UTF-16 text 250 Any labelling application that uses UTF-16 character encoding, and 251 explicitly labels the text, and knows the serialization order of the 252 characters in text, SHOULD label the text as either "UTF-16BE" or 253 "UTF-16LE", whichever is appropriate based on the endianness of the 254 text. This allows applications processing the text, but unable to look 255 inside the text, to know the serialization definitively. 257 Text in the "UTF-16BE" charset MUST be serialized with the octets which 258 make up a single 16-bit UTF-16 value in big-endian order. Systems 259 labelling UTF-16BE text MUST NOT prepend a BOM to the text. 261 Text in the "UTF-16LE" charset MUST be serialized with the octets which 262 make up a single 16-bit UTF-16 value in little-endian order. Systems 263 labelling UTF-16LE text MUST NOT prepend a BOM to the text. 265 Any labelling application that uses UTF-16 character encoding, and puts 266 an explicit charset label on the text, and does not know the 267 serialization order of the characters in text, MUST label the text as 268 "UTF-16", and SHOULD make sure the text starts with 0xFEFF. 270 An (unfortunate) exception to the "SHOULD" rule of using "UTF-16BE" or 271 "UTF-16LE" is that some document formats mandate a BOM in UTF-16 text, 272 thereby requiring the use of the "UTF-16" tag only. 274 4. Interpreting text labels 276 When a program sees text labelled as "UTF-16BE", "UTF-16LE", or 277 "UTF-16", it can make some assumptions, based on the labelling rules 278 given in the previous section. These assumptions allow the program to 279 then process the text. 281 4.1 Interpreting text labelled as UTF-16BE 283 Text labelled "UTF-16BE" can always be interpreted as being big-endian. 284 The detection of an initial BOM does not affect de-serialization of 285 text labelled as UTF-16BE. Finding 0xFF followed by 0xFE is an error 286 since there is no Unicode character 0xFFFE. 288 4.2 Interpreting text labelled as UTF-16LE 290 Text labelled "UTF-16LE" can always be interpreted as being 291 little-endian. The detection of an initial BOM does not affect 292 de-serialization of text labelled as UTF-16LE. Finding 0xFE followed by 293 0xFF is an error since there is no Unicode character 0xFFFE, which 294 would be the interpretation of those octets under little-endian order. 296 4.3 Interpreting text labelled as UTF-16 298 Text labelled with the "UTF-16" charset might be serialized in either 299 big-endian or little-endian order. If the first two octets of the text 300 is 0xFE followed by 0xFF, then the text can be interpreted as being 301 big-endian. If the first two octets of the text is 0xFF followed by 302 0xFE, then the text can be interpreted as being little-endian. If the 303 first two octets of the text is not 0xFE followed by 0xFF, and is not 304 0xFF followed by 0xFE, then the text SHOULD be interpreted as being 305 big-endian. 307 All applications that process text with the "UTF-16" charset label MUST 308 be able to read at least the first two octets of the text and be able 309 to process those octets in order to determine the serialization order 310 of the text. Applications that process text with the "UTF-16" charset 311 label MUST NOT assume the serialization without first checking the 312 first two octets to see if they are a big-endian BOM, a little-endian 313 BOM, or not a BOM. All applications that process text with the "UTF-16" 314 charset label MUST be able to interpret both big-endian and 315 little-endian text. 317 5. Examples 319 For the sake of example, let's suppose that there is a hieroglyphic 320 character representing the Egyptian god Ra with character value 321 0x00012345 (this character does not exist at present in Unicode). 323 The examples here all evaluate to the phrase: 325 *=Ra 327 where the "*" represents the Ra hieroglyph (0x00012345). 329 Text labelled with UTF-16BE, without a BOM: 330 D8 08 DF 45 00 3D 00 52 00 61 332 Text labelled with UTF-16LE, without a BOM: 333 08 D8 45 DF 3D 00 52 00 61 00 335 Big-endian text labelled with UTF-16, with a BOM: 336 FE FF D8 08 DF 45 00 3D 00 52 00 61 338 Little-endian text labelled with UTF-16, with a BOM: 339 FF FE 08 D8 45 DF 3D 00 52 00 61 00 341 6. Versions of the standards 343 ISO/IEC 10646 is updated from time to time by published amendments; 344 similarly, different versions of the Unicode standard exist: 1.0, 1.1, 345 2.0, and 2.1 as of this writing. Each new version replaces the 346 previous one, but implementations, and more significantly data, are not 347 updated instantly. 349 In general, the changes amount to adding new characters, which does not 350 pose particular problems with old data. Amendment 5 to ISO/IEC 10646, 351 however, has moved and expanded the Korean Hangul block, thereby making 352 any previous data containing Hangul characters invalid under the new 353 version. Unicode 2.0 has the same difference from Unicode 1.1. The 354 official justification for allowing such an incompatible change was 355 that no significant implementations and data containing Hangul existed, 356 a statement that is likely to be true but remains unprovable. The 357 incident has been dubbed the "Korean mess", and the relevant committees 358 have pledged to never, ever again make such an incompatible change. 360 New versions, and in particular any incompatible changes, have 361 consequences regarding MIME character encoding labels, to be discussed 362 in Appendix A. 364 7. Security considerations 366 UTF-16 is based on the ISO 10646 character set, which is frequently 367 being added to, as described in Section 6 and Appendix A of this 368 document. Processors must be able to handle characters that are not 369 defined at the time that the processor was created in such a way as to 370 not allow an attacker to harm a recipient by including unknown 371 characters. 373 Processors that handle any type of text, including text encoded as 374 UTF-16, must be vigilant in checking for control characters that might 375 reprogram a display terminal or keyboard. Similarly, processors that 376 interpret text entities (such as looking for embedded programming 377 code), must be careful not to execute the code without first alerting 378 the recipient. 380 Text in UTF-16 may contain special characters, such as the OBJECT 381 REPLACEMENT CHARACTER (0xFFFC), that might cause external processing, 382 depending on the interpretation of the processing program and the 383 availability of an external data stream that would be executed. This 384 external processing may have side-effects that allow the sender of a 385 message to attack the receiving system. 387 Implementors of UTF-16 need to consider the security aspects of how 388 they handle illegal UTF-16 sequences (that is, sequences involving 389 surrogate pairs that have illegal values or unpaired surrogates). It is 390 conceivable that in some circumstances an attacker would be able to 391 exploit an incautious UTF-16 parser by sending it an octet sequence 392 that is not permitted by the UTF-16 syntax, causing it to behave in 393 some anomalous fashion. 395 8. References 397 [CHARPOLICY] Alvestrand, H., "IETF Policy on Character Sets and 398 Languages", BCP 18, RFC 2277, January 1998. 400 [CHARSET-REG] Freed, N., and J. Postel, "IANA Charset Registration 401 Procedures", BCP 19, RFC 2278, January 1998. 403 [HTTP-1.1] Fielding, R., et. al., "Hypertext Transfer Protocol -- 404 HTTP/1.1", RFC 2068, January 1997. 406 [ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Information 407 technology -- Universal Multiple-Octet Coded Character Set (UCS) -- 408 Part 1: Architecture and Basic Multilingual Plane. Twelve amendments 409 and two technical corrigenda have been published up to now. UTF-16 is 410 described in Annex Q, published as Amendment 1. Many other amendments 411 are currently at various stages of standardization. 413 [MUSTSHOULD] Bradner, S., "Key words for use in RFCs to Indicate 414 Requirement Levels", BCP 14, RFC 2119, March 1997. 416 [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version 417 2.1", Unicode Technical Report #8. 419 [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC 420 2279, January 1998. 422 [WORKSHOP] Weider, C., et. al., "Report of the IAB Character Set 423 Workshop", RFC 2130, April 1997. 425 9. Acknowledgments 427 Deborah Goldsmith wrote a great deal of the initial wording for this 428 specification. Martin Duerst proposed numerous significant changes. 429 Other significant contributors include: 431 Mati Allouche 432 Walt Daniels 433 Mark Davis 434 Ned Freed 435 Asmus Freytag 436 Lloyd Honomichl 437 Dan Kegel 438 Murata Makoto 439 Larry Masinter 440 Markus Scherer 441 Ken Whistler 443 Some of the text in this specification was copied from [UTF-8], and 444 that document was worked on by many people. Please see the 445 acknowledgments section in that document for more people who may have 446 contributed indirectly to this document. 448 10. Changes between draft -02 and -03 450 1: Reorganized the sections. Added information about two octets being 451 enough for all current characters and the committees saying they will 452 not go beyond what can be defined in UTF-16. 454 2.1: Reworded step 2 with words to make it easier to read. 456 2.2: Added "U" to step 1. Also added note to the end of the last 457 paragraph about string decoding and errors. 459 3: Added a reference to section 4 about interpreting labels. 461 3.1: Reworded last sentence in last paragraph. 463 4.3: Added requirement that apps that can read UTF-16 must be able to 464 interpret both big-endian and little-endian. 466 5: Corrected the examples due to wrong encoding. 468 11: Moved author's addresses to Appendix B. 470 A. Charset registrations 472 This memo is meant to serve as the basis for registration of three MIME 473 charsets [CHARSET-REG]. The proposed charsets are "UTF-16BE", 474 "UTF-16LE", and "UTF-16". These strings label objects containing text 475 consisting of characters from the repertoire of ISO/IEC 10646 including 476 all amendments at least up to amendment 5 (Korean block), encoded to a 477 sequence of octets using the encoding and serialization schemes 478 outlined above. 480 Note that "UTF-16BE", "UTF-16LE", and "UTF-16" are NOT suitable for use 481 in media types under the "text" top-level type, because they do not 482 encode line endings in the way required for MIME "text" media types. An 483 exception to this is HTTP, which uses a MIME-like mechanism, but is 484 exempt from the restrictions on the text top-level type (see section 485 19.4.1 of HTTP 1.1 [HTTP-1.1]). 487 It is noteworthy that the labels described here do not contain a 488 version identification, referring generically to ISO/IEC 10646. This is 489 intentional, the rationale being as follows: 491 A MIME charset is designed to give just the information needed to 492 interpret a sequence of bytes received on the wire into a sequence of 493 characters, nothing more (see RFC 2045, section 2.2, in [MIME]). As 494 long as a character set standard does not change incompatibly, version 495 numbers serve no purpose, because one gains nothing by learning from 496 the tag that newly assigned characters may be received that one doesn't 497 know about. The tag itself doesn't teach anything about the new 498 characters, which are going to be received anyway. 500 Hence, as long as the standards evolve compatibly, the apparent 501 advantage of having labels that identify the versions is only that, 502 apparent. But there is a disadvantage to such version-dependent 503 labels: when an older application receives data accompanied by a newer, 504 unknown label, it may fail to recognize the label and be completely 505 unable to deal with the data, whereas a generic, known label would have 506 triggered mostly correct processing of the data, which may well not 507 contain any new characters. 509 The "Korean mess" (ISO/IEC 10646 amendment 5) is an incompatible 510 change, in principle contradicting the appropriateness of a version 511 independent MIME charset as described above. But the compatibility 512 problem can only appear with data containing Korean Hangul characters 513 encoded according to Unicode 1.1 (or equivalently ISO/IEC 10646 before 514 amendment 5), and there is arguably no such data to worry about, this 515 being the very reason the incompatible change was deemed acceptable. 517 In practice, then, a version-independent label is warranted, provided 518 the label is understood to refer to all versions after Amendment 5, and 519 provided no incompatible change actually occurs. Should incompatible 520 changes occur in a later version of ISO/IEC 10646, the MIME charsets 521 defined here will stay aligned with the previous version until and 522 unless the IETF specifically decides otherwise. 524 A.1 Registration for UTF-16BE 526 To: ietf-charsets@iana.org 527 Subject: Registration of new charset 529 Charset name(s): UTF-16BE 531 Published specification(s): This specification 533 Suitable for use in MIME content types under the 534 "text" top-level type: No 536 Person & email address to contact for further information: 537 Paul Hoffman 538 Francois Yergeau 540 A.2 Registration for UTF-16LE 542 To: ietf-charsets@iana.org 543 Subject: Registration of new charset 545 Charset name(s): UTF-16LE 547 Published specification(s): This specification 549 Suitable for use in MIME content types under the 550 "text" top-level type: No 552 Person & email address to contact for further information: 553 Paul Hoffman 554 Francois Yergeau 556 A.3 Registration for UTF-16 558 To: ietf-charsets@iana.org 559 Subject: Registration of new charset 561 Charset name(s): UTF-16 563 Published specification(s): This specification 565 Suitable for use in MIME content types under the 566 "text" top-level type: No 568 Person & email address to contact for further information: 569 Paul Hoffman 570 Francois Yergeau 572 B. Authors' address 574 Paul Hoffman 575 Internet Mail Consortium 576 127 Segre Place 577 Santa Cruz, CA 95060 USA 578 phoffman@imc.org 580 Francois Yergeau 581 Alis Technologies 582 100, boul. Alexis-Nihon, Suite 600 583 Montreal QC H4M 2P2 Canada 584 fyergeau@alis.com