idnits 2.17.00 (12 Aug 2021) /tmp/idnits25893/draft-hoffman-utf16-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Found some kind of copyright notice around line 27 but it does not match any copyright boilerplate known by this tool. Expected boilerplate is as follows today (2022-05-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 513 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 93 instances of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'MIME' is mentioned on line 436, but not defined ** Obsolete normative reference: RFC 2278 (ref. 'CHARSET-REG') (Obsoleted by RFC 2978) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-10646' ** Obsolete normative reference: RFC 2279 (ref. 'UTF-8') (Obsoleted by RFC 3629) -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' Summary: 12 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Paul Hoffman 2 Internet Mail Consortium 3 December 13, 1998 Francois Yergeau 4 Alis Technologies 6 UTF-16, an encoding of ISO 10646 8 Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working documents 11 of the Internet Engineering Task Force (IETF), its areas, and its working 12 groups. Note that other groups may also distribute working documents as 13 Internet- Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six months. 16 Internet-Drafts may be updated, replaced, or obsoleted by other documents 17 at any time. It is not appropriate to use Internet-Drafts as reference 18 material or to cite them other than as a "working draft" or "work in 19 progress". 21 To view the entire list of current Internet-Drafts, please check the 22 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 23 Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), 24 ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), 25 ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). 27 Copyright (C) The Internet Society (1998). All Rights Reserved. 29 1. Introduction 31 This document specifies the UTF-16 encoding of Unicode/ISO-10646 and 32 contains the registration for three MIME charset parameter values: 33 UTF-16BE, UTF-16LE, and UTF-16. 35 1.1 Background 37 The Unicode Standard [UNICODE], and ISO/IEC 10646 [ISO-10646] jointly 38 define a coded character set (CCS), hereafter referred to as Unicode, which 39 encompasses most of the world's writing systems. UTF-16, the object of this 40 specification, is a character encoding scheme (CES) of Unicode that has the 41 characteristics of encoding the vast majority of currently-defined 42 characters in exactly two octets and of being able to encode all other 43 characters that will be defined in exactly four octets. 45 The Unicode Standard further defines additional character properties and 46 other application details of great interest to implementors. Up to the 47 present time, changes in Unicode and amendments to ISO/IEC 10646 have 48 tracked each other, so that the character repertoires and code point 49 assignments have remained in sync. The relevant standardization committees 50 have committed to maintain this very useful synchronism. 52 1.2 Motivation 54 The UTF-8 transformation of Unicode is described in [UTF-8]. The IETF 55 policy on character sets and languages, [CHARPOLICY], says that IETF 56 protocols MUST be able to use the UTF-8 charset. However, relative to 57 UTF-16, UTF-8 imposes a space penalty for characters whose values are 58 greater than 0x0800. Also, characters represented in UTF-8 have varying 59 sizes. Using UTF-16 provides a way to transmit character data that is 60 mostly uniform in size. Some products and network standards already specify 61 UTF-16. (Note, however, that UTF-8 has many other advantages over UTF-16 in 62 many protocols, such as the direct encoding of US-ASCII characters and 63 re-synchronization after loss of octets.) 65 UTF-16 is a format that allows encoding the first 17 planes of ISO 10646 as 66 a sequence of 16-bit quantities. This document addresses the issues of 67 serializing UTF-16 as an octet stream for transmission over the Internet 68 and of MIME charset naming as described in [CHARSET-REG]. 70 1.3 Terminology 72 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 73 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 74 document are to be interpreted as described in RFC 2119 [MUSTSHOULD]. 76 Throughout this document, character values are shown in hexadecimal 77 notation. For example, "0x013C" is the character whose value is the 78 character assigned the integer value 316 (decimal) in the CCS. 80 2. UTF-16 definition 82 In ISO 10646, each character is assigned a number, which Unicode calls the 83 Unicode scalar value. This number is the same as the UCS-4 value of the 84 character, and this document will refer to it as the "character value" for 85 brevity. In the UTF-16 encoding, characters are represented using either 86 one or two unsigned 16-bit integers, depending on the character value. 87 Serialization of these integers for transmission as a byte stream is 88 discussed in Section 3. 90 The rules for how characters are encoded in UTF-16 are: 92 - Characters with values less than 0x10000 are represented as a single 93 16-bit integer with a value equal to that of the character number. 95 - Characters with values between 0x10000 and 0x10FFFF are represented by a 96 16-bit integer with a value between 0xD800 and 0xDBFF (within the 97 so-called high-half zone or high surrogate area) followed by a 16-bit 98 integer with a value between 0xDC00 and 0xDFFF (within the so-called 99 low-half zone or low surrogate area). 101 - Characters with values greater than 0x10FFFF cannot be encoded in 102 UTF-16. 104 2.1 Encoding UTF-16 106 Encoding of a single character proceeds as follows. Let U be the character 107 number, no greater than 0x10FFFF. 109 1) If U < 0x10000, encode U as a 16-bit unsigned integer and terminate. 111 2) Let U' = U - 0x10000. Note that because U <= 0x10FFFF, U' <= 0xFFFFF, 112 that is, U' can be represented in 20 bits. 114 3) Initialize two 16-bit unsigned integers, W1 and W2, to 0xD800 and 115 0xDC00, respectively. These integers each have 10 bits free to encode the 116 character value, for a total of 20 bits. 118 4) Assign the 10 high-order bits of the 20-bit U' to the 10 low-order bits 119 of W1 and the 10 low-order bits of U' to the 10 low-order bits of W2. 120 Terminate. 122 Graphically, steps 2 through 4 look like: 123 U' = yyyyyyyyyyxxxxxxxxxx 124 W1 = 110110yyyyyyyyyy 125 W2 = 110111xxxxxxxxxx 127 2.2 Decoding UTF-16 129 Decoding of a single character proceeds as follows. Let W1 be the next 130 16-bit integer in the sequence of integers representing the text. Let W2 be 131 the (eventual) next integer following W1. 133 1) If W1 < 0xD800 or W1 > 0xDFFF, the character value is the value of W1. 134 Terminate. 136 2) Determine if W1 is between 0xD800 and 0xDBFF. If not, the sequence is in 137 error and no valid character can be obtained using W1. Terminate. 139 3) If there is no W2 (that is, the sequence ends with W1), or if W2 is not 140 between 0xDC00 and 0xDFFF, the sequence is in error. Terminate. 142 4) Construct a 20-bit unsigned integer U', taking the 10 low-order bits of 143 W1 as its 10 high-order bits and the 10 low-order bits of W2 as its 10 144 low-order bits. 146 5) Add 0x10000 to U' to obtain the character value U. Terminate. 148 Note that steps 2 and 3 indicate errors. Error recovery is not specified by 149 this document. 151 3. Serialization of characters 153 3.1 Definition of big-endian and little-endian 155 Historically, computer hardware has processed two-octet entities such as 156 16-bit integers in one of two ways. So-called "big-endian" hardware handles 157 two-octet entities with the higher-order octet first, that is at the lower 158 address in memory; when written out to disk or to a network interface 159 (serializing), the high-order octet thus appears first in the data stream. 160 "Little-endian" hardware handles two-octet entities with the lower-order 161 octet first. Most modern hardware is little-endian, but there are many 162 current examples of big-endian hardware. 164 For example, the unsigned 16-bit integer that represents the decimal number 165 258 is 0x0102. The big-endian serialization of that number is the octet 166 0x01 followed by the octet 0x02. The little-endian serialization of that 167 number is the octet 0x02 followed by the octet 0x01. 169 The term "network byte order" has been used in many RFCs to indicate 170 big-endian serialization, although that term has never been formally 171 defined in a standards-track document. ISO 10646 prefers big-endian 172 serialization (section 6.3 of [ISO-10646]), but it is nonetheless 173 considered likely that little-endian order will also be used on the 174 Internet. 176 This specification thus contains registration for three charsets: 177 "UTF-16BE", "UTF-16LE", and "UTF-16". The character encoding schemes these 178 charsets use are identical except for the serialization order of the octets 179 in each character, and the external determination of which serialization is 180 used. 182 The Unicode Standard and ISO 10646 define the character "ZERO WIDTH 183 NON-BREAKING SPACE" (0xFEFF), which is also known informally as "BYTE ORDER 184 MARK" (abbreviated "BOM"). The latter name hints at a second possible usage 185 of the character, in addition to its normal use as a genuine "ZERO WIDTH 186 NON-BREAKING SPACE" within text. This usage, suggested by Unicode section 187 2.4 and ISO 10646 Annex F (informative), is to prepend a 0xFEFF character 188 to a stream of Unicode characters as a "signature"; a receiver of such a 189 serialized stream may then use the initial character both as a hint that 190 the stream consists of Unicode characters and as a way to recognize the 191 serialization order. In serialized UTF-16 prepended with such a signature, 192 the order is big-endian if the first two octets are 0xFE followed by 0xFF; 193 if they are 0xFF followed by 0xFE, the order is little-endian. Note that 194 0xFFFE is not a Unicode character, precisely to preserve the usefulness of 195 0xFEFF as a byte-order mark. 197 It is important to understand that the character 0xFEFF appearing at any 198 position other than the beginning of a stream MUST be interpreted with the 199 semantics for the zero-width non-breaking space, and MUST NOT be 200 interpreted as a byte-order mark. The contrapositive of that statement is 201 not always true: the character 0xFEFF in the first position of a stream MAY 202 be interpreted as a zero-width non-breaking space, and is not always a 203 byte-order mark. 205 The Unicode standard further suggests than an initial 0xFEFF character may 206 be stripped before processing the text, the rationale being that such a 207 character in initial position may be an artifact of the encoding (an 208 encoding signature), not a genuine intended "ZERO WIDTH NON-BREAKING 209 SPACE". Nevertheless, such stripping MUST NOT take place before any 210 MIME-related operations (such as hash algorithms, digest, or byte-count 211 computations) have been completed. Such operations depend on the exact 212 bytes of the data, which therefore may not be modified in any way. After 213 all MIME-related operations have been completed (for instance after a MIME 214 processor has handed an entity to a specific media type processor), an 215 initial 0xFEFF MAY be removed if appropriate, although this will prevent 216 later comparison with the original MIME object. In particular, in UTF-16 217 plain text it is likely that an initial 0xFEFF is a signature; when 218 concatenating two strings, it is important to strip out those signatures, 219 for otherwise the resulting string may contain an unintended "ZERO WIDTH 220 NON-BREAKING SPACE" at the connection point. Also, some specifications 221 mandate an initial 0xFEFF character in objects encoded in UTF-16 and 222 specify that this signature is not part of the object. 224 3.2 Serialization in UTF-16BE 226 Text in the "UTF-16BE" charset MUST be serialized with the octets which 227 make up a single 16-bit UTF-16 value in big-endian order. The detection of 228 an initial BOM does not affect de-serialization of text labelled as 229 UTF-16BE. Finding 0xFF follwed by 0xFE is an error since there is no 230 Unicode character 0xFFFE. 232 3.3 Serialization in UTF-16LE 234 Text in the "UTF-16LE" charset MUST be serialized with the octets which 235 make up a single 16-bit UTF-16 value in little-endian order. The detection 236 of an initial BOM does not affect de-serialization of text labelled as 237 UTF-16LE. Finding 0xFE folled by 0xFF is an error since there is no Unicode 238 character 0xFFFE, which is the interpretation of the 0xFEFF character under 239 little-endian order. 241 3.4 Serialization in UTF-16 243 Text in the "UTF-16" charset MAY be serialized in either big-endian or 244 little-endian order. If the first two octets of the text is 0xFE followed 245 by 0xFF, then the text MUST be big-endian. If the first two octets of the 246 text is 0xFF followed by 0xFE, then the text MUST be little-endian. If the 247 first two octets of the text is not 0xFE followed by 0xFF and is not 0xFF 248 followed by 0xFE, then the text MUST be big-endian. Big-endian text in the 249 "UTF-16" charset MAY start with the 0xFEFF character, but the 0xFEFF 250 character is not required. 252 All applications that process text in the "UTF-16" charset MUST be able to 253 read at least the first two octets of the text and be able to process those 254 octets in order to determine the serialization of the text. Applications 255 that use the "UTF-16" charset parameter value MUST NOT assume the 256 serialization without first checking the first two octets to see if they 257 are a big-endian BOM or a little-endian BOM or not a BOM. 259 4. Choosing a charset 261 Any labelling application that uses UTF-16 character encoding, and puts an 262 explicit charset label on the text, and knows the serialization of the 263 characters in text, MUST label the text as either "UTF-16BE" or "UTF-16LE", 264 whichever is appropriate. This allows applications that are processing the 265 text that are not able to look inside the text to know the serialization 266 definitively. 268 Any labelling application that uses UTF-16 character encoding, and puts an 269 explicit charset label on the text, and does not know the serialization of 270 the characters in text, MUST label the text as "UTF-16", and SHOULD be sure 271 the text starts with 0xFEFF. An application processing text that is 272 labelled with the "UTF-16" charset parameter value knows that the 273 serialization cannot be determined without looking inside the text itself. 274 Fortunately, the processing application needs to only look at the first 275 character (the first two octets) of the text to determine the 276 serialization. 278 Because creating text labelled as being in the "UTF-16" charset forces the 279 recipient to read and understand the first character of the text object, a 280 text-creating program SHOULD create text labelled as "UTF-16BE" or 281 "UTF-16LE" if possible. Text-creating programs that create text using 282 UTF-16 encoding SHOULD emit big-endian text if possible. 284 5. Examples 286 For the sake of example, let's suppose that there is a hieroglyphic 287 character representing the Egyptian god Ra with character value 0x00012345 288 (this character does not exist at present in Unicode). 290 The examples here all evaluate to the phrase: 292 *=Ra 294 where the "*" represents the Ra hieroglyph (0x00012345). 296 Text that is labelled with UTF-16BE, with no BOM: 297 D8 48 DF 45 00 3D 00 52 00 61 299 Text that is labelled with UTF-16BE, with a BOM: 300 FE FF D8 48 DF 45 00 3D 00 52 00 61 302 Text that is labelled with UTF-16LE, with no BOM: 303 48 D8 45 DF 3D 00 52 00 61 00 305 Little-endian text that is labelled with UTF-16: 306 FF FE 48 D8 45 DF 3D 00 52 00 61 00 308 6. Versions of the standards 310 ISO/IEC 10646 is updated from time to time by published amendments; 311 similarly, different versions of the Unicode standard exist: 1.0, 1.1, 2.0, 312 and 2.1 as of this writing. Each new version obsoletes and replaces the 313 previous one, but implementations, and more significantly data, are not 314 updated instantly. 316 In general, the changes amount to adding new characters, which does not 317 pose particular problems with old data. Amendment 5 to ISO/IEC 10646, 318 however, has moved and expanded the Korean Hangul block, thereby making any 319 previous data containing Hangul characters invalid under the new version. 320 Unicode 2.0 has the same difference from Unicode 1.1. The official 321 justification for allowing such an incompatible change was that no 322 implementations and no data containing Hangul existed, a statement that is 323 likely to be true but remains unprovable. The incident has been dubbed the 324 "Korean mess", and the relevant committees have pledged to never, ever 325 again make such an incompatible change. 327 New versions, and in particular any incompatible changes, have consequences 328 regarding MIME character encoding labels, to be discussed in Appendix A. 330 7. Security considerations 332 UTF-16 is based on the ISO 10646 character set, which is frequently being 333 added to, as described in Section 6 and Appendix A of this document. 334 Processors must be able to handle characters that are not defined at the 335 time that the processor was created in such a way as to not allow an 336 attacker to harm a recipient by including unknown characters. 338 Processors that handle any type of text, including text encoded as UTF-16, 339 must be vigilant in checking for control characters that might reprogram a 340 display terminal or keyboard. Similarly, processors that interpret text 341 entities (such as looking for embedded programming code), must be careful 342 not to execute the code without first alerting the recipient. 344 Text in UTF-16 may contain special characters, such as the OBJECT 345 REPLACEMENT CHARACTER (0xFFFC), that might cause external processing, 346 depending on the interpretation of the processing program and the 347 availability of an external data stream that would be executed. This 348 external processing may have side-effects that allow the sender of a 349 message to attack the receiving system. 351 Implementors of UTF-16 need to consider the security aspects of how they 352 handle illegal UTF-16 sequences (that is, sequences involving surrogate 353 pairs that have illegal values). It is conceivable that in some 354 circumstances an attacker would be able to exploit an incautious UTF-16 355 parser by sending it an octet sequence that is not permitted by the UTF-16 356 syntax, causing it to behave in some anomalous fashion. 358 8. References 360 [CHARSET-REG] Freed, N., and J. Postel, "IANA Charset Registration 361 Procedures", BCP 19, RFC 2278, January 1998. 363 [ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Information 364 technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: 365 Architecture and Basic Multilingual Plane. Twelve amendments and two 366 technical corrigenda have been published up to now. UTF-16 is described in 367 Annex Q, published as Amendment 1. Many other amendments are currently at 368 various stages of standardization. 370 [MUSTSHOULD] Bradner, S., "Key words for use in RFCs to Indicate 371 Requirement Levels", BCP 14, RFC 2119, March 1997. 373 [CHARPOLICY] Alvestrand, H., "IETF Policy on Character Sets and Languages", 374 BCP 18, RFC 2277, January 1998. 376 [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC 377 2279, January 1998. 379 [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version 2.1", 380 Unicode Technical Report #8. 382 9. Acknowledgments 384 Deborah Goldsmith wrote a great deal of the initial wording for this 385 specification. Other significant contributors include: 387 Mati Allouche 388 Walt Daniels 389 Mark Davis 390 Martin Duerst 391 Ned Freed 392 Asmus Freytag 393 Lloyd Honomichl 394 Dan Kegel 395 Murata Makoto 396 Ken Whistler 398 Some of the text in this specification was copied from [UTF-8], and that 399 document was worked on by many people. Please see the acknowledgements 400 section in that document for more people who may have contributed 401 indirectly to this document. 403 10. Authors' address 405 Paul Hoffman 406 Internet Mail Consortium 407 127 Segre Place 408 Santa Cruz, CA 95060 USA 409 phoffman@imc.org 411 Francois Yergeau 412 Alis Technologies 413 100, boul. Alexis-Nihon, Suite 600 414 Montreal QC H4M 2P2 Canada 415 fyergeau@alis.com 417 A. Charset registrations 419 This memo is meant to serve as the basis for registration of three MIME 420 charsets [CHARSET-REG]. The proposed charsets are "UTF-16BE", "UTF-16LE", 421 and "UTF-16". These strings label objects containing text consisting of 422 characters from the repertoire of ISO/IEC 10646 including all amendments at 423 least up to amendment 5 (Korean block), encoded to a sequence of octets 424 using the encoding and serialization schemes outlined above. 426 Note that "UTF-16BE", "UTF-16LE", and "UTF-16" are NOT suitable for use in 427 media types under the "text" top-level type, because they do not encode 428 line endings in the way required for MIME "text" media types. 430 It is noteworthy that the labels described here do not contain a version 431 identification, referring generically to ISO/IEC 10646. This is 432 intentional, the rationale being as follows: 434 A MIME charset is designed to give just the information needed to interpret 435 a sequence of bytes received on the wire into a sequence of characters, 436 nothing more (see RFC 2045, section 2.2, in [MIME]). As long as a character 437 set standard does not change incompatibly, version numbers serve no 438 purpose, because one gains nothing by learning from the tag that newly 439 assigned characters may be received that one doesn't know about. The tag 440 itself doesn't teach anything about the new characters, which are going to 441 be received anyway. 443 Hence, as long as the standards evolve compatibly, the apparent advantage 444 of having labels that identify the versions is only that, apparent. But 445 there is a disadvantage to such version-dependent labels: when an older 446 application receives data accompanied by a newer, unknown label, it may 447 fail to recognize the label and be completely unable to deal with the data, 448 whereas a generic, known label would have triggered mostly correct 449 processing of the data, which may well not contain any new characters. 451 The "Korean mess" (ISO/IEC 10646 amendment 5) is an incompatible change, in 452 principle contradicting the appropriateness of a version independent MIME 453 charset as described above. But the compatibility problem can only appear 454 with data containing Korean Hangul characters encoded according to Unicode 455 1.1 (or equivalently ISO/IEC 10646 before amendment 5), and there is 456 arguably no such data to worry about, this being the very reason the 457 incompatible change was deemed acceptable. 459 In practice, then, a version-independent label is warranted, provided the 460 label is understood to refer to all versions after Amendment 5, and 461 provided no incompatible change actually occurs. Should incompatible 462 changes occur in a later version of ISO/IEC 10646, the MIME charsets 463 defined here will stay aligned with the previous version until and unless 464 the IETF specifically decides otherwise. 466 A.1 Registration for UTF-16BE 468 To: ietf-charsets@iana.org 469 Subject: Registration of new charset 471 Charset name(s): UTF-16BE 473 Published specification(s): This specification 475 Suitable for use in MIME content types under the 476 "text" top-level type: No 478 Person & email address to contact for further information: 479 Paul Hoffman 480 Francois Yergeau 482 A.2 Registration for UTF-16LE 484 To: ietf-charsets@iana.org 485 Subject: Registration of new charset 487 Charset name(s): UTF-16LE 489 Published specification(s): This specification 491 Suitable for use in MIME content types under the 492 "text" top-level type: No 494 Person & email address to contact for further information: 495 Paul Hoffman 496 Francois Yergeau 498 A.3 Registration for UTF-16 500 To: ietf-charsets@iana.org 501 Subject: Registration of new charset 503 Charset name(s): UTF-16 505 Published specification(s): This specification 507 Suitable for use in MIME content types under the 508 "text" top-level type: No 510 Person & email address to contact for further information: 511 Paul Hoffman 512 Francois Yergeau