idnits 2.17.00 (12 Aug 2021) /tmp/idnits396/draft-ietf-822ext-mime-imb-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2022-05-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([RFC-MIME-HEADERS]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 445: '... in accordance with this document MUST...' RFC 2119 keyword, line 656: '... standard values MUST be documented, r...' RFC 2119 keyword, line 1058: '... MAY be represented as the US-AS...' RFC 2119 keyword, line 1063: '... Octets with values of 9 and 32 MAY be...' RFC 2119 keyword, line 1065: '...espectively, but MUST NOT be so repres...' (9 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1575 has weird spacing: '...of text is "p...' == Line 2045 has weird spacing: '...F (line break...' == Line 2767 has weird spacing: '...ed, the defau...' == Line 3564 has weird spacing: '... no inter...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 21, 1994) is 10042 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC-MIME-HEADERS' on line 4051 looks like a reference -- Missing reference section? 'RFC-822' on line 3955 looks like a reference -- Missing reference section? 'RFC-821' on line 3951 looks like a reference -- Missing reference section? 'ATK' on line 3900 looks like a reference -- Missing reference section? 'X400' on line 4064 looks like a reference -- Missing reference section? 'RFC-1049' on line 3969 looks like a reference -- Missing reference section? 'RFC-REG' on line 4056 looks like a reference -- Missing reference section? 'RFC-1341' on line 3978 looks like a reference -- Missing reference section? 'RFC-1342' on line 3984 looks like a reference -- Missing reference section? 'RFC-1521' on line 4018 looks like a reference -- Missing reference section? 'RFC-1522' on line 4024 looks like a reference -- Missing reference section? 'RFC-1123' on line 288 looks like a reference -- Missing reference section? 'RFC-1344' on line 3989 looks like a reference -- Missing reference section? 'RFC-1345' on line 3993 looks like a reference -- Missing reference section? 'RFC-1524' on line 4029 looks like a reference -- Missing reference section? 'RFC-1563' on line 4034 looks like a reference -- Missing reference section? 'RFC-1652' on line 4038 looks like a reference -- Missing reference section? 'RFC-1421' on line 3997 looks like a reference -- Missing reference section? 'ISO-646' on line 3925 looks like a reference -- Missing reference section? 'US-ASCII' on line 4060 looks like a reference -- Missing reference section? 'ISO-8859' on line 3913 looks like a reference -- Missing reference section? 'GIF' on line 3904 looks like a reference -- Missing reference section? 'PCM' on line 3935 looks like a reference -- Missing reference section? 'MPEG' on line 3930 looks like a reference -- Missing reference section? 'POSTSCRIPT' on line 3939 looks like a reference -- Missing reference section? 'POSTSCRIPT2' on line 3943 looks like a reference -- Missing reference section? 'RFC-959' on line 3964 looks like a reference -- Missing reference section? 'RFC-783' on line 3947 looks like a reference -- Missing reference section? 'RFC821' on line 3135 looks like a reference -- Missing reference section? 'RFC1421' on line 3671 looks like a reference -- Missing reference section? 'ISO-2022' on line 3908 looks like a reference -- Missing reference section? 'RFC-934' on line 3959 looks like a reference -- Missing reference section? 'RFC-1154' on line 3973 looks like a reference -- Missing reference section? 'RFC-1422' on line 4003 looks like a reference -- Missing reference section? 'RFC-1423' on line 4008 looks like a reference -- Missing reference section? 'RFC-1424' on line 4013 looks like a reference -- Missing reference section? 'RFC-1700' on line 4046 looks like a reference Summary: 10 errors (**), 0 flaws (~~), 5 warnings (==), 39 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Nathaniel Borenstein 2 Internet Draft Ned Freed 3 5 Multipurpose Internet Mail Extensions 6 (MIME) Part One: 8 Format of Internet Message Bodies 10 November 21, 1994 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are 15 working documents of the Internet Engineering Task Force 16 (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months. Internet-Drafts may be updated, replaced, or obsoleted 22 by other documents at any time. It is not appropriate to use 23 Internet-Drafts as reference material or to cite them other 24 than as a "working draft" or "work in progress". 26 To learn the current status of any Internet-Draft, please 27 check the 1id-abstracts.txt listing contained in the 28 Internet-Drafts Shadow Directories on ds.internic.net (US East 29 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), 30 or munnari.oz.au (Pacific Rim). 32 1. Abstract 34 STD 11, RFC 822 defines a message representation protocol 35 specifying considerable detail about message headers, but 36 which leaves the message content, or message body, as flat 37 US-ASCII text. This document redefines the format of message 38 bodies to allow multi-part textual and non-textual message 39 bodies to be represented and exchanged without loss of 40 information. This is based on earlier work documented in RFC 41 934, STD 11, and RFC 1049, but extends and revises them. 42 Because RFC 822 said so little about message bodies, this 43 document is largely orthogonal to (rather than a revision of) 44 RFC 822. 46 In particular, this document is designed to provide facilities 47 to include multiple parts in a single message, to represent 48 body text in character sets other than US-ASCII, to represent 49 formatted multi-font text messages, to represent non-textual 50 material such as images and audio fragments, and generally to 51 facilitate later extensions defining new types of Internet 52 mail for use by cooperating mail agents. 54 This document does NOT extend Internet mail header fields to 55 permit anything other than US-ASCII text data. Such 56 extensions are the subject of [RFC-MIME-HEADERS]. 58 This document is a revision of RFC 1521, which was a revision 59 of RFC 1341. Significant differences from RFC 1521 are 60 summarized in Appendix G. 62 2. Table of Contents 64 1 Abstract .............................................. 2 65 2 Table of Contents ..................................... 3 66 3 Introduction .......................................... 5 67 4 Notations, Conventions, and Generic BNF Grammar ....... 9 68 5 MIME Header Fields .................................... 12 69 5.1 MIME-Version Header Field ........................... 12 70 5.2 Content-Type Header Field ........................... 14 71 5.2.1 Syntax of the Content-Type Header Field ........... 15 72 5.2.2 Definition of a Top-Level Content-Type ............ 18 73 5.2.3 Initial Set of Top-Level Content-Types ............ 18 74 5.3 Content-Transfer-Encoding Header Field .............. 21 75 5.3.1 Content-Transfer-Encoding Syntax .................. 21 76 5.3.2 Content-Transfer-Encoding Semantics ............... 22 77 5.3.3 Quoted-Printable Content-Transfer-Encoding ........ 26 78 5.3.4 Base64 Content-Transfer-Encoding .................. 30 79 5.4 Content-ID Header Field ............................. 32 80 5.5 Content-Description Header Field .................... 33 81 5.6 Additional MIME Header Fields ....................... 33 82 6 Predefined Content-Type Values ........................ 34 83 6.1 Discrete Content-Type Values ........................ 34 84 6.1.1 Text Content-Type ................................. 34 85 6.1.1.1 Representation of Line Breaks ................... 35 86 6.1.1.2 Charset Parameter ............................... 35 87 6.1.1.3 Plain Subtype ................................... 38 88 6.1.1.4 Unrecognized Subtypes ........................... 38 89 6.1.2 Image Content-Type ................................ 39 90 6.1.3 Audio Content-Type ................................ 39 91 6.1.4 Video Content-Type ................................ 40 92 6.1.5 Application Content-Type .......................... 40 93 6.1.5.1 Octet-Stream Subtype ............................ 41 94 6.1.5.2 PostScript Subtype .............................. 42 95 6.1.5.3 Other Application Subtypes ...................... 45 96 6.2 Composite Content-Type Values ....................... 46 97 6.2.1 Multipart Content-Type ............................ 46 98 6.2.1.1 Common Syntax ................................... 48 99 6.2.1.2 Handling Nested Messages and Multiparts ......... 53 100 6.2.1.3 Mixed Subtype ................................... 53 101 6.2.1.4 Alternative Subtype ............................. 53 102 6.2.1.5 Digest Subtype .................................. 56 103 6.2.1.6 Parallel Subtype ................................ 57 104 6.2.1.7 Other Multipart Subtypes ........................ 57 105 6.2.2 Message Content-Type .............................. 57 106 6.2.2.1 RFC822 Subtype .................................. 58 107 6.2.2.2 Partial Subtype ................................. 58 108 6.2.2.2.1 Message Fragmentation and Reassembly .......... 59 109 6.2.2.2.2 Fragmentation and Reassembly Example .......... 60 110 6.2.2.3 External-Body Subtype ........................... 62 111 6.2.2.3.1 General External-Body Parameters .............. 64 112 6.2.2.3.2 The 'ftp' and 'tftp' Access-Types ............. 65 113 6.2.2.3.3 The 'anon-ftp' Access-Type .................... 66 114 6.2.2.3.4 The 'local-file' Access-Type .................. 66 115 6.2.2.3.5 The 'mail-server' Access-Type ................. 66 116 6.2.2.3.6 Examples and Further Explanations ............. 67 117 6.2.2.4 Other Message Subtypes .......................... 70 118 7 Experimental Content-Type Values ...................... 71 119 8 Summary ............................................... 72 120 9 Security Considerations ............................... 73 121 10 Authors' Addresses ................................... 74 122 11 Acknowledgements ..................................... 75 123 A MIME Conformance ...................................... 77 124 B Guidelines For Sending Email Data ..................... 80 125 C A Complex Multipart Example ........................... 83 126 D Collected Grammar ..................................... 85 127 F Summary of the Seven Content-types .................... 88 128 G Canonical Encoding Model .............................. 91 129 H Changes from RFC 1521 ................................. 94 130 I References ............................................ 97 131 3. Introduction 133 Since its publication in 1982, RFC 822 [RFC-822] has defined 134 the standard format of textual mail messages on the Internet. 135 Its success has been such that the RFC 822 format has been 136 adopted, wholly or partially, well beyond the confines of the 137 Internet and the Internet SMTP transport defined by RFC 821 138 [RFC-821]. As the format has seen wider use, a number of 139 limitations have proven increasingly restrictive for the user 140 community. 142 RFC 822 was intended to specify a format for text messages. 143 As such, non-text messages, such as multimedia messages that 144 might include audio or images, are simply not mentioned. Even 145 in the case of text, however, RFC 822 is inadequate for the 146 needs of mail users whose languages require the use of 147 character sets richer than US-ASCII. Since RFC 822 does not 148 specify mechanisms for mail containing audio, video, Asian 149 language text, or even text in most European languages, 150 additional specifications are needed. 152 One of the notable limitations of RFC 821/822 based mail 153 systems is the fact that they limit the contents of electronic 154 mail messages to relatively short lines of 7-bit US-ASCII. 155 This forces users to convert any non-textual data that they 156 may wish to send into seven-bit bytes representable as 157 printable US-ASCII characters before invoking a local mail UA 158 (User Agent, a program with which human users send and receive 159 mail). Examples of such encodings currently used in the 160 Internet include pure hexadecimal, uuencode, the 3-in-4 base 161 64 scheme specified in RFC 1421, the Andrew Toolkit 162 Representation [ATK], and many others. 164 The limitations of RFC 822 mail become even more apparent as 165 gateways are designed to allow for the exchange of mail 166 messages between RFC 822 hosts and X.400 hosts. X.400 [X400] 167 specifies mechanisms for the inclusion of non-textual body 168 parts within electronic mail messages. The current standards 169 for the mapping of X.400 messages to RFC 822 messages specify 170 either that X.400 non-textual body parts must be converted to 171 (not encoded in) IA5Text format, or that they must be 172 discarded, notifying the RFC 822 user that discarding has 173 occurred. This is clearly undesirable, as information that a 174 user may wish to receive is lost. Even though a user agent 175 may not have the capability of dealing with the non-textual 176 body part, the user might have some mechanism external to the 177 UA that can extract useful information from the body part. 178 Moreover, it does not allow for the fact that the message may 179 eventually be gatewayed back into an X.400 message handling 180 system (i.e., the X.400 message is "tunneled" through Internet 181 mail), where the non-textual information would definitely 182 become useful again. 184 This document describes several mechanisms that combine to 185 solve most of these problems without introducing any serious 186 incompatibilities with the existing world of RFC 822 mail. In 187 particular, it describes: 189 (1) A MIME-Version header field, which uses a version 190 number to declare a message to be conformant with this 191 specification and allows mail processing agents to 192 distinguish between such messages and those generated 193 by older or non-conformant software, which are presumed 194 to lack such a field. 196 (2) A Content-Type header field, generalized from RFC 1049 197 [RFC-1049], which can be used to specify the type and 198 subtype of data in the body of a message and to fully 199 specify the native representation (encoding) of such 200 data. 202 (3) A Content-Transfer-Encoding header field, which can be 203 used to specify an auxiliary encoding that was applied 204 to the data in order to allow it to pass through mail 205 transport mechanisms which may have data or character 206 set limitations. 208 (4) Two additional header fields that can be used to 209 further describe the data in a body, the Content-ID and 210 Content-Description header fields. 212 All of these header fields defined in this document are 213 subject to the general syntactic rules for header fields 214 specified in RFC 822. In particular, all of these header 215 fields can include RFC 822 comments, which have no semantic 216 content and should be ignored during MIME processing. 218 The generalized Content-Type header field values can be used 219 to identify both discrete and composite bodies. The following 220 types of discrete bodies are currently defined: 222 (1) A "text" Content-Type value, which can be used to 223 represent textual information in a number of character 224 sets and formatted text description languages in a 225 standardized manner. 227 (2) An "image" Content-Type value, for transmitting still 228 image (picture) data. 230 (3) An "audio" Content-Type value, for transmitting audio 231 or voice data. 233 (4) A "video" Content-Type value, for transmitting video or 234 moving image data, possibly with audio as part of the 235 composite video data format. 237 (5) An "application" Content-Type value, which can be used 238 to transmit application data or binary data, and hence, 239 among other uses, to implement an electronic mail file 240 transfer service. 242 Two types of composite bodies are currently defined: 244 (1) A "multipart" Content-Type value, which can be used to 245 combine several body parts, possibly of differing types 246 of data, into a single message. 248 (2) A "message" Content-Type value, for encapsulating 249 another message or part of a message. 251 MIME's Content-Type mechanism has been carefully designed to 252 be extensible, and it is expected that the set of content- 253 type/subtype pairs and their associated parameters will grow 254 significantly with time. Several other MIME entities, most 255 notably the list of the name of character sets registered for 256 MIME usage, are likely to have new values defined over time. 257 In order to ensure that the set of such values is developed in 258 an orderly, well-specified, and public manner, MIME sets up a 259 registration process which uses the Internet Assigned Numbers 260 Authority (IANA) as a central registry for MIME's extension 261 areas. The registration process is described in RFC REG [RFC- 262 REG]. 264 Finally, to specify and promote interoperability, Appendix A 265 of this document provides a basic applicability statement for 266 a subset of the above mechanisms that defines a minimal level 267 of "conformance" with this document. 269 HISTORICAL NOTE: Several of the mechanisms described in this 270 document may seem somewhat strange or even baroque at first 271 reading. It is important to note that compatibility with 272 existing standards AND robustness across existing practice 273 were two of the highest priorities of the working group that 274 developed this document. In particular, compatibility was 275 always favored over elegance. 277 MIME was first defined and published as RFC 1341 [RFC-1341] 278 and RFC1342 [RFC-1342], then revised in RFC 1521 [RFC-1521] 279 and RFC 1522 [RFC-1522]. This document is a relatively minor 280 updating of RFC 1521, and is intended to supersede it. The 281 companion document RFC MIME-HEADERS [RFC-MIME-HEADERS] in turn 282 supersedes RFC 1522. 284 The differences between this document and RFC 1521 are 285 summarized in Appendix G. Please refer to the current edition 286 of the "IAB Official Protocol Standards" for the 287 standardization state and status of this protocol. RFC 822 and 288 RFC 1123 [RFC-1123] also provide essential background for MIME 289 since no conforming implementation of MIME can violate them. 290 In addition, several other informational RFC documents will be 291 of interest to the MIME implementor, in particular RFC 1344 292 [RFC-1344], RFC 1345 [RFC-1345], and RFC 1524 [RFC-1524]. 294 4. Notations, Conventions, and Generic BNF Grammar 296 Although the mechanisms specified in this document are all 297 described in prose, most are also described formally in the 298 augmented BNF notation of RFC 822. Implementors will need to 299 be familiar with this notation in order to understand this 300 specification, and are referred to RFC 822 for a complete 301 explanation of the augmented BNF notation. 303 Some of the augmented BNF in this document makes reference to 304 syntactic entities that are defined in RFC 822 and not in this 305 document. A complete formal grammar, then, is obtained by 306 Appendix D of this document, the collected grammar, with the 307 BNF of RFC 822 plus the modifications to RFC 822 defined in 308 RFC 1123, which specifically changes the syntax for `return', 309 `date' and `mailbox'. 311 The term CRLF, in this document, refers to the sequence of the 312 two US-ASCII characters CR (decimal value 13) and LF (decimal 313 value 10) which, taken together, in this order, denote a line 314 break in RFC 822 mail. 316 The term "character set" is used in this document to refer to 317 a method used with one or more tables to convert a sequence of 318 octets into a sequence of characters. Note that unconditional 319 conversion in the other direction is not required, in that not 320 all characters may be available in a given character set and a 321 character set may provide more than one sequence of octets to 322 represent a particular character. This definition is intended 323 to allow various kinds of character encodings, from simple 324 single-table mappings such as US-ASCII to complex table 325 switching methods such as those that use ISO 2022's 326 techniques. However, the definition associated with a MIME 327 character set name must fully specify the mapping to be 328 performed from octets to characters. In particular, use of 329 external profiling information to determine the exact mapping 330 is not permitted. 332 The term "message", when not further qualified, means either 333 the (complete or "top-level") message being transferred on a 334 network, or a message encapsulated in a body part of type 335 "message". 337 The term "body part", in this document, refers to either the a 338 single part message or one of the parts in the body of a 339 multipart entity. A body part has a header and a body, so it 340 makes sense to speak about the body of a body part. 342 The term "entity", in this document, means either a message or 343 a body part. All kinds of entities share the property that 344 they have a header and a body. 346 The term "body", when not further qualified, means the body of 347 an entity, that is the body of either a message or of a body 348 part. 350 NOTE: The previous four definitions are clearly circular. 351 This is unavoidable, since the overall structure of a MIME 352 message is indeed recursive. 354 "7bit data" refers to data that is all represented as short 355 lines of US-ASCII. CR (decimal value 13) and LF (decimal 356 value 10) characters only occur as part of CRLF line 357 separation sequences and no NULs (US-ASCII value 0) are 358 allowed. 360 (1) "8bit data" refers to data that is all represented as 361 short lines, but there may be non-US-ASCII characters 362 (octets with the high-order bit set) present. As with 363 "7bit data" CR and LF characters only occur as part of 364 CRLF line separation sequences and no NULs are allowed. 366 (2) "Binary data" refers to data where any sequence of 367 octets whatsoever is allowed. 369 "Lines" are defined as sequences of octets separated by a CRLF 370 sequences. This is consistent with both RFC 821 and RFC 822. 371 Lines in MIME bodies must also be terminated with a CRLF, but 372 the terminating CRLF on the last line of the body may properly 373 be part of a subsequent boundary marker rather than being part 374 of the body itself. 376 In this document, all numeric and octet values are given in 377 decimal notation. All Content-Type values, subtypes, and 378 parameter names as defined in this document are case- 379 insensitive. However, parameter values are case-sensitive 380 unless otherwise specified for the specific parameter. 382 FORMATTING NOTE: Notes, such at this one, provide additional 383 nonessential information which may be skipped by the reader 384 without missing anything essential. The primary purpose of 385 these non-essential notes is to convey information about the 386 rationale of this document, or to place this document in the 387 proper historical or evolutionary context. Such information 388 may in particular be skipped by those who are focused entirely 389 on building a conformant implementation, but may be of use to 390 those who wish to understand why certain design choices were 391 made. 393 5. MIME Header Fields 395 MIME defines a number of new RFC 822 header fields that are 396 used to describe the content of messages. These header fields 397 occur in two contexts: 399 (1) As part of a regular RFC 822 message header. 401 (2) In a MIME body part header within a multipart 402 construct. 404 The formal definition of these header fields is as follows: 406 MIME-message-headers := fields 407 version CRLF 408 [ content CRLF ] 409 [ encoding CRLF ] 410 [ id CRLF ] 411 [ description CRLF ] 412 *( mime-extension-field CRLF ) 413 ; The ordering of the header 414 ; fields implied by this BNF 415 ; definition should be ignored 417 MIME-part-headers := [ content CRLF ] 418 [ encoding CRLF ] 419 [ id CRLF ] 420 [ description CRLF ] 421 *( mime-extension-field CRLF ) 422 ; The ordering of the header 423 ; fields implied by this BNF 424 ; definition should be ignored 426 The syntax of the various specific MIME header fields will be 427 described in the following sections. 429 5.1. MIME-Version Header Field 431 Since RFC 822 was published in 1982, there has really been 432 only one format standard for Internet messages, and there has 433 been little perceived need to declare the format standard in 434 use. This document is an independent document that 435 complements RFC 822. Although the extensions in this document 436 have been defined in such a way as to be compatible with RFC 437 822, there are still circumstances in which it might be 438 desirable for a mail-processing agent to know whether a 439 message was composed with the new standard in mind. 441 Therefore, this document defines a new header field, "MIME- 442 Version", which is to be used to declare the version of the 443 Internet message body format standard in use. 445 Messages composed in accordance with this document MUST 446 include such a header field, with the following verbatim text: 448 MIME-Version: 1.0 450 The presence of this header field is an assertion that the 451 message has been composed in compliance with this document. 453 Since it is possible that a future document might extend the 454 message format standard again, a formal BNF is given for the 455 content of the MIME-Version field: 457 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 459 Thus, future format specifiers, which might replace or extend 460 "1.0", are constrained to be two integer fields, separated by 461 a period. If a message is received with a MIME-version value 462 other than "1.0", it cannot be assumed to conform with this 463 specification. 465 Note that the MIME-Version header field is required at the top 466 level of a message. It is not required for each body part of 467 a multipart entity. It is required for the embedded headers 468 of a body of type "message" if and only if the embedded 469 message is itself claimed to be MIME-conformant. 471 It is not possible to fully specify how a mail reader that 472 conforms with MIME as defined in this document should treat a 473 message that might arrive in the future with some value of 474 MIME-Version other than "1.0". 476 It is also worth noting that version control for specific 477 content-types is not accomplished using the MIME-Version 478 mechanism. In particular, some formats (such as 479 application/postscript) have version numbering conventions 480 that are internal to the document format. Where such 481 conventions exist, MIME does nothing to supersede them. Where 482 no such conventions exist, a MIME type might use a "version" 483 parameter in the content-type field if necessary. 485 NOTE TO IMPLEMENTORS: When checking MIME-Version values any 486 RFC 822 comment strings that are present must be ignored. In 487 particular, the following four MIME-Version fields are 488 equivalent: 490 MIME-Version: 1.0 492 MIME-Version: 1.0 (produced by MetaSend Vx.x) 494 MIME-Version: (produced by MetaSend Vx.x) 1.0 496 MIME-Version: 1.(produced by MetaSend Vx.x)0 498 5.2. Content-Type Header Field 500 The purpose of the Content-Type field is to describe the data 501 contained in the body fully enough that the receiving user 502 agent can pick an appropriate agent or mechanism to present 503 the data to the user, or otherwise deal with the data in an 504 appropriate manner. 506 HISTORICAL NOTE: The Content-Type header field was first 507 defined in RFC 1049. RFC 1049 Content-types used a simpler 508 and less powerful syntax, but one that is largely compatible 509 with the mechanism given here. 511 The Content-Type header field is used to specify the nature of 512 the data in the body of an entity, by giving type and subtype 513 identifiers, and by providing auxiliary information that may 514 be required for certain types. After the type and subtype 515 names, the remainder of the header field is simply a set of 516 parameters, specified in an attribute/value notation. The 517 ordering of parameters is not significant. 519 In general, the top-level Content-Type is used to declare the 520 general type of data, while the subtype specifies a specific 521 format for that type of data. Thus, a Content-Type of 522 "image/xyz" is enough to tell a user agent that the data is an 523 image, even if the user agent has no knowledge of the specific 524 image format "xyz". Such information can be used, for 525 example, to decide whether or not to show a user the raw data 526 from an unrecognized subtype -- such an action might be 527 reasonable for unrecognized subtypes of text, but not for 528 unrecognized subtypes of image or audio. For this reason, 529 registered subtypes of text, image, audio, and video should 530 not contain embedded information that is really of a different 531 type. Such compound formats should be represented using the 532 "multipart" or "application" types. 534 Parameters are modifiers of the content-subtype, and as such 535 do not fundamentally affect the nature of the content. The set 536 of meaningful parameters depends on the content-type and 537 subtype. Most parameters are associated with a single specific 538 subtype. However, a given top-level content-type may define 539 parameters which are applicable to any subtype of that type. 540 For example, the "charset" parameter is applicable to any 541 subtype of "text", while the "boundary" parameter is required 542 for any subtype of the "multipart" content-type. 544 There are NO globally-meaningful parameters that apply to all 545 content-types. Truly global mechanisms are best addressed, in 546 the MIME model, by the definition of additional Content-* 547 header fields. 549 An initial set of seven top-level Content-Types is defined by 550 this document. Five of these are discrete types whose content 551 is essentially opaque as far as MIME processing is concerned. 552 The remaining two are composite types whose contents require 553 additional handling by MIME processors. 555 This set of top-level Content-Types is intended to be 556 substantially complete. It is expected that additions to the 557 larger set of supported types can generally be accomplished by 558 the creation of new subtypes of these initial types. In the 559 future, more top-level types may be defined only by a 560 standards-track extension to this standard. If another top- 561 level type is to be used for any reason, it must be given a 562 name starting with "X-" to indicate its non-standard status 563 and to avoid a potential conflict with a future official name. 565 5.2.1. Syntax of the Content-Type Header Field 567 In the Augmented BNF notation of RFC 822, a Content-Type 568 header field value is defined as follows: 570 content := "Content-Type" ":" type "/" subtype 571 *(";" parameter) 572 ; Matching of type and subtype is 573 ; ALWAYS case-insensitive 575 type := discrete-type / composite-type 577 discrete-type := "text" / "image" / "audio" / "video" / 578 "application" / extension-token 580 composite-type := "message" / "multipart" / extension-token 582 extension-token := iana-token / ietf-token / x-token 584 iana-token := 588 ietf-token := 592 x-token := 595 subtype := extension-token 597 parameter := attribute "=" value 599 attribute := token 601 value := token / quoted-string 603 token := 1* 606 tspecials := "(" / ")" / "<" / ">" / "@" / 607 "," / ";" / ":" / "\" / <"> 608 "/" / "[" / "]" / "?" / "=" 609 ; Must be in quoted-string, 610 ; to use within parameter values 612 Note that the definition of "tspecials" is the same as the RFC 613 822 definition of "specials" with the addition of the three 614 characters "/", "?", and "=", and the removal of ".". 616 Note also that a subtype specification is MANDATORY -- it may 617 not be omitted from a Content-Type header field. As such, 618 there are no default subtypes. 620 The type, subtype, and parameter names are not case sensitive. 621 For example, TEXT, Text, and TeXt are all equivalent top-level 622 Content Types. Parameter values are normally case sensitive, 623 but sometimes are interpreted in a case-insensitive fashion, 624 depending on the intended use. (For example, multipart 625 boundaries are case-sensitive, but the "access-type" parameter 626 for message/External-body is not case-sensitive.) 628 Note that the value of a quoted string parameter does not 629 include the quotes. That is, the quotation marks in a 630 quoted-string are not a part of the value of the parameter, 631 but are merely used to delimit that parameter value. In 632 addition, comments are allowed in accordance with RFC 822 633 rules for structured header fields. Thus the following two 634 forms 636 Content-type: text/plain; charset=us-ascii (Plain text) 638 Content-type: text/plain; charset="us-ascii" 640 are completely equivalent. 642 Beyond this syntax, the only syntactic constraint on the 643 definition of subtype names is the desire that their uses must 644 not conflict. That is, it would be undesirable to have two 645 different communities using "Content-Type: application/foobar" 646 to mean two different things. The process of defining new 647 content-subtypes, then, is not intended to be a mechanism for 648 imposing restrictions, but simply a mechanism for publicizing 649 the usages. There are, therefore, two acceptable mechanisms 650 for defining new Content-Type subtypes: 652 (1) Private values (starting with "X-") may be defined 653 bilaterally between two cooperating agents without 654 outside registration or standardization. 656 (2) New standard values MUST be documented, registered 657 with, and approved by IANA, as described in RFC REG. 659 5.2.2. Definition of a Top-Level Content-Type 661 The definition of a top-level content-type consists of: 663 (1) a name and a description of the type, including 664 criteria for whether a particular type would qualify 665 under that type, 667 (2) the names and definitions of parameters, if any, which 668 are defined for all subtypes of that type (including 669 whether such parameters are required or optional), 671 (3) how a user agent and/or gateway should handle unknown 672 subtypes of this type, 674 (4) general considerations on gatewaying objects of this 675 top-level type, if any, and 677 (5) any restrictions on content-transfer-encodings for 678 objects of this top-level type. 680 5.2.3. Initial Set of Top-Level Content-Types 682 The initial seven standard top-level Content-Types are 683 detailed in the bulk of this document. The five discrete top- 684 level Content-Types are: 686 (1) text -- textual information. The subtype "plain" in 687 particular indicates plain (unformatted) text. No 688 special software is required to get the full meaning of 689 the text, aside from support for the indicated 690 character set. Other subtypes are to be used for 691 enriched text in forms where application software may 692 enhance the appearance of the text, but such software 693 must not be required in order to get the general idea 694 of the content. Possible subtypes thus include any 695 word processor format that can be read without 696 resorting to software that understands the format. In 697 particular, formats that employ embeddded binary 698 formatting information are not considered directly 699 readable. A very simple and portable subtype, richtext, 700 was defined in RFC 1341 [RFC-1341], with a further 701 revision in RFC 1563 [RFC-1563] under the name 702 "enriched". 704 (2) image -- image data. Image requires a display device 705 (such as a graphical display, a graphics printer, or a 706 FAX machine) to view the information. Initial subtypes 707 are defined for two widely-used image formats, jpeg and 708 gif. 710 (3) audio -- audio data. Audio requires an audio output 711 device (such as a speaker or a telephone) to "display" 712 the contents. An initial subtype "basic" is defined in 713 this document. 715 (4) video -- video data. Video requires the capability to 716 display moving images, typically including specialized 717 hardware and software. An initial subtype "mpeg" is 718 defined in this document. 720 (5) application -- some other kind of data, typically 721 either uninterpreted binary data or information to be 722 processed by a mail-based application. The subtype 723 "octet-stream" is to be used in the case of 724 uninterpreted binary data, in which case the simplest 725 recommended action is to offer to write the information 726 into a file for the user. The "PostScript" subtype is 727 also defined for the transport of PostScript material. 728 Other expected uses for "application" include 729 spreadsheets, data for mail-based scheduling systems, 730 and languages for "active" (computational) email, and 731 word processing formats that are not directly readable. 732 Note that security considerations may exist for some 733 types of application data, most notably 734 application/PostScript and any form of active mail. 735 These issues are discussed later in this document. 737 The two composite top-level Content-Types are: 739 (1) multipart -- data consisting of multiple parts of 740 independent data types. Four subtypes are initially 741 defined, including the basic "mixed" subtype specifying 742 a generic mixed set of parts, "alternative" for 743 representing the same data in multiple formats, 744 "parallel" for parts intended to be viewed 745 simultaneously, and "digest" for multipart entities in 746 which each part is of type "message". 748 (2) message -- an encapsulated message. A body of 749 Content-Type "message" is itself all or part of some 750 kind of message object. Such objects may in turn 751 contain other messages and body parts of their own. 752 The "rfc822" subtype is used when the encpsulated 753 content is itself an RFC 822 message. The "partial" 754 subtype is defined for partial RFC 822 messages, to 755 permit the fragmented transmission of bodies that are 756 thought to be too large to be passed through mail 757 transport facilities in one piece. Another subtype, 758 "external-body", is defined for specifying large bodies 759 by reference to an external data source. 761 Default RFC 822 messages without a MIME Content-Type header 762 are taken by this protocol to be plain text in the US-ASCII 763 character set, which can be explicitly specified as: 765 Content-type: text/plain; charset=us-ascii 767 This default is assumed if no Content-Type is specified. In 768 the presence of a MIME-Version header field, a receiving User 769 Agent can also assume that plain US-ASCII text was the 770 sender's intent. Plain US-ASCII text must still be assumed in 771 the absence of a MIME-Version specification, but the sender's 772 intent might have been otherwise. 774 RATIONALE: In the absence of any Content-Type header field or 775 MIME-Version header field, it is impossible to be certain that 776 a message is actually text in the US-ASCII character set, 777 since it might well be a message that, using some set of 778 nonstandard conventions that predate this document, includes 779 text in another character set or non-textual data in a manner 780 that cannot be automatically recognized (e.g., a uuencoded 781 compressed UNIX tar file). Although there is no fully 782 acceptable alternative to treating such untyped messages as 783 "text/plain; charset=us-ascii", implementors should remain 784 aware that if a message lacks both the MIME-Version and the 785 Content-Type header fields, it may in practice contain almost 786 anything. 788 It should be noted that the list of Content-Type values given 789 here may be augmented in time, via the mechanisms described 790 above, and that the set of subtypes is expected to grow 791 substantially. 793 When a mail reader encounters mail with an unknown Content- 794 type value, it should generally treat it as equivalent to 795 "application/octet-stream", as described later in this 796 document. 798 5.3. Content-Transfer-Encoding Header Field 800 Many Content-Types which could be usefully transported via 801 email are represented, in their "natural" format, as 8-bit 802 character or binary data. Such data cannot be transmitted over 803 some transport protocols. For example, RFC 821 (SMTP) 804 restricts mail messages to 7-bit US-ASCII data with lines no 805 longer than 1000 characters. 807 It is necessary, therefore, to define a standard mechanism for 808 encoding such data into a 7-bit short-line format. Proper 809 labelling of unencoded material in less restrictive formats 810 for direct use over less restrictive transports is also 811 desireable. This document specifies that such encodings will 812 be indicated by a new "Content-Transfer-Encoding" header 813 field. This field has not been defined by any previous 814 standard. 816 5.3.1. Content-Transfer-Encoding Syntax 818 The Content-Transfer-Encoding field's value is a single token 819 specifying the type of encoding, as enumerated below. 820 Formally: 822 encoding := "Content-Transfer-Encoding" ":" mechanism 824 mechanism := "7bit" / "8bit" / "binary" / 825 "quoted-printable" / "base64" / 826 ietf-token / x-token 828 These values are not case sensitive -- Base64 and BASE64 and 829 bAsE64 are all equivalent. An encoding type of 7BIT requires 830 that the body is already in a 7-bit mail-ready representation. 831 This is the default value -- that is, "Content-Transfer- 832 Encoding: 7BIT" is assumed if the Content-Transfer-Encoding 833 header field is not present. 835 5.3.2. Content-Transfer-Encoding Semantics 837 This single token actually provides two pieces of information. 838 It specifies what sort of encoding transformation the body was 839 subjected to, and it specifies what the domain of the result 840 is. 842 Three transformations are currently defined: identity, the 843 "quoted-printable" encoding, and the "base64" encoding. The 844 domains are "binary", "8bit" and "7bit". 846 The values "7bit", "8bit", and "binary" all mean that the 847 identity (i.e. NO) encoding transformation has been performed. 848 As such, they serve simply as indicators of the domain of the 849 body part data, and provide useful information about the sort 850 of encoding that might be needed for transmission in a given 851 transport system. The terms "7bit data", "8bit data", and 852 "binary data" are all defined in Section 4. 854 The quoted-printable and base64 encodings transform their 855 input from an arbitrary domain into material in the "7bit" 856 domain, thus making it safe to carry over restricted 857 transports. The specific definition of the transformations are 858 given below. 860 The proper Content-Transfer-Encoding label must always be 861 used. Labelling unencoded data containing 8-bit characters as 862 "7bit" is not allowed, nor is labelling unencoded non-line- 863 oriented data as anything other than "binary" allowed. 865 Unlike Content-Type subtypes, a proliferation of Content- 866 Transfer-Encoding values is both undesirable and unnecessary. 867 However, establishing only a single transformation into the 868 "7bit" domain does not seem possible. There is a tradeoff 869 between the desire for a compact and efficient encoding of 870 largely-binary data and the desire for a readable encoding of 871 data that is mostly, but not entirely, 7-bit. For this 872 reason, at least two encoding mechanisms are necessary: a 873 "readable" encoding (quoted-printable) and a "dense" encoding 874 (base64). 876 Mail transport for unencoded 8-bit data is defined in RFC 1652 877 [RFC-1652]. As of the publication of this document, there are 878 no standardized Internet mail transports for which it is 879 legitimate to include unencoded binary data in mail bodies. 881 Thus there are no circumstances in which the "binary" 882 Content-Transfer-Encoding is actually valid on the Internet. 883 However, in the event that binary mail transport becomes a 884 reality in Internet mail, or when this document is used in 885 conjunction with any other binary-capable transport mechanism, 886 binary bodies should be labelled as such using this mechanism. 888 NOTE: The five values defined for the Content-Transfer- 889 Encoding field imply nothing about the Content-Type other than 890 the algorithm by which it was encoded or the transport system 891 requirements if unencoded. 893 Implementors may, if necessary, define new Content-Transfer- 894 Encoding values, but must use an x-token, which is a name 895 prefixed by "X-", to indicate its non-standard status, e.g., 896 "Content-Transfer-Encoding: x-my-new-encoding". However, 897 unlike Content-Types and subtypes, the creation of new 898 Content-Transfer-Encoding values is STRONGLY discouraged, as 899 it seems likely to hinder interoperability with little 900 potential benefit. Such use is therefore allowed only as the 901 result of an agreement between cooperating user agents. 903 If a Content-Transfer-Encoding header field appears as part of 904 a message header, it applies to the entire body of that 905 message. If a Content-Transfer-Encoding header field appears 906 as part of a body part's headers, it applies only to the body 907 of that body part. If an entity is of type "multipart" the 908 Content-Transfer-Encoding is not permitted to have any value 909 other than "7bit", "8bit" or "binary". Even more severe 910 restrictions apply to some subtypes of the "message" type. 912 It should be noted that email is character-oriented, so that 913 the mechanisms described here are mechanisms for encoding 914 arbitrary octet streams, not bit streams. If a bit stream is 915 to be encoded via one of these mechanisms, it must first be 916 converted to an 8-bit byte stream using the network standard 917 bit order ("big-endian"), in which the earlier bits in a 918 stream become the higher-order bits in a 8-bit byte. A bit 919 stream not ending at an 8-bit boundary must be padded with 920 zeroes. This document provides a mechanism for noting the 921 addition of such padding in the case of the 922 application/octet-stream Content-Type, which has a "padding" 923 parameter. 925 The encoding mechanisms defined here explicitly encode all 926 data in US-ASCII. Thus, for example, suppose an entity has 927 header fields such as: 929 Content-Type: text/plain; charset=ISO-8859-1 930 Content-transfer-encoding: base64 932 This must be interpreted to mean that the body is a base64 933 US-ASCII encoding of data that was originally in ISO-8859-1, 934 and will be in that character set again after decoding. 936 The following sections will define the two standard encoding 937 mechanisms. The definition of new content-transfer-encodings 938 is explicitly discouraged and should only occur when 939 absolutely necessary. All content-transfer-encoding namespace 940 except that beginning with "X-" is explicitly reserved to the 941 IANA for future use. Private agreements about content- 942 transfer-encodings are also explicitly discouraged. 944 Certain Content-Transfer-Encoding values may only be used on 945 certain Content-Types. In particular, it is EXPRESSLY 946 FORBIDDEN to use any encodings other than "7bit", "8bit", or 947 "binary" with any composite Content-Type, i.e. one that 948 recursively includes other Content-Type fields. Currently the 949 only composite Content-Types are "multipart" and "message". 950 All encodings that are desired for bodies of type multipart or 951 message must be done at the innermost level, by encoding the 952 actual body that needs to be encoded. 954 It should also be noted that, by definition, if a composite 955 entity has a transfer-encoding value such as "7bit", but one 956 of the enclosed parts has a less restrictive value such as 957 "8bit", then either the outer "7bit" labelling is in error, 958 because 8-bit data are included, or the inner "8bit" labelling 959 placed an unnecessarily high demand on the transport system 960 because the actual included data were actually 7-bit-safe. 962 NOTE ON ENCODING RESTRICTIONS: Though the prohibition against 963 using content-transfer-encodings on composite body data may 964 seem overly restrictive, it is necessary to prevent nested 965 encodings, in which data are passed through an encoding 966 algorithm multiple times, and must be decoded multiple times 967 in order to be properly viewed. Nested encodings add 968 considerable complexity to user agents: Aside from the 969 obvious efficiency problems with such multiple encodings, they 970 can obscure the basic structure of a message. In particular, 971 they can imply that several decoding operations are necessary 972 simply to find out what types of bodies a message contains. 973 Banning nested encodings may complicate the job of certain 974 mail gateways, but this seems less of a problem than the 975 effect of nested encodings on user agents. 977 NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT- 978 TRANSFER-ENCODING: It may seem that the Content-Transfer- 979 Encoding could be inferred from the characteristics of the 980 Content-Type that is to be encoded, or, at the very least, 981 that certain Content-Transfer-Encodings could be mandated for 982 use with specific Content-Types. There are several reasons 983 why this is not the case. First, given the varying types of 984 transports used for mail, some encodings may be appropriate 985 for some Content-Type/transport combinations and not for 986 others. (For example, in an 8-bit transport, no encoding 987 would be required for text in certain character sets, while 988 such encodings are clearly required for 7-bit SMTP.) 990 Second, certain Content-Types may require different types of 991 transfer encoding under different circumstances. For example, 992 many PostScript bodies might consist entirely of short lines 993 of 7-bit data and hence require no encoding at all. Other 994 PostScript bodies (especially those using Level 2 PostScript's 995 binary encoding mechanism) may only be reasonably represented 996 using a binary transport encoding. Finally, since Content- 997 Type is intended to be an open-ended specification mechanism, 998 strict specification of an association between Content-Types 999 and encodings effectively couples the specification of an 1000 application protocol with a specific lower-level transport. 1001 This is not desirable since the developers of a Content-Type 1002 should not have to be aware of all the transports in use and 1003 what their limitations are. 1005 NOTE ON TRANSLATING ENCODINGS: The quoted-printable and 1006 base64 encodings are designed so that conversion between them 1007 is possible. The only issue that arises in such a conversion 1008 is the handling of line breaks. When converting from quoted- 1009 printable to base64 a line break must be converted into a CRLF 1010 sequence. Similarly, a CRLF sequence in base64 data must be 1011 converted to a quoted-printable line break, but ONLY when 1012 converting text data. 1014 NOTE ON CANONICAL ENCODING MODEL: There was some confusion, 1015 in earlier drafts of this document, regarding the model for 1016 when email data was to be converted to canonical form and 1017 encoded, and in particular how this process would affect the 1018 treatment of CRLFs, given that the representation of newlines 1019 varies greatly from system to system, and the relationship 1020 between content-transfer-encodings and character sets. A 1021 canonical model for encoding is presented as Appendix F for 1022 this reason. 1024 5.3.3. Quoted-Printable Content-Transfer-Encoding 1026 The Quoted-Printable encoding is intended to represent data 1027 that largely consists of octets that correspond to printable 1028 characters in the US-ASCII character set. It encodes the data 1029 in such a way that the resulting octets are unlikely to be 1030 modified by mail transport. If the data being encoded are 1031 mostly US-ASCII text, the encoded form of the data remains 1032 largely recognizable by humans. A body which is entirely US- 1033 ASCII may also be encoded in Quoted-Printable to ensure the 1034 integrity of the data should the message pass through a 1035 character-translating, and/or line-wrapping gateway. 1037 In this encoding, octets are to be represented as determined 1038 by the following rules: 1040 (1) (General 8-bit representation) Any octet, except those 1041 indicating a line break according to the newline 1042 convention of the canonical (standard) form of the data 1043 being encoded, may be represented by an "=" followed by 1044 a two digit hexadecimal representation of the octet's 1045 value. The digits of the hexadecimal alphabet, for 1046 this purpose, are "0123456789ABCDEF". Uppercase 1047 letters must be used when sending hexadecimal data, 1048 though a robust implementation may choose to recognize 1049 lowercase letters on receipt. Thus, for example, the 1050 decimal value 12 (US-ASCII form feed) can be 1051 represented by "=0C", and the decimal value 61 (US- 1052 ASCII EQUAL SIGN) can be represented by "=3D". This 1053 rule must be followed except when the following rules 1054 allow an alternative encoding. 1056 (2) (Literal representation) Octets with decimal values of 1057 33 through 60 inclusive, and 62 through 126, inclusive, 1058 MAY be represented as the US-ASCII characters which 1059 correspond to those octets (EXCLAMATION POINT through 1060 LESS THAN, and GREATER THAN through TILDE, 1061 respectively). 1063 (3) (White Space) Octets with values of 9 and 32 MAY be 1064 represented as US-ASCII TAB (HT) and SPACE characters, 1065 respectively, but MUST NOT be so represented at the end 1066 of an encoded line. Any TAB (HT) or SPACE characters on 1067 an encoded line MUST thus be followed on that line by a 1068 printable character. In particular, an "=" at the end 1069 of an encoded line, indicating a soft line break (see 1070 rule #5) may follow one or more TAB (HT) or SPACE 1071 characters. It follows that an octet with decimal 1072 value 9 or 32 appearing at the end of an encoded line 1073 must be represented according to Rule #1. This rule is 1074 necessary because some MTAs (Message Transport Agents, 1075 programs which transport messages from one user to 1076 another, or perform a part of such transfers) are known 1077 to pad lines of text with SPACEs, and others are known 1078 to remove "white space" characters from the end of a 1079 line. Therefore, when decoding a Quoted-Printable body, 1080 any trailing white space on a line must be deleted, as 1081 it will necessarily have been added by intermediate 1082 transport agents. 1084 (4) (Line Breaks) A line break in a text body, represented 1085 as a CRLF sequence in the text canonical form, must be 1086 represented by a (RFC 822) line break, which is also a 1087 CRLF sequence, in the Quoted-Printable encoding. Since 1088 the canonical representation of types other than text 1089 do not generally include the representation of line 1090 breaks as CRLF sequences, no hard line breaks (i.e. 1091 line breaks that are intended to be meaningful and to 1092 be displayed to the user) should occur in the quoted- 1093 printable encoding of such types. Sequences like "=0D", 1094 "=0A", "=0A=0D" and "=0D=0A" will routinely appear in 1095 non-text data represented in quoted-printable, of 1096 course. 1098 Note that many implementations may elect to encode the 1099 local representation of various content types directly, 1100 as described in Appendix F. In particular, this may 1101 apply to plain text material on systems that use 1102 newline conventions other than CRLF delimiters. Such 1103 an implementation is permissible, but the generation of 1104 line breaks must be generalized to account for the case 1105 where alternate representations of newline sequences 1106 are used. 1108 (5) (Soft Line Breaks) The Quoted-Printable encoding 1109 REQUIRES that encoded lines be no more than 76 1110 characters long. If longer lines are to be encoded 1111 with the Quoted-Printable encoding, "soft" line breaks 1112 must be used. An equal sign as the last character on a 1113 encoded line indicates such a non-significant ("soft") 1114 line break in the encoded text. 1116 Thus if the "raw" form of the line is a single unencoded line 1117 that says: 1119 Now's the time for all folk to come to the aid of their country. 1121 This can be represented, in the Quoted-Printable encoding, as: 1123 Now's the time = 1124 for all folk to come= 1125 to the aid of their country. 1127 This provides a mechanism with which long lines are encoded in 1128 such a way as to be restored by the user agent. The 76 1129 character limit does not count the trailing CRLF, but counts 1130 all other characters, including any equal signs. 1132 Since the hyphen character ("-") is represented as itself in 1133 the Quoted-Printable encoding, care must be taken, when 1134 encapsulating a quoted-printable encoded body in a multipart 1135 entity, to ensure that the encapsulation boundary does not 1136 appear anywhere in the encoded body. (A good strategy is to 1137 choose a boundary that includes a character sequence such as 1138 "=_" which can never appear in a quoted-printable body. See 1139 the definition of multipart messages later in this document.) 1141 NOTE: The quoted-printable encoding represents something of a 1142 compromise between readability and reliability in transport. 1143 Bodies encoded with the quoted-printable encoding will work 1144 reliably over most mail gateways, but may not work perfectly 1145 over a few gateways, notably those involving translation into 1146 EBCDIC. A higher level of confidence is offered by the base64 1147 Content-Transfer-Encoding. A way to get reasonably reliable 1148 transport through EBCDIC gateways is to also quote the US- 1149 ASCII characters 1151 !"#$@[\]^`{|}~ 1153 according to rule #1. See Appendix B for more information. 1155 Because quoted-printable data is generally assumed to be 1156 line-oriented, it is to be expected that the representation of 1157 the breaks between the lines of quoted printable data may be 1158 altered in transport, in the same manner that plain text mail 1159 has always been altered in Internet mail when passing between 1160 systems with differing newline conventions. If such 1161 alterations are likely to constitute a corruption of the data, 1162 it is probably more sensible to use the base64 encoding rather 1163 than the quoted-printable encoding. 1165 WARNING TO IMPLEMENTORS: If binary data are encoded in 1166 quoted-printable, care must be taken to encode CR and LF 1167 characters as "=0D" and "=0A", respectively. In particular, a 1168 CRLF sequence in binary data should be encoded as "=0D=0A". 1169 Otherwise, if CRLF were represented as a hard line break, it 1170 might be incorrectly decoded on platforms with different line 1171 break conventions. 1173 For formalists, the syntax of quoted-printable data is 1174 described by the following grammar: 1176 quoted-printable := ([*(ptext / SPACE / TAB) ptext] 1177 ["="] CRLF) 1178 ; Maximum line length of 76 characters 1179 ; excluding CRLF 1181 ptext := octet / safe-char 1183 safe-char := 127, =, 1190 ; SPACE, or TAB, and is recommended for any 1191 ; characters not listed in Appendix B as 1192 ; "mail-safe". 1194 IMPORTANT NOTE: The addition of LWSP between the elements 1195 shown in this BNF is NOT allowed since this BNF does not 1196 specify a structured header field. 1198 5.3.4. Base64 Content-Transfer-Encoding 1200 The Base64 Content-Transfer-Encoding is designed to represent 1201 arbitrary sequences of octets in a form that need not be 1202 humanly readable. The encoding and decoding algorithms are 1203 simple, but the encoded data are consistently only about 33 1204 percent larger than the unencoded data. This encoding is 1205 virtually identical to the one used in Privacy Enhanced Mail 1206 (PEM) applications, as defined in RFC 1421 [RFC-1421]. 1208 A 65-character subset of US-ASCII is used, enabling 6 bits to 1209 be represented per printable character. (The extra 65th 1210 character, "=", is used to signify a special processing 1211 function.) 1213 NOTE: This subset has the important property that it is 1214 represented identically in all versions of ISO 646, including 1215 US-ASCII, and all characters in the subset are also 1216 represented identically in all versions of EBCDIC. Other 1217 popular encodings, such as the encoding used by the uuencode 1218 utility and the base85 encoding specified as part of Level 2 1219 PostScript, do not share these properties, and thus do not 1220 fulfill the portability requirements a binary transport 1221 encoding for mail must meet. 1223 The encoding process represents 24-bit groups of input bits as 1224 output strings of 4 encoded characters. Proceeding from left 1225 to right, a 24-bit input group is formed by concatenating 3 1226 8-bit input groups. These 24 bits are then treated as 4 1227 concatenated 6-bit groups, each of which is translated into a 1228 single digit in the base64 alphabet. When encoding a bit 1229 stream via the base64 encoding, the bit stream must be 1230 presumed to be ordered with the most-significant-bit first. 1231 That is, the first bit in the stream will be the high-order 1232 bit in the first 8-bit byte, and the eighth bit will be the 1233 low-order bit in the first 8-bit byte, and so on. 1235 Each 6-bit group is used as an index into an array of 64 1236 printable characters. The character referenced by the index 1237 is placed in the output string. These characters, identified 1238 in Table 1, below, are selected so as to be universally 1239 representable, and the set excludes characters with particular 1240 significance to SMTP (e.g., ".", CR, LF) and to the 1241 encapsulation boundaries defined in this document (e.g., "-"). 1243 Table 1: The Base64 Alphabet 1245 Value Encoding Value Encoding Value Encoding Value Encoding 1246 0 A 17 R 34 i 51 z 1247 1 B 18 S 35 j 52 0 1248 2 C 19 T 36 k 53 1 1249 3 D 20 U 37 l 54 2 1250 4 E 21 V 38 m 55 3 1251 5 F 22 W 39 n 56 4 1252 6 G 23 X 40 o 57 5 1253 7 H 24 Y 41 p 58 6 1254 8 I 25 Z 42 q 59 7 1255 9 J 26 a 43 r 60 8 1256 10 K 27 b 44 s 61 9 1257 11 L 28 c 45 t 62 + 1258 12 M 29 d 46 u 63 / 1259 13 N 30 e 47 v 1260 14 O 31 f 48 w (pad) = 1261 15 P 32 g 49 x 1262 16 Q 33 h 50 y 1264 The encoded output stream must be represented in lines of no 1265 more than 76 characters each. All line breaks or other 1266 characters not found in Table 1 must be ignored by decoding 1267 software. In base64 data, characters other than those in 1268 Table 1, line breaks, and other white space probably indicate 1269 a transmission error, about which a warning message or even a 1270 message rejection might be appropriate under some 1271 circumstances. 1273 Special processing is performed if fewer than 24 bits are 1274 available at the end of the data being encoded. A full 1275 encoding quantum is always completed at the end of a body. 1276 When fewer than 24 input bits are available in an input group, 1277 zero bits are added (on the right) to form an integral number 1278 of 6-bit groups. Padding at the end of the data is performed 1279 using the "=" character. Since all base64 input is an 1280 integral number of octets, only the following cases can arise: 1281 (1) the final quantum of encoding input is an integral 1282 multiple of 24 bits; here, the final unit of encoded output 1283 will be an integral multiple of 4 characters with no "=" 1284 padding, (2) the final quantum of encoding input is exactly 8 1285 bits; here, the final unit of encoded output will be two 1286 characters followed by two "=" padding characters, or (3) the 1287 final quantum of encoding input is exactly 16 bits; here, the 1288 final unit of encoded output will be three characters followed 1289 by one "=" padding character. 1291 Because it is used only for padding at the end of the data, 1292 the occurrence of any "=" characters may be taken as evidence 1293 that the end of the data has been reached (without truncation 1294 in transit). No such assurance is possible, however, when the 1295 number of octets transmitted was a multiple of three. 1297 Any characters outside of the base64 alphabet are to be 1298 ignored in base64-encoded data. The same applies to any 1299 invalid sequence of characters in the base64 encoding, such as 1300 "=====" 1302 Care must be taken to use the proper octets for line breaks if 1303 base64 encoding is applied directly to text material that has 1304 not been converted to canonical form. In particular, text 1305 line breaks must be converted into CRLF sequences prior to 1306 base64 encoding. The important thing to note is that this may 1307 be done directly by the encoder rather than in a prior 1308 canonicalization step in some implementations. 1310 NOTE: There is no need to worry about quoting apparent 1311 encapsulation boundaries within base64-encoded parts of 1312 multipart entities because no hyphen characters are used in 1313 the base64 encoding. 1315 5.4. Content-ID Header Field 1317 In constructing a high-level user agent, it may be desirable 1318 to allow one body to make reference to another. Accordingly, 1319 bodies may be labelled using the "Content-ID" header field, 1320 which is syntactically identical to the "Message-ID" header 1321 field: 1323 id := "Content-ID" ":" msg-id 1325 Like the Message-ID values, Content-ID values must be 1326 generated to be world-unique. 1328 The Content-ID value may be used for uniquely identifying MIME 1329 entities in several contexts, particularly for caching data 1330 referenced by the message/external-body mechanism. Although 1331 the Content-ID header is generally optional, its use is 1332 MANDATORY in implementations which generate data of the 1333 optional MIME Content-type "message/external-body". That is, 1334 each message/external-body entity must have a Content-ID field 1335 to permit caching of such data. 1337 It is also worth noting that the Content-ID value has special 1338 semantics in the case of the multipart/alternative content- 1339 type. This is explained in the section of this document 1340 dealing with multipart/alternative. 1342 5.5. Content-Description Header Field 1344 The ability to associate some descriptive information with a 1345 given body is often desirable. For example, it may be useful 1346 to mark an "image" body as "a picture of the Space Shuttle 1347 Endeavor." Such text may be placed in the Content-Description 1348 header field. This header field is always optional. 1350 description := "Content-Description" ":" *text 1352 The description is presumed to be given in the US-ASCII 1353 character set, although the mechanism specified in RFC MIME- 1354 HEADERS [RFC-MIME-HEADERS] may be used for non-US-ASCII 1355 Content-Description values. 1357 5.6. Additional MIME Header Fields 1359 Future documents may elect to define additional MIME header 1360 fields for various purposes. Any new header field that 1361 further describes the content of a message should begin with 1362 the string "Content-" to allow such fields which appear in a 1363 message header to be distinguished from ordinary RFC 822 1364 message header fields. 1366 MIME-extension-field := 1370 6. Predefined Content-Type Values 1372 This document defines seven initial Content-Type values and an 1373 extension mechanism for private or experimental types. 1374 Further standard types must be defined by new published 1375 specifications. It is expected that most innovation in new 1376 types of mail will take place as subtypes of the seven types 1377 defined here. The most essential characteristics of the seven 1378 content-types are summarized in Appendix E. 1380 6.1. Discrete Content-Type Values 1382 Five of the seven initial Content-Type values refer to 1383 discrete bodies. The content of such entities is handled by 1384 non-MIME mechanisms; they are opaque to MIME processors. 1386 6.1.1. Text Content-Type 1388 The text Content-Type is intended for sending material which 1389 is principally textual in form. A "charset" parameter may be 1390 used to indicate the character set of the body text for some 1391 text subtypes, notably including the subtype "text/plain", 1392 which indicates plain (unformatted) text. The default 1393 Content-Type for Internet mail if none is specified is 1394 "text/plain; charset=us-ascii". 1396 Beyond plain text, there are many formats for representing 1397 what might be known as "extended text" -- text with embedded 1398 formatting and presentation information. An interesting 1399 characteristic of many such representations is that they are 1400 to some extent readable even without the software that 1401 interprets them. It is useful, then, to distinguish them, at 1402 the highest level, from such unreadable data as images, audio, 1403 or text represented in an unreadable form. In the absence of 1404 appropriate interpretation software, it is reasonable to show 1405 subtypes of text to the user, while it is not reasonable to do 1406 so with most nontextual data. 1408 Such formatted textual data should be represented using 1409 subtypes of text. Plausible subtypes of text are typically 1410 given by the common name of the representation format, e.g., 1411 "text/enriched" [RFC-1563]. 1413 6.1.1.1. Representation of Line Breaks 1415 The canonical form of any MIME text type MUST represent a line 1416 break as a CRLF sequence. Similarly, any occurrence of CRLF 1417 in text MUST represent a line break. Use of CR and LF outside 1418 of line break sequences is also forbidden. 1420 This rule applies regardless of format or character set or 1421 sets involved. 1423 6.1.1.2. Charset Parameter 1425 A critical parameter that may be specified in the Content-Type 1426 field for text/plain data is the character set. This is 1427 specified with a "charset" parameter, as in: 1429 Content-type: text/plain; charset=iso-8859-1 1431 Unlike some other parameter values, the values of the charset 1432 parameter are NOT case sensitive. The default character set, 1433 which must be assumed in the absence of a charset parameter, 1434 is US-ASCII. 1436 The specification for any future subtypes of "text" must 1437 specify whether or not they will also utilize a "charset" 1438 parameter, and may possibly restrict its values as well. When 1439 used with a particular body, the semantics of the "charset" 1440 parameter should be identical to those specified here for 1441 "text/plain", i.e., the body consists entirely of characters 1442 in the given charset. In particular, definers of future text 1443 subtypes should pay close attention the the implications of 1444 multioctet character sets for their subtype definitions. 1446 This RFC specifies the definition of the charset parameter for 1447 the purposes of MIME to be the name of a character set, as 1448 "character set" as defined in Section 4 of this document. The 1449 rules regarding line breaks detailed in the previous section 1450 must also be observed -- a character set whose definition does 1451 not conform to these rules cannot be used in a MIME text type. 1453 An initial list of predefined character set names can be found 1454 at the end of this section. Additional character sets may be 1455 registered with IANA as described in RFC REG. 1457 Note that if the specified character set includes 8-bit data, 1458 a Content-Transfer-Encoding header field and a corresponding 1459 encoding on the data are required in order to transmit the 1460 body via some mail transfer protocols, such as SMTP. 1462 The default character set, US-ASCII, has been the subject of 1463 some confusion and ambiguity in the past. Not only were there 1464 some ambiguities in the definition, there have been wide 1465 variations in practice. In order to eliminate such ambiguity 1466 and variations in the future, it is strongly recommended that 1467 new user agents explicitly specify a character set via the 1468 Content-Type header field. "US-ASCII" does not indicate an 1469 arbitrary 7-bit character code, but specifies that the body 1470 uses character coding that uses the exact correspondence of 1471 octets to characters specified in US-ASCII. National use 1472 variations of ISO 646 [ISO-646] are NOT US-ASCII and their use 1473 in Internet mail is explicitly discouraged. The omission of 1474 the ISO 646 character set is deliberate in this regard. The 1475 character set name of "US-ASCII" explicitly refers to ANSI 1476 X3.4-1986 [US-ASCII] only. The character set name "ASCII" is 1477 reserved and must not be used for any purpose. 1479 NOTE: RFC 821 explicitly specifies "ASCII", and references an 1480 earlier version of the American Standard. Insofar as one of 1481 the purposes of specifying a Content-Type and character set is 1482 to permit the receiver to unambiguously determine how the 1483 sender intended the coded message to be interpreted, assuming 1484 anything other than "strict ASCII" as the default would risk 1485 unintentional and incompatible changes to the semantics of 1486 messages now being transmitted. This also implies that 1487 messages containing characters coded according to national 1488 variations on ISO 646, or using code-switching procedures 1489 (e.g., those of ISO 2022), as well as 8-bit or multiple octet 1490 character encodings MUST use an appropriate character set 1491 specification to be consistent with this specification. 1493 The complete US-ASCII character set is listed in ANSI X3.4- 1494 1986. Note that the control characters including DEL (0-31, 1495 127) have no defined meaning apart from the combination CRLF 1496 (US-ASCII values 13 and 10) indicating a new line. Two of the 1497 characters have de facto meanings in wide use: FF (12) often 1498 means "start subsequent text on the beginning of a new page"; 1499 and TAB or HT (9) often (though not always) means "move the 1500 cursor to the next available column after the current position 1501 where the column number is a multiple of 8 (counting the first 1502 column as column 0)." Apart from this, any use of the control 1503 characters or DEL in a body must be part of a private 1504 agreement between the sender and recipient. Such private 1505 agreements are discouraged and should be replaced by the other 1506 capabilities of this document. 1508 NOTE: Beyond US-ASCII, an enormous proliferation of character 1509 sets is possible. It is the opinion of the IETF working group 1510 that a large number of character sets is NOT a good thing. We 1511 would prefer to specify a SINGLE character set that can be 1512 used universally for representing all of the world's languages 1513 in electronic mail. Unfortunately, existing practice in 1514 several communities seems to point to the continued use of 1515 multiple character sets in the near future. For this reason, 1516 we define names for a small number of character sets for which 1517 a strong constituent base exists. 1519 The defined charset values are: 1521 (1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII]. 1523 (2) ISO-8859-X -- where "X" is to be replaced, as 1524 necessary, for the parts of ISO-8859 [ISO-8859]. Note 1525 that the ISO 646 character sets have deliberately been 1526 omitted in favor of their 8859 replacements, which are 1527 the designated character sets for Internet mail. As of 1528 the publication of this document, the legitimate values 1529 for "X" are the digits 1 through 9. 1531 All of these character sets are used as pure 7- or 8-bit sets 1532 without any shift or escape functions. The meaning of shift 1533 and escape sequences in these character sets is not defined. 1535 The character sets specified above are the ones that were 1536 relatively uncontroversial during the drafting of MIME. This 1537 document does not endorse the use of any particular character 1538 set other than US-ASCII, and recognizes that the future 1539 evolution of world character sets remains unclear. It is 1540 expected that in the future, additional character sets will be 1541 registered for use in MIME. 1543 Note that the character set used, if anything other than US- 1544 ASCII, must always be explicitly specified in the Content-Type 1545 field. 1547 No other character set name may be used in Internet mail 1548 without the publication of a formal specification and its 1549 registration with IANA, or by private agreement, in which case 1550 the character set name must begin with "X-". 1552 Implementors are discouraged from defining new character sets 1553 for mail use unless absolutely necessary. 1555 The "charset" parameter has been defined primarily for the 1556 purpose of textual data, and is described in this section for 1557 that reason. However, it is conceivable that non-textual data 1558 might also wish to specify a charset value for some purpose, 1559 in which case the same syntax and values should be used. 1561 In general, mail-sending software should always use the 1562 "lowest common denominator" character set possible. For 1563 example, if a body contains only US-ASCII characters, it 1564 should be marked as being in the US-ASCII character set, not 1565 ISO-8859-1, which, like all the ISO-8859 family of character 1566 sets, is a superset of US-ASCII. More generally, if a 1567 widely-used character set is a subset of another character 1568 set, and a body contains only characters in the widely-used 1569 subset, it should be labelled as being in that subset. This 1570 will increase the chances that the recipient will be able to 1571 view the mail correctly. 1573 6.1.1.3. Plain Subtype 1575 The simplest and most important subtype of text is "plain". 1576 This indicates plain (unformatted) text. The default 1577 Content-Type for Internet mail, "text/plain; charset=us- 1578 ascii", describes existing Internet practice. That is, it is 1579 the type of body defined by RFC 822. 1581 No other text subtype is defined by this document. 1583 6.1.1.4. Unrecognized Subtypes 1585 Unrecognized subtypes of text should be treated as subtype 1586 "plain" as long as the MIME implementation knows how to handle 1587 the charset. Unrecognized subtypes which also specify an 1588 unrecognized charset should be treated as "application/octet- 1589 stream". 1591 6.1.2. Image Content-Type 1593 A Content-Type of "image" indicates that the body contains an 1594 image. The subtype names the specific image format. These 1595 names are not case sensitive. Two initial subtypes are "jpeg" 1596 for the JPEG format, JFIF encoding, and "gif" for GIF format 1597 [GIF]. 1599 The list of image subtypes given here is neither exclusive nor 1600 exhaustive, and is expected to grow as more types are 1601 registered with IANA, as described in RFC REG. 1603 Unrecognized subtypes of image should at a miniumum be treated 1604 as "application/octet-stream". Implementations may optionally 1605 elect to pass subtypes of image that they do not specifically 1606 recognize to a robust general-purpose image viewing 1607 application, if such an application is available. 1609 6.1.3. Audio Content-Type 1611 A Content-Type of "audio" indicates that the body contains 1612 audio data. Although there is not yet a consensus on an 1613 "ideal" audio format for use with computers, there is a 1614 pressing need for a format capable of providing interoperable 1615 behavior. 1617 The initial subtype of "basic" is specified to meet this 1618 requirement by providing an absolutely minimal lowest common 1619 denominator audio format. It is expected that richer formats 1620 for higher quality and/or lower bandwidth audio will be 1621 defined by a later document. 1623 The content of the "audio/basic" subtype is single channel 1624 audio encoded using 8-bit ISDN mu-law [PCM] at a sample rate 1625 of 8000 Hz. 1627 Unrecognized subtypes of audio should at a miniumum be treated 1628 as "application/octet-stream". Implementations may optionally 1629 elect to pass subtypes of audio that they do not specifically 1630 recognize to a robust general-purpose audio playing 1631 application, if such an application is available. 1633 6.1.4. Video Content-Type 1635 A Content-Type of "video" indicates that the body contains a 1636 time-varying-picture image, possibly with color and 1637 coordinated sound. The term "video" is used extremely 1638 generically, rather than with reference to any particular 1639 technology or format, and is not meant to preclude subtypes 1640 such as animated drawings encoded compactly. The subtype 1641 "mpeg" refers to video coded according to the MPEG standard 1642 [MPEG]. 1644 Note that although in general this document strongly 1645 discourages the mixing of multiple media in a single body, it 1646 is recognized that many so-called "video" formats include a 1647 representation for synchronized audio, and this is explicitly 1648 permitted for subtypes of "video". 1650 Unrecognized subtypes of video should at a minumum be treated 1651 as "application/octet-stream". Implementations may optionally 1652 elect to pass subtypes of video that they do not specifically 1653 recognize to a robust general-purpose video display 1654 application, if such an application is available. 1656 6.1.5. Application Content-Type 1658 The "application" Content-Type is to be used for discrete data 1659 which do not fit in any of the other categories, and 1660 particularly for data to be processed by mail-based uses of 1661 application programs. This is information which must be 1662 processed by an application before it is viewable or usable to 1663 a user. Expected uses for Content-Type application include 1664 mail-based file transfer, spreadsheets, data for mail-based 1665 scheduling systems, and languages for "active" (computational) 1666 email. (The latter, in particular, can pose security problems 1667 which must be understood by implementors, and are considered 1668 in detail in the discussion of the application/PostScript 1669 content-type.) 1671 For example, a meeting scheduler might define a standard 1672 representation for information about proposed meeting dates. 1673 An intelligent user agent would use this information to 1674 conduct a dialog with the user, and might then send further 1675 mail based on that dialog. More generally, there have been 1676 several "active" messaging languages developed in which 1677 programs in a suitably specialized language are sent through 1678 the mail and automatically run in the recipient's environment. 1680 Such applications may be defined as subtypes of the 1681 "application" Content-Type. This document defines two 1682 subtypes: octet-stream, and PostScript. 1684 The subtype of application will often be the name of the 1685 application for which the data are intended. This does not 1686 mean, however, that any application program name may be used 1687 freely as a subtype of application. Usage of any subtype 1688 (other than subtypes beginning with "x-") must be registered 1689 with IANA, as described in RFC REG. 1691 6.1.5.1. Octet-Stream Subtype 1693 The "octet-stream" subtype is used to indicate that a body 1694 contains arbitrary binary data. The set of currently defined 1695 parameters is: 1697 (1) TYPE -- the general type or category of binary data. 1698 This is intended as information for the human recipient 1699 rather than for any automatic processing. 1701 (2) PADDING -- the number of bits of padding that were 1702 appended to the bit-stream comprising the actual 1703 contents to produce the enclosed 8-bit byte-oriented 1704 data. This is useful for enclosing a bit-stream in a 1705 body when the total number of bits is not a multiple of 1706 8. 1708 Both of these parameters are optional. 1710 An additional parameter, "CONVERSIONS", was defined in RFC 1711 1341 but has since been removed. RFC 1341 also defined the 1712 use of a "NAME" parameter which gave a suggested file name to 1713 be used if the data were to be written to a file. This has 1714 been deprecated in anticipation of a separate Content- 1715 Disposition header field, to be defined in a subsequent RFC. 1717 The recommended action for an implementation that receives 1718 application/octet-stream mail is to simply offer to put the 1719 data in a file, with any Content-Transfer-Encoding undone, or 1720 perhaps to use it as input to a user-specified process. 1722 To reduce the danger of transmitting rogue programs through 1723 the mail, it is strongly recommended that implementations NOT 1724 implement a path-search mechanism whereby an arbitrary program 1725 named in the Content-Type parameter (e.g., an "interpreter=" 1726 parameter) is found and executed using the mail body as input. 1728 6.1.5.2. PostScript Subtype 1730 A Content-Type of "application/postscript" indicates a 1731 PostScript program. Currently two variants of the PostScript 1732 language are allowed; the original level 1 variant is 1733 described in [POSTSCRIPT] and the more recent level 2 variant 1734 is described in [POSTSCRIPT2]. 1736 PostScript is a registered trademark of Adobe Systems, Inc. 1737 Use of the MIME content-type "application/postscript" implies 1738 recognition of that trademark and all the rights it entails. 1740 The PostScript language definition provides facilities for 1741 internal labelling of the specific language features a given 1742 program uses. This labelling, called the PostScript document 1743 structuring conventions, or DSC, is very general and provides 1744 substantially more information than just the language level. 1745 The use of document structuring conventions, while not 1746 required, is strongly recommended as an aid to 1747 interoperability. Documents which lack proper structuring 1748 conventions cannot be tested to see whether or not they will 1749 work in a given environment. As such, some systems may assume 1750 the worst and refuse to process unstructured documents. 1752 The execution of general-purpose PostScript interpreters 1753 entails serious security risks, and implementors are 1754 discouraged from simply sending PostScript email bodies to 1755 "off-the-shelf" interpreters. While it is usually safe to 1756 send PostScript to a printer, where the potential for harm is 1757 greatly constrained by typical printer environments, 1758 implementors should consider all of the following before they 1759 add interactive display of PostScript bodies to their mail 1760 readers. 1762 The remainder of this section outlines some, though probably 1763 not all, of the possible problems with sending PostScript 1764 through the mail. 1766 (1) Dangerous operations in the PostScript language 1767 include, but may not be limited to, the PostScript 1768 operators "deletefile", "renamefile", "filenameforall", 1769 and "file". "File" is only dangerous when applied to 1770 something other than standard input or output. 1771 Implementations may also define additional nonstandard 1772 file operators; these may also pose a threat to 1773 security. "Filenameforall", the wildcard file search 1774 operator, may appear at first glance to be harmless. 1775 Note, however, that this operator has the potential to 1776 reveal information about what files the recipient has 1777 access to, and this information may itself be 1778 sensitive. Message senders should avoid the use of 1779 potentially dangerous file operators, since these 1780 operators are quite likely to be unavailable in secure 1781 PostScript implementations. Message receiving and 1782 displaying software should either completely disable 1783 all potentially dangerous file operators or take 1784 special care not to delegate any special authority to 1785 their operation. These operators should be viewed as 1786 being done by an outside agency when interpreting 1787 PostScript documents. Such disabling and/or checking 1788 should be done completely outside of the reach of the 1789 PostScript language itself; care should be taken to 1790 insure that no method exists for re-enabling full- 1791 function versions of these operators. 1793 (2) The PostScript language provides facilities for exiting 1794 the normal interpreter, or server, loop. Changes made 1795 in this "outer" environment are customarily retained 1796 across documents, and may in some cases be retained 1797 semipermanently in nonvolatile memory. The operators 1798 associated with exiting the interpreter loop have the 1799 potential to interfere with subsequent document 1800 processing. As such, their unrestrained use constitutes 1801 a threat of service denial. PostScript operators that 1802 exit the interpreter loop include, but may not be 1803 limited to, the exitserver and startjob operators. 1804 Message sending software should not generate PostScript 1805 that depends on exiting the interpreter loop to 1806 operate, since the ability to exit will probably be 1807 unavailable in secure PostScript implementations. 1808 Message receiving and displaying software should 1809 completely disable the ability to make retained changes 1810 to the PostScript environment by eliminating or 1811 disabling the "startjob" and "exitserver" operations. 1812 If these operations cannot be eliminated or completely 1813 disabled the password associated with them should at 1814 least be set to a hard-to-guess value. 1816 (3) PostScript provides operators for setting system-wide 1817 and device-specific parameters. These parameter 1818 settings may be retained across jobs and may 1819 potentially pose a threat to the correct operation of 1820 the interpreter. The PostScript operators that set 1821 system and device parameters include, but may not be 1822 limited to, the "setsystemparams" and "setdevparams" 1823 operators. Message sending software should not 1824 generate PostScript that depends on the setting of 1825 system or device parameters to operate correctly. The 1826 ability to set these parameters will probably be 1827 unavailable in secure PostScript implementations. 1828 Message receiving and displaying software should 1829 disable the ability to change system and device 1830 parameters. If these operators cannot be completely 1831 disabled the password associated with them should at 1832 least be set to a hard-to-guess value. 1834 (4) Some PostScript implementations provide nonstandard 1835 facilities for the direct loading and execution of 1836 machine code. Such facilities are quite obviously open 1837 to substantial abuse. Message sending software should 1838 not make use of such features. Besides being totally 1839 hardware-specific, they are also likely to be 1840 unavailable in secure implementations of PostScript. 1841 Message receiving and displaying software should not 1842 allow such operators to be used if they exist. 1844 (5) PostScript is an extensible language, and many, if not 1845 most, implementations of it provide a number of their 1846 own extensions. This document does not deal with such 1847 extensions explicitly since they constitute an unknown 1848 factor. Message sending software should not make use 1849 of nonstandard extensions; they are likely to be 1850 missing from some implementations. Message receiving 1851 and displaying software should make sure that any 1852 nonstandard PostScript operators are secure and don't 1853 present any kind of threat. 1855 (6) It is possible to write PostScript that consumes huge 1856 amounts of various system resources. It is also 1857 possible to write PostScript programs that loop 1858 indefinitely. Both types of programs have the 1859 potential to cause damage if sent to unsuspecting 1860 recipients. Message-sending software should avoid the 1861 construction and dissemination of such programs, which 1862 is antisocial. Message receiving and displaying 1863 software should provide appropriate mechanisms to abort 1864 processing of a document after a reasonable amount of 1865 time has elapsed. In addition, PostScript interpreters 1866 should be limited to the consumption of only a 1867 reasonable amount of any given system resource. 1869 (7) It is possible to include raw binary information inside 1870 PostScript in various forms. This is not recommended 1871 for use in email, both because it is not supported by 1872 all PostScript interpreters and because it 1873 significantly complicates the use of a MIME Content- 1874 Transfer-Encoding. (Without such binary, PostScript 1875 may typically be viewed as line-oriented data. The 1876 treatment of CRLF sequences becomes extremely 1877 problematic if binary and line-oriented data are mixed 1878 in a single Postscript data stream.) 1880 (8) Finally, bugs may exist in some PostScript interpreters 1881 which could possibly be exploited to gain unauthorized 1882 access to a recipient's system. Apart from noting this 1883 possibility, there is no specific action to take to 1884 prevent this, apart from the timely correction of such 1885 bugs if any are found. 1887 6.1.5.3. Other Application Subtypes 1889 It is expected that many other subtypes of application will be 1890 defined in the future. MIME implementations must at a minimum 1891 treat any unrecognized subtypes as being equivalent to 1892 "application/octet-stream". 1894 6.2. Composite Content-Type Values 1896 The remaining two of the seven initial Content-Type values 1897 refer to composite entities. Composite entities are handled 1898 using MIME mechanisms -- a MIME processor typically handles 1899 the body directly. 1901 6.2.1. Multipart Content-Type 1903 In the case of multiple part entities, in which one or more 1904 different sets of data are combined in a single body, a 1905 "multipart" Content-Type field must appear in the entity's 1906 header. The body must then contain one or more "body parts," 1907 each preceded by an encapsulation boundary, and the last one 1908 followed by a closing boundary. Each part starts with an 1909 encapsulation boundary, and then contains a body part 1910 consisting of a header area, a blank line, and a body area. 1911 Thus a body part is similar to an RFC 822 message in syntax, 1912 but different in meaning. 1914 A body part is NOT to be interpreted as actually being an RFC 1915 822 message. To begin with, NO header fields are actually 1916 required in body parts. A body part that starts with a blank 1917 line, therefore, is allowed and is a body part for which all 1918 default values are to be assumed. In such a case, the 1919 absence of a Content-Type header indicates that the 1920 corresponding body has a content-type of "text/plain; 1921 charset=US-ASCII"". 1923 The only header fields that have defined meaning for body 1924 parts are those the names of which begin with "Content-". All 1925 other header fields are generally to be ignored in body parts. 1926 Although they should generally be retained in mail processing, 1927 they may be discarded by gateways if necessary. Such other 1928 fields are permitted to appear in body parts but must not be 1929 depended on. "X-" fields may be created for experimental or 1930 private purposes, with the recognition that the information 1931 they contain may be lost at some gateways. 1933 NOTE: The distinction between an RFC 822 message and a body 1934 part is subtle, but important. A gateway between Internet and 1935 X.400 mail, for example, must be able to tell the difference 1936 between a body part that contains an image and a body part 1937 that contains an encapsulated message, the body of which is a 1938 GIF image. In order to represent the latter, the body part 1939 must have "Content-Type: message/rfc822", and its body (after 1940 the blank line) must be the encapsulated message, with its own 1941 "Content-Type: image/gif" header field. The use of similar 1942 syntax facilitates the conversion of messages to body parts, 1943 and vice versa, but the distinction between the two must be 1944 understood by implementors. (For the special case in which 1945 all parts actually are messages, a "digest" subtype is also 1946 defined.) 1948 As stated previously, each body part is preceded by an 1949 encapsulation boundary. The encapsulation boundary MUST NOT 1950 appear inside any of the encapsulated parts. Thus, it is 1951 crucial that the composing agent be able to choose and specify 1952 a unique boundary that will separate the parts. 1954 All present and future subtypes of the "multipart" type must 1955 use an identical syntax. Subtypes may differ in their 1956 semantics, and may impose additional restrictions on syntax, 1957 but must conform to the required syntax for the multipart 1958 type. This requirement ensures that all conformant user 1959 agents will at least be able to recognize and separate the 1960 parts of any multipart entity, even of an unrecognized 1961 subtype. 1963 As stated in the definition of the Content-Transfer-Encoding 1964 field, no encoding other than "7bit", "8bit", or "binary" is 1965 permitted for entities of type "multipart". The multipart 1966 delimiters and header fields are always represented as 7-bit 1967 US-ASCII in any case (though the header fields may encode 1968 non-US-ASCII header text as per RFC MIME-HEADERS, and data 1969 within the body parts can be encoded on a part-by-part basis, 1970 with Content-Transfer-Encoding fields for each appropriate 1971 body part. 1973 Message transport agents, relays, and gateways are commonly 1974 known to alter the top-level header of an RFC 822 message. In 1975 particular, they frequently add, remove, or reorder header 1976 fields. Such alterations are explicitly forbidden for the 1977 headers of any body part which occurs within an enclosing 1978 multipart body part. 1980 6.2.1.1. Common Syntax 1982 This section defines a common syntax for subtypes of 1983 multipart. All subtypes of multipart must use this syntax. A 1984 simple example of a multipart message also appears in this 1985 section. An example of a more complex multipart message is 1986 given in Appendix C. 1988 The Content-Type field for multipart entities requires one 1989 parameter, "boundary", which is used to specify the 1990 encapsulation boundary. The encapsulation boundary is defined 1991 as a line consisting entirely of two hyphen characters ("-", 1992 decimal value 45) followed by the boundary parameter value 1993 from the Content-Type header field. 1995 NOTE: The hyphens are for rough compatibility with the 1996 earlier RFC 934 method of message encapsulation, and for ease 1997 of searching for the boundaries in some implementations. 1998 However, it should be noted that multipart messages are NOT 1999 completely compatible with RFC 934 encapsulations; in 2000 particular, they do not obey RFC 934 quoting conventions for 2001 embedded lines that begin with hyphens. This mechanism was 2002 chosen over the RFC 934 mechanism because the latter causes 2003 lines to grow with each level of quoting. The combination of 2004 this growth with the fact that SMTP implementations sometimes 2005 wrap long lines made the RFC 934 mechanism unsuitable for use 2006 in the event that deeply-nested multipart structuring is ever 2007 desired. 2009 WARNING TO IMPLEMENTORS: The grammar for parameters on the 2010 Content-type field is such that it is often necessary to 2011 enclose the boundaries in quotes on the Content-type line. 2012 This is not always necessary, but never hurts. Implementors 2013 should be sure to study the grammar carefully in order to 2014 avoid producing invalid Content-type fields. Thus, a typical 2015 multipart Content-Type header field might look like this: 2017 Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p 2019 But the following is not valid: 2021 Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p 2023 (because of the colon) and must instead be represented as 2024 Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p" 2026 This Content-Type value indicates that the content consists of 2027 one or more parts, each with a structure that is syntactically 2028 identical to an RFC 822 message, except that the header area 2029 is allowed to be completely empty, and that the parts are each 2030 preceded by the line 2032 --gc0pJq0M:08jU534c0p 2034 The encapsulation boundary MUST occur at the beginning of a 2035 line, i.e., following a CRLF, and the initial CRLF is 2036 considered to be attached to the encapsulation boundary rather 2037 than part of the preceding part. The boundary must be 2038 followed immediately either by another CRLF and the header 2039 fields for the next part, or by two CRLFs, in which case there 2040 are no header fields for the next part (and it is therefore 2041 assumed to be of Content-Type text/plain). 2043 NOTE: The CRLF preceding the encapsulation line is 2044 conceptually attached to the boundary so that it is possible 2045 to have a part that does not end with a CRLF (line break). 2046 Body parts that must be considered to end with line breaks, 2047 therefore, must have two CRLFs preceding the encapsulation 2048 line, the first of which is part of the preceding body part, 2049 and the second of which is part of the encapsulation boundary. 2051 Encapsulation boundaries must not appear within the 2052 encapsulations, and must be no longer than 70 characters, not 2053 counting the two leading hyphens. 2055 The encapsulation boundary following the last body part is a 2056 distinguished delimiter that indicates that no further body 2057 parts will follow. Such a delimiter is identical to the 2058 previous delimiters, with the addition of two more hyphens at 2059 the end of the line: 2061 --gc0pJq0M:08jU534c0p-- 2063 There appears to be room for additional information prior to 2064 the first encapsulation boundary and following the final 2065 boundary. These areas should generally be left blank, and 2066 implementations must ignore anything that appears before the 2067 first boundary or after the last one. 2069 NOTE: These "preamble" and "epilogue" areas are generally not 2070 used because of the lack of proper typing of these parts and 2071 the lack of clear semantics for handling these areas at 2072 gateways, particularly X.400 gateways. However, rather than 2073 leaving the preamble area blank, many MIME implementations 2074 have found this to be a convenient place to insert an 2075 explanatory note for recipients who read the message with 2076 pre-MIME software, since such notes will be ignored by MIME- 2077 compliant software. 2079 NOTE: Because encapsulation boundaries must not appear in the 2080 body parts being encapsulated, a user agent must exercise care 2081 to choose a unique boundary. The boundary in the example 2082 above could have been the result of an algorithm designed to 2083 produce boundaries with a very low probability of already 2084 existing in the data to be encapsulated without having to 2085 prescan the data. Alternate algorithms might result in more 2086 "readable" boundaries for a recipient with an old user agent, 2087 but would require more attention to the possibility that the 2088 boundary might appear in the encapsulated part. The simplest 2089 boundary possible is something like "---", with a closing 2090 boundary of "-----". 2092 As a very simple example, the following multipart message has 2093 two parts, both of them plain text, one of them explicitly 2094 typed and one of them implicitly typed: 2096 From: Nathaniel Borenstein 2097 To: Ned Freed 2098 Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST) 2099 Subject: Sample message 2100 MIME-Version: 1.0 2101 Content-type: multipart/mixed; boundary="simple boundary" 2103 This is the preamble. It is to be ignored, though it 2104 is a handy place for mail composers to include an 2105 explanatory note to non-MIME conformant readers. 2107 --simple boundary 2109 This is implicitly typed plain US-ASCII text. 2110 It does NOT end with a linebreak. 2111 --simple boundary 2112 Content-type: text/plain; charset=us-ascii 2113 This is explicitly typed plain US-ASCII text. 2114 It DOES end with a linebreak. 2116 --simple boundary-- 2118 This is the epilogue. It is also to be ignored. 2120 The use of a Content-Type of multipart in a body part within 2121 another multipart entity is explicitly allowed. In such 2122 cases, for obvious reasons, care must be taken to ensure that 2123 each nested multipart entity uses a different boundary 2124 delimiter. See Appendix C for an example of nested multipart 2125 entities. 2127 The use of the multipart Content-Type with only a single body 2128 part may be useful in certain contexts, and is explicitly 2129 permitted. 2131 The only mandatory global parameter for the multipart 2132 Content-Type is the boundary parameter, which consists of 1 to 2133 70 characters from a set of characters known to be very robust 2134 through email gateways, and NOT ending with white space. (If a 2135 boundary appears to end with white space, the white space must 2136 be presumed to have been added by a gateway, and must be 2137 deleted.) It is formally specified by the following BNF: 2139 boundary := 0*69 bcharsnospace 2141 bchars := bcharsnospace / " " 2143 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / 2144 "+" / "_" / "," / "-" / "." / 2145 "/" / ":" / "=" / "?" 2147 Overall, the body of a multipart entity may be specified as 2148 follows: 2150 dash-boundary := "--" boundary 2151 ; boundary taken from Content-Type 2152 ; field. 2154 multipart-body := preamble dash-boundary 2155 [*LWSP-char] CRLF 2156 body-part *encapsulation 2157 close-delimiter [*LWSP-char] 2158 CRLF epilogue 2160 encapsulation := delimiter [*LWSP-char] 2161 CRLF body-part 2163 delimiter := CRLF dash-boundary 2165 close-delimiter := CRLF dash-boundary "--" 2167 preamble := discard-text 2169 epilogue := discard-text 2171 discard-text := *text *(*text CRLF) 2172 ; To be ignored upon receipt. 2174 body-part := <"message" as defined in RFC 822, with all 2175 header fields optional, not starting with the 2176 specified dash-boundary, and with the 2177 delimiter not occurring anywhere in the 2178 message body. Note that the semantics of a 2179 part differ from the semantics of a message, 2180 as described in the text.> 2182 IMPORTANT NOTE: The addition of LWSP between the elements 2183 shown in this BNF is NOT allowed since this BNF does not 2184 specify a structured header field. 2186 NOTE: In certain transport enclaves, RFC 822 restrictions 2187 such as the one that limits bodies to printable US-ASCII 2188 characters may not be in force. (That is, the transport 2189 domains may resemble standard Internet mail transport as 2190 specified in RFC 821 and assumed by RFC 822, but without 2191 certain restrictions.) The relaxation of these restrictions 2192 should be construed as locally extending the definition of 2193 bodies, for example to include octets outside of the US-ASCII 2194 range, as long as these extensions are supported by the 2195 transport and adequately documented in the Content-Transfer- 2196 Encoding header field. However, in no event are headers 2197 (either message headers or body-part headers) allowed to 2198 contain anything other than US-ASCII characters. 2200 NOTE: Conspicuously missing from the multipart type is a 2201 notion of structured, related body parts. In general, it 2202 seems premature to try to standardize interpart structure yet. 2203 It is recommended that those wishing to provide a more 2204 structured or integrated multipart messaging facility should 2205 define a subtype of multipart that is syntactically identical, 2206 but that always expects the inclusion of a distinguished part 2207 that can be used to specify the structure and integration of 2208 the other parts, probably referring to them by their Content- 2209 ID field. If this approach is used, other implementations 2210 will not recognize the new subtype, but will treat it as the 2211 primary subtype (multipart/mixed) and will thus be able to 2212 show the user the parts that are recognized. 2214 6.2.1.2. Handling Nested Messages and Multiparts 2216 The "message/rfc822" subtype defined in a subsequent section 2217 of this document has no terminating condition other than 2218 running out of data. Similarly, an improperly truncated 2219 multipart object may not have any terminating boundary marker, 2220 and does arise in practice due to mail system malfunctions. 2222 It is essential that such objects be handled correctly when 2223 they are themselves imbedded inside of another multipart 2224 structure. MIME implementations are therefore required to 2225 recognize outer level boundary markers at ANY level of inner 2226 nesting. It is not sufficient to only check for the next 2227 expected marker or other terminating condition. 2229 6.2.1.3. Mixed Subtype 2231 The "mixed" subtype of multipart is intended for use when the 2232 body parts are independent and need to be bundled in a 2233 particular order. Any multipart subtypes that an 2234 implementation does not recognize must be treated as being of 2235 subtype "mixed". 2237 6.2.1.4. Alternative Subtype 2239 The multipart/alternative type is syntactically identical to 2240 multipart/mixed, but the semantics are different. In 2241 particular, each of the parts is an "alternative" version of 2242 the same information. 2244 Systems should recognize that the content of the various parts 2245 are interchangeable. Systems should choose the "best" type 2246 based on the local environment and preferences, in some cases 2247 even through user interaction. As with multipart/mixed, the 2248 order of body parts is significant. In this case, the 2249 alternatives appear in an order of increasing faithfulness to 2250 the original content. In general, the best choice is the LAST 2251 part of a type supported by the recipient system's local 2252 environment. 2254 Multipart/alternative may be used, for example, to send mail 2255 in a fancy text format in such a way that it can easily be 2256 displayed anywhere: 2258 From: Nathaniel Borenstein 2259 To: Ned Freed 2260 Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST) 2261 Subject: Formatted text mail 2262 MIME-Version: 1.0 2263 Content-Type: multipart/alternative; boundary=boundary42 2265 --boundary42 2266 Content-Type: text/plain; charset=us-ascii 2268 ... plain text version of message goes here ... 2270 --boundary42 2271 Content-Type: text/enriched 2273 ... RFC 1563 text/enriched version of same message 2274 goes here ... 2276 --boundary42 2277 Content-Type: application/x-whatever 2279 ... fanciest version of same message goes here ... 2281 --boundary42-- 2283 In this example, users whose mail system understood the 2284 "application/x-whatever" format would see only the fancy 2285 version, while other users would see only the enriched or 2286 plain text version, depending on the capabilities of their 2287 system. 2289 In general, user agents that compose multipart/alternative 2290 entities must place the body parts in increasing order of 2291 preference, that is, with the preferred format last. For 2292 fancy text, the sending user agent should put the plainest 2293 format first and the richest format last. Receiving user 2294 agents should pick and display the last format they are 2295 capable of displaying. In the case where one of the 2296 alternatives is itself of type "multipart" and contains 2297 unrecognized sub-parts, the user agent may choose either to 2298 show that alternative, an earlier alternative, or both. 2300 NOTE: From an implementor's perspective, it might seem more 2301 sensible to reverse this ordering, and have the plainest 2302 alternative last. However, placing the plainest alternative 2303 first is the friendliest possible option when 2304 multipart/alternative entities are viewed using a non-MIME- 2305 conformant mail reader. While this approach does impose some 2306 burden on conformant mail readers, interoperability with older 2307 mail readers was deemed to be more important in this case. 2309 It may be the case that some user agents, if they can 2310 recognize more than one of the formats, will prefer to offer 2311 the user the choice of which format to view. This makes 2312 sense, for example, if mail includes both a nicely-formatted 2313 image version and an easily-edited text version. What is most 2314 critical, however, is that the user not automatically be shown 2315 multiple versions of the same data. Either the user should be 2316 shown the last recognized version or should be given the 2317 choice. 2319 NOTE ON THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE: 2320 Each part of a multipart/alternative entity represents the 2321 same data, but the mappings between the two are not 2322 necessarily without information loss. For example, 2323 information is lost when translating ODA to PostScript or 2324 plain text. It is recommended that each part should have a 2325 different Content-ID value in the case where the information 2326 content of the two parts is not identical. And when the 2327 information content is identical -- for example, where several 2328 parts of type "message/external-body" specify alternate ways 2329 to access the identical data -- the same Content-ID field 2330 value should be used, to optimize any caching mechanisms that 2331 might be present on the recipient's end. However, the 2332 Content-ID values used by the parts should NOT be the same 2333 Content-ID value that describes the multipart/alternative as a 2334 whole, if there is any such Content-ID field. That is, one 2335 Content-ID value will refer to the multipart/alternative 2336 entity, while one or more other Content-ID values will refer 2337 to the parts inside it. 2339 6.2.1.5. Digest Subtype 2341 This document defines a "digest" subtype of the multipart 2342 Content-Type. This type is syntactically identical to 2343 multipart/mixed, but the semantics are different. In 2344 particular, in a digest, the default Content-Type value for a 2345 body part is changed from "text/plain" to "message/rfc822". 2346 This is done to allow a more readable digest format that is 2347 largely compatible (except for the quoting convention) with 2348 RFC 934. 2350 A digest in this format might, then, look something like this: 2352 From: Moderator-Address 2353 To: Recipient-List 2354 Date: Mon, 22 Mar 1994 13:34:51 +0000 2355 Subject: Internet Digest, volume 42 2356 MIME-Version: 1.0 2357 Content-Type: multipart/digest; 2358 boundary="---- next message ----" 2360 ------ next message ---- 2362 From: someone-else 2363 Date: Fri, 26 Mar 1993 11:13:32 +0200 2364 Subject: my opinion 2366 ...body goes here ... 2368 ------ next message ---- 2370 From: someone-else-again 2371 Date: Fri, 26 Mar 1993 10:07:13 -0500 2372 Subject: my different opinion 2374 ... another body goes here ... 2376 ------ next message ------ 2378 6.2.1.6. Parallel Subtype 2380 This document defines a "parallel" subtype of the multipart 2381 Content-Type. This type is syntactically identical to 2382 multipart/mixed, but the semantics are different. In 2383 particular, in a parallel entity, the order of body parts is 2384 not significant. 2386 A common presentation of this type is to display all of the 2387 parts simultaneously on hardware and software that are capable 2388 of doing so. However, composing agents should be aware that 2389 many mail readers will lack this capability and will show the 2390 parts serially in any event. 2392 6.2.1.7. Other Multipart Subtypes 2394 Other multipart subtypes are expected in the future. MIME 2395 implementations must in general treat unrecognized subtypes of 2396 multipart as being equivalent to "multipart/mixed". 2398 6.2.2. Message Content-Type 2400 It is frequently desirable, in sending mail, to encapsulate 2401 another mail message. A special Content-Type, "message", is 2402 defined to facilitate this. In particular, the "rfc822" 2403 subtype of "message" is used to encapsulate RFC 822 messages. 2405 NOTE: It has been suggested that subtypes of message might be 2406 defined for forwarded or rejected messages. However, 2407 forwarded and rejected messages can be handled as multipart 2408 messages in which the first part contains any control or 2409 descriptive information, and a second part, of type 2410 message/rfc822, is the forwarded or rejected message. 2411 Composing rejection and forwarding messages in this manner 2412 will preserve the type information on the original message and 2413 allow it to be correctly presented to the recipient, and hence 2414 is strongly encouraged. 2416 Subtypes of message often impose restrictions on what 2417 encodings are allowed. These restrictions are described in 2418 conjunction with each specific subtype. 2420 Mail gateways, relays, and other mail handling agents are 2421 commonly known to alter the top-level header of an RFC 822 2422 message. In particular, they frequently add, remove, or 2423 reorder header fields. Such alterations are explicitly 2424 forbidden for the encapsulated headers embedded in the bodies 2425 of messages of type "message." 2427 6.2.2.1. RFC822 Subtype 2429 A Content-Type of "message/rfc822" indicates that the body 2430 contains an encapsulated message, with the syntax of an RFC 2431 822 message. However, unlike top-level RFC 822 messages, the 2432 restriction that each message/rfc822 body must include a 2433 "From", "Date", and at least one destination header is removed 2434 and replaced with the requirement that at least one of "From", 2435 "Subject", or "Date" must be present. 2437 No encoding other than "7bit", "8bit", or "binary" is 2438 permitted for parts of type "message/rfc822". The message 2439 header fields are always US-ASCII in any case, and data within 2440 the body can still be encoded, in which case the Content- 2441 Transfer-Encoding header field in the encapsulated message 2442 will reflect this. Non-US-ASCII text in the headers of an 2443 encapsulated message can be specified using the mechanisms 2444 described in RFC MIME-HEADERS. 2446 It should be noted that, despite the use of the numbers "822", 2447 a message/rfc822 entity can include enhanced information as 2448 defined in this document. In other words, a message/rfc822 2449 message may be a MIME message. 2451 6.2.2.2. Partial Subtype 2453 The "partial" subtype is defined to allow large entities to be 2454 delivered as several separate pieces of mail and automatically 2455 reassembled by the receiving user agent. (The concept is 2456 similar to IP fragmentation and reassembly in the basic 2457 Internet Protocols.) This mechanism can be used when 2458 intermediate transport agents limit the size of individual 2459 messages that can be sent. Content-Type "message/partial" 2460 thus indicates that the body contains a fragment of a larger 2461 message. 2463 Three parameters must be specified in the Content-Type field 2464 of type message/partial: The first, "id", is a unique 2465 identifier, as close to a world-unique identifier as possible, 2466 to be used to match the parts together. (In general, the 2467 identifier is essentially a message-id; if placed in double 2468 quotes, it can be ANY message-id, in accordance with the BNF 2469 for "parameter" given earlier in this specification.) The 2470 second, "number", an integer, is the part number, which 2471 indicates where this part fits into the sequence of fragments. 2472 The third, "total", another integer, is the total number of 2473 parts. This third subfield is required on the final part, and 2474 is optional (though encouraged) on the earlier parts. Note 2475 also that these parameters may be given in any order. 2477 Thus, part 2 of a 3-part message may have either of the 2478 following header fields: 2480 Content-Type: Message/Partial; number=2; total=3; 2481 id="oc=jpbe0M2Yt4s@thumper.bellcore.com" 2483 Content-Type: Message/Partial; 2484 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; 2485 number=2 2487 But part 3 MUST specify the total number of parts: 2489 Content-Type: Message/Partial; number=3; total=3; 2490 id="oc=jpbe0M2Yt4s@thumper.bellcore.com" 2492 Note that part numbering begins with 1, not 0. 2494 When the parts of a message broken up in this manner are put 2495 together, the result is a complete MIME entity, which may have 2496 its own Content-Type header field, and thus may contain any 2497 other data type. 2499 6.2.2.2.1. Message Fragmentation and Reassembly 2501 The semantics of a reassembled partial message must be those 2502 of the "inner" message, rather than of a message containing 2503 the inner message. This makes it possible, for example, to 2504 send a large audio message as several partial messages, and 2505 still have it appear to the recipient as a simple audio 2506 message rather than as an encapsulated message containing an 2507 audio message. That is, the encapsulation of the message is 2508 considered to be "transparent". 2510 When generating and reassembling the parts of a 2511 message/partial message, the headers of the encapsulated 2512 message must be merged with the headers of the enclosing 2513 entities. In this process the following rules must be 2514 observed: 2516 (1) All of the header fields from the initial enclosing 2517 entity (part one), except those that start with 2518 "Content-" and the specific header fields "Subject", 2519 "Message-ID", "Encrypted", and "MIME-Version", must be 2520 copied, in order, to the new message. 2522 (2) Only those header fields in the enclosed message which 2523 start with "Content-" and "Subject", "Message-ID", 2524 "Encrypted", and "MIME-Version" must be appended, in 2525 order, to the header fields of the new message. Any 2526 header fields in the enclosed message which do not 2527 start with "Content-" (except for "Message-ID", 2528 "Encrypted", and "MIME-Version") will be ignored. 2530 (3) All of the header fields from the second and any 2531 subsequent messages will be ignored. 2533 6.2.2.2.2. Fragmentation and Reassembly Example 2535 If an audio message is broken into two parts, the first part 2536 might look something like this: 2538 X-Weird-Header-1: Foo 2539 From: Bill@host.com 2540 To: joe@otherhost.com 2541 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 2542 Subject: Audio mail (part 1 of 2) 2543 Message-ID: 2544 MIME-Version: 1.0 2545 Content-type: message/partial; id="ABC@host.com"; 2546 number=1; total=2 2548 X-Weird-Header-1: Bar 2549 X-Weird-Header-2: Hello 2550 Message-ID: 2551 Subject: Audio mail 2552 MIME-Version: 1.0 2553 Content-type: audio/basic 2554 Content-transfer-encoding: base64 2556 ... first half of encoded audio data goes here ... 2558 and the second half might look something like this: 2560 From: Bill@host.com 2561 To: joe@otherhost.com 2562 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 2563 Subject: Audio mail (part 2 of 2) 2564 MIME-Version: 1.0 2565 Message-ID: 2566 Content-type: message/partial; 2567 id="ABC@host.com"; number=2; total=2 2569 ... second half of encoded audio data goes here ... 2571 Then, when the fragmented message is reassembled, the 2572 resulting message to be displayed to the user should look 2573 something like this: 2575 X-Weird-Header-1: Foo 2576 From: Bill@host.com 2577 To: joe@otherhost.com 2578 Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) 2579 Subject: Audio mail 2580 Message-ID: 2581 MIME-Version: 1.0 2582 Content-type: audio/basic 2583 Content-transfer-encoding: base64 2585 ... first half of encoded audio data goes here ... 2586 ... second half of encoded audio data goes here ... 2588 Because data of type "message" may never be encoded in base64 2589 or quoted-printable, a problem might arise if message/partial 2590 entities are constructed in an environment that supports 2591 binary or 8-bit transport. The problem is that the binary 2592 data would be split into multiple message/partial messages, 2593 each of them requiring binary transport. If such messages 2594 were encountered at a gateway into a 7-bit transport 2595 environment, there would be no way to properly encode them for 2596 the 7-bit world, aside from waiting for all of the fragments, 2597 reassembling the inner message, and then encoding the 2598 reassembled data in base64 or quoted-printable. Since it is 2599 possible that different fragments might go through different 2600 gateways, even this is not an acceptable solution. For this 2601 reason, it is specified that MIME entities of type 2602 message/partial must always have a content-transfer-encoding 2603 of 7-bit (the default). In particular, even in environments 2604 that support binary or 8-bit transport, the use of a content- 2605 transfer-encoding of "8bit" or "binary" is explicitly 2606 prohibited for entities of type message/partial. 2608 Because some message transfer agents may choose to 2609 automatically fragment large messages, and because such agents 2610 may use very different fragmentation thresholds, it is 2611 possible that the pieces of a partial message, upon 2612 reassembly, may prove themselves to comprise a partial 2613 message. This is explicitly permitted. 2615 The inclusion of a "References" field in the headers of the 2616 second and subsequent pieces of a fragmented message that 2617 references the Message-Id on the previous piece may be of 2618 benefit to mail readers that understand and track references. 2619 However, the generation of such "References" fields is 2620 entirely optional. 2622 Finally, it should be noted that the "Encrypted" header field 2623 has been made obsolete by Privacy Enhanced Messaging (PEM) 2624 [RFC1421, RFC1422, RFC1423, and RFC1424], but the rules above 2625 are nevertheless believed to describe the correct way to treat 2626 it if it is encountered in the context of conversion to and 2627 from message/partial fragments. 2629 6.2.2.3. External-Body Subtype 2631 The external-body subtype indicates that the actual body data 2632 are not included, but merely referenced. In this case, the 2633 parameters describe a mechanism for accessing the external 2634 data. 2636 When an entity is of type "message/external-body", it consists 2637 of a header, two consecutive CRLFs, and the message header for 2638 the encapsulated message. If another pair of consecutive 2639 CRLFs appears, this of course ends the message header for the 2640 encapsulated message. However, since the encapsulated 2641 message's body is itself external, it does NOT appear in the 2642 area that follows. For example, consider the following 2643 message: 2645 Content-type: message/external-body; 2646 access-type=local-file; 2647 name="/u/nsb/Me.gif" 2648 Content-type: image/gif 2649 Content-ID: 2650 Content-Transfer-Encoding: binary 2652 THIS IS NOT REALLY THE BODY! 2654 The area at the end, which might be called the "phantom body", 2655 is ignored for most external-body messages. However, it may 2656 be used to contain auxiliary information for some such 2657 messages, as indeed it is when the access-type is "mail- 2658 server". The only access-type defined in this document that 2659 uses the phantom body is "mail-server", but other access-types 2660 may be defined in the future in other documents that use this 2661 area. 2663 The encapsulated headers in ALL message/external-body entities 2664 MUST include a Content-ID header field to give a unique 2665 identifier by which to reference the data. This identifier 2666 may be used for caching mechanisms, and for recognizing the 2667 receipt of the data when the access-type is "mail-server". 2669 Note that, as specified here, the tokens that describe 2670 external-body data, such as file names and mail server 2671 commands, are required to be in the US-ASCII character set. 2672 If this proves problematic in practice, a new mechanism may be 2673 required as a future extension to MIME, either as newly 2674 defined access-types for message/external-body or by some 2675 other mechanism. 2677 As with message/partial, MIME entities of type 2678 message/external-body MUST have a content-transfer-encoding of 2679 7-bit (the default). In particular, even in environments that 2680 support binary or 8-bit transport, the use of a content- 2681 transfer-encoding of "8bit" or "binary" is explicitly 2682 prohibited for entities of type message/external-body. 2684 6.2.2.3.1. General External-Body Parameters 2686 The parameters that may be used with any message/external-body 2687 are: 2689 (1) ACCESS-TYPE -- A word indicating the supported access 2690 mechanism by which the file or data may be obtained. 2691 This word is not case sensitive. Values include, but 2692 are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL- 2693 FILE", and "MAIL-SERVER". Future values, except for 2694 experimental values beginning with "X-", must be 2695 registered with IANA, as described in RFC REG. This 2696 parameter is unconditionally mandatory and MUST be 2697 present on EVERY message/external-body. 2699 (2) EXPIRATION -- The date (in the RFC 822 "date-time" 2700 syntax, as extended by RFC 1123 to permit 4 digits in 2701 the year field) after which the existence of the 2702 external data is not guaranteed. This parameter may be 2703 used with ANY access-type and is ALWAYS optional. 2705 (3) SIZE -- The size (in octets) of the data. The intent 2706 of this parameter is to help the recipient decide 2707 whether or not to expend the necessary resources to 2708 retrieve the external data. Note that this describes 2709 the size of the data in its canonical form, that is, 2710 before any Content-Transfer-Encoding has been applied 2711 or after the data have been decoded. This parameter 2712 may be used with ANY access-type and is ALWAYS 2713 optional. 2715 (4) PERMISSION -- A case-insensitive field that indicates 2716 whether or not it is expected that clients might also 2717 attempt to overwrite the data. By default, or if 2718 permission is "read", the assumption is that they are 2719 not, and that if the data is retrieved once, it is 2720 never needed again. If PERMISSION is "read-write", 2721 this assumption is invalid, and any local copy must be 2722 considered no more than a cache. "Read" and "Read- 2723 write" are the only defined values of permission. This 2724 parameter may be used with ANY access-type and is 2725 ALWAYS optional. 2727 The precise semantics of the access-types defined here are 2728 described in the sections that follow. 2730 6.2.2.3.2. The 'ftp' and 'tftp' Access-Types 2732 An access-type of FTP or TFTP indicates that the message body 2733 is accessible as a file using the FTP [RFC-959] or TFTP [RFC- 2734 783] protocols, respectively. For these access-types, the 2735 following additional parameters are mandatory: 2737 (1) NAME -- The name of the file that contains the actual 2738 body data. 2740 (2) SITE -- A machine from which the file may be obtained, 2741 using the given protocol. This must be a fully 2742 qualified domain name, not a nickname. 2744 (3) Before any data are retrieved, using FTP, the user will 2745 generally need to be asked to provide a login id and a 2746 password for the machine named by the site parameter. 2747 For security reasons, such an id and password are not 2748 specified as content-type parameters, but must be 2749 obtained from the user. 2751 In addition, the following parameters are optional: 2753 (1) DIRECTORY -- A directory from which the data named by 2754 NAME should be retrieved. 2756 (2) MODE -- A case-insensitive string indicating the mode 2757 to be used when retrieving the information. The valid 2758 values for access-type "TFTP" are "NETASCII", "OCTET", 2759 and "MAIL", as specified by the TFTP protocol [RFC- 2760 783]. The valid values for access-type "FTP" are 2761 "ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a 2762 decimal integer, typically 8. These correspond to the 2763 representation types "A" "E" "I" and "L n" as specified 2764 by the FTP protocol [RFC-959]. Note that "BINARY" and 2765 "TENEX" are not valid values for MODE and that "OCTET" 2766 or "IMAGE" or "LOCAL8" should be used instead. IF MODE 2767 is not specified, the default value is "NETASCII" for 2768 TFTP and "ASCII" otherwise. 2770 6.2.2.3.3. The 'anon-ftp' Access-Type 2772 The "anon-ftp" access-type is identical to the "ftp" access 2773 type, except that the user need not be asked to provide a name 2774 and password for the specified site. Instead, the ftp 2775 protocol will be used with login "anonymous" and a password 2776 that corresponds to the user's email address. 2778 6.2.2.3.4. The 'local-file' Access-Type 2780 An access-type of "local-file" indicates that the actual body 2781 is accessible as a file on the local machine. Two additional 2782 parameters are defined for this access type: 2784 (1) NAME -- The name of the file that contains the actual 2785 body data. This parameter is mandatory for the "local- 2786 file" access-type. 2788 (2) SITE -- A domain specifier for a machine or set of 2789 machines that are known to have access to the data 2790 file. This optional parameter is used to describe the 2791 locality of reference for the data, that is, the site 2792 or sites at which the file is expected to be visible. 2793 Asterisks may be used for wildcard matching to a part 2794 of a domain name, such as "*.bellcore.com", to indicate 2795 a set of machines on which the data should be directly 2796 visible, while a single asterisk may be used to 2797 indicate a file that is expected to be universally 2798 available, e.g., via a global file system. 2800 6.2.2.3.5. The 'mail-server' Access-Type 2802 The "mail-server" access-type indicates that the actual body 2803 is available from a mail server. Two additional parameters 2804 are defined for this access-type: 2806 (1) SERVER -- The email address of the mail server from 2807 which the actual body data can be obtained. This 2808 parameter is mandatory for the "mail-server" access- 2809 type. 2811 (2) SUBJECT -- The subject that is to be used in the mail 2812 that is sent to obtain the data. Note that keying mail 2813 servers on Subject lines is NOT recommended, but such 2814 mail servers are known to exist. This is an optional 2815 parameter. 2817 Because mail servers accept a variety of syntaxes, some of 2818 which is multiline, the full command to be sent to a mail 2819 server is not included as a parameter on the content-type 2820 line. Instead, it is provided as the "phantom body" when the 2821 content-type is message/external-body and the access-type is 2822 mail-server. 2824 Note that MIME does not define a mail server syntax. Rather, 2825 it allows the inclusion of arbitrary mail server commands in 2826 the phantom body. Implementations must include the phantom 2827 body in the body of the message it sends to the mail server 2828 address to retrieve the relevant data. 2830 Unlike other access-types, mail-server access is asynchronous 2831 and will happen at an unpredictable time in the future. For 2832 this reason, it is important that there be a mechanism by 2833 which the returned data can be matched up with the original 2834 message/external-body entity. MIME mailservers must use the 2835 same Content-ID field on the returned message that was used in 2836 the original message/external-body entity, to facilitate such 2837 matching. 2839 6.2.2.3.6. Examples and Further Explanations 2841 When the external-body mechanism is used in conjunction with 2842 the multipart/alternative Content-Type it extends the 2843 functionality of multipart/alternative to include the case 2844 where the same object is provided in the same format but via 2845 different accces mechanisms. When this is done the originator 2846 of the message must order the part first in terms of preferred 2847 formats and then by preferred access mechanisms. The 2848 recipient's viewer should then evaluate the list both in terms 2849 of format and access mechanisms. 2851 With the emerging possibility of very wide-area file systems, 2852 it becomes very hard to know in advance the set of machines 2853 where a file will and will not be accessible directly from the 2854 file system. Therefore it may make sense to provide both a 2855 file name, to be tried directly, and the name of one or more 2856 sites from which the file is known to be accessible. An 2857 implementation can try to retrieve remote files using FTP or 2858 any other protocol, using anonymous file retrieval or 2859 prompting the user for the necessary name and password. If an 2860 external body is accessible via multiple mechanisms, the 2861 sender may include multiple parts of type message/external- 2862 body within an entity of type multipart/alternative. 2864 However, the external-body mechanism is not intended to be 2865 limited to file retrieval, as shown by the mail-server 2866 access-type. Beyond this, one can imagine, for example, using 2867 a video server for external references to video clips. 2869 The embedded message header fields which appear in the body of 2870 the message/external-body data must be used to declare the 2871 Content-type of the external body if it is anything other than 2872 plain US-ASCII text, since the external body does not have a 2873 header section to declare its type. Similarly, any Content- 2874 transfer-encoding other than "7bit" must also be declared 2875 here. Thus a complete message/external-body message, 2876 referring to a document in PostScript format, might look like 2877 this: 2879 From: Whomever 2880 To: Someone 2881 Date: Whenever 2882 Subject: whatever 2883 MIME-Version: 1.0 2884 Message-ID: 2885 Content-Type: multipart/alternative; boundary=42 2886 Content-ID: 2888 --42 2889 Content-Type: message/external-body; name="BodyFormats.ps"; 2890 site="thumper.bellcore.com"; mode="image"; 2891 access-type=ANON-FTP; directory="pub"; 2892 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2894 Content-type: application/postscript 2895 Content-ID: 2897 --42 2898 Content-Type: message/external-body; access-type=local-file; 2899 name="/u/nsb/writing/rfcs/RFC-MIME.ps"; 2900 site="thumper.bellcore.com"; 2901 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2903 Content-type: application/postscript 2904 Content-ID: 2906 --42 2907 Content-Type: message/external-body; 2908 access-type=mail-server 2909 server="listserv@bogus.bitnet"; 2910 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2912 Content-type: application/postscript 2913 Content-ID: 2915 get RFC-MIME.DOC 2917 --42-- 2919 Note that in the above examples, the default Content- 2920 transfer-encoding of "7bit" is assumed for the external 2921 postscript data. 2923 Like the message/partial type, the message/external-body type 2924 is intended to be transparent, that is, to convey the data 2925 type in the external body rather than to convey a message with 2926 a body of that type. Thus the headers on the outer and inner 2927 parts must be merged using the same rules as for 2928 message/partial. In particular, this means that the Content- 2929 type header is overridden, but the From and Subject headers 2930 are preserved. 2932 Note that since the external bodies are not transported as 2933 mail, they need not conform to the 7-bit and line length 2934 requirements, but might in fact be binary files. Thus a 2935 Content-Transfer-Encoding is not generally necessary, though 2936 it is permitted. 2938 Note that the body of a message of type "message/external- 2939 body" is governed by the basic syntax for an RFC 822 message. 2940 In particular, anything before the first consecutive pair of 2941 CRLFs is header information, while anything after it is body 2942 information, which is ignored for most access-types. 2944 6.2.2.4. Other Message Subtypes 2946 MIME implementations must in general treat unrecognized 2947 subtypes of message as being equivalent to 2948 "application/octet-stream". 2950 7. Experimental Content-Type Values 2952 A Content-Type value beginning with the characters "X-" is a 2953 private value, to be used by consenting mail systems by mutual 2954 agreement. Any format without a rigorous and public 2955 definition must be named with an "X-" prefix, and publicly 2956 specified values shall never begin with "X-". (Older versions 2957 of the widely used Andrew system use the "X-BE2" name, so new 2958 systems should probably choose a different name.) 2960 In general, the use of "X-" top-level types is strongly 2961 discouraged. Implementors should invent subtypes of the 2962 existing types whenever possible. The invention of new types 2963 is intended to be restricted primarily to the development of 2964 new media types for email, such as digital odors or 2965 holography, and not for new data formats in general. In many 2966 cases, a subtype of application will be more appropriate than 2967 a new top-level type. 2969 8. Summary 2971 Using the MIME-Version, Content-Type, and Content-Transfer- 2972 Encoding header fields, it is possible to include, in a 2973 standardized way, arbitrary types of data objects with RFC 822 2974 conformant mail messages. No restrictions imposed by either 2975 RFC 821 or RFC 822 are violated, and care has been taken to 2976 avoid problems caused by additional restrictions imposed by 2977 the characteristics of some Internet mail transport mechanisms 2978 (see Appendix B). The "multipart" and "message" Content-Types 2979 allow mixing and hierarchical structuring of objects of 2980 different types in a single message. Further Content-Types 2981 provide a standardized mechanism for tagging messages or body 2982 parts as audio, image, or several other kinds of data. A 2983 distinguished parameter syntax allows further specification of 2984 data format details, particularly the specification of 2985 alternate character sets. Additional optional header fields 2986 provide mechanisms for certain extensions deemed desirable by 2987 many implementors. Finally, a number of useful Content-Types 2988 are defined for general use by consenting user agents, notably 2989 message/partial, and message/external-body. 2991 9. Security Considerations 2993 Security issues are discussed in the context of the 2994 application/postscript type and in Appendix E. Implementors 2995 should pay special attention to the security implications of 2996 any mail content-types that can cause the remote execution of 2997 any actions in the recipient's environment. In such cases, 2998 the discussion of the application/postscript type may serve as 2999 a model for considering other content-types with remote 3000 execution capabilities. 3002 10. Authors' Addresses 3004 For more information, the authors of this document may be 3005 contacted via Internet mail: 3007 Nathaniel S. Borenstein 3008 First Virtual Holdings 3009 25 Washington Avenue 3010 Morristown, NJ 07960 3011 USA 3013 Email: nsb@nsb.fv.com 3014 Phone: +1 201 540 8967 3015 Fax: +1 201 993 3032 3017 Ned Freed 3018 Innosoft International, Inc. 3019 1050 East Garvey Avenue South 3020 West Covina, CA 91790 3021 USA 3023 Email: ned@innosoft.com 3024 Phone: +1 818 919 3600 3025 Fax: +1 818919 3614 3027 MIME is a result of the work of the Internet Engineering Task 3028 Force Working Group on Email Extensions. The chairman of that 3029 group, Greg Vaudreuil, may be reached at: 3031 Gregory M. Vaudreuil 3032 Tigon Corporation 3033 17060 Dallas Parkway 3034 Dallas Texas, 75248 3036 Email: greg.vaudreuil@ons.octel.com 3037 Phone: +1 214 733 2722 3038 11. Acknowledgements 3040 This document is the result of the collective effort of a 3041 large number of people, at several IETF meetings, on the 3042 IETF-SMTP and IETF-822 mailing lists, and elsewhere. Although 3043 any enumeration seems doomed to suffer from egregious 3044 omissions, the following are among the many contributors to 3045 this effort: 3047 Harald Tveit Alvestrand Marc Andreessen 3048 Randall Atkinson Bob Braden 3049 Philippe Brandon Brian Capouch 3050 Kevin Carosso Uhhyung Choi 3051 Peter Clitherow Dave Collier-Brown 3052 Cristian Constantinof John Coonrod 3053 Mark Crispin Dave Crocker 3054 Stephen Crocker Terry Crowley 3055 Walt Daniels Jim Davis 3056 Frank Dawson Axel Deininger 3057 Hitoshi Doi Kevin Donnelly 3058 Steve Dorner Keith Edwards 3059 Chris Eich Dana S. Emery 3060 Johnny Eriksson Craig Everhart 3061 Patrik Faltstrom Erik E. Fair 3062 Roger Fajman Alain Fontaine 3063 Martin Forssen James M. Galvin 3064 Stephen Gildea Philip Gladstone 3065 Thomas Gordon Keld Simonsen 3066 Terry Gray Phill Gross 3067 James Hamilton David Herron 3068 Mark Horton Bruce Howard 3069 Bill Janssen Olle Jarnefors 3070 Risto Kankkunen Phil Karn 3071 Alan Katz Tim Kehres 3072 Neil Katin Steve Kille 3073 Kyuho Kim Anders Klemets 3074 John Klensin Valdis Kletniek 3075 Jim Knowles Stev Knowles 3076 Bob Kummerfeld Pekka Kytolaakso 3077 Stellan Lagerstrom Vincent Lau 3078 Timo Lehtinen Donald Lindsay 3079 Warner Losh Carlyn Lowery 3080 Laurence Lundblade Charles Lynn 3081 John R. MacMillan Larry Masinter 3082 Rick McGowan Michael J. McInerny 3083 Leo Mclaughlin Goli Montaser-Kohsari 3084 Keith Moore Tom Moore 3085 Erik Naggum Mark Needleman 3086 John Noerenberg Mats Ohrman 3087 Julian Onions Michael Patton 3088 David J. Pepper Erik van der Poel 3089 Jon Postel Blake C. Ramsdell 3090 Christer Romson Luc Rooijakkers 3091 Marshall T. Rose Jonathan Rosenberg 3092 Guido van Rossum Jan Rynning 3093 Harri Salminen Michael Sanderson 3094 Yutaka Sato Markku Savela 3095 Richard Alan Schafer Masahiro Sekiguchi 3096 Mark Sherman Bob Smart 3097 Peter Speck Henry Spencer 3098 Einar Stefferud Michael Stein 3099 Klaus Steinberger Peter Svanberg 3100 James Thompson Steve Uhler 3101 Stuart Vance Peter Vanderbilt 3102 Greg Vaudreuil Ed Vielmetti 3103 Larry W. Virden Ryan Waldron 3104 Rhys Weatherly Jay Weber 3105 Dave Wecker Wally Wedel 3106 Sven-Ove Westberg Brian Wideen 3107 John Wobus Glenn Wright 3108 Rayan Zachariassen David Zimmerman 3110 The authors apologize for any omissions from this list, which 3111 are certainly unintentional. 3113 Appendix A -- MIME Conformance 3115 The mechanisms described in this document are open-ended. It 3116 is definitely not expected that all implementations will 3117 support all of the Content-Types described, nor that they will 3118 all share the same extensions. In order to promote 3119 interoperability, however, it is useful to define the concept 3120 of "MIME-conformance" to define a certain level of 3121 implementation that allows the useful interworking of messages 3122 with content that differs from US-ASCII text. In this 3123 section, we specify the requirements for such conformance. 3125 A mail user agent that is MIME-conformant MUST: 3127 (1) Always generate a "MIME-Version: 1.0" header field. 3129 (2) Recognize the Content-Transfer-Encoding header field 3130 and decode all received data encoded with either the 3131 quoted-printable or base64 implementations. Any non-7- 3132 bit data that is sent without encoding must be properly 3133 labelled with a content-transfer-encoding of 8bit or 3134 binary, as appropriate. If the underlying transport 3135 does not support 8bit or binary (as SMTP [RFC821] does 3136 not), the snder is required to both encode and label 3137 data using an appropriate Content-Transfer-Encoding 3138 such as quoted-printable or base64. 3140 (3) Recognize and interpret the Content-Type header field, 3141 and avoid showing users raw data with a Content-Type 3142 field other than text. Be able to send at least 3143 text/plain messages, with the character set specified 3144 as a parameter if it is not US-ASCII. 3146 (4) Explicitly handle the following Content-Type values, to 3147 at least the following extents: 3149 Text: 3151 -- Recognize and display "text" mail with the 3152 character set "US-ASCII." 3153 -- Recognize other character sets at least to the 3154 extent of being able to inform the user about what 3155 character set the message uses. 3157 -- Recognize the "ISO-8859-*" character sets to the 3158 extent of being able to display those characters that 3159 are common to ISO-8859-* and US-ASCII, namely all 3160 characters represented by octet values 0-127. 3162 -- For unrecognized subtypes in a known character 3163 set, show or offer to show the user the "raw" version 3164 of the data after conversion of the content from 3165 canonical form to local form. 3167 -- Treat material in an unknown character set as if 3168 it were "application/octet-stream". 3170 Image, audio, and video: 3172 -- At a minumum provide facilities to Treat any 3173 unrecognized subtypes as if they were 3174 "application/octet-stream". 3176 Application: 3178 -- Offer the ability to remove either of the quoted- 3179 printable or base64 encodings defined in this 3180 document if they were used and put the resulting 3181 information in a user file. 3183 Multipart: 3185 -- Recognize the mixed subtype. Display all relevant 3186 information on the message level and the body part 3187 header level and then display or offer to display 3188 each of the body parts individually. 3190 -- Recognize the "alternative" subtype, and avoid 3191 showing the user redundant parts of 3192 multipart/alternative mail. 3194 -- Recognize the "multipart/digest" subtype, 3195 specifically using "message/rfc822" rather than 3196 "text/plain" as the default content-type for 3197 encapsulations inside "multipart/digest" entities. 3199 -- Treat any unrecognized subtypes as if they were 3200 "mixed". 3202 Message: 3204 -- Recognize and display at least the primary 3205 (RFC822) encapsulation in such a way as to preserve 3206 any recursive structure, that is, displaying or 3207 offering to display the encapsulated data in 3208 accordance with its Content-type. 3210 -- Treat any unrecognized subtypes as if they were 3211 "application/octet-stream". 3213 (5) Upon encountering any unrecognized Content-Type, an 3214 implementation must treat it as if it had a Content- 3215 Type of "application/octet-stream" with no parameter 3216 sub-arguments. How such data are handled is up to an 3217 implementation, but likely options for handling such 3218 unrecognized data include offering the user to write it 3219 into a file (decoded from its mail transport format) or 3220 offering the user to name a program to which the 3221 decoded data should be passed as input. 3223 A user agent that meets the above conditions is said to be 3224 MIME-conformant. The meaning of this phrase is that it is 3225 assumed to be "safe" to send virtually any kind of properly- 3226 marked data to users of such mail systems, because such 3227 systems will at least be able to treat the data as 3228 undifferentiated binary, and will not simply splash it onto 3229 the screen of unsuspecting users. 3231 There is another sense in which it is always "safe" to send 3232 data in a format that is MIME-conformant, which is that such 3233 data will not break or be broken by any known systems that are 3234 conformant with RFC 821 and RFC 822. User agents that are 3235 MIME-conformant have the additional guarantee that the user 3236 will not be shown data that were never intended to be viewed 3237 as text. 3239 Appendix B -- Guidelines For Sending Email Data 3241 Internet email is not a perfect, homogeneous system. Mail may 3242 become corrupted at several stages in its travel to a final 3243 destination. Specifically, email sent throughout the Internet 3244 may travel across many networking technologies. Many 3245 networking and mail technologies do not support the full 3246 functionality possible in the SMTP transport environment. 3247 Mail traversing these systems is likely to be modified in such 3248 a way that it can be transported. 3250 There exist many widely-deployed non-conformant MTAs in the 3251 Internet. These MTAs, speaking the SMTP protocol, alter 3252 messages on the fly to take advantage of the internal data 3253 structure of the hosts they are implemented on, or are just 3254 plain broken. 3256 The following guidelines may be useful to anyone devising a 3257 data format (Content-Type) that will survive the widest range 3258 of networking technologies and known broken MTAs unscathed. 3259 Note that anything encoded in the base64 encoding will satisfy 3260 these rules, but that some well-known mechanisms, notably the 3261 UNIX uuencode facility, will not. Note also that anything 3262 encoded in the Quoted-Printable encoding will survive most 3263 gateways intact, but possibly not some gateways to systems 3264 that use the EBCDIC character set. 3266 (1) Under some circumstances the encoding used for data may 3267 change as part of normal gateway or user agent 3268 operation. In particular, conversion from base64 to 3269 quoted-printable and vice versa may be necessary. This 3270 may result in the confusion of CRLF sequences with line 3271 breaks in text bodies. As such, the persistence of 3272 CRLF as something other than a line break must not be 3273 relied on. 3275 (2) Many systems may elect to represent and store text data 3276 using local newline conventions. Local newline 3277 conventions may not match the RFC822 CRLF convention -- 3278 systems are known that use plain CR, plain LF, CRLF, or 3279 counted records. The result is that isolated CR and LF 3280 characters are not well tolerated in general; they may 3281 be lost or converted to delimiters on some systems, and 3282 hence must not be relied on. 3284 (3) TAB (HT) characters may be misinterpreted or may be 3285 automatically converted to variable numbers of spaces. 3286 This is unavoidable in some environments, notably those 3287 not based on the US-ASCII character set. Such 3288 conversion is STRONGLY DISCOURAGED, but it may occur, 3289 and mail formats must not rely on the persistence of 3290 TAB (HT) characters. 3292 (4) Lines longer than 76 characters may be wrapped or 3293 truncated in some environments. Line wrapping and line 3294 truncation are STRONGLY DISCOURAGED, but unavoidable in 3295 some cases. Applications which require long lines must 3296 somehow differentiate between soft and hard line 3297 breaks. (A simple way to do this is to use the 3298 quoted-printable encoding.) 3300 (5) Trailing "white space" characters (SPACE, TAB (HT)) on 3301 a line may be discarded by some transport agents, while 3302 other transport agents may pad lines with these 3303 characters so that all lines in a mail file are of 3304 equal length. The persistence of trailing white space, 3305 therefore, must not be relied on. 3307 (6) Many mail domains use variations on the US-ASCII 3308 character set, or use character sets such as EBCDIC 3309 which contain most but not all of the US-ASCII 3310 characters. The correct translation of characters not 3311 in the "invariant" set cannot be depended on across 3312 character converting gateways. For example, this 3313 situation is a problem when sending uuencoded 3314 information across BITNET, an EBCDIC system. Similar 3315 problems can occur without crossing a gateway, since 3316 many Internet hosts use character sets other than US- 3317 ASCII internally. The definition of Printable Strings 3318 in X.400 adds further restrictions in certain special 3319 cases. In particular, the only characters that are 3320 known to be consistent across all gateways are the 73 3321 characters that correspond to the upper and lower case 3322 letters A-Z and a-z, the 10 digits 0-9, and the 3323 following eleven special characters: 3325 "'" (US-ASCII decimal value 39) 3326 "(" (US-ASCII decimal value 40) 3327 ")" (US-ASCII decimal value 41) 3328 "+" (US-ASCII decimal value 43) 3329 "," (US-ASCII decimal value 44) 3330 "-" (US-ASCII decimal value 45) 3331 "." (US-ASCII decimal value 46) 3332 "/" (US-ASCII decimal value 47) 3333 ":" (US-ASCII decimal value 58) 3334 "=" (US-ASCII decimal value 61) 3335 "?" (US-ASCII decimal value 63) 3337 A maximally portable mail representation, such as the 3338 base64 encoding, will confine itself to relatively 3339 short lines of text in which the only meaningful 3340 characters are taken from this set of 73 characters. 3342 (7) Some mail transport agents will corrupt data that 3343 includes certain literal strings. In particular, a 3344 period (".") alone on a line is known to be corrupted 3345 by some (incorrect) SMTP implementations, and a line 3346 that starts with the five characters "From " (the fifth 3347 character is a SPACE) are commonly corrupted as well. 3348 A careful composition agent can prevent these 3349 corruptions by encoding the data (e.g., in the quoted- 3350 printable encoding, "=46rom " in place of "From " at 3351 the start of a line, and "=2E" in place of "." alone on 3352 a line. 3354 Please note that the above list is NOT a list of recommended 3355 practices for MTAs. RFC 821 MTAs are prohibited from altering 3356 the character of white space or wrapping long lines. These 3357 BAD and invalid practices are known to occur on established 3358 networks, and implementations should be robust in dealing with 3359 the bad effects they can cause. 3361 Appendix C -- A Complex Multipart Example 3363 What follows is the outline of a complex multipart message. 3364 This message has five parts to be displayed serially: two 3365 introductory plain text parts, an embedded multipart message, 3366 a text/enriched part, and a closing encapsulated text message 3367 in a non-ASCII character set. The embedded multipart message 3368 has two parts to be displayed in parallel, a picture and an 3369 audio fragment. 3371 MIME-Version: 1.0 3372 From: Nathaniel Borenstein 3373 To: Ned Freed 3374 Date: Fri, 07 Oct 1994 16:15:05 -0700 (PDT) 3375 Subject: A multipart example 3376 Content-Type: multipart/mixed; 3377 boundary=unique-boundary-1 3379 This is the preamble area of a multipart message. 3380 Mail readers that understand multipart format 3381 should ignore this preamble. 3383 If you are reading this text, you might want to 3384 consider changing to a mail reader that understands 3385 how to properly display multipart messages. 3387 --unique-boundary-1 3389 ... Some text appears here ... 3391 [Note that the blank between the boundary and the start 3392 of the text in this part means no header fields were 3393 given and this is text in the US-ASCII character set. 3394 It could have been done with explicit typing as in the 3395 next part.] 3397 --unique-boundary-1 3398 Content-type: text/plain; charset=US-ASCII 3400 This could have been part of the previous part, but 3401 illustrates explicit versus implicit typing of body 3402 parts. 3404 --unique-boundary-1 3405 Content-Type: multipart/parallel; boundary=unique-boundary-2 3407 --unique-boundary-2 3408 Content-Type: audio/basic 3409 Content-Transfer-Encoding: base64 3411 ... base64-encoded 8000 Hz single-channel 3412 mu-law-format audio data goes here ... 3414 --unique-boundary-2 3415 Content-Type: image/gif 3416 Content-Transfer-Encoding: base64 3418 ... base64-encoded image data goes here ... 3420 --unique-boundary-2-- 3422 --unique-boundary-1 3423 Content-type: text/enriched 3425 This is enriched. 3426 as defined in RFC 1563 3428 Isn't it 3429 cool? 3431 --unique-boundary-1 3432 Content-Type: message/rfc822 3434 From: (mailbox in US-ASCII) 3435 To: (address in US-ASCII) 3436 Subject: (subject in US-ASCII) 3437 Content-Type: Text/plain; charset=ISO-8859-1 3438 Content-Transfer-Encoding: Quoted-printable 3440 ... Additional text in ISO-8859-1 goes here ... 3442 --unique-boundary-1-- 3443 Appendix D -- Collected Grammar 3445 This appendix contains the complete BNF grammar for all the 3446 syntax specified by this document. 3448 By itself, however, this grammar is incomplete. It refers to 3449 several entities that are defined by RFC 822. Rather than 3450 reproduce those definitions here, and risk unintentional 3451 differences between the two, this document simply refers the 3452 reader to RFC 822 for the remaining definitions. Wherever a 3453 term is undefined, it refers to the RFC 822 definition. 3455 attribute := token 3457 boundary := 0*69 bcharsnospace 3459 bchars := bcharsnospace / " " 3461 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / 3462 "+" / "_" / "," / "-" / "." / 3463 "/" / ":" / "=" / "?" 3465 body-part := <"message" as defined in RFC 822, with all 3466 header fields optional, not starting with the 3467 specified dash-boundary, and with the 3468 delimiter not occurring anywhere in the 3469 message body. Note that the semantics of a 3470 part differ from the semantics of a message, 3471 as described in the text.> 3473 close-delimiter := CRLF dash-boundary "--" 3475 composite-type := "message" / "multipart" / extension-token 3477 content := "Content-Type" ":" type "/" subtype 3478 *(";" parameter) 3479 ; Matching of type and subtype is 3480 ; ALWAYS case-insensitive 3482 dash-boundary := "--" boundary 3483 ; boundary taken from Content-Type 3484 ; field. 3486 delimiter := CRLF dash-boundary 3488 description := "Content-Description" ":" *text 3490 discard-text := *(*text CRLF) 3491 ; To be ignored upon receipt. 3493 discrete-type := "text" / "image" / "audio" / "video" / 3494 "application" / extension-token 3496 encapsulation := delimiter [*LWSP-char] 3497 CRLF body-part 3499 encoding := "Content-Transfer-Encoding" ":" mechanism 3501 epilogue := discard-text 3503 extension-token := iana-token / ietf-token / x-token 3505 iana-token := 3509 ietf-token := 3513 id := "Content-ID" ":" msg-id 3515 mechanism := "7bit" / "8bit" / "binary" / 3516 "quoted-printable" / "base64" / 3517 ietf-token / x-token 3519 multipart-body := preamble dash-boundary 3520 [*LWSP-char] CRLF 3521 body-part *encapsulation 3522 close-delimiter [*LWSP-char] 3523 CRLF epilogue 3525 octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 3526 ; Octet must be used for characters > 127, =, 3527 ; SPACE, or TAB, and is recommended for any 3528 ; characters not listed in Appendix B as 3529 ; "mail-safe". 3531 parameter := attribute "=" value 3533 preamble := discard-text 3535 ptext := octet / safe-char 3537 quoted-printable := ([*(ptext / SPACE / TAB) ptext] ["="] CRLF) 3538 ; Maximum line length of 76 characters 3539 ; excluding CRLF 3541 safe-char := 3551 tspecials := "(" / ")" / "<" / ">" / "@" / 3552 "," / ";" / ":" / "\" / <"> 3553 "/" / "[" / "]" / "?" / "=" 3554 ; Must be in quoted-string, 3555 ; to use within parameter values 3557 type := discrete-type / composite-type 3559 value := token / quoted-string 3561 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 3563 x-token := 3566 Appendix E -- Summary of the Seven Content-types 3568 Content type: text 3570 Subtypes defined by this document: plain 3572 Important parameters: charset 3574 Encoding notes: quoted-printable generally preferred if an 3575 encoding is needed and the character set is mostly a US- 3576 ASCII superset. 3578 Security considerations: Rich text formats such as TeX and 3579 Troff often contain mechanisms for executing arbitrary 3580 commands or file system operations, and should not be used 3581 automatically unless these security problems have been 3582 addressed. Even plain text may contain control characters 3583 that can be used to exploit the capabilities of 3584 "intelligent" terminals and cause security violations. User 3585 interfaces designed to run on such terminals should be aware 3586 of and try to prevent such problems. 3588 Content type: image 3590 Subtypes defined by this document: jpeg, gif 3592 Important parameters: none 3594 Encoding notes: base64 generally preferred 3596 Content type: audio 3598 Subtypes defined by this document: basic 3600 Important parameters: none 3602 Encoding notes: base64 generally preferred 3604 Content type: video 3606 Subtypes defined by this document: mpeg 3608 Important parameters: none 3609 Encoding notes: base64 generally preferred 3611 Content type: application 3613 Subtypes defined by this document: octet-stream, postscript 3615 Important parameters: type, padding 3617 Deprecated parameters: name and conversions were defined in 3618 RFC 1341, and have since been deleted. 3620 Encoding notes: base64 preferred for unreadable subtypes. 3622 Security considerations: This type is intended for the 3623 transmission of data to be interpreted by locally-installed 3624 programs. Severe security problems could result if this 3625 type is used to transmit binary programs or programs in 3626 general-purpose interpreted languages, such as LISP programs 3627 or shell scripts, without taking special precautions. 3628 Authors of mail-reading agents are cautioned against giving 3629 their systems the power to execute mail-based application 3630 data without carefully considering the security 3631 implications. While it is certainly possible to define safe 3632 application formats and even safe interpreters for unsafe 3633 formats, each interpreter should be evaluated separately for 3634 possible security problems. 3636 Content type: multipart 3638 Subtypes defined by this document: mixed, alternative, 3639 digest, parallel. 3641 Important parameters: boundary 3643 Encoding notes: No content-transfer-encoding other than 3644 "7bit", "8bit", or "binary" are permitted. 3646 Content type: message 3648 Subtypes defined by this document: rfc822, partial, 3649 external-body 3651 Important parameters: id, number, total, access-type, 3652 expiration, size, permission, name, site, directory, mode, 3653 server, subject 3654 Encoding notes: Only "7bit" is permitted for 3655 "message/partial" or "message/external-body", and only 3656 "7bit", "8bit", or "binary" are permitted for other subtypes 3657 of "message". 3659 Appendix F -- Canonical Encoding Model 3661 There was some confusion, in earlier drafts of this memo, 3662 regarding the model for when email data was to be converted to 3663 canonical form and encoded, and in particular how this process 3664 would affect the treatment of CRLFs, given that the 3665 representation of newlines varies greatly from system to 3666 system. For this reason, a canonical model for encoding is 3667 presented below. 3669 The process of composing a MIME entity can be modeled as being 3670 done in a number of steps. Note that these steps are roughly 3671 similar to those steps used in PEM [RFC1421] and are performed 3672 for each "innermost level" body: 3674 (1) Creation of local form. 3676 The body to be transmitted is created in the system's 3677 native format. The native character set is used, and 3678 where appropriate local end of line conventions are 3679 used as well. The body may be a UNIX-style text file, 3680 or a Sun raster image, or a VMS indexed file, or audio 3681 data in a system-dependent format stored only in 3682 memory, or anything else that corresponds to the local 3683 model for the representation of some form of 3684 information. Fundamentally, the data is created in the 3685 "native" form that corresponds to the type specified by 3686 the content type. 3688 (2) Conversion to canonical form. 3690 The entire body, including "out-of-band" information 3691 such as record lengths and possibly file attribute 3692 information, is converted to a universal canonical 3693 form. The specific content type of the body as well as 3694 its associated attributes dictate the nature of the 3695 canonical form that is used. Conversion to the proper 3696 canonical form may involve character set conversion, 3697 transformation of audio data, compression, or various 3698 other operations specific to the various content types. 3699 If character set conversion is involved, however, care 3700 must be taken to understand the semantics of the 3701 content-type, which may have strong implications for 3702 any character set conversion, e.g. with regard to 3703 syntactically meaningful characters in a text subtype 3704 other than "plain". 3706 For example, in the case of text/plain data, the text 3707 must be converted to a supported character set and 3708 lines must be delimited with CRLF delimiters in 3709 accordance with RFC 822. Note that the restriction on 3710 line lengths implied by RFC 822 is eliminated if the 3711 next step employs either quoted-printable or base64 3712 encoding. 3714 (3) Apply transfer encoding. 3716 A Content-Transfer-Encoding appropriate for this body 3717 is applied. Note that there is no fixed relationship 3718 between the content type and the transfer encoding. In 3719 particular, it may be appropriate to base the choice of 3720 base64 or quoted-printable on character frequency 3721 counts which are specific to a given instance of a 3722 body. 3724 (4) Insertion into entity. 3726 The encoded object is inserted into a MIME entity with 3727 appropriate headers. The entity is then inserted into 3728 the body of a higher-level entity (message or 3729 multipart) if needed. 3731 It is vital to note that these steps are only a model; they 3732 are specifically NOT a blueprint for how an actual system 3733 would be built. In particular, the model fails to account for 3734 two common designs: 3736 (1) In many cases the conversion to a canonical form prior 3737 to encoding will be subsumed into the encoder itself, 3738 which understands local formats directly. For example, 3739 the local newline convention for text bodies might be 3740 carried through to the encoder itself along with 3741 knowledge of what that format is. 3743 (2) The output of the encoders may have to pass through one 3744 or more additional steps prior to being transmitted as 3745 a message. As such, the output of the encoder may not 3746 be conformant with the formats specified by RFC 822. 3748 In particular, once again it may be appropriate for the 3749 converter's output to be expressed using local newline 3750 conventions rather than using the standard RFC 822 CRLF 3751 delimiters. 3753 Other implementation variations are conceivable as well. The 3754 vital aspect of this discussion is that, in spite of any 3755 optimizations, collapsings of required steps, or insertion of 3756 additional processing, the resulting messages must be 3757 consistent with those produced by the model described here. 3758 For example, a message with the following header fields: 3760 Content-type: text/foo; charset=bar 3761 Content-Transfer-Encoding: base64 3763 must be first represented in the text/foo form, then (if 3764 necessary) represented in the "bar" character set, and finally 3765 transformed via the base64 algorithm into a mail-safe form. 3767 Appendix G -- Changes from RFC 1521 3769 This document is a revision of RFC 1521. For the convenience 3770 of those familiar with RFC 1521, the changes from that 3771 document are summarized in this appendix. For further history, 3772 note that Appendix H in RFC 1521 specified how that document 3773 differed from its predecessor, RFC 1341. 3775 (1) This document has been completely reformatted. This was 3776 done to improve the quality of the plain text version 3777 of this document, which is required to be the reference 3778 copy. 3780 (2) BNF describing the overall structure of MIME message 3781 and part headers has been added. This is a 3782 documentation change only -- the underlying syntax has 3783 not changed in any way. 3785 (3) The specific BNF for the seven content types in MIME 3786 has been removed. This BNF was incorrect, incomplete, 3787 amd inconsistent with the type-indendependent BNF. And 3788 since the type-independent BNF already fully specifies 3789 the syntax of the various MIME headers, the type- 3790 specific BNF was, in the final analysis, completely 3791 unnecessary and caused more problems than it solved. 3793 (4) The more specific "US-ASCII" character set name has 3794 replaced the use of the term ASCII in many parts of 3795 this specification. 3797 (5) The informal concept of a primary subtype has been 3798 removed. 3800 (6) The term "object" was being used inconsistently. This 3801 term has been replaced with the more precise terms 3802 "body", "body part", and "entity" where appropriate. 3804 (7) The BNF for the multipart content-type has been 3805 rearranged to make it clear that the CRLF preceeding 3806 the boundary marker is actually part of the marker 3807 itself rather than the preceeding body part. 3809 (8) In the rules on reassembling "message/partial" MIME 3810 entities, "Subject" is added to the list of headers to 3811 take from the inner message, and the example is 3812 modified to clarify this point. 3814 (9) In the discussion of the application/postscript type, 3815 an additional paragraph has been added warning about 3816 possible interoperability problems caused by embedding 3817 of binary data inside a PostScript MIME entity. 3819 (10) Added a clarifying note to the basic syntax rules for 3820 Content-Type to make it clear that the following two 3821 forms: 3823 Content-type: text/plain; charset=us-ascii (comment) 3825 Content-type: text/plain; charset="us-ascii" 3827 are completely equivalent. 3829 (11) The following sentence has been removed from the 3830 discussion of the MIME-Version header: "However, 3831 conformant software is encouraged to check the version 3832 number and at least warn the user if an unrecognized 3833 MIME-version is encountered." 3835 (12) A typo was fixed that said "application/external-body" 3836 instead of "message/external-body". 3838 (13) The definition of a character set has been reorganized 3839 to make the requirements clearer. 3841 (14) The definitions of "7bit" and "8bit" have been 3842 tightened so that use of bare CR, LF, and NUL 3843 characters are no longer allowed. 3845 (15) The definition of canonical text in MIME has been 3846 tightened so that line breaks must be represented by a 3847 CRLF sequence. CR and LF characters are not allowed 3848 outside of this usage. The definition of quoted- 3849 printable encoding has been altered accordingly. 3851 (16) Prose was added to clarify the use of the "7bit", "8- 3852 bit", and "binary" transfer-encodings on multipart or 3853 message entities encapsulating "8bit" or "binary" data. 3855 (17) In Appendix A, "multipart/digest" support was added to 3856 the list of requirements for minimal MIME conformance. 3857 Also, the requirement for "message/rfc822" support were 3858 strengthened to clarify the importance of recognizing 3859 recursive structure. 3861 (18) The various restrictions on subtypes of "message" are 3862 now specified entirely on a subtype by subtype basis. 3864 (19) The definition of "message/rfc822" was changed to 3865 indicate that at least one of the "From", "Subject", or 3866 "Date" headers must be present. 3868 (20) The required handling of unrecognized subtypes as 3869 "application/octet-stream" has been made more explicit 3870 in both the type definitions sections and the 3871 conformance guidelines. 3873 (21) Examples using text/richtext were changed to 3874 text/enriched. 3876 (22) The BNF definition of subtype has been changed to make 3877 it clear that either an IANA registered subtype or a 3878 nonstandard "X-" subtype must be used in a Content-Type 3879 header field. 3881 (23) The use of escape and shift mechanisms in the US-ASCII 3882 and ISO-8859-X character sets this specification 3883 defines has been clarified: Such mechanisms should 3884 never be used in conjunction with these character sets 3885 and their effect if they are used is undefined. 3887 (24) The definition of the AFS access-type for 3888 message/external-body has been removed. 3890 (25) Entities that are simply registered for use and those 3891 that are standardized by the IETF are now distinguished 3892 in the MIME BNF. 3894 (26) The handling of the combination of 3895 multipart/alternative and message/external-body is now 3896 specifically addressed. 3898 Appendix H -- References 3900 [ATK] 3901 Borenstein, Nathaniel S., Multimedia Applications 3902 Development with the Andrew Toolkit, Prentice-Hall, 1990. 3904 [GIF] 3905 Graphics Interchange Format (Version 89a), Compuserve, 3906 Inc., Columbus, Ohio, 1990. 3908 [ISO-2022] 3909 International Standard -- Information Processing -- ISO 3910 7-bit and 8-bit Coded Character Sets -- Code Extension 3911 Techniques, ISO 2022:1986. 3913 [ISO-8859] 3914 International Standard -- Information Processing -- 8-bit 3915 Single-Byte Coded Graphic Character Sets -- Part 1: Latin 3916 Alphabet No. 1, ISO 8859-1:1987. Part 2: Latin alphabet 3917 No. 2, ISO 8859-2, 1987. Part 3: Latin alphabet No. 3, 3918 ISO 8859-3, 1988. Part 4: Latin alphabet No. 4, ISO 3919 8859-4, 1988. Part 5: Latin/Cyrillic alphabet, ISO 3920 8859-5, 1988. Part 6: Latin/Arabic alphabet, ISO 8859-6, 3921 1987. Part 7: Latin/Greek alphabet, ISO 8859-7, 1987. 3922 Part 8: Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: 3923 Latin alphabet No. 5, ISO 8859-9, 1990. 3925 [ISO-646] 3926 International Standard -- Information Processing -- ISO 3927 7-bit Coded Character Set For Information Interchange, 3928 ISO 646:1983. 3930 [MPEG] 3931 Video Coding Draft Standard ISO 11172 CD, ISO 3932 IEC/TJC1/SC2/WG11 (Motion Picture Experts Group), May, 3933 1991. 3935 [PCM] 3936 CCITT, Fascicle III.4 - Recommendation G.711, "Pulse Code 3937 Modulation (PCM) of Voice Frequencies", Geneva, 1972. 3939 [POSTSCRIPT] 3940 Adobe Systems, Inc., PostScript Language Reference 3941 Manual, Addison-Wesley, 1985. 3943 [POSTSCRIPT2] 3944 Adobe Systems, Inc., PostScript Language Reference 3945 Manual, Addison-Wesley, Second Edition, 1990. 3947 [RFC-783] 3948 Sollins, K.R., "TFTP Protocol (revision 2)", RFC-783, 3949 MIT, June 1981. 3951 [RFC-821] 3952 Postel, J.B., "Simple Mail Transfer Protocol", STD 10, 3953 RFC 821, USC/Information Sciences Institute, August 1982. 3955 [RFC-822] 3956 Crocker, D., "Standard for the Format of ARPA Internet 3957 Text Messages", STD 11, RFC 822, UDEL, August 1982. 3959 [RFC-934] 3960 Rose, M., and E. Stefferud, "Proposed Standard for 3961 Message Encapsulation", RFC 934, Delaware and NMA, 3962 January 1985. 3964 [RFC-959] 3965 Postel, J. and J. Reynolds, "File Transfer Protocol", STD 3966 9, RFC 959, USC/Information Sciences Institute, October 3967 1985. 3969 [RFC-1049] 3970 Sirbu, M., "Content-Type Header Field for Internet 3971 Messages", STD 11, RFC 1049, CMU, March 1988. 3973 [RFC-1154] 3974 Robinson, D. and R. Ullmann, "Encoding Header Field for 3975 Internet Messages", RFC 1154, Prime Computer, Inc., April 3976 1990. 3978 [RFC-1341] 3979 Borenstein, N., and N. Freed, "MIME (Multipurpose 3980 Internet Mail Extensions): Mechanisms for Specifying and 3981 Describing the Format of Internet Message Bodies", RFC 3982 1341, Bellcore, Innosoft, June 1992. 3984 [RFC-1342] 3985 Moore, K., "Representation of Non-Ascii Text in Internet 3986 Message Headers", RFC 1342, University of Tennessee, June 3987 1992. 3989 [RFC-1344] 3990 Borenstein, N., "Implications of MIME for Internet Mail 3991 Gateways", RFC 1344, Bellcore, June 1992. 3993 [RFC-1345] 3994 Simonsen, K., "Character Mnemonics & Character Sets", RFC 3995 1345, Rationel Almen Planlaegning, June 1992. 3997 [RFC-1421] 3998 Linn, J., "Privacy Enhancement for Internet Electronic 3999 Mail: Part I -- Message Encryption and Authentication 4000 Procedures", RFC 1421, IAB IRTF PSRG, IETF PEM WG, 4001 February 1993. 4003 [RFC-1422] 4004 Kent, S., "Privacy Enhancement for Internet Electronic 4005 Mail: Part II -- Certificate-Based Key Management", RFC 4006 1422, IAB IRTF PSRG, IETF PEM WG, February 1993. 4008 [RFC-1423] 4009 Balenson, D., "Privacy Enhancement for Internet 4010 Electronic Mail: Part III -- Algorithms, Modes, and 4011 Identifiers", IAB IRTF PSRG, IETF PEM WG, February 1993. 4013 [RFC-1424] 4014 Kaliski, B., "Privacy Enhancement for Internet Electronic 4015 Mail: Part IV -- Key Certification and Related 4016 Services", IAB IRTF PSRG, IETF PEM WG, February 1993. 4018 [RFC-1521] 4019 Borenstein, N., and N. Freed, "MIME (Multipurpose 4020 Internet Mail Extensions): Mechanisms for Specifying and 4021 Describing the Format of Internet Message Bodies", RFC 4022 1521, Bellcore, Innosoft, September, 1993. 4024 [RFC-1522] 4025 Moore, K., "Representation of Non-ASCII Text in Internet 4026 Message Headers", RFC 1522, University of Tennessee, 4027 September 1993. 4029 [RFC-1524] 4030 Borenstein, N., "A User Agent Configuration Mechanism for 4031 Multimedia Mail Format Information", RFC 1524, Bellcore, 4032 September 1993. 4034 [RFC-1563] 4035 Borenstein, N., "The text/enriched MIME Content-type", 4036 RFC 1563, Bellcore, January, 1994. 4038 [RFC-1652] 4039 Klensin, J., (WG Chair), Freed, N., (Editor), Rose, M., 4040 Stefferud, E., and Crocker, D., "SMTP Service Extension 4041 for 8bit-MIME transport", RFC 1652, United Nations 4042 Universit, Innosoft, Dover Beach Consulting, Inc., 4043 Network Management Associates, Inc., The Branch Office, 4044 February 1993. 4046 [RFC-1700] 4047 Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, 4048 RFC 1700, USC/Information Sciences Institute, October 4049 1994. 4051 [RFC-MIME-HEADERS] 4052 Moore, K., "Representation of Non-Ascii Text in Internet 4053 Message Headers", RFC MIME-HEADERS, University of 4054 Tennessee, ?. 4056 [RFC-REG] 4057 Postel, J., "Media Type Registration Procedure", RFC REG, 4058 ?. 4060 [US-ASCII] 4061 Coded Character Set -- 7-Bit American Standard Code for 4062 Information Interchange, ANSI X3.4-1986. 4064 [X400] 4065 Schicker, Pietro, "Message Handling Systems, X.400", 4066 Message Handling Systems and Distributed Applications, E. 4067 Stefferud, O-j. Jacobsen, and P. Schicker, eds., North- 4068 Holland, 1989, pp. 3-41.