idnits 2.17.00 (12 Aug 2021) /tmp/idnits6593/draft-ietf-822ext-mime-imb-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2022-05-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand corner of the first page ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 100 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. (A line matching the expected section header was found, but with an unexpected indentation: ' RATIONALE: In the absence of any Content-Type' ) ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 1735 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** There are 9 instances of too long lines in the document, the longest one being 17 characters in excess of 72. ** The abstract seems to contain references ([RFC-959], [RFC-783], [ISO-8859], [POSTSCRIPT], [RFC-1341], [RFC-1342], [X400], [US-ASCII], [RFC-1049], [ATK], [ISO-646], [RFC-1563], [RFC-1344], [RFC-1345], [RFC-1521], [RFC-821], [RFC-1522], [PCM], [POSTSCRIPT2], [RFC-822], [MPEG], [GIF], [RFC-1426]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 345: '...accordance with this document MUST...' RFC 2119 keyword, line 984: '... inclusive, MAY be represented as...' RFC 2119 keyword, line 990: '... MAY be represented...' RFC 2119 keyword, line 991: '...ectively, but MUST NOT be so...' RFC 2119 keyword, line 993: '...rs on an encoded line MUST thus be...' (4 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 16 has weird spacing: '...ment is an I...' == Line 17 has weird spacing: '...working docum...' == Line 18 has weird spacing: '...te that other...' == Line 19 has weird spacing: '... groups may ...' == Line 23 has weird spacing: '... six month...' == (1730 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 1994) is 10232 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'ISO-2022' is defined on line 4152, but no explicit reference was found in the text == Unused Reference: 'RFC-934' is defined on line 4200, but no explicit reference was found in the text == Unused Reference: 'RFC-1421' is defined on line 4211, but no explicit reference was found in the text == Unused Reference: 'RFC-1154' is defined on line 4216, but no explicit reference was found in the text == Unused Reference: 'RFC-1343' is defined on line 4229, but no explicit reference was found in the text == Unused Reference: 'RFC-1340' is defined on line 4252, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'US-ASCII' -- Possible downref: Non-RFC (?) normative reference: ref. 'ATK' -- Possible downref: Non-RFC (?) normative reference: ref. 'GIF' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-2022' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-8859' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-646' -- Possible downref: Non-RFC (?) normative reference: ref. 'MPEG' -- Possible downref: Non-RFC (?) normative reference: ref. 'PCM' -- Possible downref: Non-RFC (?) normative reference: ref. 'POSTSCRIPT' -- Possible downref: Non-RFC (?) normative reference: ref. 'POSTSCRIPT2' -- Possible downref: Non-RFC (?) normative reference: ref. 'X400' ** Obsolete normative reference: RFC 783 (Obsoleted by RFC 1350) ** Obsolete normative reference: RFC 821 (Obsoleted by RFC 2821) ** Obsolete normative reference: RFC 822 (Obsoleted by RFC 2822) ** Downref: Normative reference to an Unknown state RFC: RFC 934 ** Downref: Normative reference to an Historic RFC: RFC 1049 ** Downref: Normative reference to an Historic RFC: RFC 1421 ** Obsolete normative reference: RFC 1154 (Obsoleted by RFC 1505) ** Obsolete normative reference: RFC 1341 (Obsoleted by RFC 1521) ** Obsolete normative reference: RFC 1342 (Obsoleted by RFC 1522) ** Downref: Normative reference to an Informational RFC: RFC 1343 ** Downref: Normative reference to an Informational RFC: RFC 1344 ** Downref: Normative reference to an Informational RFC: RFC 1345 ** Obsolete normative reference: RFC 1426 (Obsoleted by RFC 1652) ** Obsolete normative reference: RFC 1522 (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 1340 (Obsoleted by RFC 1700) ** Obsolete normative reference: RFC 1521 (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 1563 (Obsoleted by RFC 1896) Summary: 31 errors (**), 0 flaws (~~), 14 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group N. Borenstein 3 Internet Draft First Virtual Holdings 4 Expires in six months N. Freed, Innosoft 5 May 1994 7 MIME (Multipurpose Internet Mail Extensions) Part One: 9 Mechanisms for Specifying and Describing 10 the Format of Internet Message Bodies 12 14 Status of this Memo 16 This document is an Internet-Draft. Internet-Drafts are 17 working documents of the Internet Engineering Task Force 18 (IETF), its areas, and its working groups. Note that other 19 groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of 23 six months and may be updated, replaced, or obsoleted by 24 other documents at any time. It is inappropriate to use 25 Internet- Drafts as reference material or to cite them other 26 than as ``work in progress.'' 28 To learn the current status of any Internet-Draft, please 29 check the ``1id-abstracts.txt'' listing contained in the 30 Internet- Drafts Shadow Directories on ds.internic.net (US 31 East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West 32 Coast), or munnari.oz.au (Pacific Rim). 34 Abstract 36 STD 11, RFC 822 defines a message representation protocol 37 which specifies considerable detail about message headers, 38 but which leaves the message content, or message body, as 39 flat ASCII text. This document redefines the format of 40 message bodies to allow multi-part textual and non-textual 41 message bodies to be represented and exchanged without loss 42 of information. This is based on earlier work documented 43 in RFC 934, STD 11, and RFC 1049, but extends and revises 44 that work. Because RFC 822 said so little about message 45 bodies, this document is largely orthogonal to (rather than 46 a revision of) RFC 822. 48 In particular, this document is designed to provide 49 facilities to include multiple objects in a single message, 50 to represent body text in character sets other than US- 51 ASCII, to represent formatted multi-font text messages, to 52 represent non-textual material such as images and audio 53 fragments, and generally to facilitate later extensions 54 defining new types of Internet mail for use by cooperating 55 mail agents. 57 This document does NOT extend Internet mail header fields to 58 permit anything other than US-ASCII text data. Such 59 extensions are the subject of a companion document [RFC 60 -1522]. 62 This document is a revision of RFC 1521, which was a 63 revision of RFC 1341. Significant differences from RFC 1521 64 are summarized in Appendix H. 66 THIS PAGE INTENTIONALLY LEFT BLANK. 68 The table of contents should be inserted after this page. 70 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 72 1 Introduction 74 Since its publication in 1982, RFC 822 [RFC-822] has defined 75 the standard format of textual mail messages on the 76 Internet. Its success has been such that the RFC 822 format 77 has been adopted, wholly or partially, well beyond the 78 confines of the Internet and the Internet SMTP transport 79 defined by RFC 821 [RFC-821]. As the format has seen wider 80 use, a number of limitations have proven increasingly 81 restrictive for the user community. 82 RFC 822 was intended to specify a format for text messages. 83 As such, non-text messages, such as multimedia messages that 84 might include audio or images, are simply not mentioned. 85 Even in the case of text, however, RFC 822 is inadequate for 86 the needs of mail users whose languages require the use of 87 character sets richer than US ASCII [US-ASCII]. Since RFC 88 822 does not specify mechanisms for mail containing audio, 89 video, Asian language text, or even text in most European 90 languages, additional specifications are needed. 92 One of the notable limitations of RFC 821/822 based mail 93 systems is the fact that they limit the contents of 94 electronic mail messages to relatively short lines of 95 seven-bit ASCII. This forces users to convert any non- 96 textual data that they may wish to send into seven-bit bytes 97 representable as printable ASCII characters before invoking 98 a local mail UA (User Agent, a program with which human 99 users send and receive mail). Examples of such encodings 100 currently used in the Internet include pure hexadecimal, 101 uuencode, the 3-in-4 base 64 scheme specified in RFC 1421, 102 the Andrew Toolkit Representation [ATK], and many others. 104 The limitations of RFC 822 mail become even more apparent as 105 gateways are designed to allow for the exchange of mail 106 messages between RFC 822 hosts and X.400 hosts. X.400 107 [X400] specifies mechanisms for the inclusion of non-textual 108 body parts within electronic mail messages. The current 109 standards for the mapping of X.400 messages to RFC 822 110 messages specify either that X.400 non-textual body parts 111 must be converted to (not encoded in) an ASCII format, or 112 that they must be discarded, notifying the RFC 822 user that 113 discarding has occurred. This is clearly undesirable, as 114 information that a user may wish to receive is lost. Even 115 though a user's UA may not have the capability of dealing 116 with the non-textual body part, the user might have some 117 mechanism external to the UA that can extract useful 119 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 121 information from the body part. Moreover, it does not allow 122 for the fact that the message may eventually be gatewayed 123 back into an X.400 message handling system (i.e., the X.400 124 message is "tunneled" through Internet mail), where the 125 non-textual information would definitely become useful 126 again. 128 This document describes several mechanisms that combine to 129 solve most of these problems without introducing any serious 130 incompatibilities with the existing world of RFC 822 mail. 131 In particular, it describes: 133 1. A MIME-Version header field, which uses a version number 134 to declare a message to be conformant with this 135 specification and allows mail processing agents to 136 distinguish between such messages and those generated 137 by older or non-conformant software, which is presumed 138 to lack such a field. 140 2. A Content-Type header field, generalized from RFC 1049 141 [RFC-1049], which can be used to specify the type and 142 subtype of data in the body of a message and to fully 143 specify the native representation (encoding) of such 144 data. 146 2.a. A "text" Content-Type value, which can be used to 147 represent textual information in a number of 148 character sets and formatted text description 149 languages in a standardized manner. 151 2.b. A "multipart" Content-Type value, which can be 152 used to combine several body parts, possibly of 153 differing types of data, into a single message. 155 2.c. An "application" Content-Type value, which can be 156 used to transmit application data or binary data, 157 and hence, among other uses, to implement an 158 electronic mail file transfer service. 160 2.d. A "message" Content-Type value, for encapsulating 161 another mail message. 163 2.e An "image" Content-Type value, for transmitting 164 still image (picture) data. 166 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 168 2.f. An "audio" Content-Type value, for transmitting 169 audio or voice data. 171 2.g. A "video" Content-Type value, for transmitting 172 video or moving image data, possibly with audio as 173 part of the composite video data format. 175 3. A Content-Transfer-Encoding header field, which can be 176 used to specify an auxiliary encoding that was applied 177 to the data in order to allow it to pass through mail 178 transport mechanisms which may have data or character 179 set limitations. 181 4. Two additional header fields that can be used to further 182 describe the data in a message body, the Content-ID and 183 Content-Description header fields. 185 MIME has been carefully designed as an extensible mechanism, 186 and it is expected that the set of content-type/subtype 187 pairs and their associated parameters will grow 188 significantly with time. Several other MIME fields, notably 189 including character set names, are likely to have new values 190 defined over time. In order to ensure that the set of such 191 values is developed in an orderly, well-specified, and 192 public manner, MIME defines a registration process which 193 uses the Internet Assigned Numbers Authority (IANA) as a 194 central registry for such values. Appendix E provides 195 details about how IANA registration is accomplished. 197 Finally, to specify and promote interoperability, Appendix A 198 of this document provides a basic applicability statement 199 for a subset of the above mechanisms that defines a minimal 200 level of "conformance" with this document. 202 HISTORICAL NOTE: Several of the mechanisms 203 described in this document may seem somewhat 204 strange or even baroque at first reading. It is 205 important to note that compatibility with existing 206 standards AND robustness across existing practice 207 were two of the highest priorities of the working 208 group that developed this document. In 209 particular, compatibility was always favored over 210 elegance. 212 MIME was first defined and published as RFCs 1341 and 1342 213 [RFC-1341] [RFC-1342], then revised as RFCs 1521 and 1522 215 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 217 [RFC-1521] [RFC-1522]. This document is a relatively minor 218 updating of RFC 1521, and is intended to supersede it. The 219 differences between this document and RFC 1521 are 220 summarized in Appendix H. Please refer to the current 221 edition of the "IAB Official Protocol Standards" for the 222 standardization state and status of this protocol. 223 Several other RFC documents will be of interest to the MIME 224 implementor, in particular [RFC 1343], [RFC-1344], and 225 [RFC-1345]. 227 2 Notations, Conventions, and Generic BNF Grammar 229 This document is being published in two versions, one as 230 plain ASCII text and one as PostScript1 . The latter is 231 recommended, though the textual contents are identical. An 232 Andrew-format copy of this document is also available from 233 the first author (Borenstein). 235 Although the mechanisms specified in this document are all 236 described in prose, most are also described formally in the 237 modified BNF notation of RFC 822. Implementors will need to 238 be familiar with this notation in order to understand this 239 specification, and are referred to RFC 822 for a complete 240 explanation of the modified BNF notation. 242 Some of the modified BNF in this document makes reference to 243 syntactic entities that are defined in RFC 822 and not in 244 this document. A complete formal grammar, then, is obtained 245 by combining the collected grammar appendix of this document 246 with that of RFC 822 plus the modifications to RFC 822 247 defined in RFC 1123, which specifically changes the syntax 248 for `return', `date' and `mailbox'. 250 The term CRLF, in this document, refers to the sequence of 251 the two ASCII characters CR (13) and LF (10) which, taken 252 together, in this order, denote a line break in RFC 822 253 mail. 255 The term "character set" is used in this document to refer 256 to a method used with one or more tables to convert encoded 257 text to a series of octets. This definition is intended to 258 allow various kinds of text encodings, from simple single- 259 table mappings such as ASCII to complex table switching 260 methods such as those that use ISO 2022's techniques. 261 __________ 262 1PostScript is a trademark of Adobe Systems Incorporated. 264 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 266 However, a MIME character set name must fully specify the 267 mapping to be performed. 269 The term "message", when not further qualified, means either 270 the (complete or "top-level") message being transferred on a 271 network, or a message encapsulated in a body of type 272 "message". 274 The term "body part", in this document, means one of the 275 parts of the body of a multipart entity. A body part has a 276 header and a body, so it makes sense to speak about the body 277 of a body part. 279 The term "entity", in this document, means either a message 280 or a body part. All kinds of entities share the property 281 that they have a header and a body. 283 The term "body", when not further qualified, means the body 284 of an entity, that is the body of either a message or of a 285 body part. 287 NOTE: The previous four definitions are clearly 288 circular. This is unavoidable, since the overall 289 structure of a MIME message is indeed recursive. 291 In this document, all numeric and octet values are given in 292 decimal notation. 294 It must be noted that Content-Type values, subtypes, and 295 parameter names as defined in this document are case- 296 insensitive. However, parameter values are case-sensitive 297 unless otherwise specified for the specific parameter. 299 FORMATTING NOTE: This document has been carefully 300 formatted for ease of reading. The PostScript 301 version of this document, in particular, places 302 notes like this one, which may be skipped by the 303 reader, in a smaller, italicized, font, and 304 indents it as well. In the text version, only the 305 indentation is preserved, so if you are reading 306 the text version of this you might consider using 307 the PostScript version instead. However, all such 308 notes will be indented and preceded by "NOTE:" or 309 some similar introduction, even in the text 310 version. 312 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 314 The primary purpose of these non-essential notes 315 is to convey information about the rationale of 316 this document, or to place this document in the 317 proper historical or evolutionary context. Such 318 information may be skipped by those who are 319 focused entirely on building a conformant 320 implementation, but may be of use to those who 321 wish to understand why this document is written as 322 it is. 324 For ease of recognition, all BNF definitions have 325 been placed in a fixed-width font in the 326 PostScript version of this document. 328 3 The MIME-Version Header Field 330 Since RFC 822 was published in 1982, there has really been 331 only one format standard for Internet messages, and there 332 has been little perceived need to declare the format 333 standard in use. This document is an independent document 334 that complements RFC 822. Although the extensions in this 335 document have been defined in such a way as to be compatible 336 with RFC 822, there are still circumstances in which it 337 might be desirable for a mail-processing agent to know 338 whether a message was composed with the new standard in 339 mind. 341 Therefore, this document defines a new header field, "MIME- 342 Version", which is to be used to declare the version of the 343 Internet message body format standard in use. 345 Messages composed in accordance with this document MUST 346 include such a header field, with the following verbatim 347 text: 349 MIME-Version: 1.0 351 The presence of this header field is an assertion that the 352 message has been composed in compliance with this document. 354 Since it is possible that a future document might extend the 355 message format standard again, a formal BNF is given for the 356 content of the MIME-Version field: 358 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 360 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 362 Thus, future format specifiers, which might replace or 363 extend "1.0", are constrained to be two integer fields, 364 separated by a period. If a message is received with a 365 MIME-version value other than "1.0", it cannot be assumed to 366 conform with this specification. 368 Note that the MIME-Version header field is required at the 369 top level of a message. It is not required for each body 370 part of a multipart entity. It is required for the embedded 371 headers of a body of type "message" if and only if the 372 embedded message is itself claimed to be MIME-conformant. 374 It is not possible to fully specify how a mail reader that 375 conforms with MIME as defined in this document should treat 376 a message that might arrive in the future with some value of 377 MIME-Version other than "1.0". However, conformant 378 software is encouraged to check the version number and at 379 least warn the user if an unrecognized MIME-version is 380 encountered. 382 It is also worth noting that version control for specific 383 content-types is not accomplished using the MIME-Version 384 mechanism. In particular, some formats (such as 385 application/postscript) have version numbering conventions 386 that are internal to the document format. Where such 387 conventions exist, MIME does nothing to supersede them. 388 Where no such conventions exist, a MIME type might use a 389 "version" parameter in the content-type field if necessary. 391 NOTE TO IMPLEMENTORS: All header fields defined in this 392 document, including MIME-Version, Content-type, etc., are 393 subject to the general syntactic rules for header fields 394 specified in RFC 822. In particular, all can include 395 comments, which means that the following two MIME-Version 396 fields are equivalent: 398 MIME-Version: 1.0 399 MIME-Version: 1.0 (Generated by GBD-killer 3.7) 401 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 403 4 The Content-Type Header Field 405 The purpose of the Content-Type field is to describe the 406 data contained in the body fully enough that the receiving 407 user agent can pick an appropriate agent or mechanism to 408 present the data to the user, or otherwise deal with the 409 data in an appropriate manner. 411 HISTORICAL NOTE: The Content-Type header field 412 was first defined in RFC 1049. RFC 1049 Content- 413 types used a simpler and less powerful syntax, but 414 one that is largely compatible with the mechanism 415 given here. 417 The Content-Type header field is used to specify the nature 418 of the data in the body of an entity, by giving type and 419 subtype identifiers, and by providing auxiliary information 420 that may be required for certain types. After the type and 421 subtype names, the remainder of the header field is simply a 422 set of parameters, specified in an attribute/value notation. 423 The set of meaningful parameters differs for the different 424 types. In particular, there are NO globally-meaningful 425 parameters that apply to all content-types. Global 426 mechanisms are best addressed, in the MIME model, by the 427 definition of additional Content-* header fields. The 428 ordering of parameters is not significant. Among the 429 defined parameters is a "charset" parameter by which the 430 character set used in the body may be declared. Comments 431 are allowed in accordance with RFC 822 rules for structured 432 header fields. 434 In general, the top-level Content-Type is used to declare 435 the general type of data, while the subtype specifies a 436 specific format for that type of data. Thus, a Content-Type 437 of "image/xyz" is enough to tell a user agent that the data 438 is an image, even if the user agent has no knowledge of the 439 specific image format "xyz". Such information can be used, 440 for example, to decide whether or not to show a user the raw 441 data from an unrecognized subtype -- such an action might be 442 reasonable for unrecognized subtypes of text, but not for 443 unrecognized subtypes of image or audio. For this reason, 444 registered subtypes of audio, image, text, and video, should 445 not contain embedded information that is really of a 446 different type. Such compound types should be represented 447 using the "multipart" or "application" types. 449 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 451 Parameters are modifiers of the content-subtype, and do not 452 fundamentally affect the requirements of the host system. 453 Although most parameters make sense only with certain 454 content-types, others are "global" in the sense that they 455 might apply to any subtype. For example, the "boundary" 456 parameter makes sense only for the "multipart" content-type, 457 but the "charset" parameter might make sense with several 458 content-types. 460 An initial set of seven Content-Types is defined by this 461 document. This set of top-level names is intended to be 462 substantially complete. It is expected that additions to 463 the larger set of supported types can generally be 464 accomplished by the creation of new subtypes of these 465 initial types. In the future, more top-level types may be 466 defined only by an extension to this standard. If another 467 primary type is to be used for any reason, it must be given 468 a name starting with "X-" to indicate its non-standard 469 status and to avoid a potential conflict with a future 470 official name. 472 In the Augmented BNF notation of RFC 822, a Content-Type 473 header field value is defined as follows: 475 content := "Content-Type" ":" type "/" subtype 476 *(";" parameter) 477 ; case-insensitive matching of type and subtype 479 type := "application" / "audio" 480 / "image" / "message" 481 / "multipart" / "text" 482 / "video" / extension-token 483 ; All values case-insensitive 485 extension-token := x-token / iana-token 487 iana-token := 491 x-token := 495 subtype := token ; case-insensitive 497 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 499 parameter := attribute "=" value 501 attribute := token ; case-insensitive 503 value := token / quoted-string 505 token := 1* 508 tspecials := "(" / ")" / "<" / ">" / "@" 509 / "," / ";" / ":" / "\" / <"> 510 / "/" / "[" / "]" / "?" / "=" 511 ; Must be in quoted-string, 512 ; to use within parameter values 514 Note that the definition of "tspecials" is the same as the 515 RFC 822 definition of "specials" with the addition of the 516 three characters "/", "?", and "=", and the removal of ".". 518 Note also that a subtype specification is MANDATORY. There 519 are no default subtypes. 521 The type, subtype, and parameter names are not case 522 sensitive. For example, TEXT, Text, and TeXt are all 523 equivalent. Parameter values are normally case sensitive, 524 but certain parameters are interpreted to be case- 525 insensitive, depending on the intended use. (For example, 526 multipart boundaries are case-sensitive, but the "access- 527 type" for message/External-body is not case-sensitive.) 529 Note that the value of a quoted string parameter does not 530 include the quotes. That is, the quotation marks in a 531 quoted-string are not a part of the value of an object, but 532 are merely used to delimit that object. Thus the following 533 two forms: 535 Content-type: text/plain; charset=us-ascii 536 Content-type: text/plain; charset="us-ascii" 538 are completely equivalent. 540 Beyond this syntax, the only constraint on the definition of 541 subtype names is the desire that their uses must not 542 conflict. That is, it would be undesirable to have two 543 different communities using "Content-Type: 544 application/foobar" to mean two different things. The 546 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 548 process of defining new content-subtypes, then, is not 549 intended to be a mechanism for imposing restrictions, but 550 simply a mechanism for publicizing the usages. There are, 551 therefore, two acceptable mechanisms for defining new 552 Content-Type subtypes: 554 1. Private values (starting with "X-") may be 555 defined bilaterally between two cooperating 556 agents without outside registration or 557 standardization. 559 2. New standard values must be documented, 560 registered with, and approved by IANA, as 561 described in Appendix E. Where intended for 562 public use, the formats they refer to must 563 also be defined by a published specification, 564 and possibly offered for standardization. 566 The seven standard initial predefined Content-Types are 567 detailed in the bulk of this document. They are: 569 text -- textual information. The primary subtype, 570 "plain", indicates plain (unformatted) text. No 571 special software is required to get the full 572 meaning of the text, aside from support for the 573 indicated character set. Subtypes are to be used 574 for enriched text in forms where application 575 software may enhance the appearance of the text, 576 but such software must not be required in order to 577 get the general idea of the content. Possible 578 subtypes thus include any readable word processor 579 format. A very simple and portable subtype, 580 richtext, was defined in RFC 1341 [RFC-1341], with 581 a further revision in RFC 1563 [RFC-1563]. 582 multipart -- data consisting of multiple parts of 583 independent data types. Four initial subtypes 584 are defined, including the primary "mixed" 585 subtype, "alternative" for representing the same 586 data in multiple formats, "parallel" for parts 587 intended to be viewed simultaneously, and "digest" 588 for multipart entities in which each part is of 589 type "message". 590 message -- an encapsulated message. A body of 591 Content-Type "message" is itself all or part of a 592 fully formatted RFC 822 conformant message which 593 may contain its own different Content-Type header 595 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 597 field. The primary subtype is "rfc822". The 598 "partial" subtype is defined for partial messages, 599 to permit the fragmented transmission of bodies 600 that are thought to be too large to be passed 601 through mail transport facilities. Another 602 subtype, "External-body", is defined for 603 specifying large bodies by reference to an 604 external data source. 605 image -- image data. Image requires a display device 606 (such as a graphical display, a printer, or a FAX 607 machine) to view the information. Initial 608 subtypes are defined for two widely-used image 609 formats, jpeg and gif. 610 audio -- audio data, with initial subtype "basic". 611 Audio requires an audio output device (such as a 612 speaker or a telephone) to "display" the contents. 613 video -- video data. Video requires the capability to 614 display moving images, typically including 615 specialized hardware and software. The initial 616 subtype is "mpeg". 617 application -- some other kind of data, typically 618 either uninterpreted binary data or information to 619 be processed by a mail-based application. The 620 primary subtype, "octet-stream", is to be used in 621 the case of uninterpreted binary data, in which 622 case the simplest recommended action is to offer 623 to write the information into a file for the user. 624 An additional subtype, "PostScript", is defined 625 for transporting PostScript documents in bodies. 626 Other expected uses for "application" include 627 spreadsheets, data for mail-based scheduling 628 systems, and languages for "active" 629 (computational) email. (Note that active email 630 and other application data may entail several 631 security considerations, which are discussed later 632 in this memo, particularly in the context of 633 application/PostScript.) 635 Default RFC 822 messages are typed by this protocol as plain 636 text in the US-ASCII character set, which can be explicitly 637 specified as "Content-type: text/plain; charset=us-ascii". 638 If no Content-Type is specified, this default is assumed. 639 In the presence of a MIME-Version header field, a receiving 640 User Agent can also assume that plain US-ASCII text was the 641 sender's intent. In the absence of a MIME-Version 642 specification, plain US-ASCII text must still be assumed, 644 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 646 but the sender's intent might have been otherwise. 648 RATIONALE: In the absence of any Content-Type 649 header field or MIME-Version header field, it is 650 impossible to be certain that a message is 651 actually text in the US-ASCII character set, since 652 it might well be a message that, using the 653 conventions that predate this document, includes 654 text in another character set or non-textual data 655 in a manner that cannot be automatically 656 recognized (e.g., a uuencoded compressed UNIX tar 657 file). Although there is no fully acceptable 658 alternative to treating such untyped messages as 659 "text/plain; charset=us-ascii", implementors 660 should remain aware that if a message lacks both 661 the MIME-Version and the Content-Type header 662 fields, it may in practice contain almost 663 anything. 665 It should be noted that the list of Content-Type values 666 given here may be augmented in time, via the mechanisms 667 described above, and that the set of subtypes is expected to 668 grow substantially. 670 When a mail reader encounters mail with an unknown Content- 671 type value, it should generally treat it as equivalent to 672 "application/octet-stream", as described later in this 673 document. 675 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 677 5 The Content-Transfer-Encoding Header Field 679 Many Content-Types which could usefully be transported via 680 email are represented, in their "natural" format, as 8-bit 681 character or binary data. Such data cannot be transmitted 682 over some transport protocols. For example, RFC 821 683 restricts mail messages to 7-bit US-ASCII data with lines no 684 longer than 1000 characters. 686 It is necessary, therefore, to define a standard mechanism 687 for re-encoding such data into a 7-bit short-line format. 688 This document specifies that such encodings will be 689 indicated by a new "Content-Transfer-Encoding" header field. 690 The Content-Transfer-Encoding field is used to indicate the 691 type of transformation that has been used in order to 692 represent the body in an acceptable manner for transport. 694 Unlike Content-Types, a proliferation of Content-Transfer- 695 Encoding values is undesirable and unnecessary. However, 696 establishing only a single Content-Transfer-Encoding 697 mechanism does not seem possible. There is a tradeoff 698 between the desire for a compact and efficient encoding of 699 largely-binary data and the desire for a readable encoding 700 of data that is mostly, but not entirely, 7-bit data. For 701 this reason, at least two encoding mechanisms are necessary: 702 a "readable" encoding and a "dense" encoding. 704 The Content-Transfer-Encoding field is designed to specify 705 an invertible mapping between the "native" representation of 706 a type of data and a representation that can be readily 707 exchanged using 7 bit mail transport protocols, such as 708 those defined by RFC 821 (SMTP). This field has not been 709 defined by any previous standard. The field's value is a 710 single token specifying the type of encoding, as enumerated 711 below. Formally: 713 encoding := "Content-Transfer-Encoding" ":" mechanism 715 mechanism := "7bit" ; case-insensitive 716 / "quoted-printable" 717 / "base64" 718 / "8bit" 719 / "binary" 720 / x-token 722 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 724 These values are not case sensitive. That is, Base64 and 725 BASE64 and bAsE64 are all equivalent. An encoding type of 726 7BIT requires that the body is already in a seven-bit mail- 727 ready representation. This is the default value -- that is, 728 "Content-Transfer-Encoding: 7BIT" is assumed if the 729 Content-Transfer-Encoding header field is not present. 731 The values "8bit", "7bit", and "binary" all mean that NO 732 encoding has been performed. However, they are potentially 733 useful as indications of the kind of data contained in the 734 object, and therefore of the kind of encoding that might 735 need to be performed for transmission in a given transport 736 system. In particular: 738 "7bit" means that the data is all represented as short 739 lines of US-ASCII data. 740 "8bit" means that the lines are short, but there may be 741 non-ASCII characters (octets with the high-order 742 bit set). 743 "Binary" means that not only may non-ASCII characters 744 be present, but also that the lines are not 745 necessarily short enough for SMTP transport. 747 The difference between "8bit" (or any other conceivable 748 bit-width token) and the "binary" token is that "binary" 749 does not require adherence to any limits on line length or 750 to the SMTP CRLF semantics, while the bit-width tokens do 751 require such adherence. If the body contains data in any 752 bit-width other than 7-bit, the appropriate bit-width 753 Content-Transfer-Encoding token must be used (e.g., "8bit" 754 for unencoded 8 bit wide data). If the body contains binary 755 data, the "binary" Content-Transfer-Encoding token must be 756 used. 758 NOTE: The distinction between the Content- 759 Transfer-Encoding values of "binary", "8bit", etc. 760 may seem unimportant, in that all of them really 761 mean "none" -- that is, there has been no encoding 762 of the data for transport. However, clear 763 labeling will be of enormous value to gateways 764 between future mail transport systems with 765 differing capabilities in transporting data that 766 do not meet the restrictions of RFC 821 transport. 768 Mail transport for unencoded 8-bit data is defined 769 in RFC-1426 [RFC-1426]. As of the publication of 771 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 773 this document, there are no standardized Internet 774 mail transports for which it is legitimate to 775 include unencoded binary data in mail bodies. 776 Thus there are no circumstances in which the 777 "binary" Content-Transfer-Encoding is actually 778 legal on the Internet. However, in the event that 779 binary mail transport becomes a reality in 780 Internet mail, or when this document is used in 781 conjunction with any other binary-capable 782 transport mechanism, binary bodies should be 783 labeled as such using this mechanism. 785 NOTE: The five values defined for the Content- 786 Transfer-Encoding field imply nothing about the 787 Content-Type other than the algorithm by which it 788 was encoded or the transport system requirements 789 if unencoded. 791 Implementors may, if necessary, define new Content- 792 Transfer-Encoding values, but must use an x-token, which is 793 a name prefixed by "X-" to indicate its non-standard status, 794 e.g., "Content-Transfer-Encoding: x-my-new-encoding". 795 However, unlike Content-Types and subtypes, the creation of 796 new Content-Transfer-Encoding values is explicitly and 797 strongly discouraged, as it seems likely to hinder 798 interoperability with little potential benefit. Their use 799 is allowed only as the result of an agreement between 800 cooperating user agents. 802 If a Content-Transfer-Encoding header field appears as part 803 of a message header, it applies to the entire body of that 804 message. If a Content-Transfer-Encoding header field 805 appears as part of a body part's headers, it applies only to 806 the body of that body part. If an entity is of type 807 "multipart" or "message", the Content-Transfer-Encoding is 808 not permitted to have any value other than a bit width 809 (e.g., "7bit", "8bit", etc.) or "binary". 811 It should be noted that email is character-oriented, so that 812 the mechanisms described here are mechanisms for encoding 813 arbitrary octet streams, not bit streams. If a bit stream 814 is to be encoded via one of these mechanisms, it must first 815 be converted to an 8-bit byte stream using the network 816 standard bit order ("big-endian"), in which the earlier bits 817 in a stream become the higher-order bits in a byte. A bit 818 stream not ending at an 8-bit boundary must be padded with 820 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 822 zeroes. This document provides a mechanism for noting the 823 addition of such padding in the case of the application 824 Content-Type, which has a "padding" parameter. 826 The encoding mechanisms defined here explicitly encode all 827 data in ASCII. Thus, for example, suppose an entity has 828 header fields such as: 830 Content-Type: text/plain; charset=ISO-8859-1 831 Content-transfer-encoding: base64 833 This must be interpreted to mean that the body is a base64 834 ASCII encoding of data that was originally in ISO-8859-1, 835 and will be in that character set again after decoding. 837 The following sections will define the two standard encoding 838 mechanisms. The definition of new content-transfer- 839 encodings is explicitly discouraged and should only occur 840 when absolutely necessary. All content-transfer-encoding 841 namespace except that beginning with "X-" is explicitly 842 reserved to the IANA for future use. Private agreements 843 about content-transfer-encodings are also explicitly 844 discouraged. 846 Certain Content-Transfer-Encoding values may only be used on 847 certain Content-Types. In particular, it is expressly 848 forbidden to use any encodings other than "7bit", "8bit", or 849 "binary" with any Content-Type that recursively includes 850 other Content-Type fields, notably the "multipart" and 851 "message" Content-Types. All encodings that are desired for 852 bodies of type multipart or message must be done at the 853 innermost level, by encoding the actual body that needs to 854 be encoded. 856 It should also be noted that, by definition, if a 857 "multipart" or "message" entity has a transfer-encoding 858 value such as "7bit", but one of the enclosed parts has a 859 less restrictive value such as "8bit", then either the outer 860 "7bit" labelling is in error, because 8 bit data are 861 included, or the inner "8bit" labelling placed an 862 unnecessarily high demand on the transport system because 863 the actual included data were actually 7bit-safe. 865 NOTE ON ENCODING RESTRICTIONS: Though the 866 prohibition against using content-transfer- 867 encodings on data of type multipart or message may 869 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 871 seem overly restrictive, it is necessary to 872 prevent nested encodings, in which data are passed 873 through an encoding algorithm multiple times, and 874 must be decoded multiple times in order to be 875 properly viewed. Nested encodings add 876 considerable complexity to user agents: aside 877 from the obvious efficiency problems with such 878 multiple encodings, they can obscure the basic 879 structure of a message. In particular, they can 880 imply that several decoding operations are 881 necessary simply to find out what types of objects 882 a message contains. Banning nested encodings may 883 complicate the job of certain mail gateways, but 884 this seems less of a problem than the effect of 885 nested encodings on user agents. 887 NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND 888 CONTENT-TRANSFER-ENCODING: It may seem that the 889 Content-Transfer-Encoding could be inferred from 890 the characteristics of the Content-Type that is to 891 be encoded, or, at the very least, that certain 892 Content-Transfer-Encodings could be mandated for 893 use with specific Content-Types. There are several 894 reasons why this is not the case. First, given the 895 varying types of transports used for mail, some 896 encodings may be appropriate for some Content- 897 Type/transport combinations and not for others. 898 (For example, in an 8-bit transport, no encoding 899 would be required for text in certain character 900 sets, while such encodings are clearly required 901 for 7-bit SMTP.) 903 Second, certain Content-Types may require 904 different types of transfer encoding under 905 different circumstances. For example, many 906 PostScript bodies might consist entirely of short 907 lines of 7-bit data and hence require little or no 908 encoding. Other PostScript bodies (especially 909 those using Level 2 PostScript's binary encoding 910 mechanism) may only be reasonably represented 911 using a binary transport encoding. Finally, since 912 Content-Type is intended to be an open-ended 913 specification mechanism, strict specification of 914 an association between Content-Types and encodings 915 effectively couples the specification of an 916 application protocol with a specific lower-level 918 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 920 transport. This is not desirable since the 921 developers of a Content-Type should not have to be 922 aware of all the transports in use and what their 923 limitations are. 925 NOTE ON TRANSLATING ENCODINGS: The quoted- 926 printable and base64 encodings are designed so 927 that conversion between them is possible. The only 928 issue that arises in such a conversion is the 929 handling of line breaks. When converting from 930 quoted-printable to base64 a line break must be 931 converted into a CRLF sequence. Similarly, a CRLF 932 sequence in base64 data must be converted to a 933 quoted-printable line break, but ONLY when 934 converting text data. 936 NOTE ON CANONICAL ENCODING MODEL: There was some 937 confusion, in earlier drafts of this memo, 938 regarding the model for when email data was to be 939 converted to canonical form and encoded, and in 940 particular how this process would affect the 941 treatment of CRLFs, given that the representation 942 of newlines varies greatly from system to system, 943 and the relationship between content-transfer- 944 encodings and character sets. For this reason, a 945 canonical model for encoding is presented as 946 Appendix G. 948 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 950 5.1 Quoted-Printable Content-Transfer-Encoding 952 The Quoted-Printable encoding is intended to represent data 953 that largely consists of octets that correspond to printable 954 characters in the ASCII character set. It encodes the data 955 in such a way that the resulting octets are unlikely to be 956 modified by mail transport. If the data being encoded are 957 mostly ASCII text, the encoded form of the data remains 958 largely recognizable by humans. A body which is entirely 959 ASCII may also be encoded in Quoted-Printable to ensure the 960 integrity of the data should the message pass through a 961 character-translating, and/or line-wrapping gateway. 963 In this encoding, octets are to be represented as determined 964 by the following rules: 966 Rule #1: (General 8-bit representation) Any octet, 967 except those indicating a line break according to the 968 newline convention of the canonical (standard) form of 969 the data being encoded, may be represented by an "=" 970 followed by a two digit hexadecimal representation of 971 the octet's value. The digits of the hexadecimal 972 alphabet, for this purpose, are "0123456789ABCDEF". 973 Uppercase letters must be used when sending hexadecimal 974 data, though a robust implementation may choose to 975 recognize lowercase letters on receipt. Thus, for 976 example, the value 12 (ASCII form feed) can be 977 represented by "=0C", and the value 61 (ASCII EQUAL 978 SIGN) can be represented by "=3D". Except when the 979 following rules allow an alternative encoding, this 980 rule is mandatory. 982 Rule #2: (Literal representation) Octets with decimal 983 values of 33 through 60 inclusive, and 62 through 126, 984 inclusive, MAY be represented as the ASCII characters 985 which correspond to those octets (EXCLAMATION POINT 986 through LESS THAN, and GREATER THAN through TILDE, 987 respectively). 989 Rule #3: (White Space): Octets with values of 9 and 32 990 MAY be represented as ASCII TAB (HT) and SPACE 991 characters, respectively, but MUST NOT be so 992 represented at the end of an encoded line. Any TAB (HT) 993 or SPACE characters on an encoded line MUST thus be 994 followed on that line by a printable character. In 995 particular, an "=" at the end of an encoded line, 997 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 999 indicating a soft line break (see rule #5) may follow 1000 one or more TAB (HT) or SPACE characters. It follows 1001 that an octet with value 9 or 32 appearing at the end 1002 of an encoded line must be represented according to 1003 Rule #1. This rule is necessary because some MTAs 1004 (Message Transport Agents, programs which transport 1005 messages from one user to another, or perform a part of 1006 such transfers) are known to pad lines of text with 1007 SPACEs, and others are known to remove "white space" 1008 characters from the end of a line. Therefore, when 1009 decoding a Quoted-Printable body, any trailing white 1010 space on a line must be deleted, as it will necessarily 1011 have been added by intermediate transport agents. 1013 Rule #4 (Line Breaks): A line break in a text body, 1014 independent of what its representation is following the 1015 canonical representation of the data being encoded, 1016 must be represented by a (RFC 822) line break, which is 1017 a CRLF sequence, in the Quoted-Printable encoding. 1018 Since the canonical representation of types other than 1019 text do not generally include the representation of 1020 line breaks, no hard line breaks (i.e. line breaks that 1021 are intended to be meaningful and to be displayed to 1022 the user) should occur in the quoted-printable encoding 1023 of such types. Of course, occurrences of "=0D", "=0A", 1024 "=0A=0D" and "=0D=0A" will eventually be encountered. 1025 In general, however, base64 is preferred over quoted- 1026 printable for binary data. 1028 Note that many implementations may elect to encode the 1029 local representation of various content types directly, 1030 as described in Appendix G. In particular, this may 1031 apply to plain text material on systems that use 1032 newline conventions other than CRLF delimiters. Such an 1033 implementation is permissible, but the generation of 1034 line breaks must be generalized to account for the case 1035 where alternate representations of newline sequences 1036 are used. 1038 Rule #5 (Soft Line Breaks): The Quoted-Printable 1039 encoding REQUIRES that encoded lines be no more than 76 1040 characters long. If longer lines are to be encoded with 1041 the Quoted-Printable encoding, 'soft' line breaks must 1042 be used. An equal sign as the last character on a 1043 encoded line indicates such a non-significant ('soft') 1044 line break in the encoded text. Thus if the "raw" form 1046 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1048 of the line is a single unencoded line that says: 1050 Now's the time for all folk to come to the aid of 1051 their country. 1053 This can be represented, in the Quoted-Printable 1054 encoding, as 1056 Now's the time = 1057 for all folk to come= 1058 to the aid of their country. 1060 This provides a mechanism with which long lines are 1061 encoded in such a way as to be restored by the user 1062 agent. The 76 character limit does not count the 1063 trailing CRLF, but counts all other characters, 1064 including any equal signs. 1066 Since the hyphen character ("-") is represented as itself in 1067 the Quoted-Printable encoding, care must be taken, when 1068 encapsulating a quoted-printable encoded body in a multipart 1069 entity, to ensure that the encapsulation boundary does not 1070 appear anywhere in the encoded body. (A good strategy is to 1071 choose a boundary that includes a character sequence such as 1072 "=_" which can never appear in a quoted-printable body. See 1073 the definition of multipart messages later in this 1074 document.) 1076 NOTE: The quoted-printable encoding represents 1077 something of a compromise between readability and 1078 reliability in transport. Bodies encoded with the 1079 quoted-printable encoding will work reliably over 1080 most mail gateways, but may not work perfectly 1081 over a few gateways, notably those involving 1082 translation into EBCDIC. (In theory, an EBCDIC 1083 gateway could decode a quoted-printable body and 1084 re-encode it using base64, but such gateways do 1085 not yet exist.) A higher level of confidence is 1086 offered by the base64 Content-Transfer-Encoding. 1087 A way to get reasonably reliable transport through 1088 EBCDIC gateways is to also quote the ASCII 1089 characters 1091 !"#$@[\]^`{|}~ 1093 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1095 according to rule #1. See Appendix B for more 1096 information. 1098 Because quoted-printable data is generally assumed to be 1099 line-oriented, it is to be expected that the representation 1100 of the breaks between the lines of quoted printable data may 1101 be altered in transport, in the same manner that plain text 1102 mail has always been altered in Internet mail when passing 1103 between systems with differing newline conventions. If such 1104 alterations are likely to constitute a corruption of the 1105 data, it is probably more sensible to use the base64 1106 encoding rather than the quoted-printable encoding. 1108 WARNING TO IMPLEMENTORS: If binary data are encoded in 1109 quoted-printable, care must be taken to encode CR and LF 1110 characters as "=0D" and "=0A", respectively. In particular, 1111 a CRLF sequence in binary data should be encoded as 1112 "=0D=0A". Otherwise, if CRLF were represented as a hard 1113 line break, it might be incorrectly decoded on platforms 1114 with different line break conventions. 1116 For formalists, the syntax of quoted-printable data is 1117 described by the following grammar: 1119 quoted-printable := ([*(ptext / SPACE / TAB) ptext] ["="] 1120 CRLF) 1121 ; Maximum line length of 76 characters excluding CRLF 1123 ptext := octet / 127, =, SPACE, or 1130 TAB, 1131 ; and is recommended for any characters not listed in 1132 ; Appendix B as "mail-safe". 1134 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1136 5.2 Base64 Content-Transfer-Encoding 1138 The Base64 Content-Transfer-Encoding is designed to 1139 represent arbitrary sequences of octets in a form that need 1140 not be humanly readable. The encoding and decoding 1141 algorithms are simple, but the encoded data are consistently 1142 only about 33 percent larger than the unencoded data. This 1143 encoding is virtually identical to the one used in Privacy 1144 Enhanced Mail (PEM) applications, as defined in RFC 1421. 1145 The base64 encoding is adapted from RFC 1421, with one 1146 change: base64 eliminates the "*" mechanism for embedded 1147 clear text. 1149 A 65-character subset of US-ASCII is used, enabling 6 bits 1150 to be represented per printable character. (The extra 65th 1151 character, "=", is used to signify a special processing 1152 function.) 1154 NOTE: This subset has the important property that 1155 it is represented identically in all versions of 1156 ISO 646, including US ASCII, and all characters in 1157 the subset are also represented identically in all 1158 versions of EBCDIC. Other popular encodings, 1159 such as the encoding used by the uuencode utility 1160 and the base85 encoding specified as part of Level 1161 2 PostScript, do not share these properties, and 1162 thus do not fulfill the portability requirements a 1163 binary transport encoding for mail must meet. 1165 The encoding process represents 24-bit groups of input bits 1166 as output strings of 4 encoded characters. Proceeding from 1167 left to right, a 24-bit input group is formed by 1168 concatenating 3 8-bit input groups. These 24 bits are then 1169 treated as 4 concatenated 6-bit groups, each of which is 1170 translated into a single digit in the base64 alphabet. When 1171 encoding a bit stream via the base64 encoding, the bit 1172 stream must be presumed to be ordered with the most- 1173 significant-bit first. That is, the first bit in the stream 1174 will be the high-order bit in the first byte, and the eighth 1175 bit will be the low-order bit in the first byte, and so on. 1177 Each 6-bit group is used as an index into an array of 64 1178 printable characters. The character referenced by the index 1179 is placed in the output string. These characters, identified 1180 in Table 1, below, are selected so as to be universally 1181 representable, and the set excludes characters with 1183 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1185 particular significance to SMTP (e.g., ".", CR, LF) and to 1186 the encapsulation boundaries defined in this document (e.g., 1187 "-"). 1189 Table 1: The Base64 Alphabet 1191 Value Encoding Value Encoding Value Encoding Value 1192 Encoding 1193 0 A 17 R 34 i 51 z 1194 1 B 18 S 35 j 52 0 1195 2 C 19 T 36 k 53 1 1196 3 D 20 U 37 l 54 2 1197 4 E 21 V 38 m 55 3 1198 5 F 22 W 39 n 56 4 1199 6 G 23 X 40 o 57 5 1200 7 H 24 Y 41 p 58 6 1201 8 I 25 Z 42 q 59 7 1202 9 J 26 a 43 r 60 8 1203 10 K 27 b 44 s 61 9 1204 11 L 28 c 45 t 62 + 1205 12 M 29 d 46 u 63 / 1206 13 N 30 e 47 v 1207 14 O 31 f 48 w (pad) = 1208 15 P 32 g 49 x 1209 16 Q 33 h 50 y 1211 The output stream (encoded bytes) must be represented in 1212 lines of no more than 76 characters each. All line breaks 1213 or other characters not found in Table 1 must be ignored by 1214 decoding software. In base64 data, characters other than 1215 those in Table 1, line breaks, and other white space 1216 probably indicate a transmission error, about which a 1217 warning message or even a message rejection might be 1218 appropriate under some circumstances. 1220 Special processing is performed if fewer than 24 bits are 1221 available at the end of the data being encoded. A full 1222 encoding quantum is always completed at the end of a body. 1223 When fewer than 24 input bits are available in an input 1224 group, zero bits are added (on the right) to form an 1225 integral number of 6-bit groups. Padding at the end of the 1226 data is performed using the '=' character. Since all 1227 base64 input is an integral number of octets, only the 1228 following cases can arise: (1) the final quantum of encoding 1229 input is an integral multiple of 24 bits; here, the final 1230 unit of encoded output will be an integral multiple of 4 1232 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1234 characters with no "=" padding, (2) the final quantum of 1235 encoding input is exactly 8 bits; here, the final unit of 1236 encoded output will be two characters followed by two "=" 1237 padding characters, or (3) the final quantum of encoding 1238 input is exactly 16 bits; here, the final unit of encoded 1239 output will be three characters followed by one "=" padding 1240 character. 1242 Because it is used only for padding at the end of the data, 1243 the occurrence of any '=' characters may be taken as 1244 evidence that the end of the data has been reached (without 1245 truncation in transit). No such assurance is possible, 1246 however, when the number of octets transmitted was a 1247 multiple of three. 1249 Any characters outside of the base64 alphabet are to be 1250 ignored in base64-encoded data. The same applies to any 1251 illegal sequence of characters in the base64 encoding, such 1252 as "=====" 1254 Care must be taken to use the proper octets for line breaks 1255 if base64 encoding is applied directly to text material that 1256 has not been converted to canonical form. In particular, 1257 text line breaks must be converted into CRLF sequences prior 1258 to base64 encoding. The important thing to note is that this 1259 may be done directly by the encoder rather than in a prior 1260 canonicalization step in some implementations. 1262 NOTE: There is no need to worry about quoting 1263 apparent encapsulation boundaries within base64- 1264 encoded parts of multipart entities because no 1265 hyphen characters are used in the base64 encoding. 1267 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1269 6 Additional Content- Header Fields 1271 6.1 Optional Content-ID Header Field 1273 In constructing a high-level user agent, it may be desirable 1274 to allow one body to make reference to another. 1275 Accordingly, bodies may be labeled using the "Content-ID" 1276 header field, which is syntactically identical to the 1277 "Message-ID" header field: 1279 id := "Content-ID" ":" msg-id 1281 Like the Message-ID values, Content-ID values must be 1282 generated to be world-unique. 1284 The Content-ID value may be used for uniquely identifying 1285 MIME entities in several contexts, particularly for cacheing 1286 data referenced by the message/external-body mechanism. 1287 Although the Content-ID header is generally optional, its 1288 use is mandatory in implementations which generate data of 1289 the optional MIME Content-type "message/external-body". 1290 That is, each message/external-body entity must have a 1291 Content-ID field to permit cacheing of such data. 1293 It is also worth noting that the Content-ID value has 1294 special semantics in the case of the multipart/alternative 1295 content-type. This is explained in the section of this 1296 document dealing with multipart/alternative. 1298 6.2 Optional Content-Description Header Field 1300 The ability to associate some descriptive information with a 1301 given body is often desirable. For example, it may be 1302 useful to mark an "image" body as "a picture of the Space 1303 Shuttle Endeavor." Such text may be placed in the Content- 1304 Description header field. 1306 description := "Content-Description" ":" *text 1308 The description is presumed to be given in the US-ASCII 1309 character set, although the mechanism specified in [RFC- 1310 1522] may be used for non-US-ASCII Content-Description 1311 values. 1313 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1315 7 The Predefined Content-Type Values 1317 This document defines seven initial Content-Type values and 1318 an extension mechanism for private or experimental types. 1319 Further standard types must be defined by new published 1320 specifications. It is expected that most innovation in new 1321 types of mail will take place as subtypes of the seven types 1322 defined here. The most essential characteristics of the 1323 seven content-types are summarized in Appendix F. 1325 7.1 The Text Content-Type 1327 The text Content-Type is intended for sending material which 1328 is principally textual in form. It is the default Content- 1329 Type. A "charset" parameter may be used to indicate the 1330 character set of the body text for some text subtypes, 1331 notably including the primary subtype, "text/plain", which 1332 indicates plain (unformatted) text. The default Content- 1333 Type for Internet mail is "text/plain; charset=us-ascii". 1335 Beyond plain text, there are many formats for representing 1336 what might be known as "extended text" -- text with embedded 1337 formatting and presentation information. An interesting 1338 characteristic of many such representations is that they are 1339 to some extent readable even without the software that 1340 interprets them. It is useful, then, to distinguish them, 1341 at the highest level, from such unreadable data as images, 1342 audio, or text represented in an unreadable form. In the 1343 absence of appropriate interpretation software, it is 1344 reasonable to show subtypes of text to the user, while it is 1345 not reasonable to do so with most nontextual data. 1347 Such formatted textual data should be represented using 1348 subtypes of text. Plausible subtypes of text are typically 1349 given by the common name of the representation format, e.g., 1350 "text/richtext" [RFC-1341]. 1352 7.1.1 The charset parameter 1354 A critical parameter that may be specified in the Content- 1355 Type field for text/plain data is the character set. This 1356 is specified with a "charset" parameter, as in: 1358 Content-type: text/plain; charset=us-ascii 1360 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1362 Unlike some other parameter values, the values of the 1363 charset parameter are NOT case sensitive. The default 1364 character set, which must be assumed in the absence of a 1365 charset parameter, is US-ASCII. 1367 The specification for any future subtypes of "text" must 1368 specify whether or not they will also utilize a "charset" 1369 parameter, and may possibly restrict its values as well. 1370 When used with a particular body, the semantics of the 1371 "charset" parameter should be identical to those specified 1372 here for "text/plain", i.e., the body consists entirely of 1373 characters in the given charset. In particular, definers of 1374 future text subtypes should pay close attention the the 1375 implications of multibyte character sets for their subtype 1376 definitions. 1378 This RFC specifies the definition of the charset parameter 1379 for the purposes of MIME to be a unique mapping of a byte 1380 stream to glyphs, a mapping which does not require external 1381 profiling information. 1383 An initial list of predefined character set names can be 1384 found at the end of this section. Additional character sets 1385 may be registered with IANA, although the standardization of 1386 their use requires the usual IAB review and approval. Note 1387 that if the specified character set includes 8-bit data, a 1388 Content-Transfer-Encoding header field and a corresponding 1389 encoding on the data are required in order to transmit the 1390 body via some mail transfer protocols, such as SMTP. 1392 The default character set, US-ASCII, has been the subject of 1393 some confusion and ambiguity in the past. Not only were 1394 there some ambiguities in the definition, there have been 1395 wide variations in practice. In order to eliminate such 1396 ambiguity and variations in the future, it is strongly 1397 recommended that new user agents explicitly specify a 1398 character set via the Content-Type header field. "US-ASCII" 1399 does not indicate an arbitrary seven-bit character code, but 1400 specifies that the body uses character coding that uses the 1401 exact correspondence of codes to characters specified in 1402 ASCII. National use variations of ISO 646 [ISO-646] are NOT 1403 ASCII and their use in Internet mail is explicitly 1404 discouraged. The omission of the ISO 646 character set is 1405 deliberate in this regard. The character set name of "US- 1406 ASCII" explicitly refers to ANSI X3.4-1986 [US-ASCII] only. 1407 The character set name "ASCII" is reserved and must not be 1409 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1411 used for any purpose. 1413 NOTE: RFC 821 explicitly specifies "ASCII", and 1414 references an earlier version of the American 1415 Standard. Insofar as one of the purposes of 1416 specifying a Content-Type and character set is to 1417 permit the receiver to unambiguously determine how 1418 the sender intended the coded message to be 1419 interpreted, assuming anything other than "strict 1420 ASCII" as the default would risk unintentional and 1421 incompatible changes to the semantics of messages 1422 now being transmitted. This also implies that 1423 messages containing characters coded according to 1424 national variations on ISO 646, or using code- 1425 switching procedures (e.g., those of ISO 2022), as 1426 well as 8-bit or multiple octet character 1427 encodings MUST use an appropriate character set 1428 specification to be consistent with this 1429 specification. 1431 The complete US-ASCII character set is listed in [US-ASCII]. 1432 Note that the control characters including DEL (0-31, 127) 1433 have no defined meaning apart from the combination CRLF 1434 (ASCII values 13 and 10) indicating a new line. Two of the 1435 characters have de facto meanings in wide use: FF (12) often 1436 means "start subsequent text on the beginning of a new 1437 page"; and TAB or HT (9) often (though not always) means 1438 "move the cursor to the next available column after the 1439 current position where the column number is a multiple of 8 1440 (counting the first column as column 0)." Apart from this, 1441 any use of the control characters or DEL in a body must be 1442 part of a private agreement between the sender and 1443 recipient. Such private agreements are discouraged and 1444 should be replaced by the other capabilities of this 1445 document. 1447 NOTE: Beyond US-ASCII, an enormous proliferation 1448 of character sets is possible. It is the opinion 1449 of the IETF working group that a large number of 1450 character sets is NOT a good thing. We would 1451 prefer to specify a single character set that can 1452 be used universally for representing all of the 1453 world's languages in electronic mail. 1454 Unfortunately, existing practice in several 1455 communities seems to point to the continued use of 1456 multiple character sets in the near future. For 1458 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1460 this reason, we define names for a small number of 1461 character sets for which a strong constituent base 1462 exists. 1464 The defined charset values are: 1466 US-ASCII -- as defined in [US-ASCII]. 1468 ISO-8859-X -- where "X" is to be replaced, as 1469 necessary, for the parts of ISO-8859 [ISO- 1470 8859]. Note that the ISO 646 character sets 1471 have deliberately been omitted in favor of 1472 their 8859 replacements, which are the 1473 designated character sets for Internet mail. 1474 As of the publication of this document, the 1475 legitimate values for "X" are the digits 1 1476 through 9. 1478 The character sets specified above are the ones that were 1479 relatively uncontroversial during the drafting of MIME. 1480 This document does not endorse the use of any particular 1481 character set other than US-ASCII, and recognizes that the 1482 future evolution of world character sets remains unclear. 1483 It is expected that in the future, additional character sets 1484 will be registered for use in MIME. 1486 Note that the character set used, if anything other than 1487 US-ASCII, must always be explicitly specified in the 1488 Content-Type field. 1490 No other character set name may be used in Internet mail 1491 without the publication of a formal specification and its 1492 registration with IANA, or by private agreement, in which 1493 case the character set name must begin with "X-". 1495 Implementors are discouraged from defining new character 1496 sets for mail use unless absolutely necessary. 1498 The "charset" parameter has been defined primarily for the 1499 purpose of textual data, and is described in this section 1500 for that reason. However, it is conceivable that non- 1501 textual data might also wish to specify a charset value for 1502 some purpose, in which case the same syntax and values 1503 should be used. 1505 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1507 In general, mail-sending software must always use the 1508 "lowest common denominator" character set possible. For 1509 example, if a body contains only US-ASCII characters, it 1510 must be marked as being in the US-ASCII character set, not 1511 ISO-8859-1, which, like all the ISO-8859 family of character 1512 sets, is a superset of US-ASCII. More generally, if a 1513 widely-used character set is a subset of another character 1514 set, and a body contains only characters in the widely-used 1515 subset, it must be labeled as being in that subset. This 1516 will increase the chances that the recipient will be able to 1517 view the mail correctly. 1519 7.1.2 The Text/plain subtype 1521 The primary subtype of text is "plain". This indicates 1522 plain (unformatted) text. The default Content-Type for 1523 Internet mail, "text/plain; charset=us-ascii", describes 1524 existing Internet practice. That is, it is the type of body 1525 defined by RFC 822. 1527 No other text subtype is defined by this document. 1529 The formal grammar for the content-type header field for 1530 text is as follows: 1532 text-type := "text" "/" text-subtype [";" "charset" "=" 1533 charset] 1535 text-subtype := "plain" / extension-token 1537 charset := "us-ascii" / "iso-8859-1" / "iso-8859-2" / "iso- 1538 8859-3" 1539 / "iso-8859-4" / "iso-8859-5" / "iso-8859-6" / "iso- 1540 8859-7" 1541 / "iso-8859-8" / "iso-8859-9" / extension-token 1542 ; case insensitive 1544 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1546 7.2 The Multipart Content-Type 1548 In the case of multiple part entities, in which one or more 1549 different sets of data are combined in a single body, a 1550 "multipart" Content-Type field must appear in the entity's 1551 header. The body must then contain one or more "body parts," 1552 each preceded by an encapsulation boundary, and the last one 1553 followed by a closing boundary. Each part starts with an 1554 encapsulation boundary, and then contains a body part 1555 consisting of header area, a blank line, and a body area. 1556 Thus a body part is similar to an RFC 822 message in syntax, 1557 but different in meaning. 1559 A body part is NOT to be interpreted as actually being an 1560 RFC 822 message. To begin with, NO header fields are 1561 actually required in body parts. A body part that starts 1562 with a blank line, therefore, is allowed and is a body part 1563 for which all default values are to be assumed. In such a 1564 case, the absence of a Content-Type header field implies 1565 that the corresponding body is plain US-ASCII text. The 1566 only header fields that have defined meaning for body parts 1567 are those the names of which begin with "Content-". All 1568 other header fields are generally to be ignored in body 1569 parts. Although they should generally be retained in mail 1570 processing, they may be discarded by gateways if necessary. 1571 Such other fields are permitted to appear in body parts but 1572 must not be depended on. "X-" fields may be created for 1573 experimental or private purposes, with the recognition that 1574 the information they contain may be lost at some gateways. 1576 NOTE: The distinction between an RFC 822 message 1577 and a body part is subtle, but important. A 1578 gateway between Internet and X.400 mail, for 1579 example, must be able to tell the difference 1580 between a body part that contains an image and a 1581 body part that contains an encapsulated message, 1582 the body of which is an image. In order to 1583 represent the latter, the body part must have 1584 "Content-Type: message", and its body (after the 1585 blank line) must be the encapsulated message, with 1586 its own "Content-Type: image" header field. The 1587 use of similar syntax facilitates the conversion 1588 of messages to body parts, and vice versa, but the 1589 distinction between the two must be understood by 1590 implementors. (For the special case in which all 1591 parts actually are messages, a "digest" subtype is 1593 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1595 also defined.) 1597 As stated previously, each body part is preceded by an 1598 encapsulation boundary. The encapsulation boundary MUST NOT 1599 appear inside any of the encapsulated parts. Thus, it is 1600 crucial that the composing agent be able to choose and 1601 specify the unique boundary that will separate the parts. 1603 All present and future subtypes of the "multipart" type must 1604 use an identical syntax. Subtypes may differ in their 1605 semantics, and may impose additional restrictions on syntax, 1606 but must conform to the required syntax for the multipart 1607 type. This requirement ensures that all conformant user 1608 agents will at least be able to recognize and separate the 1609 parts of any multipart entity, even of an unrecognized 1610 subtype. 1612 As stated in the definition of the Content-Transfer-Encoding 1613 field, no encoding other than "7bit", "8bit", or "binary" is 1614 permitted for entities of type "multipart". The multipart 1615 delimiters and header fields are always represented as 7-bit 1616 ASCII in any case (though the header fields may encode non- 1617 ASCII header text as per [RFC-1522]), and data within the 1618 body parts can be encoded on a part-by-part basis, with 1619 Content-Transfer-Encoding fields for each appropriate body 1620 part. 1622 Mail gateways, relays, and other mail handling agents are 1623 commonly known to alter the top-level header of an RFC 822 1624 message. In particular, they frequently add, remove, or 1625 reorder header fields. Such alterations are explicitly 1626 forbidden for the body part headers embedded in the bodies 1627 of messages of type "multipart." 1629 7.2.1 Multipart: The common syntax 1631 All subtypes of "multipart" share a common syntax, defined 1632 in this section. A simple example of a multipart message 1633 also appears in this section. An example of a more complex 1634 multipart message is given in Appendix C. 1636 The Content-Type field for multipart entities requires one 1637 parameter, "boundary", which is used to specify the 1638 encapsulation boundary. The encapsulation boundary is 1639 defined as a line consisting entirely of two hyphen 1640 characters ("-", decimal code 45) followed by the boundary 1642 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1644 parameter value from the Content-Type header field. 1646 NOTE: The hyphens are for rough compatibility 1647 with the earlier RFC 934 method of message 1648 encapsulation, and for ease of searching for the 1649 boundaries in some implementations. However, it 1650 should be noted that multipart messages are NOT 1651 completely compatible with RFC 934 encapsulations; 1652 in particular, they do not obey RFC 934 quoting 1653 conventions for embedded lines that begin with 1654 hyphens. This mechanism was chosen over the RFC 1655 934 mechanism because the latter causes lines to 1656 grow with each level of quoting. The combination 1657 of this growth with the fact that SMTP 1658 implementations sometimes wrap long lines made the 1659 RFC 934 mechanism unsuitable for use in the event 1660 that deeply-nested multipart structuring is ever 1661 desired. 1663 WARNING TO IMPLEMENTORS: The grammar for parameters on the 1664 Content-type field is such that it is often necessary to 1665 enclose the boundaries in quotes on the Content-type line. 1666 This is not always necessary, but never hurts. Implementors 1667 should be sure to study the grammar carefully in order to 1668 avoid producing illegal Content-type fields. Thus, a 1669 typical multipart Content-Type header field might look like 1670 this: 1672 Content-Type: multipart/mixed; 1673 boundary=gc0p4Jq0M2Yt08jU534c0p 1675 But the following is illegal: 1677 Content-Type: multipart/mixed; 1678 boundary=gc0p4Jq0M:2Yt08jU534c0p 1680 (because of the colon) and must instead be represented as 1682 Content-Type: multipart/mixed; 1683 boundary="gc0p4Jq0M:2Yt08jU534c0p" 1685 This indicates that the entity consists of several parts, 1686 each itself with a structure that is syntactically identical 1687 to an RFC 822 message, except that the header area might be 1688 completely empty, and that the parts are each preceded by 1689 the line 1691 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1693 --gc0p4Jq0M:2Yt08jU534c0p 1695 Note that the encapsulation boundary must occur at the 1696 beginning of a line, i.e., following a CRLF, and that the 1697 initial CRLF is considered to be attached to the 1698 encapsulation boundary rather than part of the preceding 1699 part. The boundary must be followed immediately either by 1700 another CRLF and the header fields for the next part, or by 1701 two CRLFs, in which case there are no header fields for the 1702 next part (and it is therefore assumed to be of Content-Type 1703 text/plain). 1705 NOTE: The CRLF preceding the encapsulation line 1706 is conceptually attached to the boundary so that 1707 it is possible to have a part that does not end 1708 with a CRLF (line break). Body parts that must 1709 be considered to end with line breaks, therefore, 1710 must have two CRLFs preceding the encapsulation 1711 line, the first of which is part of the preceding 1712 body part, and the second of which is part of the 1713 encapsulation boundary. 1715 Encapsulation boundaries must not appear within the 1716 encapsulations, and must be no longer than 70 characters, 1717 not counting the two leading hyphens. 1719 The encapsulation boundary following the last body part is a 1720 distinguished delimiter that indicates that no further body 1721 parts will follow. Such a delimiter is identical to the 1722 previous delimiters, with the addition of two more hyphens 1723 at the end of the line: 1725 --gc0p4Jq0M2Yt08jU534c0p-- 1727 There appears to be room for additional information prior to 1728 the first encapsulation boundary and following the final 1729 boundary. These areas should generally be left blank, and 1730 implementations must ignore anything that appears before the 1731 first boundary or after the last one. 1733 NOTE: These "preamble" and "epilogue" areas are 1734 generally not used because of the lack of proper 1735 typing of these parts and the lack of clear 1736 semantics for handling these areas at gateways, 1737 particularly X.400 gateways. However, rather than 1738 leaving the preamble area blank, many MIME 1740 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1742 implementations have found this to be a convenient 1743 place to insert an explanatory note for recipients 1744 who read the message with pre-MIME software, since 1745 such notes will be ignored by MIME-compliant 1746 software. 1748 NOTE: Because encapsulation boundaries must not 1749 appear in the body parts being encapsulated, a 1750 user agent must exercise care to choose a unique 1751 boundary. The boundary in the example above could 1752 have been the result of an algorithm designed to 1753 produce boundaries with a very low probability of 1754 already existing in the data to be encapsulated 1755 without having to prescan the data. Alternate 1756 algorithms might result in more 'readable' 1757 boundaries for a recipient with an old user agent, 1758 but would require more attention to the 1759 possibility that the boundary might appear in the 1760 encapsulated part. The simplest boundary possible 1761 is something like "---", with a closing boundary 1762 of "-----". 1764 As a very simple example, the following multipart message 1765 has two parts, both of them plain text, one of them 1766 explicitly typed and one of them implicitly typed: 1768 From: Nathaniel Borenstein 1769 To: Ned Freed 1770 Subject: Sample message 1771 MIME-Version: 1.0 1772 Content-type: multipart/mixed; 1773 boundary="simple boundary" 1775 This is the preamble. It is to be ignored, though it 1776 is a handy place for mail composers to include an 1777 explanatory note to non-MIME conformant readers. 1778 --simple boundary 1780 This is implicitly typed plain ASCII text. 1781 It does NOT end with a linebreak. 1782 --simple boundary 1783 Content-type: text/plain; charset=us-ascii 1785 This is explicitly typed plain ASCII text. 1787 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1789 It DOES end with a linebreak. 1791 --simple boundary-- 1792 This is the epilogue. It is also to be ignored. 1794 The use of a Content-Type of multipart in a body part within 1795 another multipart entity is explicitly allowed. In such 1796 cases, for obvious reasons, care must be taken to ensure 1797 that each nested multipart entity must use a different 1798 boundary delimiter. See Appendix C for an example of nested 1799 multipart entities. 1801 The use of the multipart Content-Type with only a single 1802 body part may be useful in certain contexts, and is 1803 explicitly permitted. 1805 The only mandatory parameter for the multipart Content-Type 1806 is the boundary parameter, which consists of 1 to 70 1807 characters from a set of characters known to be very robust 1808 through email gateways, and NOT ending with white space. 1809 (If a boundary appears to end with white space, the white 1810 space must be presumed to have been added by a gateway, and 1811 must be deleted.) It is formally specified by the following 1812 BNF: 1814 boundary := 0*69 bcharsnospace 1816 bchars := bcharsnospace / " " 1818 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" / 1819 "_" 1820 / "," / "-" / "." / "/" / ":" / "=" / "?" 1822 Overall, the body of a multipart entity may be specified as 1823 follows: 1825 multipart-body := preamble 1*encapsulation 1826 close-delimiter epilogue 1828 encapsulation := delimiter body-part CRLF 1830 delimiter := "--" boundary CRLF ; taken from Content-Type 1831 field. 1832 ; There must be no space 1834 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1836 ; between "--" and boundary. 1838 close-delimiter := "--" boundary "--" CRLF 1839 ; Again, no space by "--", 1841 preamble := discard-text ; to be ignored 1842 upon receipt. 1844 epilogue := discard-text ; to be ignored 1845 upon receipt. 1847 discard-text := *(*text CRLF) 1849 body-part := <"message" as defined in RFC 822, 1850 with all header fields optional, and with the 1851 specified delimiter not occurring anywhere in 1852 the message body, either on a line by itself 1853 or as a substring anywhere. Note that the 1854 semantics of a part differ from the semantics 1855 of a message, as described in the text.> 1857 NOTE: In certain transport enclaves, RFC 822 1858 restrictions such as the one that limits bodies to 1859 printable ASCII characters may not be in force. (That 1860 is, the transport domains may resemble standard 1861 Internet mail transport as specified in RFC821 and 1862 assumed by RFC822, but without certain restrictions.) 1863 The relaxation of these restrictions should be 1864 construed as locally extending the definition of 1865 bodies, for example to include octets outside of the 1866 ASCII range, as long as these extensions are supported 1867 by the transport and adequately documented in the 1868 Content-Transfer-Encoding header field. However, in 1869 no event are headers (either message headers or body- 1870 part headers) allowed to contain anything other than 1871 ASCII characters. 1873 NOTE: Conspicuously missing from the multipart 1874 type is a notion of structured, related body 1875 parts. In general, it seems premature to try to 1876 standardize interpart structure yet. It is 1877 recommended that those wishing to provide a more 1878 structured or integrated multipart messaging 1879 facility should define a subtype of multipart that 1880 is syntactically identical, but that always 1881 expects the inclusion of a distinguished part that 1883 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1885 can be used to specify the structure and 1886 integration of the other parts, probably referring 1887 to them by their Content-ID field. If this 1888 approach is used, other implementations will not 1889 recognize the new subtype, but will treat it as 1890 the primary subtype (multipart/mixed) and will 1891 thus be able to show the user the parts that are 1892 recognized. 1894 7.2.2 The Multipart/mixed (primary) subtype 1896 The primary subtype for multipart, "mixed", is intended for 1897 use when the body parts are independent and need to be 1898 bundled in a particular order. Any multipart subtypes that 1899 an implementation does not recognize must be treated as 1900 being of subtype "mixed". 1902 7.2.3 The Multipart/alternative subtype 1904 The multipart/alternative type is syntactically identical to 1905 multipart/mixed, but the semantics are different. In 1906 particular, each of the parts is an "alternative" version of 1907 the same information. 1909 Systems should recognize that the content of the various 1910 parts are interchangeable. Systems should choose the 1911 "best" type based on the local environment and preferences, 1912 in some cases even through user interaction. As with 1913 multipart/mixed, the order of body parts is significant. In 1914 this case, the alternatives appear in an order of increasing 1915 faithfulness to the original content. In general, the best 1916 choice is the LAST part of a type supported by the recipient 1917 system's local environment. 1919 Multipart/alternative may be used, for example, to send mail 1920 in a fancy text format in such a way that it can easily be 1921 displayed anywhere: 1923 From: Nathaniel Borenstein 1924 To: Ned Freed 1925 Subject: Formatted text mail 1926 MIME-Version: 1.0 1927 Content-Type: multipart/alternative; boundary=boundary42 1929 --boundary42 1931 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1933 Content-Type: text/plain; charset=us-ascii 1935 ...plain text version of message goes here.... 1937 --boundary42 1938 Content-Type: text/richtext 1940 .... RFC 1341 richtext version of same message goes here ... 1942 --boundary42 1943 Content-Type: text/x-whatever 1945 .... fanciest version of same message goes here ... 1947 --boundary42-- 1949 In this example, users whose mail system understood the 1950 "text/x-whatever" format would see only the fancy version, 1951 while other users would see only the richtext or plain text 1952 version, depending on the capabilities of their system. 1954 In general, user agents that compose multipart/alternative 1955 entities must place the body parts in increasing order of 1956 preference, that is, with the preferred format last. For 1957 fancy text, the sending user agent should put the plainest 1958 format first and the richest format last. Receiving user 1959 agents should pick and display the last format they are 1960 capable of displaying. In the case where one of the 1961 alternatives is itself of type "multipart" and contains 1962 unrecognized sub-parts, the user agent may choose either to 1963 show that alternative, an earlier alternative, or both. 1965 NOTE: From an implementor's perspective, it might 1966 seem more sensible to reverse this ordering, and 1967 have the plainest alternative last. However, 1968 placing the plainest alternative first is the 1969 friendliest possible option when 1970 multipart/alternative entities are viewed using a 1971 non-MIME-conformant mail reader. While this 1972 approach does impose some burden on conformant 1973 mail readers, interoperability with older mail 1974 readers was deemed to be more important in this 1975 case. 1977 It may be the case that some user agents, if they can 1978 recognize more than one of the formats, will prefer to offer 1980 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 1982 the user the choice of which format to view. This makes 1983 sense, for example, if mail includes both a nicely-formatted 1984 image version and an easily-edited text version. What is 1985 most critical, however, is that the user not automatically 1986 be shown multiple versions of the same data. Either the 1987 user should be shown the last recognized version or should 1988 be given the choice. 1990 NOTE ON THE SEMANTICS OF CONTENT-ID IN 1991 MULTIPART/ALTERNATIVE: Each part of a multipart/alternative 1992 entity represents the same data, but the mappings between 1993 the two are not necessarily without information loss. For 1994 example, information is lost when translating ODA to 1995 PostScript or plain text. It is recommended that each part 1996 should have a different Content-ID value in the case where 1997 the information content of the two parts is not identical. 1998 However, where the information content is identical -- for 1999 example, where several parts of type "message/external-body" 2000 specify alternate ways to access the identical data -- the 2001 same Content-ID field value should be used, to optimize any 2002 cacheing mechanisms that might be present on the recipient's 2003 end. However, it is recommended that the Content-ID values 2004 used by the parts should not be the same Content-ID value 2005 that describes the multipart/alternative as a whole, if 2006 there is any such Content-ID field. That is, one Content-ID 2007 value will refer to the multipart/alternative entity, while 2008 one or more other Content-ID values will refer to the parts 2009 inside it. 2011 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2013 7.2.4 The Multipart/digest subtype 2015 This document defines a "digest" subtype of the multipart 2016 Content-Type. This type is syntactically identical to 2017 multipart/mixed, but the semantics are different. In 2018 particular, in a digest, the default Content-Type value for 2019 a body part is changed from "text/plain" to 2020 "message/rfc822". This is done to allow a more readable 2021 digest format that is largely compatible (except for the 2022 quoting convention) with RFC 934. 2024 A digest in this format might, then, look something like 2025 this: 2027 From: Moderator-Address 2028 To: Recipient-List 2029 MIME-Version: 1.0 2030 Subject: Internet Digest, volume 42 2031 Content-Type: multipart/digest; 2032 boundary="---- next message ----" 2034 ------ next message ---- 2036 From: someone-else 2037 Subject: my opinion 2039 ...body goes here ... 2041 ------ next message ---- 2043 From: someone-else-again 2044 Subject: my different opinion 2046 ... another body goes here... 2048 ------ next message ------ 2050 7.2.5 The Multipart/parallel subtype 2052 This document defines a "parallel" subtype of the multipart 2053 Content-Type. This type is syntactically identical to 2054 multipart/mixed, but the semantics are different. In 2055 particular, in a parallel entity, the order of body 2056 parts is not significant. 2058 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2060 A common presentation of this type is to display all of the 2061 parts simultaneously on hardware and software that are 2062 capable of doing so. However, composing agents should be 2063 aware that many mail readers will lack this capability and 2064 will show the parts serially in any event. 2066 7.2.6 Other Multipart subtypes 2068 Other multipart subtypes are expected in the future. MIME 2069 implementations must in general treat unrecognized subtypes 2070 of multipart as being equivalent to "multipart/mixed". 2072 The formal grammar for content-type header fields for 2073 multipart data is given by: 2075 multipart-type := "multipart" "/" multipart-subtype 2076 ";" "boundary" "=" boundary 2078 multipart-subtype := "mixed" / "parallel" / "digest" 2079 / "alternative" / extension-token 2081 7.3 The Message Content-Type 2083 It is frequently desirable, in sending mail, to encapsulate 2084 another mail message. For this common operation, a special 2085 Content-Type, "message", is defined. The primary subtype, 2086 message/rfc822, has no required parameters in the Content- 2087 Type field. Additional subtypes, "partial" and "External- 2088 body", do have required parameters. These subtypes are 2089 explained below. 2091 NOTE: It has been suggested that subtypes of 2092 message might be defined for forwarded or rejected 2093 messages. However, forwarded and rejected 2094 messages can be handled as multipart messages in 2095 which the first part contains any control or 2096 descriptive information, and a second part, of 2097 type message/rfc822, is the forwarded or rejected 2098 message. Composing rejection and forwarding 2099 messages in this manner will preserve the type 2100 information on the original message and allow it 2101 to be correctly presented to the recipient, and 2102 hence is strongly encouraged. 2104 As stated in the definition of the Content-Transfer-Encoding 2105 field, no encoding other than "7bit", "8bit", or "binary" is 2107 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2109 permitted for messages or parts of type "message". Even 2110 stronger restrictions apply to the subtypes 2111 "message/partial" and "message/external-body", as specified 2112 below. The message header fields are always US-ASCII in any 2113 case, and data within the body can still be encoded, in 2114 which case the Content-Transfer-Encoding header field in the 2115 encapsulated message will reflect this. Non-ASCII text in 2116 the headers of an encapsulated message can be specified 2117 using the mechanisms described in [RFC-1522]. 2119 Mail gateways, relays, and other mail handling agents are 2120 commonly known to alter the top-level header of an RFC 822 2121 message. In particular, they frequently add, remove, or 2122 reorder header fields. Such alterations are explicitly 2123 forbidden for the encapsulated headers embedded in the 2124 bodies of messages of type "message." 2126 7.3.1 The Message/rfc822 (primary) subtype 2128 A Content-Type of "message/rfc822" indicates that the body 2129 contains an encapsulated message, with the syntax of an RFC 2130 822 message. However, unlike top-level RFC 822 messages, 2131 the restriction that each message/rfc822 body must include a 2132 "From", "Date", and at least one destination header is 2133 removed and replaced with the requirement that at least one 2134 of "From", "Subject", or "Date" must be present. 2136 It should be noted that, despite the use of the numbers 2137 "822", a message/rfc822 entity can include enhanced 2138 information as defined in this document. In other words, a 2139 message/rfc822 message may be a MIME message. 2141 7.3.2 The Message/Partial subtype 2143 A subtype of message, "partial", is defined in order to 2144 allow large objects to be delivered as several separate 2145 pieces of mail and automatically reassembled by the 2146 receiving user agent. (The concept is similar to IP 2147 fragmentation/reassembly in the basic Internet Protocols.) 2148 This mechanism can be used when intermediate transport 2149 agents limit the size of individual messages that can be 2150 sent. Content-Type "message/partial" thus indicates that 2151 the body contains a fragment of a larger message. 2153 Three parameters must be specified in the Content-Type field 2154 of type message/partial: The first, "id", is a unique 2156 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2158 identifier, as close to a world-unique identifier as 2159 possible, to be used to match the parts together. (In 2160 general, the identifier is essentially a message-id; if 2161 placed in double quotes, it can be any message-id, in 2162 accordance with the BNF for "parameter" given earlier in 2163 this specification.) The second, "number", an integer, is 2164 the part number, which indicates where this part fits into 2165 the sequence of fragments. The third, "total", another 2166 integer, is the total number of parts. This third subfield 2167 is required on the final part, and is optional (though 2168 encouraged) on the earlier parts. Note also that these 2169 parameters may be given in any order. 2171 Thus, part 2 of a 3-part message may have either of the 2172 following header fields: 2174 Content-Type: Message/Partial; 2175 number=2; total=3; 2176 id="oc=jpbe0M2Yt4s@thumper.bellcore.com" 2178 Content-Type: Message/Partial; 2179 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; 2180 number=2 2182 But part 3 MUST specify the total number of parts: 2184 Content-Type: Message/Partial; 2185 number=3; total=3; 2186 id="oc=jpbe0M2Yt4s@thumper.bellcore.com" 2188 Note that part numbering begins with 1, not 0. 2190 When the parts of a message broken up in this manner are put 2191 together, the result is a complete MIME entity, which may 2192 have its own Content-Type header field, and thus may contain 2193 any other data type. 2195 Message fragmentation and reassembly: The semantics of a 2196 reassembled partial message must be those of the "inner" 2197 message, rather than of a message containing the inner 2198 message. This makes it possible, for example, to send a 2199 large audio message as several partial messages, and still 2200 have it appear to the recipient as a simple audio message 2201 rather than as an encapsulated message containing an audio 2202 message. That is, the encapsulation of the message is 2203 considered to be "transparent". 2205 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2207 When generating and reassembling the parts of a 2208 message/partial message, the headers of the encapsulated 2209 message must be merged with the headers of the enclosing 2210 entities. In this process the following rules must be 2211 observed: 2213 (1) All of the header fields from the initial 2214 enclosing entity (part one), except those that 2215 start with "Content-" and the specific header 2216 fields "Subject", "Message-ID", "Encrypted", and 2217 "MIME-Version",must be copied, in order, to the 2218 new message. 2220 (2) Only those header fields in the enclosed 2221 message which start with "Content-" and "Subject", 2222 "Message-ID", "Encrypted", and "MIME-Version" must 2223 be appended, in order, to the header fields of the 2224 new message. Any header fields in the enclosed 2225 message which do not start with "Content-" (except 2226 for "Message-ID", "Encrypted", and "MIME-Version") 2227 will be ignored. 2229 (3) All of the header fields from the second and 2230 any subsequent messages will be ignored. 2232 For example, if an audio message is broken into two parts, 2233 the first part might look something like this: 2235 X-Weird-Header-1: Foo 2236 From: Bill@host.com 2237 To: joe@otherhost.com 2238 Subject: Audio mail (part 1 of 2) 2239 Message-ID: 2240 MIME-Version: 1.0 2241 Content-type: message/partial; 2242 id="ABC@host.com"; 2243 number=1; total=2 2245 X-Weird-Header-1: Bar 2246 X-Weird-Header-2: Hello 2247 Message-ID: 2248 Subject: Audio mail 2249 MIME-Version: 1.0 2250 Content-type: audio/basic 2252 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2254 Content-transfer-encoding: base64 2256 ... first half of encoded audio data goes here... 2258 and the second half might look something like this: 2260 From: Bill@host.com 2261 To: joe@otherhost.com 2262 Subject: Audio mail (part 2 of 2) 2263 MIME-Version: 1.0 2264 Message-ID: 2265 Content-type: message/partial; 2266 id="ABC@host.com"; number=2; total=2 2268 ... second half of encoded audio data goes here... 2270 Then, when the fragmented message is reassembled, the 2271 resulting message to be displayed to the user should look 2272 something like this: 2274 X-Weird-Header-1: Foo 2275 From: Bill@host.com 2276 To: joe@otherhost.com 2277 Subject: Audio mail 2278 Message-ID: 2279 MIME-Version: 1.0 2280 Content-type: audio/basic 2281 Content-transfer-encoding: base64 2283 ... first half of encoded audio data goes here... 2284 ... second half of encoded audio data goes here... 2286 Note on encoding of MIME entities encapsulated inside 2287 message/partial entities: Because data of type "message" 2288 may never be encoded in base64 or quoted-printable, a 2289 problem might arise if message/partial entities are 2290 constructed in an environment that supports binary or 8-bit 2291 transport. The problem is that the binary data would be 2292 split into multiple message/partial objects, each of them 2293 requiring binary transport. If such objects were 2294 encountered at a gateway into a 7-bit transport environment, 2295 there would be no way to properly encode them for the 7-bit 2296 world, aside from waiting for all of the parts, reassembling 2297 the message, and then encoding the reassembled data in 2298 base64 or quoted-printable. Since it is possible that 2299 different parts might go through different gateways, even 2301 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2303 this is not an acceptable solution. For this reason, it is 2304 specified that MIME entities of type message/partial must 2305 always have a content-transfer-encoding of 7-bit (the 2306 default). In particular, even in environments that support 2307 binary or 8-bit transport, the use of a content-transfer- 2308 encoding of "8bit" or "binary" is explicitly prohibited for 2309 entities of type message/partial. 2311 It should be noted that, because some message transfer 2312 agents may choose to automatically fragment large messages, 2313 and because such agents may use different fragmentation 2314 thresholds, it is possible that the pieces of a partial 2315 message, upon reassembly, may prove themselves to comprise a 2316 partial message. This is explicitly permitted. 2318 It should also be noted that the inclusion of a "References" 2319 field in the headers of the second and subsequent pieces of 2320 a fragmented message that references the Message-Id on the 2321 previous piece may be of benefit to mail readers that 2322 understand and track references. However, the generation of 2323 such "References" fields is entirely optional. 2325 Finally, it should be noted that the "Encrypted" header 2326 field has been made obsolete by Privacy Enhanced Messaging 2327 (PEM), but the rules above are believed to describe the 2328 correct way to treat it if it is encountered in the context 2329 of conversion to and from message/partial fragments. 2331 7.3.3 The Message/External-Body subtype 2333 The external-body subtype indicates that the actual body 2334 data are not included, but merely referenced. In this case, 2335 the parameters describe a mechanism for accessing the 2336 external data. 2338 When an entity is of type "message/external-body", it 2339 consists of a header, two consecutive CRLFs, and the message 2340 header for the encapsulated message. If another pair of 2341 consecutive CRLFs appears, this of course ends the message 2342 header for the encapsulated message. However, since the 2343 encapsulated message's body is itself external, it does NOT 2344 appear in the area that follows. For example, consider the 2345 following message: 2347 Content-type: message/external-body; 2349 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2351 access-type=local-file; 2352 name="/u/nsb/Me.gif" 2354 Content-type: image/gif 2355 Content-ID: 2356 Content-Transfer-Encoding: binary 2358 THIS IS NOT REALLY THE BODY! 2360 The area at the end, which might be called the "phantom 2361 body", is ignored for most external-body messages. However, 2362 it may be used to contain auxiliary information for some 2363 such messages, as indeed it is when the access-type is 2364 "mail-server". Of the access-types defined by this 2365 document, the phantom body is used only when the access-type 2366 is "mail-server". In all other cases, the phantom body is 2367 ignored. 2369 The only always-mandatory parameter for message/external- 2370 body is "access-type"; all of the other parameters may be 2371 mandatory or optional depending on the value of access-type. 2373 ACCESS-TYPE -- A case-insensitive word, indicating 2374 the supported access mechanism by which the file 2375 or data may be obtained. Values include, but are 2376 not limited to, "FTP", "ANON-FTP", "TFTP", "AFS", 2377 "LOCAL-FILE", and "MAIL-SERVER". Future values, 2378 except for experimental values beginning with "X- 2379 ", must be registered with IANA, as described in 2380 Appendix E . 2382 In addition, the following three parameters are optional for 2383 ALL access-types: 2385 EXPIRATION -- The date (in the RFC 822 "date-time" 2386 syntax, as extended by RFC 1123 to permit 4 digits 2387 in the year field) after which the existence of 2388 the external data is not guaranteed. 2390 SIZE -- The size (in octets) of the data. The 2391 intent of this parameter is to help the recipient 2392 decide whether or not to expend the necessary 2393 resources to retrieve the external data. Note 2394 that this describes the size of the data in its 2395 canonical form, that is, before any Content- 2396 Transfer-Encoding has been applied or after the 2398 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2400 data have been decoded. 2402 PERMISSION -- A case-insensitive field that 2403 indicates whether or not it is expected that 2404 clients might also attempt to overwrite the data. 2405 By default, or if permission is "read", the 2406 assumption is that they are not, and that if the 2407 data is retrieved once, it is never needed again. 2408 If PERMISSION is "read-write", this assumption is 2409 invalid, and any local copy must be considered no 2410 more than a cache. "Read" and "Read-write" are 2411 the only defined values of permission. 2413 The precise semantics of the access-types defined here are 2414 described in the sections that follow. 2416 The encapsulated headers in ALL message/external-body 2417 entities MUST include a Content-ID header field to give a 2418 unique identifier by which to reference the data. This 2419 identifier may be used for cacheing mechanisms, and for 2420 recognizing the receipt of the data when the access-type is 2421 "mail-server". 2423 Note that, as specified here, the tokens that describe 2424 external-body data, such as file names and mail server 2425 commands, are required to be in the US-ASCII character set. 2426 If this proves problematic in practice, a new mechanism may 2427 be required as a future extension to MIME, either as newly 2428 defined access-types for message/external-body or by some 2429 other mechanism. 2431 As with message/partial, it is specified that MIME entities 2432 of type message/external-body must always have a content- 2433 transfer-encoding of 7-bit (the default). In particular, 2434 even in environments that support binary or 8-bit transport, 2435 the use of a content-transfer-encoding of "8bit" or "binary" 2436 is explicitly prohibited for entities of type 2437 message/external-body. 2439 7.3.3.1 The "ftp" and "tftp" access-types 2441 An access-type of FTP or TFTP indicates that the message 2442 body is accessible as a file using the FTP [RFC-959] or TFTP 2443 [RFC-783] protocols, respectively. For these access-types, 2444 the following additional parameters are mandatory: 2446 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2448 NAME -- The name of the file that contains the 2449 actual body data. 2451 SITE -- A machine from which the file may be 2452 obtained, using the given protocol. This must be 2453 a fully qualified domain name, not a nickname. 2455 Before any data are retrieved, using FTP, the user will 2456 generally need to be asked to provide a login id and a 2457 password for the machine named by the site parameter. For 2458 security reasons, such an id and password are not specified 2459 as content-type parameters, but must be obtained from the 2460 user. 2462 In addition, the following parameters are optional: 2464 DIRECTORY -- A directory from which the data named 2465 by NAME should be retrieved. 2467 MODE -- A case-insensitive string indicating the 2468 mode to be used when retrieving the information. 2469 The legal values for access-type "TFTP" are 2470 "NETASCII", "OCTET", and "MAIL", as specified by 2471 the TFTP protocol [RFC-783]. The legal values for 2472 access-type "FTP" are "ASCII", "EBCDIC", "IMAGE", 2473 and "LOCALn" where "n" is a decimal integer, 2474 typically 8. These correspond to the 2475 representation types "A" "E" "I" and "L n" as 2476 specified by the FTP protocol [RFC-959]. Note 2477 that "BINARY" and "TENEX" are not valid values for 2478 MODE, but that "OCTET" or "IMAGE" or "LOCAL8" 2479 should be used instead. IF MODE is not specified, 2480 the default value is "NETASCII" for TFTP and 2481 "ASCII" otherwise. 2483 7.3.3.2 The "anon-ftp" access-type 2485 The "anon-ftp" access-type is identical to the "ftp" access 2486 type, except that the user need not be asked to provide a 2487 name and password for the specified site. Instead, the ftp 2488 protocol will be used with login "anonymous" and a password 2489 that corresponds to the user's email address. 2491 7.3.3.3 The "local-file" and "afs" access-types 2493 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2495 An access-type of "local-file" indicates that the actual 2496 body is accessible as a file on the local machine. An 2497 access-type of "afs" indicates that the file is accessible 2498 via the global AFS file system. In both cases, only a 2499 single parameter is required: 2501 NAME -- The name of the file that contains the 2502 actual body data. 2504 The following optional parameter may be used to describe the 2505 locality of reference for the data, that is, the site or 2506 sites at which the file is expected to be visible: 2508 SITE -- A domain specifier for a machine or set of 2509 machines that are known to have access to the data 2510 file. Asterisks may be used for wildcard matching 2511 to a part of a domain name, such as 2512 "*.bellcore.com", to indicate a set of machines on 2513 which the data should be directly visible, while a 2514 single asterisk may be used to indicate a file 2515 that is expected to be universally available, 2516 e.g., via a global file system. 2518 7.3.3.4 The "mail-server" access-type 2520 The "mail-server" access-type indicates that the actual body 2521 is available from a mail server. The mandatory parameter 2522 for this access-type is: 2524 SERVER -- The email address of the mail server 2525 from which the actual body data can be obtained. 2527 Because mail servers accept a variety of syntaxes, some of 2528 which is multiline, the full command to be sent to a mail 2529 server is not included as a parameter on the content-type 2530 line. Instead, it is provided as the "phantom body" when 2531 the content-type is message/external-body and the access- 2532 type is mail-server. 2534 An optional parameter for this access-type is: 2536 SUBJECT -- The subject that is to be used in the 2537 mail that is sent to obtain the data. Note that 2538 keying mail servers on Subject lines is NOT 2539 recommended, but such mail servers are known to 2540 exist. 2542 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2544 Note that MIME does not define a mail server syntax. 2545 Rather, it allows the inclusion of arbitrary mail server 2546 commands in the phantom body. Implementations must include 2547 the phantom body in the body of the message it sends to the 2548 mail server address to retrieve the relevant data. 2550 It is worth noting that, unlike other access-types, mail- 2551 server access is asynchronous and will happen at an 2552 unpredictable time in the future. For this reason, it is 2553 important that there be a mechanism by which the returned 2554 data can be matched up with the original message/external- 2555 body entity. MIME mailservers must use the same Content-ID 2556 field on the returned message that was used in the original 2557 message/external-body entity, to facilitate such matching. 2559 7.3.3.5 Examples and Further Explanations 2561 With the emerging possibility of very wide-area file 2562 systems, it becomes very hard to know in advance the set of 2563 machines where a file will and will not be accessible 2564 directly from the file system. Therefore it may make sense 2565 to provide both a file name, to be tried directly, and the 2566 name of one or more sites from which the file is known to be 2567 accessible. An implementation can try to retrieve remote 2568 files using FTP or any other protocol, using anonymous file 2569 retrieval or prompting the user for the necessary name and 2570 password. If an external body is accessible via multiple 2571 mechanisms, the sender may include multiple parts of type 2572 message/external-body within an entity of type 2573 multipart/alternative. 2575 However, the external-body mechanism is not intended to be 2576 limited to file retrieval, as shown by the mail-server 2577 access-type. Beyond this, one can imagine, for example, 2578 using a video server for external references to video clips. 2580 If an entity is of type "message/external-body", then the 2581 body of the entity will contain the header fields of the 2582 encapsulated message. The body itself is to be found in the 2583 external location. This means that if the body of the 2584 "message/external-body" message contains two consecutive 2585 CRLFs, everything after those pairs is NOT part of the 2586 message itself. For most message/external-body messages, 2587 this trailing area must simply be ignored. However, it is a 2588 convenient place for additional data that cannot be included 2590 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2592 in the content-type header field. In particular, if the 2593 "access-type" value is "mail-server", then the trailing area 2594 must contain commands to be sent to the mail server at the 2595 address given by the value of the SERVER parameter. 2597 The embedded message header fields which appear in the body 2598 of the message/external-body data must be used to declare 2599 the Content-type of the external body if it is anything 2600 other than plain ASCII text, since the external body does 2601 not have a header section to declare its type. Similarly, 2602 any Content-transfer-encoding other than "7bit" must also be 2603 declared here. Thus a complete message/external-body 2604 message, referring to a document in PostScript format, might 2605 look like this: 2607 From: Whomever 2608 To: Someone 2609 Subject: whatever 2610 MIME-Version: 1.0 2611 Message-ID: 2612 Content-Type: multipart/alternative; boundary=42 2613 Content-ID: 2615 --42 2616 Content-Type: message/external-body; 2617 name="BodyFormats.ps"; 2618 site="thumper.bellcore.com"; 2619 access-type=ANON-FTP; 2620 directory="pub"; 2621 mode="image"; 2622 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2624 Content-type: application/postscript 2625 Content-ID: 2627 --42 2628 Content-Type: message/external-body; 2629 name="/u/nsb/writing/rfcs/RFC-MIME.ps"; 2630 site="thumper.bellcore.com"; 2631 access-type=AFS 2632 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2634 Content-type: application/postscript 2635 Content-ID: 2637 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2639 --42 2640 Content-Type: message/external-body; 2641 access-type=mail-server 2642 server="listserv@bogus.bitnet"; 2643 expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" 2645 Content-type: application/postscript 2646 Content-ID: 2648 get RFC-MIME.DOC 2650 --42-- 2652 Note that in the above examples, the default Content- 2653 transfer-encoding of "7bit" is assumed for the external 2654 postscript data. 2656 Like the message/partial type, the message/external-body 2657 type is intended to be transparent, that is, to convey the 2658 data type in the external body rather than to convey a 2659 message with a body of that type. Thus the headers on the 2660 outer and inner parts must be merged using the same rules as 2661 for message/partial. In particular, this means that the 2662 Content-type header is overridden, but the From and Subject 2663 headers are preserved. 2665 Note that since the external bodies are not transported as 2666 mail, they need not conform to the 7-bit and line length 2667 requirements, but might in fact be binary files. Thus a 2668 Content-Transfer-Encoding is not generally necessary, though 2669 it is permitted. 2671 Note that the body of a message of type "message/external- 2672 body" is governed by the basic syntax for an RFC 822 2673 message. In particular, anything before the first 2674 consecutive pair of CRLFs is header information, while 2675 anything after it is body information, which is ignored for 2676 most access-types. 2678 The formal grammar for content-type header fields for data 2679 of type message is given by: 2681 message-type := "message" "/" message-subtype 2683 message-subtype := "rfc822" 2685 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2687 / "partial" 2#3partial-param 2688 / "external-body" 1*external-param 2689 / extension-token 2691 partial-param := (";" "id" "=" value) 2692 / (";" "number" "=" 1*DIGIT) 2693 / (";" "total" "=" 1*DIGIT) 2694 ; id & number required; total required for last 2695 part 2697 external-param := (";" "access-type" "=" atype) 2698 / (";" "expiration" "=" date-time) 2699 ; Note that date-time is quoted 2700 / (";" "size" "=" 1*DIGIT) 2701 / (";" "permission" "=" ("read" / "read- 2702 write")) 2703 ; Permission is case-insensitive 2704 / (";" "name" "=" value) 2705 / (";" "site" "=" value) 2706 / (";" "dir" "=" value) 2707 / (";" "mode" "=" value) 2708 / (";" "server" "=" value) 2709 / (";" "subject" "=" value) 2710 ; access-type required; others required based on 2711 access-type 2713 atype := "ftp" / "anon-ftp" / "tftp" / "local-file" 2714 / "afs" / "mail-server" / extension-token 2715 ; Case-insensitive 2717 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2719 7.4 The Application Content-Type 2721 The "application" Content-Type is to be used for data which 2722 do not fit in any of the other categories, and particularly 2723 for data to be processed by mail-based uses of application 2724 programs. This is information which must be processed by an 2725 application before it is viewable or usable to a user. 2726 Expected uses for Content-Type application include mail- 2727 based file transfer, spreadsheets, data for mail-based 2728 scheduling systems, and languages for "active" 2729 (computational) email. (The latter, in particular, can pose 2730 security problems which must be understood by implementors, 2731 and are considered in detail in the discussion of the 2732 application/PostScript content-type.) 2734 For example, a meeting scheduler might define a standard 2735 representation for information about proposed meeting dates. 2736 An intelligent user agent would use this information to 2737 conduct a dialog with the user, and might then send further 2738 mail based on that dialog. More generally, there have been 2739 several "active" messaging languages developed in which 2740 programs in a suitably specialized language are sent through 2741 the mail and automatically run in the recipient's 2742 environment. 2744 Such applications may be defined as subtypes of the 2745 "application" Content-Type. This document defines two 2746 subtypes: octet-stream, and PostScript. 2748 In general, the subtype of application will often be the 2749 name of the application for which the data are intended. 2750 This does not mean, however, that any application program 2751 name may be used freely as a subtype of application. Such 2752 usages (other than subtypes beginning with "x-") must be 2753 registered with IANA, as described in Appendix E. 2755 7.4.1 The Application/Octet-Stream (primary) subtype 2757 The primary subtype of application, "octet-stream", may be 2758 used to indicate that a body contains binary data. The set 2759 of possible parameters includes, but is not limited to: 2761 TYPE -- the general type or category of binary 2762 data. This is intended as information for the 2763 human recipient rather than for any automatic 2764 processing. 2766 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2768 PADDING -- the number of bits of padding that were 2769 appended to the bit-stream comprising the actual 2770 contents to produce the enclosed byte-oriented 2771 data. This is useful for enclosing a bit-stream 2772 in a body when the total number of bits is not a 2773 multiple of the byte size. 2775 An additional parameter, "conversions", was defined in 2776 [RFC-1341] but has been removed. 2778 RFC 1341 also defined the use of a "NAME" parameter which 2779 gave a suggested file name to be used if the data were to be 2780 written to a file. This has been deprecated in anticipation 2781 of a separate Content-Disposition header field, to be 2782 defined in a subsequent RFC. 2784 The recommended action for an implementation that receives 2785 application/octet-stream mail is to simply offer to put the 2786 data in a file, with any Content-Transfer-Encoding undone, 2787 or perhaps to use it as input to a user-specified process. 2789 To reduce the danger of transmitting rogue programs through 2790 the mail, it is strongly recommended that implementations 2791 NOT implement a path-search mechanism whereby an arbitrary 2792 program named in the Content-Type parameter (e.g., an 2793 "interpreter=" parameter) is found and executed using the 2794 mail body as input. 2796 7.4.2 The Application/PostScript subtype 2798 A Content-Type of "application/postscript" indicates a 2799 PostScript program. Currently two variants of the 2800 PostScript language are allowed; the original level 1 2801 variant is described in [POSTSCRIPT] and the more recent 2802 level 2 variant is described in [POSTSCRIPT2]. 2804 PostScript is a registered trademark of Adobe Systems, Inc. 2805 Use of the MIME content-type "application/postscript" 2806 implies recognition of that trademark and all the rights it 2807 entails. 2809 The PostScript language definition provides facilities for 2810 internal labeling of the specific language features a given 2811 program uses. This labeling, called the PostScript document 2812 structuring conventions, is very general and provides 2813 substantially more information than just the language level. 2815 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2817 The use of document structuring conventions, while not 2818 required, is strongly recommended as an aid to 2819 interoperability. Documents which lack proper structuring 2820 conventions cannot be tested to see whether or not they will 2821 work in a given environment. As such, some systems may 2822 assume the worst and refuse to process unstructured 2823 documents. 2825 The execution of general-purpose PostScript interpreters 2826 entails serious security risks, and implementors are 2827 discouraged from simply sending PostScript email bodies to 2828 "off-the-shelf" interpreters. While it is usually safe to 2829 send PostScript to a printer, where the potential for harm 2830 is greatly constrained, implementors should consider all of 2831 the following before they add interactive display of 2832 PostScript bodies to their mail readers. 2834 The remainder of this section outlines some, though probably 2835 not all, of the possible problems with sending PostScript 2836 through the mail. 2838 Dangerous operations in the PostScript language include, but 2839 may not be limited to, the PostScript operators deletefile, 2840 renamefile, filenameforall, and file. File is only 2841 dangerous when applied to something other than standard 2842 input or output. Implementations may also define additional 2843 nonstandard file operators; these may also pose a threat to 2844 security. Filenameforall, the wildcard file search 2845 operator, may appear at first glance to be harmless. Note, 2846 however, that this operator has the potential to reveal 2847 information about what files the recipient has access to, 2848 and this information may itself be sensitive. Message 2849 senders should avoid the use of potentially dangerous file 2850 operators, since these operators are quite likely to be 2851 unavailable in secure PostScript implementations. Message- 2852 receiving and -displaying software should either completely 2853 disable all potentially dangerous file operators or take 2854 special care not to delegate any special authority to their 2855 operation. These operators should be viewed as being done by 2856 an outside agency when interpreting PostScript documents. 2857 Such disabling and/or checking should be done completely 2858 outside of the reach of the PostScript language itself; care 2859 should be taken to insure that no method exists for re- 2860 enabling full-function versions of these operators. 2862 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2864 The PostScript language provides facilities for exiting the 2865 normal interpreter, or server, loop. Changes made in this 2866 "outer" environment are customarily retained across 2867 documents, and may in some cases be retained semipermanently 2868 in nonvolatile memory. The operators associated with 2869 exiting the interpreter loop have the potential to interfere 2870 with subsequent document processing. As such, their 2871 unrestrained use constitutes a threat of service denial. 2872 PostScript operators that exit the interpreter loop include, 2873 but may not be limited to, the exitserver and startjob 2874 operators. Message-sending software should not generate 2875 PostScript that depends on exiting the interpreter loop to 2876 operate. The ability to exit will probably be unavailable in 2877 secure PostScript implementations. Message-receiving and 2878 -displaying software should, if possible, disable the 2879 ability to make retained changes to the PostScript 2880 environment, and eliminate the startjob and exitserver 2881 commands. If these commands cannot be eliminated, the 2882 password associated with them should at least be set to a 2883 hard-to-guess value. 2885 PostScript provides operators for setting system-wide and 2886 device-specific parameters. These parameter settings may be 2887 retained across jobs and may potentially pose a threat to 2888 the correct operation of the interpreter. The PostScript 2889 operators that set system and device parameters include, but 2890 may not be limited to, the setsystemparams and setdevparams 2891 operators. Message-sending software should not generate 2892 PostScript that depends on the setting of system or device 2893 parameters to operate correctly. The ability to set these 2894 parameters will probably be unavailable in secure PostScript 2895 implementations. Message-receiving and -displaying software 2896 should, if possible, disable the ability to change system 2897 and device parameters. If these operators cannot be 2898 disabled, the password associated with them should at least 2899 be set to a hard-to-guess value. 2901 Some PostScript implementations provide nonstandard 2902 facilities for the direct loading and execution of machine 2903 code. Such facilities are quite obviously open to 2904 substantial abuse. Message-sending software should not 2905 make use of such features. Besides being totally hardware- 2906 specific, they are also likely to be unavailable in secure 2907 implementations of PostScript. Message-receiving and 2908 -displaying software should not allow such operators to be 2909 used if they exist. 2911 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2913 PostScript is an extensible language, and many, if not most, 2914 implementations of it provide a number of their own 2915 extensions. This document does not deal with such extensions 2916 explicitly since they constitute an unknown factor. 2917 Message-sending software should not make use of nonstandard 2918 extensions; they are likely to be missing from some 2919 implementations. Message-receiving and -displaying software 2920 should make sure that any nonstandard PostScript operators 2921 are secure and don't present any kind of threat. 2923 It is possible to write PostScript that consumes huge 2924 amounts of various system resources. It is also possible to 2925 write PostScript programs that loop infinitely. Both types 2926 of programs have the potential to cause damage if sent to 2927 unsuspecting recipients. Message-sending software should 2928 avoid the construction and dissemination of such programs, 2929 which is antisocial. Message-receiving and -displaying 2930 software should provide appropriate mechanisms to abort 2931 processing of a document after a reasonable amount of time 2932 has elapsed. In addition, PostScript interpreters should be 2933 limited to the consumption of only a reasonable amount of 2934 any given system resource. 2936 It is possible to include raw binary information inside 2937 PostScript in various forms. This is not recommended for 2938 use in email, both because it is not supported by all 2939 PostScript interpreters and because it significantly 2940 complicates the use of a MIME Content-Transfer-Encoding. 2941 (Without such binary, PostScript may typically be viewed as 2942 line-oriented data. The treatment of CRLF sequences becomes 2943 extremely problematic if binary and line-oriented data are 2944 mixed in a single Postscript data stream.) 2946 Finally, bugs may exist in some PostScript interpreters 2947 which could possibly be exploited to gain unauthorized 2948 access to a recipient's system. Apart from noting this 2949 possibility, there is no specific action to take to prevent 2950 this, apart from the timely correction of such bugs if any 2951 are found. 2953 7.4.3 Other Application subtypes 2955 It is expected that many other subtypes of application will 2956 be defined in the future. MIME implementations must 2957 generally treat any unrecognized subtypes as being 2958 equivalent to application/octet-stream. 2960 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 2962 The formal grammar for content-type header fields for 2963 application data is given by: 2965 application-type := "application" "/" application-subtype 2967 application-subtype := ("octet-stream" *stream-param) 2968 / "postscript" / extension-token 2970 stream-param := (";" "type" "=" value) 2971 / (";" "padding" "=" padding) 2973 padding := "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" 2975 7.5 The Image Content-Type 2977 A Content-Type of "image" indicates that the body contains 2978 an image. The subtype names the specific image format. 2979 These names are case insensitive. Two initial subtypes are 2980 "jpeg" for the JPEG format, JFIF encoding, and "gif" for GIF 2981 format [GIF]. 2983 The list of image subtypes given here is neither exclusive 2984 nor exhaustive, and is expected to grow as more types are 2985 registered with IANA, as described in Appendix E. 2987 The formal grammar for the content-type header field for 2988 data of type image is given by: 2990 image-type := "image" "/" ("gif" / "jpeg" / extension-token) 2992 7.6 The Audio Content-Type 2994 A Content-Type of "audio" indicates that the body contains 2995 audio data. Although there is not yet a consensus on an 2996 "ideal" audio format for use with computers, there is a 2997 pressing need for a format capable of providing 2998 interoperable behavior. 3000 The initial subtype of "basic" is specified to meet this 3001 requirement by providing an absolutely minimal lowest common 3002 denominator audio format. It is expected that richer 3003 formats for higher quality and/or lower bandwidth audio will 3004 be defined by a later document. 3006 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3008 The content of the "audio/basic" subtype is audio encoded 3009 using 8-bit ISDN mu-law [PCM]. When this subtype is 3010 present, a sample rate of 8000 Hz and a single channel is 3011 assumed. 3013 The formal grammar for the content-type header field for 3014 data of type audio is given by: 3016 audio-type := "audio" "/" ("basic" / extension-token) 3018 7.7 The Video Content-Type 3020 A Content-Type of "video" indicates that the body contains a 3021 time-varying-picture image, possibly with color and 3022 coordinated sound. The term "video" is used extremely 3023 generically, rather than with reference to any particular 3024 technology or format, and is not meant to preclude subtypes 3025 such as animated drawings encoded compactly. The subtype 3026 "mpeg" refers to video coded according to the MPEG standard 3027 [MPEG]. 3029 Note that although in general this document strongly 3030 discourages the mixing of multiple media in a single body, 3031 it is recognized that many so-called "video" formats include 3032 a representation for synchronized audio, and this is 3033 explicitly permitted for subtypes of "video". 3035 The formal grammar for the content-type header field for 3036 data of type video is given by: 3038 video-type := "video" "/" ("mpeg" / extension-token) 3040 7.8 Experimental Content-Type Values 3042 A Content-Type value beginning with the characters "X-" is a 3043 private value, to be used by consenting mail systems by 3044 mutual agreement. Any format without a rigorous and public 3045 definition must be named with an "X-" prefix, and publicly 3046 specified values shall never begin with "X-". (Older 3047 versions of the widely-used Andrew system use the "X-BE2" 3048 name, so new systems should probably choose a different 3049 name.) 3051 In general, the use of "X-" top-level types is strongly 3052 discouraged. Implementors should invent subtypes of the 3053 existing types whenever possible. The invention of new 3055 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3057 types is intended to be restricted primarily to the 3058 development of new media types for email, such as digital 3059 odors or holography, and not for new data formats in 3060 general. In many cases, a subtype of application will be 3061 more appropriate than a new top-level type. 3063 Summary 3065 Using the MIME-Version, Content-Type, and Content-Transfer- 3066 Encoding header fields, it is possible to include, in a 3067 standardized way, arbitrary types of data objects with RFC 3068 822 conformant mail messages. No restrictions imposed by 3069 either RFC 821 or RFC 822 are violated, and care has been 3070 taken to avoid problems caused by additional restrictions 3071 imposed by the characteristics of some Internet mail 3072 transport mechanisms (see Appendix B). The "multipart" and 3073 "message" Content-Types allow mixing and hierarchical 3074 structuring of objects of different types in a single 3075 message. Further Content-Types provide a standardized 3076 mechanism for tagging messages or body parts as audio, 3077 image, or several other kinds of data. A distinguished 3078 parameter syntax allows further specification of data format 3079 details, particularly the specification of alternate 3080 character sets. Additional optional header fields provide 3081 mechanisms for certain extensions deemed desirable by many 3082 implementors. Finally, a number of useful Content-Types are 3083 defined for general use by consenting user agents, notably 3084 message/partial, and message/external-body. 3086 Security Considerations 3088 Security issues are discussed in Section 7.4.2 and in 3089 Appendix F. Implementors should pay special attention to 3090 the security implications of any mail content-types that can 3091 cause the remote execution of any actions in the recipient's 3092 environment. In such cases, the discussion of the 3093 application/postscript content-type in Section 7.4.2 may 3094 serve as a model for considering other content-types with 3095 remote execution capabilities. 3097 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3099 Authors' Addresses 3101 For more information, the authors of this document may be 3102 contacted via Internet mail: 3104 Nathaniel S. Borenstein 3105 First Virtual Holdings 3106 25 Washington Avenue 3107 Morristown, NJ 07960 3109 Email: nsb@nsb.fv.com 3110 Phone: +1 201 540 8967 3111 Fax: +1 201 993 3032 3113 Ned Freed 3114 Innosoft International, Inc. 3115 250 West First Street 3116 Suite 240 3117 Claremont, CA 91711 3119 Phone: +1 909 624 7907 3120 Fax: +1 909 621 5319 3121 Email: ned@innosoft.com 3123 MIME is a result of the work of the Internet Engineering 3124 Task Force Working Group on Email Extensions. The chairman 3125 of that group, Greg Vaudreuil, may be reached at: 3127 Gregory M. Vaudreuil 3128 Tigon Corporation 3129 17060 Dallas Parkway 3130 Dallas Texas, 75248 3131 214-733-2722 3132 Email: gvaudre@cnri.reston.va.us 3134 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3136 Acknowledgements 3138 This document is the result of the collective effort of a 3139 large number of people, at several IETF meetings, on the 3140 IETF-SMTP and IETF-822 mailing lists, and elsewhere. 3141 Although any enumeration seems doomed to suffer from 3142 egregious omissions, the following are among the many 3143 contributors to this effort: 3145 Harald Tveit Alvestrand Timo Lehtinen 3146 Randall Atkinson John R. MacMillan 3147 Philippe Brandon Rick McGowan 3148 Kevin Carosso Leo Mclaughlin 3149 Uhhyung Choi Goli Montaser-Kohsari 3150 Cristian Constantinof Keith Moore 3151 Mark Crispin Tom Moore 3152 Dave Crocker Erik Naggum 3153 Terry Crowley Mark Needleman 3154 Walt Daniels John Noerenberg 3155 Frank Dawson Mats Ohrman 3156 Hitoshi Doi Julian Onions 3157 Kevin Donnelly Michael Patton 3158 Keith Edwards David J. Pepper 3159 Chris Eich Blake C. Ramsdell 3160 Johnny Eriksson Luc Rooijakkers 3161 Craig Everhart Marshall T. Rose 3162 Patrik F.ltstr.m Jonathan Rosenberg 3163 Erik E. Fair Jan Rynning 3164 Roger Fajman Harri Salminen 3165 Alain Fontaine Michael Sanderson 3166 James M. Galvin Masahiro Sekiguchi 3167 Philip Gladstone Mark Sherman 3168 Thomas Gordon Keld Simonsen 3169 Phill Gross Bob Smart 3170 James Hamilton Peter Speck 3171 Steve Hardcastle-Kille Henry Spencer 3172 David Herron Einar Stefferud 3173 Bruce Howard Michael Stein 3174 Bill Janssen Klaus Steinberger 3175 Olle J.rnefors Peter Svanberg 3176 Risto Kankkunen James Thompson 3177 Phil Karn Steve Uhler 3178 Alan Katz Stuart Vance 3179 Tim Kehres Erik van der Poel 3181 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3183 Neil Katin Guido van Rossum 3184 Kyuho Kim Peter Vanderbilt 3185 Anders Klemets Greg Vaudreuil 3186 John Klensin Ed Vielmetti 3187 Valdis Kletniek Ryan Waldron 3188 Jim Knowles Wally Wedel 3189 Stev Knowles Sven-Ove Westberg 3190 Bob Kummerfeld Brian Wideen 3191 Pekka Kytolaakso John Wobus 3192 Stellan Lagerstr.m Glenn Wright 3193 Vincent Lau Rayan Zachariassen 3194 Donald Lindsay David Zimmerman 3196 Marc Andreessen Bob Braden 3197 Brian Capouch Peter Clitherow 3198 Dave Collier-Brown John Coonrod 3199 Stephen Crocker Jim Davis 3200 Axel Deininger Dana S Emery 3201 Martin Forssen Stephen Gildea 3202 Terry Gray Mark Horton 3203 Warner Losh Carlyn Lowery 3204 Laurence Lundblade Charles Lynn 3205 Larry Masinter Michael J. McInerny 3206 Jon Postel Christer Romson 3207 Yutaka Sato Markku Savela 3208 Richard Alan Schafer Larry W. Virden 3209 Rhys Weatherly Jay Weber 3210 Dave Wecker 3212 The authors apologize for any omissions from this list, 3213 which are certainly unintentional. 3215 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3217 Appendix A -- Minimal MIME-Conformance 3219 The mechanisms described in this document are open-ended. 3220 It is definitely not expected that all implementations will 3221 support all of the Content-Types described, nor that they 3222 will all share the same extensions. In order to promote 3223 interoperability, however, it is useful to define the 3224 concept of "MIME-conformance" to define a certain level of 3225 implementation that allows the useful interworking of 3226 messages with content that differs from US ASCII text. In 3227 this section, we specify the requirements for such 3228 conformance. 3230 A mail user agent that is MIME-conformant MUST: 3232 1. Always generate a "MIME-Version: 1.0" header 3233 field. 3235 2. Recognize the Content-Transfer-Encoding header 3236 field, and decode all received data encoded with 3237 either the quoted-printable or base64 3238 implementations. Encode any data sent that is 3239 not in seven-bit mail-ready representation using 3240 one of these transformations and include the 3241 appropriate Content-Transfer-Encoding header 3242 field, unless the underlying transport mechanism 3243 supports non-seven-bit data, as SMTP does not. 3245 3. Recognize and interpret the Content-Type 3246 header field, and avoid showing users raw data 3247 with a Content-Type field other than text. Be 3248 able to send at least text/plain messages, with 3249 the character set specified as a parameter if it 3250 is not US-ASCII. 3252 4. Explicitly handle the following Content-Type 3253 values, to at least the following extents: 3255 Text: 3256 -- Recognize and display "text" mail 3257 with the character set "US-ASCII." 3258 -- Recognize other character sets at 3259 least to the extent of being able 3260 to inform the user about what 3261 character set the message uses. 3263 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3265 -- Recognize the "ISO-8859-*" character 3266 sets to the extent of being able to 3267 display those characters that are 3268 common to ISO-8859-* and US-ASCII, 3269 namely all characters represented 3270 by octet values 0-127. 3271 -- For unrecognized subtypes, show or 3272 offer to show the user the "raw" 3273 version of the data after 3274 conversion of the content from 3275 canonical form to local form. 3276 Message: 3277 -- Recognize and display at least the 3278 primary (822) encapsulation in such 3279 a way as to preserve any recursive 3280 structure, that is, displaying or 3281 offering to display the 3282 encapsulated data in accordance 3283 with its Content-type. 3284 Multipart: 3285 -- Recognize the primary (mixed) 3286 subtype. Display all relevant 3287 information on the message level 3288 and the body part header level and 3289 then display or offer to display 3290 each of the body parts 3291 individually. 3292 -- Recognize the "alternative" subtype, 3293 and avoid showing the user 3294 redundant parts of 3295 multipart/alternative mail. 3296 -- Recognize the "multipart/digest" 3297 subtype, specifically using 3298 "message/rfc822" rather than 3299 "text/plain" as the default 3300 content-type for encapsulations 3301 inside "multipart/digest" entities. 3302 -- Treat any unrecognized subtypes as if 3303 they were "mixed". 3304 Application: 3305 -- Offer the ability to remove either of 3306 the two types of Content-Transfer- 3307 Encoding defined in this document 3308 and put the resulting information 3309 in a user file. 3311 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3313 5. Upon encountering any unrecognized Content- 3314 Type, an implementation must treat it as if it had 3315 a Content-Type of "application/octet-stream" with 3316 no parameter sub-arguments. How such data are 3317 handled is up to an implementation, but likely 3318 options for handling such unrecognized data 3319 include offering the user to write it into a file 3320 (decoded from its mail transport format) or 3321 offering the user to name a program to which the 3322 decoded data should be passed as input. 3323 Unrecognized predefined types, which in a MIME- 3324 conformant mailer might still include audio, 3325 image, or video, should also be treated in this 3326 way. 3328 A user agent that meets the above conditions is said to be 3329 MIME-conformant. The meaning of this phrase is that it is 3330 assumed to be "safe" to send virtually any kind of 3331 properly-marked data to users of such mail systems, because 3332 such systems will at least be able to treat the data as 3333 undifferentiated binary, and will not simply splash it onto 3334 the screen of unsuspecting users. There is another sense 3335 in which it is always "safe" to send data in a format that 3336 is MIME-conformant, which is that such data will not break 3337 or be broken by any known systems that are conformant with 3338 RFC 821 and RFC 822. User agents that are MIME-conformant 3339 have the additional guarantee that the user will not be 3340 shown data that were never intended to be viewed as text. 3342 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3344 Appendix B -- General Guidelines For Sending Email Data 3346 Internet email is not a perfect, homogeneous system. Mail 3347 may become corrupted at several stages in its travel to a 3348 final destination. Specifically, email sent throughout the 3349 Internet may travel across many networking technologies. 3350 Many networking and mail technologies do not support the 3351 full functionality possible in the SMTP transport 3352 environment. Mail traversing these systems is likely to be 3353 modified in such a way that it can be transported. 3355 There exist many widely-deployed non-conformant MTAs in the 3356 Internet. These MTAs, speaking the SMTP protocol, alter 3357 messages on the fly to take advantage of the internal data 3358 structure of the hosts they are implemented on, or are just 3359 plain broken. 3361 The following guidelines may be useful to anyone devising a 3362 data format (Content-Type) that will survive the widest 3363 range of networking technologies and known broken MTAs 3364 unscathed. Note that anything encoded in the base64 3365 encoding will satisfy these rules, but that some well-known 3366 mechanisms, notably the UNIX uuencode facility, will not. 3367 Note also that anything encoded in the Quoted-Printable 3368 encoding will survive most gateways intact, but possibly not 3369 some gateways to systems that use the EBCDIC character set. 3371 (1) Under some circumstances the encoding used for data 3372 may change as part of normal gateway or user agent 3373 operation. In particular, conversion from base64 to 3374 quoted-printable and vice versa may be necessary. This 3375 may result in the confusion of CRLF sequences with line 3376 breaks in text bodies. As such, the persistence of CRLF 3377 as something other than a line break must not be relied 3378 on. 3380 (2) Many systems may elect to represent and store text 3381 data using local newline conventions. Local newline 3382 conventions may not match the RFC822 CRLF convention -- 3383 systems are known that use plain CR, plain LF, CRLF, or 3384 counted records. The result is that isolated CR and LF 3385 characters are not well tolerated in general; they 3386 may be lost or converted to delimiters on some systems, 3387 and hence must not be relied on. 3389 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3391 (3) TAB (HT) characters may be misinterpreted or may be 3392 automatically converted to variable numbers of spaces. 3393 This is unavoidable in some environments, notably those 3394 not based on the ASCII character set. Such conversion 3395 is STRONGLY DISCOURAGED, but it may occur, and mail 3396 formats must not rely on the persistence of TAB (HT) 3397 characters. 3399 (4) Lines longer than 76 characters may be wrapped or 3400 truncated in some environments. Line wrapping and line 3401 truncation are STRONGLY DISCOURAGED, but unavoidable in 3402 some cases. Applications which require long lines must 3403 somehow differentiate between soft and hard line 3404 breaks. (A simple way to do this is to use the 3405 quoted-printable encoding.) 3407 (5) Trailing "white space" characters (SPACE, TAB 3408 (HT)) on a line may be discarded by some transport 3409 agents, while other transport agents may pad lines with 3410 these characters so that all lines in a mail file are 3411 of equal length. The persistence of trailing white 3412 space, therefore, must not be relied on. 3414 (6) Many mail domains use variations on the ASCII 3415 character set, or use character sets such as EBCDIC 3416 which contain most but not all of the US-ASCII 3417 characters. The correct translation of characters not 3418 in the "invariant" set cannot be depended on across 3419 character converting gateways. For example, this 3420 situation is a problem when sending uuencoded 3421 information across BITNET, an EBCDIC system. Similar 3422 problems can occur without crossing a gateway, since 3423 many Internet hosts use character sets other than ASCII 3424 internally. The definition of Printable Strings in 3425 X.400 adds further restrictions in certain special 3426 cases. In particular, the only characters that are 3427 known to be consistent across all gateways are the 73 3428 characters that correspond to the upper and lower case 3429 letters A-Z and a-z, the 10 digits 0-9, and the 3430 following eleven special characters: 3432 "'" (ASCII code 39) 3433 "(" (ASCII code 40) 3434 ")" (ASCII code 41) 3436 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3438 "+" (ASCII code 43) 3439 "," (ASCII code 44) 3440 "-" (ASCII code 45) 3441 "." (ASCII code 46) 3442 "/" (ASCII code 47) 3443 ":" (ASCII code 58) 3444 "=" (ASCII code 61) 3445 "?" (ASCII code 63) 3447 A maximally portable mail representation, such as the 3448 base64 encoding, will confine itself to relatively 3449 short lines of text in which the only meaningful 3450 characters are taken from this set of 73 characters. 3452 (7) Some mail transport agents will corrupt data that 3453 includes certain literal strings. In particular, a 3454 period (".") alone on a line is known to be corrupted 3455 by some (incorrect) SMTP implementations, and a line 3456 that starts with the five characters "From " (the fifth 3457 character is a SPACE) are commonly corrupted as well. 3458 A careful composition agent can prevent these 3459 corruptions by encoding the data (e.g., in the quoted- 3460 printable encoding, "=46rom " in place of "From " at 3461 the start of a line, and "=2E" in place of "." alone on 3462 a line. 3464 Please note that the above list is NOT a list of recommended 3465 practices for MTAs. RFC 821 MTAs are prohibited from 3466 altering the character of white space or wrapping long 3467 lines. These BAD and illegal practices are known to occur 3468 on established networks, and implementations should be 3469 robust in dealing with the bad effects they can cause. 3471 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3473 Appendix C -- A Complex Multipart Example 3475 What follows is the outline of a complex multipart message. 3476 This message has five parts to be displayed serially: two 3477 introductory plain text parts, an embedded multipart 3478 message, a richtext part, and a closing encapsulated text 3479 message in a non-ASCII character set. The embedded 3480 multipart message has two parts to be displayed in parallel, 3481 a picture and an audio fragment. 3483 MIME-Version: 1.0 3484 From: Nathaniel Borenstein 3485 To: Ned Freed 3486 Subject: A multipart example 3487 Content-Type: multipart/mixed; 3488 boundary=unique-boundary-1 3490 This is the preamble area of a multipart message. 3491 Mail readers that understand multipart format 3492 should ignore this preamble. 3493 If you are reading this text, you might want to 3494 consider changing to a mail reader that understands 3495 how to properly display multipart messages. 3496 --unique-boundary-1 3498 ...Some text appears here... 3499 [Note that the preceding blank line means 3500 no header fields were given and this is text, 3501 with charset US ASCII. It could have been 3502 done with explicit typing as in the next part.] 3504 --unique-boundary-1 3505 Content-type: text/plain; charset=US-ASCII 3507 This could have been part of the previous part, 3508 but illustrates explicit versus implicit 3509 typing of body parts. 3511 --unique-boundary-1 3512 Content-Type: multipart/parallel; 3513 boundary=unique-boundary-2 3515 --unique-boundary-2 3517 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3519 Content-Type: audio/basic 3520 Content-Transfer-Encoding: base64 3522 ... base64-encoded 8000 Hz single-channel 3523 mu-law-format audio data goes here.... 3525 --unique-boundary-2 3526 Content-Type: image/gif 3527 Content-Transfer-Encoding: base64 3529 ... base64-encoded image data goes here.... 3531 --unique-boundary-2-- 3533 --unique-boundary-1 3534 Content-type: text/richtext 3536 This is richtext. 3537 as defined in RFC 1341 3538 Isn't it 3539 cool? 3541 --unique-boundary-1 3542 Content-Type: message/rfc822 3544 From: (mailbox in US-ASCII) 3545 To: (address in US-ASCII) 3546 Subject: (subject in US-ASCII) 3547 Content-Type: Text/plain; charset=ISO-8859-1 3548 Content-Transfer-Encoding: Quoted-printable 3550 ... Additional text in ISO-8859-1 goes here ... 3552 --unique-boundary-1-- 3554 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3556 Appendix D -- Collected Grammar 3558 This appendix contains the complete BNF grammar for all the 3559 syntax specified by this document. 3561 By itself, however, this grammar is incomplete. It refers 3562 to several entities that are defined by RFC 822. Rather 3563 than reproduce those definitions here, and risk 3564 unintentional differences between the two, this document 3565 simply refers the reader to RFC 822 for the remaining 3566 definitions. Wherever a term is undefined, it refers to the 3567 RFC 822 definition. 3569 application-subtype := ("octet-stream" *stream-param) 3570 / "postscript" / extension-token 3572 application-type := "application" "/" application-subtype 3574 attribute := token ; case-insensitive 3576 atype := "ftp" / "anon-ftp" / "tftp" / "local-file" 3577 / "afs" / "mail-server" / extension-token 3578 ; Case-insensitive 3580 audio-type := "audio" "/" ("basic" / extension-token) 3582 body-part := <"message" as defined in RFC 822, 3583 with all header fields optional, and with the 3584 specified delimiter not occurring anywhere in 3585 the message body, either on a line by itself 3586 or as a substring anywhere.> 3588 NOTE: In certain transport enclaves, RFC 822 3589 restrictions such as the one that limits bodies to 3590 printable ASCII characters may not be in force. (That 3591 is, the transport domains may resemble standard 3592 Internet mail transport as specified in RFC821 and 3593 assumed by RFC822, but without certain restrictions.) 3594 The relaxation of these restrictions should be 3595 construed as locally extending the definition of 3596 bodies, for example to include octets outside of the 3597 ASCII range, as long as these extensions are supported 3598 by the transport and adequately documented in the 3599 Content-Transfer-Encoding header field. However, in 3600 no event are headers (either message headers or body- 3601 part headers) allowed to contain anything other than 3603 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3605 ASCII characters. 3607 boundary := 0*69 bcharsnospace 3609 bchars := bcharsnospace / " " 3611 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" / 3612 "_" 3613 / "," / "-" / "." / "/" / ":" / "=" / "?" 3615 charset := "us-ascii" / "iso-8859-1" / "iso-8859-2" / "iso- 3616 8859-3" 3617 / "iso-8859-4" / "iso-8859-5" / "iso-8859-6" / "iso- 3618 8859-7" 3619 / "iso-8859-8" / "iso-8859-9" / extension-token 3620 ; case insensitive 3622 close-delimiter := "--" boundary "--" CRLF 3623 ; Again, no space by "--", 3625 content := "Content-Type" ":" type "/" subtype 3626 *(";" parameter) 3627 ; case-insensitive matching of type and subtype 3629 delimiter := "--" boundary CRLF ; taken from Content-Type 3630 field. 3631 ; There must be no space 3632 ; between "--" and boundary. 3634 description := "Content-Description" ":" *text 3636 discard-text := *(*text CRLF) 3638 encapsulation := delimiter body-part CRLF 3640 encoding := "Content-Transfer-Encoding" ":" mechanism 3642 epilogue := discard-text ; to be ignored 3643 upon receipt. 3645 extension-token := x-token / iana-token 3647 external-param := (";" "access-type" "=" atype) 3648 / (";" "expiration" "=" date-time) 3650 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3652 ; Note that date-time is quoted 3653 / (";" "size" "=" 1*DIGIT) 3654 / (";" "permission" "=" ("read" / "read- 3655 write")) 3656 ; Permission is case-insensitive 3657 / (";" "name" "=" value) 3658 / (";" "site" "=" value) 3659 / (";" "dir" "=" value) 3660 / (";" "mode" "=" value) 3661 / (";" "server" "=" value) 3662 / (";" "subject" "=" value) 3663 ; access-type required; others required based on 3664 access-type 3666 iana-token := 3670 id := "Content-ID" ":" msg-id 3672 image-type := "image" "/" ("gif" / "jpeg" / extension-token) 3674 mechanism := "7bit" ; case-insensitive 3675 / "quoted-printable" 3676 / "base64" 3677 / "8bit" 3678 / "binary" 3679 / x-token 3681 message-subtype := "rfc822" 3682 / "partial" 2#3partial-param 3683 / "external-body" 1*external-param 3684 / extension-token 3686 message-type := "message" "/" message-subtype 3688 multipart-body := preamble 1*encapsulation close-delimiter 3689 epilogue 3691 multipart-subtype := "mixed" / "parallel" / "digest" 3692 / "alternative" / extension-token 3694 multipart-type := "multipart" "/" multipart-subtype 3695 ";" "boundary" "=" boundary 3697 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3699 octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 3700 ; octet must be used for characters > 127, =, SPACE, or 3701 TAB, 3702 ; and is recommended for any characters not listed in 3703 ; Appendix B as "mail-safe". 3705 padding := "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" 3707 parameter := attribute "=" value 3709 partial-param := (";" "id" "=" value) 3710 / (";" "number" "=" 1*DIGIT) 3711 / (";" "total" "=" 1*DIGIT) 3712 ; id & number required; total required for last 3713 part 3715 preamble := discard-text ; to be ignored 3716 upon receipt. 3718 ptext := octet / " / "@" 3741 / "," / ";" / ":" / "\" / <"> 3742 / "/" / "[" / "]" / "?" / "=" 3744 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3746 ; Must be in quoted-string, 3747 ; to use within parameter values 3749 type := "application" / "audio" ; case- 3750 insensitive 3751 / "image" / "message" 3752 / "multipart" / "text" 3753 / "video" / extension-token 3754 ; All values case-insensitive 3756 value := token / quoted-string 3758 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 3760 video-type := "video" "/" ("mpeg" / extension-token) 3762 x-token := 3766 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3768 Appendix E -- IANA Registration Procedures 3770 MIME has been carefully designed to have extensible 3771 mechanisms, and it is expected that the set of content- 3772 type/subtype pairs and their associated parameters will grow 3773 significantly with time. Several other MIME fields, notably 3774 character set names, access-type parameters for the 3775 message/external-body type, and possibly even Content- 3776 Transfer-Encoding values, are likely to have new values 3777 defined over time. In order to ensure that the set of such 3778 values is developed in an orderly, well-specified, and 3779 public manner, MIME defines a registration process which 3780 uses the Internet Assigned Numbers Authority (IANA) as a 3781 central registry for such values. 3783 In general, parameters in the content-type header field are 3784 used to convey supplemental information for various content 3785 types, and their use is defined when the content-type and 3786 subtype are defined. New parameters should not be defined 3787 as a way to introduce new functionality. 3789 In order to simplify and standardize the registration 3790 process, this appendix gives templates for the registration 3791 of new values with IANA. Each of these is given in the form 3792 of an email message template, to be filled in by the 3793 registering party. 3795 E.1 Registration of New Content-type/subtype Values 3797 Note that MIME is generally expected to be extended by 3798 subtypes. If a new fundamental top-level type is needed, 3799 its specification must be published as an RFC or submitted 3800 in a form suitable to become an RFC, and be subject to the 3801 Internet standards process. 3803 To: IANA@isi.edu 3804 Subject: Registration of new MIME 3805 content-type/subtype 3807 MIME type name: 3809 (If the above is not an existing top-level MIME type, 3810 please explain why an existing type cannot be used.) 3812 MIME subtype name: 3814 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3816 Required parameters: 3818 Optional parameters: 3820 Encoding considerations: 3822 Security considerations: 3824 Published specification: 3826 (The published specification must be an Internet RFC or 3827 RFC-to-be if a new top-level type is being defined, and 3828 must be a publicly available specification in any 3829 case.) 3831 Person & email address to contact for further 3832 information: 3834 E.2 Registration of New Access-type Values for 3835 Message/external-body 3837 To: IANA@isi.edu 3838 Subject: Registration of new MIME Access-type for 3839 Message/external-body content-type 3841 MIME access-type name: 3843 Required parameters: 3845 Optional parameters: 3847 Published specification: 3849 (The published specification must be an Internet RFC or 3850 RFC-to-be.) 3852 Person & email address to contact for further 3853 information: 3855 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3857 Appendix F -- Summary of the Seven Content-types 3859 Content-type: text 3861 Subtypes defined by this document: plain 3863 Important Parameters: charset 3865 Encoding notes: quoted-printable generally preferred if an 3866 encoding is needed and the character set is mostly an 3867 ASCII superset. 3869 Security considerations: Rich text formats such as TeX and 3870 Troff often contain mechanisms for executing arbitrary 3871 commands or file system operations, and should not be 3872 used automatically unless these security problems have 3873 been addressed. Even plain text may contain control 3874 characters that can be used to exploit the capabilities 3875 of "intelligent" terminals and cause security 3876 violations. User interfaces designed to run on such 3877 terminals should be aware of and try to prevent such 3878 problems. 3879 ________________________________________________________________ 3881 Content-type: multipart 3883 Subtypes defined by this document: mixed, alternative, 3884 digest, parallel. 3886 Important Parameters: boundary 3888 Encoding notes: No content-transfer-encoding is permitted. 3890 ________________________________________________________________ 3892 Content-type: message 3894 Subtypes defined by this document: rfc822, partial, 3895 external-body 3897 Important Parameters: id, number, total, access-type, 3898 expiration, size, permission, name, site, directory, 3899 mode, server, subject 3901 Encoding notes: No content-transfer-encoding is permitted. 3902 Specifically, only "7bit" is permitted for 3904 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3906 "message/partial" or "message/external-body", and only 3907 "7bit", "8bit", or "binary" are permitted for other 3908 subtypes of "message". 3910 ________________________________________________________________ 3912 Content-type: application 3914 Subtypes defined by this document: octet-stream, postscript 3916 Important Parameters: type, padding 3918 Deprecated Parameters: name and conversions were defined in 3919 RFC 1341. 3921 Encoding notes: base64 preferred for unreadable subtypes. 3923 Security considerations: This type is intended for the 3924 transmission of data to be interpreted by locally-installed 3925 programs. If used, for example, to transmit executable 3926 binary programs or programs in general-purpose interpreted 3927 languages, such as LISP programs or shell scripts, severe 3928 security problems could result. Authors of mail-reading 3929 agents are cautioned against giving their systems the power 3930 to execute mail-based application data without carefully 3931 considering the security implications. While it is 3932 certainly possible to define safe application formats and 3933 even safe interpreters for unsafe formats, each interpreter 3934 should be evaluated separately for possible security 3935 problems. 3936 ________________________________________________________________ 3938 Content-type: image 3940 Subtypes defined by this document: jpeg, gif 3942 Important Parameters: none 3944 Encoding notes: base64 generally preferred 3946 ________________________________________________________________ 3948 Content-type: audio 3950 Subtypes defined by this document: basic 3952 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3954 Important Parameters: none 3956 Encoding notes: base64 generally preferred 3958 ________________________________________________________________ 3960 Content-type: video 3962 Subtypes defined by this document: mpeg 3964 Important Parameters: none 3966 Encoding notes: base64 generally preferred 3968 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 3970 Appendix G -- Canonical Encoding Model 3972 There was some confusion, in earlier drafts of this memo, 3973 regarding the model for when email data was to be converted 3974 to canonical form and encoded, and in particular how this 3975 process would affect the treatment of CRLFs, given that the 3976 representation of newlines varies greatly from system to 3977 system. For this reason, a canonical model for encoding is 3978 presented below. 3980 The process of composing a MIME entity can be modeled as 3981 being done in a number of steps. Note that these steps are 3982 roughly similar to those steps used in RFC 1421 and are 3983 performed for each 'innermost level' body: 3985 Step 1. Creation of local form. 3987 The body to be transmitted is created in the system's native 3988 format. The native character set is used, and where 3989 appropriate local end of line conventions are used as well. 3990 The body may be a UNIX-style text file, or a Sun raster 3991 image, or a VMS indexed file, or audio data in a system- 3992 dependent format stored only in memory, or anything else 3993 that corresponds to the local model for the representation 3994 of some form of information. Fundamentally, the data is 3995 created in the "native" form specified by the type/subtype 3996 information. 3998 Step 2. Conversion to canonical form. 4000 The entire body, including "out-of-band" information such as 4001 record lengths and possibly file attribute information, is 4002 converted to a universal canonical form. The specific 4003 content type of the body as well as its associated 4004 attributes dictate the nature of the canonical form that is 4005 used. Conversion to the proper canonical form may involve 4006 character set conversion, transformation of audio data, 4007 compression, or various other operations specific to the 4008 various content types. If character set conversion is 4009 involved, however, care must be taken to understand the 4010 semantics of the content-type, which may have strong 4011 implications for any character set conversion, e.g. with 4012 regard to syntactically meaningful characters in a text 4013 subtype other than "plain". 4015 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 4017 For example, in the case of text/plain data, the text must 4018 be converted to a supported character set and lines must be 4019 delimited with CRLF delimiters in accordance with RFC822. 4020 Note that the restriction on line lengths implied by RFC822 4021 is eliminated if the next step employs either quoted- 4022 printable or base64 encoding. 4024 Step 3. Apply transfer encoding. 4026 A Content-Transfer-Encoding appropriate for this body is 4027 applied. Note that there is no fixed relationship between 4028 the content type and the transfer encoding. In particular, 4029 it may be appropriate to base the choice of base64 or 4030 quoted-printable on character frequency counts which are 4031 specific to a given instance of a body. 4033 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 4035 Step 4. Insertion into entity. 4037 The encoded object is inserted into a MIME entity with 4038 appropriate headers. The entity is then inserted into the 4039 body of a higher-level entity (message or multipart) if 4040 needed. 4042 It is vital to note that these steps are only a model; they 4043 are specifically NOT a blueprint for how an actual system 4044 would be built. In particular, the model fails to account 4045 for two common designs: 4047 1. In many cases the conversion to a canonical 4048 form prior to encoding will be subsumed into the 4049 encoder itself, which understands local formats 4050 directly. For example, the local newline 4051 convention for text bodies might be carried 4052 through to the encoder itself along with knowledge 4053 of what that format is. 4055 2. The output of the encoders may have to pass 4056 through one or more additional steps prior to 4057 being transmitted as a message. As such, the 4058 output of the encoder may not be conformant with 4059 the formats specified by RFC822. In particular, 4060 once again it may be appropriate for the 4061 converter's output to be expressed using local 4062 newline conventions rather than using the standard 4063 RFC822 CRLF delimiters. 4065 Other implementation variations are conceivable as well. 4066 The vital aspect of this discussion is that, in spite of any 4067 optimizations, collapsings of required steps, or insertion 4068 of additional processing, the resulting messages must be 4069 consistent with those produced by the model described here. 4070 For example, a message with the following header fields: 4072 Content-type: text/foo; charset=bar 4073 Content-Transfer-Encoding: base64 4075 must be first represented in the text/foo form, then (if 4076 necessary) represented in the "bar" character set, and 4077 finally transformed via the base64 algorithm into a mail- 4078 safe form. 4080 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 4082 Appendix H -- Changes from RFC 1521 4084 This document is a very minor revision of RFC 1521. For the 4085 convenience of those familiar with RFC 1521, the changes 4086 from that document are summarized in this appendix. For 4087 further history, note that Appendix H in RFC 1521 specified 4088 how that document differed from its predecessor, RFC 1341. 4090 1. In the rules on reassembling "message/partial" MIME 4091 entities in section 7.3.2, "Subject" is added to the list of 4092 headers to take from the inner message, and the example is 4093 modified to clarify this point. 4095 2. In the discussion of the application/postscript type in 4096 section 7.4.2, an additional paragraph has been added 4097 warning against the embedding of binary data inside a 4098 PostScript MIME entity. 4100 3. Added a clarifying note to the basic syntax rules in 4101 section 4 to make it clear that the following two forms: 4103 Content-type: text/plain; charset=us-ascii 4104 Content-type: text/plain; charset="us-ascii" 4106 are completely equivalent. 4108 4. In section 7.2.3, a typo was fixed that said 4109 "application/external-body" instead of "message/external- 4110 body". 4112 5. In section 5, the following paragraph was added to 4113 clarify the use of the "7bit" transfer-encoding in multipart 4114 or message entities encapsulating "8bit" or "binary" data: 4116 It should also be noted that, by definition, if a 4117 "multipart" or "message" entity has a transfer- 4118 encoding value such as "7bit", but one of the 4119 enclosed parts has a less restrictive value such 4120 as "8bit", then either the outer "7bit" labelling 4121 is in error, because 8 bit data are included, or 4122 the inner "8bit" labelling placed an unnecessarily 4123 high demand on the transport system because the 4124 actual included data were actually 7bit-safe. 4126 6. In Appendix A, "multipart/digest" support was added to 4127 the list of requirements for minimal MIME conformance. 4129 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 4131 Also, the requirement for "message/rfc822" support were 4132 strengthened to clarify the importance of recognizing 4133 recursive structure. 4135 7. In section 7.3.1, the definition of "message/rfc822" was 4136 changed to indicate that at least one of the "From", 4137 "Subject", or "Date" headers must be present. 4139 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 4141 References 4143 [US-ASCII] Coded Character Set--7-Bit American Standard Code 4144 for Information Interchange, ANSI X3.4-1986. 4146 [ATK] Borenstein, Nathaniel S., Multimedia Applications 4147 Development with the Andrew Toolkit, Prentice-Hall, 1990. 4149 [GIF] Graphics Interchange Format (Version 89a), Compuserve, 4150 Inc., Columbus, Ohio, 1990. 4152 [ISO-2022] International Standard--Information Processing-- 4153 ISO 7-bit and 8-bit coded character sets--Code extension 4154 techniques, ISO 2022:1986. 4156 [ISO-8859] Information Processing -- 8-bit Single-Byte Coded 4157 Graphic Character Sets -- Part 1: Latin Alphabet No. 1, ISO 4158 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2, 4159 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part 4160 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5: 4161 Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6: 4162 Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7: 4163 Latin/Greek alphabet, ISO 8859-7, 1987. Part 8: 4164 Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin 4165 alphabet No. 5, ISO 8859-9, 1990. 4167 [ISO-646] International Standard--Information Processing-- 4168 ISO 7-bit coded character set for information interchange, 4169 ISO 646:1983. 4171 [MPEG] Video Coding Draft Standard ISO 11172 CD, ISO 4172 IEC/TJC1/SC2/WG11 (Motion Picture Experts Group), May, 1991. 4174 [PCM] CCITT, Fascicle III.4 - Recommendation G.711, "Pulse 4175 Code Modulation (PCM) of Voice Frequencies", Geneva, 1972. 4177 [POSTSCRIPT] Adobe Systems, Inc., PostScript Language 4178 Reference Manual, Addison-Wesley, 1985. 4180 [POSTSCRIPT2] Adobe Systems, Inc., PostScript Language 4181 Reference Manual, Addison-Wesley, Second Edition, 1990. 4183 [X400] Schicker, Pietro, "Message Handling Systems, X.400", 4184 Message Handling Systems and Distributed Applications, E. 4185 Stefferud, O-j. Jacobsen, and P. Schicker, eds., North- 4186 Holland, 1989, pp. 3-41. 4188 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 4190 [RFC-783] Sollins, K.R. "TFTP Protocol (revision 2)", 4191 RFC-783, MIT, June 1981. 4193 [RFC-821] Postel, J.B. "Simple Mail Transfer 4194 Protocol", STD 10, RFC 821, USC/Information Sciences 4195 Institute, August 1982. 4197 [RFC-822] Crocker, D., "Standard for the Format of ARPA 4198 Internet Text Messages", STD 11, RFC 822, UDEL, August 1982. 4200 [RFC-934] Rose, M., and E. Stefferud, "Proposed Standard for 4201 Message Encapsulation", RFC 934, Delaware and NMA, January 4202 1985. 4204 [RFC-959] Postel, J. and J. Reynolds, "File Transfer 4205 Protocol", STD 9, RFC 959, USC/Information Sciences 4206 Institute, October 1985. 4208 [RFC-1049] Sirbu, M., "Content-Type Header Field for 4209 Internet Messages", STD 11, RFC 1049, CMU, March 1988. 4211 [RFC-1421] Linn, J., "Privacy Enhancement for Internet 4212 Electronic Mail: Part I - Message Encryption and 4213 Authentication Procedures", RFC 1421, IAB IRTF PSRG, IETF 4214 PEM WG, February 1993. 4216 [RFC-1154] Robinson, D. and R. Ullmann, "Encoding Header 4217 Field for Internet Messages", RFC 1154, Prime Computer, 4218 Inc., April 1990. 4220 [RFC-1341] Borenstein, N., and N. Freed, "MIME 4221 (Multipurpose Internet Mail Extensions): Mechanisms for 4222 Specifying and Describing the Format of Internet Message 4223 Bodies", RFC 1341, Bellcore, Innosoft, June 1992. 4225 [RFC-1342] Moore, K., "Representation of Non-Ascii Text in 4226 Internet Message Headers", RFC 1342, University of 4227 Tennessee, June 1992. 4229 [RFC-1343] Borenstein, N., "A User Agent Configuration 4230 Mechanism for Multimedia Mail Format Information", RFC 1343, 4231 Bellcore, June 1992. 4233 [RFC-1344] Borenstein, N., "Implications of MIME for 4234 Internet Mail Gateways", RFC 1344, Bellcore, June 1992. 4236 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 4238 [RFC-1345] Simonsen, K., "Character Mnemonics & Character 4239 Sets", RFC 1345, Rationel Almen Planlaegning, June 1992. 4241 [RFC-1426] Klensin, J., (WG Chair), Freed, N., (Editor), 4242 Rose, M., Stefferud, E., and D. Crocker, "SMTP Service 4243 Extension for 8bit-MIME transport", RFC 1426, United Nations 4244 Universit, Innosoft, Dover Beach Consulting, Inc., Network 4245 Management Associates, Inc., The Branch Office, February 4246 1993. 4248 [RFC-1522] Moore, K., "Representation of Non-Ascii Text in 4249 Internet Message Headers" RFC 1522, University of Tennessee, 4250 September 1993. 4252 [RFC-1340] Reynolds, J., and J. Postel, "Assigned Numbers", 4253 STD 2, RFC 1340, USC/Information Sciences Institute, July 4254 1992. 4256 [RFC-1521] Borenstein, N., and N. Freed, "MIME 4257 (Multipurpose Internet Mail Extensions): Mechanisms for 4258 Specifying and Describing the Format of Internet Message 4259 Bodies", RFC 1521, Bellcore, Innosoft, September, 1993. 4261 [RFC-1563] Borenstein, N., "The text/enriched MIME Content- 4262 type", RFC 1563, Bellcore, January, 1994. 4264 Expires 11/20/94 draft-ietf-822-mime-00.txt May 1994 4266 THIS PAGE INTENTIONALLY LEFT BLANK. 4268 Table of Contents 4270 1 Introduction....................................... 4 4271 2 Notations, Conventions, and Generic BNF Grammar.... 4 4272 3 The MIME-Version Header Field...................... 6 4273 4 The Content-Type Header Field...................... 8 4274 5 The Content-Transfer-Encoding Header Field......... 14 4275 5.1 Quoted-Printable Content-Transfer-Encoding......... 20 4276 5.2 Base64 Content-Transfer-Encoding................... 24 4277 6 Additional Content- Header Fields.................. 27 4278 6.1 Optional Content-ID Header Field................... 27 4279 6.2 Optional Content-Description Header Field.......... 27 4280 7 The Predefined Content-Type Values................. 28 4281 7.1 The Text Content-Type.............................. 28 4282 7.1.1 The charset parameter.............................. 28 4283 7.1.2 The Text/plain subtype............................. 32 4284 7.2 The Multipart Content-Type......................... 33 4285 7.2.1 Multipart: The common syntax...................... 34 4286 7.2.2 The Multipart/mixed (primary) subtype.............. 40 4287 7.2.3 The Multipart/alternative subtype.................. 40 4288 7.2.4 The Multipart/digest subtype....................... 43 4289 7.2.5 The Multipart/parallel subtype..................... 43 4290 7.3 The Message Content-Type........................... 44 4291 7.3.1 The Message/rfc822 (primary) subtype............... 45 4292 7.3.2 The Message/Partial subtype........................ 45 4293 7.3.3 The Message/External-Body subtype.................. 49 4294 7.4 The Application Content-Type....................... 58 4295 7.4.1 The Application/Octet-Stream (primary) subtype..... 58 4296 7.4.2 The Application/PostScript subtype................. 59 4297 7.4.3 Other Application subtypes......................... 62 4298 7.5 The Image Content-Type............................. 63 4299 7.6 The Audio Content-Type............................. 63 4300 7.7 The Video Content-Type............................. 64 4301 7.8 Experimental Content-Type Values................... 64 4302 Summary............................................ 65 4303 Security Considerations............................ 65 4304 Authors' Addresses................................. 66 4305 Acknowledgements................................... 67 4306 Appendix A -- Minimal MIME-Conformance............. 69 4307 Appendix B -- General Guidelines For Sending Email Data72 4308 Appendix C -- A Complex Multipart Example.......... 75 4309 Appendix D -- Collected Grammar.................... 77 4310 Appendix E -- IANA Registration Procedures......... 82 4311 E.1 Registration of New Content-type/subtype Values..82 4313 E.2 Registration of New Access-type Values for Message/external-body83 4314 Appendix F -- Summary of the Seven Content-types... 84 4315 Appendix G -- Canonical Encoding Model............. 87 4316 Appendix H -- Changes from RFC 1521................ 90 4317 References......................................... 92