idnits 2.17.00 (12 Aug 2021) /tmp/idnits64896/draft-ietf-822ext-mime-imb-04.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2022-05-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 424: '... in accordance with this document MUST...' RFC 2119 keyword, line 502: '...agents MUST include proper MIME labell...' RFC 2119 keyword, line 677: '... standard values MUST be documented, r...' RFC 2119 keyword, line 972: '... MAY be represented as the US-AS...' RFC 2119 keyword, line 977: '... Octets with values of 9 and 32 MAY be...' (6 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1480 has weird spacing: '... no inter...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 1995) is 9653 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC-822' on line 121 looks like a reference -- Missing reference section? 'RFC-821' on line 126 looks like a reference -- Missing reference section? 'RFC821' on line 352 looks like a reference -- Missing reference section? 'ATK' on line 150 looks like a reference -- Missing reference section? 'X400' on line 155 looks like a reference -- Missing reference section? 'RFC-1049' on line 186 looks like a reference -- Missing reference section? 'RFC-1123' on line 225 looks like a reference -- Missing reference section? 'RFC-1344' on line 229 looks like a reference -- Missing reference section? 'RFC-1345' on line 230 looks like a reference -- Missing reference section? 'RFC-1524' on line 230 looks like a reference -- Missing reference section? 'RFC-MIME-REG' on line 679 looks like a reference -- Missing reference section? 'RFC-1652' on line 782 looks like a reference -- Missing reference section? 'RFC-1421' on line 1137 looks like a reference -- Missing reference section? 'RFC-1741' on line 1149 looks like a reference -- Missing reference section? 'RFC-MIME-HEADERS' on line 1284 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 17 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Nathaniel Borenstein 2 Internet Draft Ned Freed 3 5 Multipurpose Internet Mail Extensions 6 (MIME) Part One: 8 Format of Internet Message Bodies 10 December 1995 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are 15 working documents of the Internet Engineering Task Force 16 (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months. Internet-Drafts may be updated, replaced, or obsoleted 22 by other documents at any time. It is not appropriate to use 23 Internet-Drafts as reference material or to cite them other 24 than as a "working draft" or "work in progress". 26 To learn the current status of any Internet-Draft, please 27 check the 1id-abstracts.txt listing contained in the 28 Internet-Drafts Shadow Directories on ds.internic.net (US East 29 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), 30 or munnari.oz.au (Pacific Rim). 32 1. Abstract 34 STD 11, RFC 822, defines a message representation protocol 35 specifying considerable detail about US-ASCII message headers, 36 and leaves the message content, or message body, as flat US- 37 ASCII text. This set of documents, collectively called the 38 Multipurpose Internet Mail Extensions, or MIME, redefines the 39 format of messages to allow for 40 (1) textual message bodies in character sets other than 41 US-ASCII, 43 (2) non-textual message bodies, 45 (3) multi-part message bodies, and 47 (4) textual header information in character sets other than 48 US-ASCII. 50 These documents are based on earlier work documented in RFC 51 934, STD 11, and RFC 1049, but extends and revises them. 52 Because RFC 822 said so little about message bodies, these 53 documents are largely orthogonal to (rather than a revision 54 of) RFC 822. 56 In particular, these documents are designed to provide 57 facilities to include multiple parts in a single message, to 58 represent body and header text in character sets other than 59 US-ASCII, to represent formatted multi-font text messages, to 60 represent non-textual material such as images and audio clips, 61 and generally to facilitate later extensions defining new 62 types of Internet mail for use by cooperating mail agents. 64 This initial document specifies the various headers used to 65 describe the structure of MIME messages. The second document, 66 RFC MIME-IMT, defines the general structure of the MIME media 67 typing system and defines an initial set of media types. The 68 third document, RFC MIME-HEADERS, describes extensions to RFC 69 822 to allow non-US-ASCII text data in Internet mail header 70 fields. The fourth document, RFC MIME-REG, specifies various 71 IANA registration procedures for MIME-related facilities. The 72 fifth and final document, RFC MIME-CONF, describes MIME 73 conformance criteria as well as providing some illustrative 74 examples of MIME message formats, acknowledgements, and the 75 bibliography. 77 These documents are revisions of RFCs 1521, 1522, and 1590, 78 which themselves were revisions of RFCs 1341 and 1342. An 79 appendix in RFC MIME-CONF describes differences and changes 80 from previous versions. 82 2. Table of Contents 84 1 Abstract .............................................. 1 85 2 Table of Contents ..................................... 3 86 3 Introduction .......................................... 4 87 4 Definitions, Conventions, and Generic BNF Grammar ..... 6 88 4.1 CRLF ................................................ 7 89 4.2 Character Set ....................................... 7 90 4.3 Message ............................................. 8 91 4.4 Entity .............................................. 8 92 4.5 Body Part ........................................... 8 93 4.6 Body ................................................ 8 94 4.7 7bit Data ........................................... 9 95 4.8 8bit Data ........................................... 9 96 4.9 Binary Data ......................................... 9 97 4.10 Lines .............................................. 9 98 5 MIME Header Fields .................................... 9 99 6 MIME-Version Header Field ............................. 10 100 7 Content-Type Header Field ............................. 13 101 7.1 Syntax of the Content-Type Header Field ............. 14 102 7.2 Content-Type Defaults ............................... 16 103 8 Content-Transfer-Encoding Header Field ................ 17 104 8.1 Content-Transfer-Encoding Syntax .................... 17 105 8.2 Content-Transfer-Encodings Sematics ................. 18 106 8.3 New Content-Transfer-Encodings ...................... 19 107 8.4 Interpretation and Use .............................. 19 108 8.5 Translating Encodings ............................... 22 109 8.6 Canonical Encoding Model ............................ 22 110 8.7 Quoted-Printable Content-Transfer-Encoding .......... 22 111 8.8 Base64 Content-Transfer-Encoding .................... 27 112 9 Content-ID Header Field ............................... 29 113 10 Content-Description Header Field ..................... 30 114 11 Additional MIME Header Fields ........................ 30 115 12 Summary .............................................. 30 116 13 Security Considerations .............................. 31 117 14 Authors' Addresses ................................... 32 118 A Collected Grammar ..................................... 33 119 3. Introduction 121 Since its publication in 1982, RFC 822 [RFC-822] has defined 122 the standard format of textual mail messages on the Internet. 123 Its success has been such that the RFC 822 format has been 124 adopted, wholly or partially, well beyond the confines of the 125 Internet and the Internet SMTP transport defined by RFC 821 126 [RFC-821]. As the format has seen wider use, a number of 127 limitations have proven increasingly restrictive for the user 128 community. 130 RFC 822 was intended to specify a format for text messages. 131 As such, non-text messages, such as multimedia messages that 132 might include audio or images, are simply not mentioned. Even 133 in the case of text, however, RFC 822 is inadequate for the 134 needs of mail users whose languages require the use of 135 character sets richer than US-ASCII. Since RFC 822 does not 136 specify mechanisms for mail containing audio, video, Asian 137 language text, or even text in most European languages, 138 additional specifications are needed. 140 One of the notable limitations of RFC 821/822 based mail 141 systems is the fact that they limit the contents of electronic 142 mail messages to relatively short lines (e.g. 1000 characters 143 or less [RFC821]) of 7bit US-ASCII. This forces users to 144 convert any non-textual data that they may wish to send into 145 seven-bit bytes representable as printable US-ASCII characters 146 before invoking a local mail UA (User Agent, a program with 147 which human users send and receive mail). Examples of such 148 encodings currently used in the Internet include pure 149 hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in 150 RFC 1421, the Andrew Toolkit Representation [ATK], and many 151 others. 153 The limitations of RFC 822 mail become even more apparent as 154 gateways are designed to allow for the exchange of mail 155 messages between RFC 822 hosts and X.400 hosts. X.400 [X400] 156 specifies mechanisms for the inclusion of non-textual material 157 within electronic mail messages. The current standards for 158 the mapping of X.400 messages to RFC 822 messages specify 159 either that X.400 non-textual material must be converted to 160 (not encoded in) IA5Text format, or that they must be 161 discarded, notifying the RFC 822 user that discarding has 162 occurred. This is clearly undesirable, as information that a 163 user may wish to receive is lost. Even though a user agent 164 may not have the capability of dealing with the non-textual 165 material, the user might have some mechanism external to the 166 UA that can extract useful information from the material. 167 Moreover, it does not allow for the fact that the message may 168 eventually be gatewayed back into an X.400 message handling 169 system (i.e., the X.400 message is "tunneled" through Internet 170 mail), where the non-textual information would definitely 171 become useful again. 173 This document describes several mechanisms that combine to 174 solve most of these problems without introducing any serious 175 incompatibilities with the existing world of RFC 822 mail. In 176 particular, it describes: 178 (1) A MIME-Version header field, which uses a version 179 number to declare a message to be conformant with this 180 specification and allows mail processing agents to 181 distinguish between such messages and those generated 182 by older or non-conformant software, which are presumed 183 to lack such a field. 185 (2) A Content-Type header field, generalized from RFC 1049 186 [RFC-1049], which can be used to specify the media type 187 and subtype of data in the body of a message and to 188 fully specify the native representation (canonical 189 form) of such data. 191 (3) A Content-Transfer-Encoding header field, which can be 192 used to specify both the encoding transformation that 193 was applied to the body and the domain of the result. 194 Encoding transformations other than the identity 195 transformation are usually applied to data in order to 196 allow it to pass through mail transport mechanisms 197 which may have data or character set limitations. 199 (4) Two additional header fields that can be used to 200 further describe the data in a body, the Content-ID and 201 Content-Description header fields. 203 All of the header fields defined in this document are subject 204 to the general syntactic rules for header fields specified in 205 RFC 822. In particular, all of these header fields except for 206 Content-Disposition can include RFC 822 comments, which have 207 no semantic content and should be ignored during MIME 208 processing. 210 Finally, to specify and promote interoperability, RFC MIME- 211 CONF provides a basic applicability statement for a subset of 212 the above mechanisms that defines a minimal level of 213 "conformance" with this document. 215 HISTORICAL NOTE: Several of the mechanisms described in this 216 set of documents may seem somewhat strange or even baroque at 217 first reading. It is important to note that compatibility 218 with existing standards AND robustness across existing 219 practice were two of the highest priorities of the working 220 group that developed this set of documents. In particular, 221 compatibility was always favored over elegance. 223 Please refer to the current edition of the "IAB Official 224 Protocol Standards" for the standardization state and status 225 of this protocol. RFC 822 and RFC 1123 [RFC-1123] also 226 provide essential background for MIME since no conforming 227 implementation of MIME can violate them. In addition, several 228 other informational RFC documents will be of interest to the 229 MIME implementor, in particular RFC 1344 [RFC-1344], RFC 1345 230 [RFC-1345], and RFC 1524 [RFC-1524]. 232 4. Definitions, Conventions, and Generic BNF Grammar 234 Although the mechanisms specified in this set of documents are 235 all described in prose, most are also described formally in 236 the augmented BNF notation of RFC 822. Implementors will need 237 to be familiar with this notation in order to understand this 238 specification, and are referred to RFC 822 for a complete 239 explanation of the augmented BNF notation. 241 Some of the augmented BNF in this set of documents makes named 242 references to syntax rules defined in RFC 822. A complete 243 formal grammar, then, is obtained by combining the collected 244 grammar appendices in each document in this set with the BNF 245 of RFC 822 plus the modifications to RFC 822 defined in RFC 246 1123 (which specifically changes the syntax for `return', 247 `date' and `mailbox'). 249 All numeric and octet values are given in decimal notation in 250 this set of documents. All media type values, subtype values, 251 and parameter names as defined are case-insensitive. However, 252 parameter values are case-sensitive unless otherwise specified 253 for the specific parameter. 255 FORMATTING NOTE: Notes, such at this one, provide additional 256 nonessential information which may be skipped by the reader 257 without missing anything essential. The primary purpose of 258 these non-essential notes is to convey information about the 259 rationale of this set of documents, or to place these 260 documents in the proper historical or evolutionary context. 261 Such information may in particular be skipped by those who are 262 focused entirely on building a conformant implementation, but 263 may be of use to those who wish to understand why certain 264 design choices were made. 266 4.1. CRLF 268 The term CRLF, in this set of documents, refers to the 269 sequence of octets corresponding to the two US-ASCII 270 characters CR (decimal value 13) and LF (decimal value 10) 271 which, taken together, in this order, denote a line break in 272 RFC 822 mail. 274 4.2. Character Set 276 The term "character set" is used in MIME to refer to a method 277 of converting a sequence of octets into a sequence of 278 characters. Note that unconditional and unambiguous 279 conversion in the other direction is not required, in that not 280 all characters may be representable by a given character set 281 and a character set may provide more than one sequence of 282 octets to represent a particular sequence of characters. 284 This definition is intended to allow various kinds of 285 character encodings, from simple single-table mappings such as 286 US-ASCII to complex table switching methods such as those that 287 use ISO 2022's techniques. However, the definition associated 288 with a MIME character set name must fully specify the mapping 289 to be performed. In particular, use of external profiling 290 information to determine the exact mapping is not permitted. 292 NOTE: The term "character set" was originally used in MIME 293 with specifications such as US-ASCII and other 7bit and 8bit 294 schemes which have a simple mapping from single octets to 295 single characters. Multi-octet coded character sets and 296 switching techniques make the situation more complex. For 297 example, some communities use the term "character encoding" 298 for what MIME calls a "character set", while using the phrase 299 "coded character set" to denote an abstract mapping from 300 integers (not octets) to characters. 302 4.3. Message 304 The term "message", when not further qualified, means either a 305 (complete or "top-level") RFC 822 message being transferred on 306 a network, or a message encapsulated in a body of type 307 "message/rfc822" or "message/partial". 309 4.4. Entity 311 The term "entity", refers specifically to the MIME-defined 312 header fields and contents of either a message or one of the 313 parts in the body of a multipart entity. The specification of 314 such entities is the essence of MIME. Since the contents of 315 an entity are often called the "body", it makes sense to speak 316 about the body of an entity. Any sort of field may be present 317 in the header of an entity, but only those fields whose names 318 begin with "content-" actually have any MIME-related meaning. 319 Note that this does NOT imply thay they have no meaning at all 320 -- an entity that is also a message has non-MIME header fields 321 whose meanings are defined by RFC 822. 323 4.5. Body Part 325 The term "body part" refers to an entity inside of a multipart 326 entity. 328 4.6. Body 330 The term "body", when not further qualified, means the body of 331 an entity, that is, the body of either a message or of a body 332 part. 334 NOTE: The previous four definitions are clearly circular. 335 This is unavoidable, since the overall structure of a MIME 336 message is indeed recursive. 338 4.7. 7bit Data 340 "7bit data" refers to data that is all represented as 341 relatively short lines with 998 octets or less between CRLF 342 line separation sequences [RFC821]. No octets with decimal 343 values greater than 127 are allowed and neither are NULs 344 (octets with decimal value 0). CR (decimal value 13) and LF 345 (decimal value 10) octets only occur as part of CRLF line 346 separation sequences. 348 4.8. 8bit Data 350 "8bit data" refers to data that is all represented as 351 relatively short lines with 998 octets or less between CRLF 352 line separation sequences [RFC821]), but octets with decimal 353 values greater than 127 may be used. As with "7bit data" CR 354 and LF octets only occur as part of CRLF line separation 355 sequences and no NULs are allowed. 357 4.9. Binary Data 359 "Binary data" refers to data where any sequence of octets 360 whatsoever is allowed. 362 4.10. Lines 364 "Lines" are defined as sequences of octets separated by a CRLF 365 sequences. This is consistent with both RFC 821 and RFC 822. 366 "Lines" only refers to a unit of data in a message, which may 367 or may not correspond to something that is actually displayed 368 by a user agent. 370 5. MIME Header Fields 372 MIME defines a number of new RFC 822 header fields that are 373 used to describe the content of a MIME entity. These header 374 fields occur in at least two contexts: 376 (1) As part of a regular RFC 822 message header. 378 (2) In a MIME body part header within a multipart 379 construct. 381 The formal definition of these header fields is as follows: 383 entity-headers := [ content CRLF ] 384 [ encoding CRLF ] 385 [ id CRLF ] 386 [ description CRLF ] 387 *( MIME-extension-field CRLF ) 389 MIME-message-headers := entity-headers 390 fields 391 version CRLF 392 ; The ordering of the header 393 ; fields implied by this BNF 394 ; definition should be ignored. 396 MIME-part-headers := entity-headers 397 [ fields ] 398 ; Any field not beginning with 399 ; "content-" can have no defined 400 ; meaning and should be ignored. 401 ; The ordering of the header 402 ; fields implied by this BNF 403 ; definition should be ignored. 405 The syntax of the various specific MIME header fields will be 406 described in the following sections. 408 6. MIME-Version Header Field 410 Since RFC 822 was published in 1982, there has really been 411 only one format standard for Internet messages, and there has 412 been little perceived need to declare the format standard in 413 use. This document is an independent document that 414 complements RFC 822. Although the extensions in this document 415 have been defined in such a way as to be compatible with RFC 416 822, there are still circumstances in which it might be 417 desirable for a mail-processing agent to know whether a 418 message was composed with the new standard in mind. 420 Therefore, this document defines a new header field, "MIME- 421 Version", which is to be used to declare the version of the 422 Internet message body format standard in use. 424 Messages composed in accordance with this document MUST 425 include such a header field, with the following verbatim text: 427 MIME-Version: 1.0 429 The presence of this header field is an assertion that the 430 message has been composed in compliance with this document. 432 Since it is possible that a future document might extend the 433 message format standard again, a formal BNF is given for the 434 content of the MIME-Version field: 436 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 438 Thus, future format specifiers, which might replace or extend 439 "1.0", are constrained to be two integer fields, separated by 440 a period. If a message is received with a MIME-version value 441 other than "1.0", it cannot be assumed to conform with this 442 specification. 444 Note that the MIME-Version header field is required at the top 445 level of a message. It is not required for each body part of 446 a multipart entity. It is required for the embedded headers 447 of a body of type "message/rfc822" or "message/partial" if and 448 only if the embedded message is itself claimed to be MIME- 449 conformant. 451 It is not possible to fully specify how a mail reader that 452 conforms with MIME as defined in this document should treat a 453 message that might arrive in the future with some value of 454 MIME-Version other than "1.0". 456 It is also worth noting that version control for specific 457 media types is not accomplished using the MIME-Version 458 mechanism. In particular, some formats (such as 459 application/postscript) have version numbering conventions 460 that are internal to the media format. Where such conventions 461 exist, MIME does nothing to supersede them. Where no such 462 conventions exist, a MIME media type might use a "version" 463 parameter in the content-type field if necessary. 465 NOTE TO IMPLEMENTORS: When checking MIME-Version values any 466 RFC 822 comment strings that are present must be ignored. In 467 particular, the following four MIME-Version fields are 468 equivalent: 470 MIME-Version: 1.0 472 MIME-Version: 1.0 (produced by MetaSend Vx.x) 474 MIME-Version: (produced by MetaSend Vx.x) 1.0 476 MIME-Version: 1.(produced by MetaSend Vx.x)0 478 In the absence of a MIME-Version field, a receiving user agent 479 (whether MIME compliant or not) may optionally choose to 480 interpret the body of the message according to local 481 conventions. Many such conventions are currently in use and 482 it should be noted that in practice non-MIME messages can 483 contain just about anything. 485 It is impossible to be certain that a non-MIME message is 486 actually plain text in the US-ASCII character set since it 487 might well be a message that, using some set of nonstandard 488 local conventions that predate this document, includes text in 489 another character set or non-textual data presented in a 490 manner that cannot be automatically recognized (e.g., a 491 uuencoded compressed UNIX tar file). 493 MIME-compliant user agents are required, if they support any 494 such nonstandard conventions at all, to do so on received 495 messages only -- they must not send non-MIME messages 496 containing anything other than US-ASCII text. 498 In particular, the use of non-US-ASCII text in messages 499 without a MIME-Version field is strongly discouraged as it 500 impedes interoperability when sending messages between regions 501 with different localization conventions. MIME-compliant user 502 agents MUST include proper MIME labelling when sending 503 anything other than plain text in the US-ASCII character set. 505 In addition, non-MIME user agents should be upgraded if at all 506 possible to include appropriate MIME header information in the 507 messages they send even if nothing else in MIME is supported. 508 This upgrade will have little, if any, effect on non-MIME 509 recipients and will aid MIME in correctly displaying such 510 messages. It also provides a smooth transition path to 511 eventual adoption of other MIME capabilities. 513 7. Content-Type Header Field 515 The purpose of the Content-Type field is to describe the data 516 contained in the body fully enough that the receiving user 517 agent can pick an appropriate agent or mechanism to present 518 the data to the user, or otherwise deal with the data in an 519 appropriate manner. The value in this field is called a media 520 type. 522 HISTORICAL NOTE: The Content-Type header field was first 523 defined in RFC 1049. RFC 1049 used a simpler and less 524 powerful syntax, but one that is largely compatible with the 525 mechanism given here. 527 The Content-Type header field specifies the nature of the data 528 in the body of an entity by giving media type and subtype 529 identifiers, and by providing auxiliary information that may 530 be required for certain media types. After the media type and 531 subtype names, the remainder of the header field is simply a 532 set of parameters, specified in an attribute=value notation. 533 The ordering of parameters is not significant. 535 In general, the top-level media type is used to declare the 536 general type of data, while the subtype specifies a specific 537 format for that type of data. Thus, a media type of 538 "image/xyz" is enough to tell a user agent that the data is an 539 image, even if the user agent has no knowledge of the specific 540 image format "xyz". Such information can be used, for 541 example, to decide whether or not to show a user the raw data 542 from an unrecognized subtype -- such an action might be 543 reasonable for unrecognized subtypes of text, but not for 544 unrecognized subtypes of image or audio. For this reason, 545 registered subtypes of text, image, audio, and video should 546 not contain embedded information that is really of a different 547 type. Such compound formats should be represented using the 548 "multipart" or "application" types. 550 Parameters are modifiers of the media subtype, and as such do 551 not fundamentally affect the nature of the content. The set 552 of meaningful parameters depends on the media type and 553 subtype. Most parameters are associated with a single 554 specific subtype. However, a given top-level media type may 555 define parameters which are applicable to any subtype of that 556 type. Parameters may be required by their defining content 557 type or subtype or they may be optional. MIME implementations 558 must ignore any parameters whose names they do not recognize. 560 For example, the "charset" parameter is applicable to any 561 subtype of "text", while the "boundary" parameter is required 562 for any subtype of the "multipart" media type. 564 There are NO globally-meaningful parameters that apply to all 565 media types. Truly global mechanisms are best addressed, in 566 the MIME model, by the definition of additional Content-* 567 header fields. 569 An initial set of seven top-level media types is defined in 570 MIME-IMT. Five of these are discrete types whose content is 571 essentially opaque as far as MIME processing is concerned. 572 The remaining two are composite types whose contents require 573 additional handling by MIME processors. 575 This set of top-level media types is intended to be 576 substantially complete. It is expected that additions to the 577 larger set of supported types can generally be accomplished by 578 the creation of new subtypes of these initial types. In the 579 future, more top-level types may be defined only by a 580 standards-track extension to this standard. If another top- 581 level type is to be used for any reason, it must be given a 582 name starting with "X-" to indicate its non-standard status 583 and to avoid a potential conflict with a future official name. 585 7.1. Syntax of the Content-Type Header Field 587 In the Augmented BNF notation of RFC 822, a Content-Type 588 header field value is defined as follows: 590 content := "Content-Type" ":" type "/" subtype 591 *(";" parameter) 592 ; Matching of media type and subtype 593 ; is ALWAYS case-insensitive. 595 type := discrete-type / composite-type 597 discrete-type := "text" / "image" / "audio" / "video" / 598 "application" / extension-token 600 composite-type := "message" / "multipart" / extension-token 601 extension-token := ietf-token / x-token 603 ietf-token := 607 x-token := 610 subtype := extension-token / iana-token 612 iana-token := 616 parameter := attribute "=" value 618 attribute := token 619 ; Matching of attributes 620 ; is ALWAYS case-insensitive. 622 value := token / quoted-string 624 token := 1* 627 tspecials := "(" / ")" / "<" / ">" / "@" / 628 "," / ";" / ":" / "\" / <"> 629 "/" / "[" / "]" / "?" / "=" 630 ; Must be in quoted-string, 631 ; to use within parameter values 633 Note that the definition of "tspecials" is the same as the RFC 634 822 definition of "specials" with the addition of the three 635 characters "/", "?", and "=", and the removal of ".". 637 Note also that a subtype specification is MANDATORY -- it may 638 not be omitted from a Content-Type header field. As such, 639 there are no default subtypes. 641 The type, subtype, and parameter names are not case sensitive. 642 For example, TEXT, Text, and TeXt are all equivalent top-level 643 media types. Parameter values are normally case sensitive, 644 but sometimes are interpreted in a case-insensitive fashion, 645 depending on the intended use. (For example, multipart 646 boundaries are case-sensitive, but the "access-type" parameter 647 for message/External-body is not case-sensitive.) 649 Note that the value of a quoted string parameter does not 650 include the quotes. That is, the quotation marks in a 651 quoted-string are not a part of the value of the parameter, 652 but are merely used to delimit that parameter value. In 653 addition, comments are allowed in accordance with RFC 822 654 rules for structured header fields. Thus the following two 655 forms 657 Content-type: text/plain; charset=us-ascii (Plain text) 659 Content-type: text/plain; charset="us-ascii" 661 are completely equivalent. 663 Beyond this syntax, the only syntactic constraint on the 664 definition of subtype names is the desire that their uses must 665 not conflict. That is, it would be undesirable to have two 666 different communities using "Content-Type: application/foobar" 667 to mean two different things. The process of defining new 668 media subtypes, then, is not intended to be a mechanism for 669 imposing restrictions, but simply a mechanism for publicizing 670 their definition and usage. There are, therefore, two 671 acceptable mechanisms for defining new media subtypes: 673 (1) Private values (starting with "X-") may be defined 674 bilaterally between two cooperating agents without 675 outside registration or standardization. 677 (2) New standard values MUST be documented, registered 678 with, and approved by IANA, as described in RFC MIME- 679 REG [RFC-MIME-REG]. 681 The second document in this set, RFC MIME-IMT, defines the 682 initial set of media types for MIME. 684 7.2. Content-Type Defaults 686 Default RFC 822 messages without a MIME Content-Type header 687 are taken by this protocol to be plain text in the US-ASCII 688 character set, which can be explicitly specified as: 690 Content-type: text/plain; charset=us-ascii 692 This default is assumed if no Content-Type header field is 693 specified. It is also recommend that this default be assumed 694 when a syntactically invalid Content-Type header field is 695 encountered. In the presence of a MIME-Version header field 696 and the absence of any Content-Type header field, a receiving 697 User Agent can also assume that plain US-ASCII text was the 698 sender's intent. Plain US-ASCII text may still be assumed in 699 the absence of a MIME-Version or the presence of an 700 syntactically invalid Content-Type header field, but the 701 sender's intent might have been otherwise. 703 8. Content-Transfer-Encoding Header Field 705 Many media types which could be usefully transported via email 706 are represented, in their "natural" format, as 8bit character 707 or binary data. Such data cannot be transmitted over some 708 transfer protocols. For example, RFC 821 (SMTP) restricts 709 mail messages to 7bit US-ASCII data with lines no longer than 710 1000 characters including any trailing CRLF line separator. 712 It is necessary, therefore, to define a standard mechanism for 713 encoding such data into a 7bit short line format. Proper 714 labelling of unencoded material in less restrictive formats 715 for direct use over less restrictive transports is also 716 desireable. This document specifies that such encodings will 717 be indicated by a new "Content-Transfer-Encoding" header 718 field. This field has not been defined by any previous 719 standard. 721 8.1. Content-Transfer-Encoding Syntax 723 The Content-Transfer-Encoding field's value is a single token 724 specifying the type of encoding, as enumerated below. 725 Formally: 727 encoding := "Content-Transfer-Encoding" ":" mechanism 729 mechanism := "7bit" / "8bit" / "binary" / 730 "quoted-printable" / "base64" / 731 ietf-token / x-token 733 These values are not case sensitive -- Base64 and BASE64 and 734 bAsE64 are all equivalent. An encoding type of 7BIT requires 735 that the body is already in a 7bit mail-ready representation. 736 This is the default value -- that is, "Content-Transfer- 737 Encoding: 7BIT" is assumed if the Content-Transfer-Encoding 738 header field is not present. 740 8.2. Content-Transfer-Encodings Sematics 742 This single Content-Transfer-Encoding token actually provides 743 two pieces of information. It specifies what sort of encoding 744 transformation the body was subjected to, and it specifies 745 what the domain of the result is. 747 Three transformations are currently defined: identity, the 748 "quoted-printable" encoding, and the "base64" encoding. The 749 domains are "binary", "8bit" and "7bit". 751 The Content-Transfer-Encoding values "7bit", "8bit", and 752 "binary" all mean that the identity (i.e. NO) encoding 753 transformation has been performed. As such, they serve simply 754 as indicators of the domain of the body data, and provide 755 useful information about the sort of encoding that might be 756 needed for transmission in a given transport system. The 757 terms "7bit data", "8bit data", and "binary data" are all 758 defined in Section 4. 760 The quoted-printable and base64 encodings transform their 761 input from an arbitrary domain into material in the "7bit" 762 range, thus making it safe to carry over restricted 763 transports. The specific definition of the transformations 764 are given below. 766 The proper Content-Transfer-Encoding label must always be 767 used. Labelling unencoded data containing 8bit characters as 768 "7bit" is not allowed, nor is labelling unencoded non-line- 769 oriented data as anything other than "binary" allowed. 771 Unlike media subtypes, a proliferation of Content-Transfer- 772 Encoding values is both undesirable and unnecessary. However, 773 establishing only a single transformation into the "7bit" 774 domain does not seem possible. There is a tradeoff between 775 the desire for a compact and efficient encoding of largely- 776 binary data and the desire for a readable encoding of data 777 that is mostly, but not entirely, 7bit. For this reason, at 778 least two encoding mechanisms are necessary: a "readable" 779 encoding (quoted-printable) and a "dense" encoding (base64). 781 Mail transport for unencoded 8bit data is defined in RFC 1652 782 [RFC-1652]. As of the initial publication of this document, 783 there are no standardized Internet mail transports for which 784 it is legitimate to include unencoded binary data in mail 785 bodies. Thus there are no circumstances in which the "binary" 786 Content-Transfer-Encoding is actually valid in Internet mail. 787 However, in the event that binary mail transport becomes a 788 reality in Internet mail, or when this document is used in 789 conjunction with any other binary-capable transport mechanism, 790 binary bodies should be labelled as such using this mechanism. 792 NOTE: The five values defined for the Content-Transfer- 793 Encoding field imply nothing about the media type other than 794 the algorithm by which it was encoded or the transport system 795 requirements if unencoded. 797 8.3. New Content-Transfer-Encodings 799 Implementors may, if necessary, define private Content- 800 Transfer-Encoding values, but must use an x-token, which is a 801 name prefixed by "X-", to indicate its non-standard status, 802 e.g., "Content-Transfer-Encoding: x-my-new-encoding". 803 Additional standardized Content-Transfer-Encoding values must 804 be specified by a standards-track RFC. Additional 805 requirements such specifications must meet are given in RFC 806 REG. As such, all content-transfer-encoding namespace except 807 that beginning with "X-" is explicitly reserved to the IETF 808 for future use. 810 Unlike media types and subtypes, the creation of new Content- 811 Transfer-Encoding values is STRONGLY discouraged, as it seems 812 likely to hinder interoperability with little potential 813 benefit 815 8.4. Interpretation and Use 817 If a Content-Transfer-Encoding header field appears as part of 818 a message header, it applies to the entire body of that 819 message. If a Content-Transfer-Encoding header field appears 820 as part of an entity's headers, it applies only to the body of 821 that entity. If an entity is of type "multipart" the 822 Content-Transfer-Encoding is not permitted to have any value 823 other than "7bit", "8bit" or "binary". Even more severe 824 restrictions apply to some subtypes of the "message" type. 826 It should be noted that most media types are defined in terms 827 of octets rather than bits, so that the mechanisms described 828 here are mechanisms for encoding arbitrary octet streams, not 829 bit streams. If a bit stream is to be encoded via one of 830 these mechanisms, it must first be converted to an 8bit byte 831 stream using the network standard bit order ("big-endian"), in 832 which the earlier bits in a stream become the higher-order 833 bits in a 8bit byte. A bit stream not ending at an 8bit 834 boundary must be padded with zeroes. RFC MIME-IMT provides a 835 mechanism for noting the addition of such padding in the case 836 of the application/octet-stream media type, which has a 837 "padding" parameter. 839 The encoding mechanisms defined here explicitly encode all 840 data in US-ASCII. Thus, for example, suppose an entity has 841 header fields such as: 843 Content-Type: text/plain; charset=ISO-8859-1 844 Content-transfer-encoding: base64 846 This must be interpreted to mean that the body is a base64 847 US-ASCII encoding of data that was originally in ISO-8859-1, 848 and will be in that character set again after decoding. 850 Certain Content-Transfer-Encoding values may only be used on 851 certain media types. In particular, it is EXPRESSLY FORBIDDEN 852 to use any encodings other than "7bit", "8bit", or "binary" 853 with any composite media type, i.e. one that recursively 854 includes other Content-Type fields. Currently the only 855 composite media types are "multipart" and "message". All 856 encodings that are desired for bodies of type multipart or 857 message must be done at the innermost level, by encoding the 858 actual body that needs to be encoded. 860 It should also be noted that, by definition, if a composite 861 entity has a transfer-encoding value such as "7bit", but one 862 of the enclosed entities has a less restrictive value such as 863 "8bit", then either the outer "7bit" labelling is in error, 864 because 8bit data are included, or the inner "8bit" labelling 865 placed an unnecessarily high demand on the transport system 866 because the actual included data were actually 7bit-safe. 868 NOTE ON ENCODING RESTRICTIONS: Though the prohibition against 869 using content-transfer-encodings on composite body data may 870 seem overly restrictive, it is necessary to prevent nested 871 encodings, in which data are passed through an encoding 872 algorithm multiple times, and must be decoded multiple times 873 in order to be properly viewed. Nested encodings add 874 considerable complexity to user agents: Aside from the 875 obvious efficiency problems with such multiple encodings, they 876 can obscure the basic structure of a message. In particular, 877 they can imply that several decoding operations are necessary 878 simply to find out what types of bodies a message contains. 879 Banning nested encodings may complicate the job of certain 880 mail gateways, but this seems less of a problem than the 881 effect of nested encodings on user agents. 883 Any entity with an unrecognized Content-Transfer-Encoding must 884 be treated as if it has a Content-Type of "application/octet- 885 stream", regardless of what the Content-Type header field 886 actually says. 888 NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT- 889 TRANSFER-ENCODING: It may seem that the Content-Transfer- 890 Encoding could be inferred from the characteristics of the 891 media that is to be encoded, or, at the very least, that 892 certain Content-Transfer-Encodings could be mandated for use 893 with specific media types. There are several reasons why this 894 is not the case. First, given the varying types of transports 895 used for mail, some encodings may be appropriate for some 896 combinations of media types and transports but not for others. 897 (For example, in an 8bit transport, no encoding would be 898 required for text in certain character sets, while such 899 encodings are clearly required for 7bit SMTP.) 901 Second, certain media types may require different types of 902 transfer encoding under different circumstances. For example, 903 many PostScript bodies might consist entirely of short lines 904 of 7bit data and hence require no encoding at all. Other 905 PostScript bodies (especially those using Level 2 PostScript's 906 binary encoding mechanism) may only be reasonably represented 907 using a binary transport encoding. Finally, since the 908 Content-Type field is intended to be an open-ended 909 specification mechanism, strict specification of an 910 association between media types and encodings effectively 911 couples the specification of an application protocol with a 912 specific lower-level transport. This is not desirable since 913 the developers of a media type should not have to be aware of 914 all the transports in use and what their limitations are. 916 8.5. Translating Encodings 918 The quoted-printable and base64 encodings are designed so that 919 conversion between them is possible. The only issue that 920 arises in such a conversion is the handling of hard line 921 breaks. When converting from quoted-printable to base64 a 922 hard line break must be converted into a CRLF sequence. 923 Similarly, a CRLF sequence in base64 data must be converted to 924 a quoted-printable hard line break, but ONLY when converting 925 text data. 927 8.6. Canonical Encoding Model 929 There was some confusion, in the predecessors of this RFC, 930 regarding the model for when email data was to be converted to 931 canonical form and encoded, and in particular how this process 932 would affect the treatment of CRLFs, given that the 933 representation of newlines varies greatly from system to 934 system, and the relationship between content-transfer- 935 encodings and character sets. A canonical model for encoding 936 is presented in RFC MIME-CONF for this reason. 938 8.7. Quoted-Printable Content-Transfer-Encoding 940 The Quoted-Printable encoding is intended to represent data 941 that largely consists of octets that correspond to printable 942 characters in the US-ASCII character set. It encodes the data 943 in such a way that the resulting octets are unlikely to be 944 modified by mail transport. If the data being encoded are 945 mostly US-ASCII text, the encoded form of the data remains 946 largely recognizable by humans. A body which is entirely US- 947 ASCII may also be encoded in Quoted-Printable to ensure the 948 integrity of the data should the message pass through a 949 character-translating, and/or line-wrapping gateway. 951 In this encoding, octets are to be represented as determined 952 by the following rules: 954 (1) (General 8bit representation) Any octet, except a CR or 955 LF that is part of a CRLF line break of the canonical 956 (standard) form of the data being encoded, may be 957 represented by an "=" followed by a two digit 958 hexadecimal representation of the octet's value. The 959 digits of the hexadecimal alphabet, for this purpose, 960 are "0123456789ABCDEF". Uppercase letters must be used 961 when sending hexadecimal data, though a robust 962 implementation may choose to recognize lowercase 963 letters on receipt. Thus, for example, the decimal 964 value 12 (US-ASCII form feed) can be represented by 965 "=0C", and the decimal value 61 (US-ASCII EQUAL SIGN) 966 can be represented by "=3D". This rule must be 967 followed except when the following rules allow an 968 alternative encoding. 970 (2) (Literal representation) Octets with decimal values of 971 33 through 60 inclusive, and 62 through 126, inclusive, 972 MAY be represented as the US-ASCII characters which 973 correspond to those octets (EXCLAMATION POINT through 974 LESS THAN, and GREATER THAN through TILDE, 975 respectively). 977 (3) (White Space) Octets with values of 9 and 32 MAY be 978 represented as US-ASCII TAB (HT) and SPACE characters, 979 respectively, but MUST NOT be so represented at the end 980 of an encoded line. Any TAB (HT) or SPACE characters 981 on an encoded line MUST thus be followed on that line 982 by a printable character. In particular, an "=" at the 983 end of an encoded line, indicating a soft line break 984 (see rule #5) may follow one or more TAB (HT) or SPACE 985 characters. It follows that an octet with decimal 986 value 9 or 32 appearing at the end of an encoded line 987 must be represented according to Rule #1. This rule is 988 necessary because some MTAs (Message Transport Agents, 989 programs which transport messages from one user to 990 another, or perform a portion of such transfers) are 991 known to pad lines of text with SPACEs, and others are 992 known to remove "white space" characters from the end 993 of a line. Therefore, when decoding a Quoted-Printable 994 body, any trailing white space on a line must be 995 deleted, as it will necessarily have been added by 996 intermediate transport agents. 998 (4) (Line Breaks) A line break in a text body, represented 999 as a CRLF sequence in the text canonical form, must be 1000 represented by a (RFC 822) line break, which is also a 1001 CRLF sequence, in the Quoted-Printable encoding. Since 1002 the canonical representation of media types other than 1003 text do not generally include the representation of 1004 line breaks as CRLF sequences, no hard line breaks 1005 (i.e. line breaks that are intended to be meaningful 1006 and to be displayed to the user) should occur in the 1007 quoted-printable encoding of such types. Sequences 1008 like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely 1009 appear in non-text data represented in quoted- 1010 printable, of course. 1012 Note that many implementations may elect to encode the 1013 local representation of various content types directly 1014 rather than converting to canonical form first, 1015 encoding, and then converting back to local 1016 representation. In particular, this may apply to plain 1017 text material on systems that use newline conventions 1018 other than a CRLF terminator sequence. Such an 1019 implementation optimization is permissible, but only 1020 when the combined canonicalization-encoding step is 1021 equivalent to performing the three steps separately. 1023 (5) (Soft Line Breaks) The Quoted-Printable encoding 1024 REQUIRES that encoded lines be no more than 76 1025 characters long. If longer lines are to be encoded 1026 with the Quoted-Printable encoding, "soft" line breaks 1027 must be used. An equal sign as the last character on a 1028 encoded line indicates such a non-significant ("soft") 1029 line break in the encoded text. 1031 Thus if the "raw" form of the line is a single unencoded line 1032 that says: 1034 Now's the time for all folk to come to the aid of their country. 1036 This can be represented, in the Quoted-Printable encoding, as: 1038 Now's the time = 1039 for all folk to come= 1040 to the aid of their country. 1042 This provides a mechanism with which long lines are encoded in 1043 such a way as to be restored by the user agent. The 76 1044 character limit does not count the trailing CRLF, but counts 1045 all other characters, including any equal signs. 1047 Since the hyphen character ("-") may be represented as itself 1048 in the Quoted-Printable encoding, care must be taken, when 1049 encapsulating a quoted-printable encoded body inside one or 1050 more multipart entities, to ensure that the boundary delimiter 1051 does not appear anywhere in the encoded body. (A good 1052 strategy is to choose a boundary that includes a character 1053 sequence such as "=_" which can never appear in a quoted- 1054 printable body. See the definition of multipart messages in 1055 MIME-IMT.) 1057 NOTE: The quoted-printable encoding represents something of a 1058 compromise between readability and reliability in transport. 1059 Bodies encoded with the quoted-printable encoding will work 1060 reliably over most mail gateways, but may not work perfectly 1061 over a few gateways, notably those involving translation into 1062 EBCDIC. A higher level of confidence is offered by the base64 1063 Content-Transfer-Encoding. A way to get reasonably reliable 1064 transport through EBCDIC gateways is to also quote the US- 1065 ASCII characters 1067 !"#$@[\]^`{|}~ 1069 according to rule #1. 1071 Because quoted-printable data is generally assumed to be 1072 line-oriented, it is to be expected that the representation of 1073 the breaks between the lines of quoted printable data may be 1074 altered in transport, in the same manner that plain text mail 1075 has always been altered in Internet mail when passing between 1076 systems with differing newline conventions. If such 1077 alterations are likely to constitute a corruption of the data, 1078 it is probably more sensible to use the base64 encoding rather 1079 than the quoted-printable encoding. 1081 WARNING TO IMPLEMENTORS: If binary data are encoded in 1082 quoted-printable, care must be taken to encode CR and LF 1083 characters as "=0D" and "=0A", respectively. In particular, a 1084 CRLF sequence in binary data should be encoded as "=0D=0A". 1085 Otherwise, if CRLF were represented as a hard line break, it 1086 might be incorrectly decoded on platforms with different line 1087 break conventions. 1089 For formalists, the syntax of quoted-printable data is 1090 described by the following grammar: 1092 quoted-printable := qp-line *(CRLF qp-line) 1094 qp-line := *(qp-segment transport-padding CRLF) 1095 qp-part transport-padding 1097 qp-part := qp-section 1098 ; Maximum length of 76 characters 1100 qp-segment := qp-section *(SPACE / TAB) "=" 1101 ; Maximum length of 76 characters 1103 qp-section := [*(ptext / SPACE / TAB) ptext] 1105 ptext := hex-octet / safe-char 1107 safe-char := 1109 ; Characters not listed as "mail-safe" in 1110 ; RFC MIME-CONF are also not recommended. 1112 hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1113 ; Octet must be used for characters > 127, =, 1114 ; SPACEs or TABs at the ends of lines, and is 1115 ; recommended for any character not listed in 1116 ; RFC MIME-CONF as "mail-safe". 1118 transport-padding := *LWSP-char 1119 ; Composers MUST NOT generate 1120 ; non-zero length transport 1121 ; padding, but receivers MUST 1122 ; be able to handle padding 1123 ; added by message transports. 1125 IMPORTANT: The addition of LWSP between the elements shown in 1126 this BNF is NOT allowed since this BNF does not specify a 1127 structured header field. 1129 8.8. Base64 Content-Transfer-Encoding 1131 The Base64 Content-Transfer-Encoding is designed to represent 1132 arbitrary sequences of octets in a form that need not be 1133 humanly readable. The encoding and decoding algorithms are 1134 simple, but the encoded data are consistently only about 33 1135 percent larger than the unencoded data. This encoding is 1136 virtually identical to the one used in Privacy Enhanced Mail 1137 (PEM) applications, as defined in RFC 1421 [RFC-1421]. 1139 A 65-character subset of US-ASCII is used, enabling 6 bits to 1140 be represented per printable character. (The extra 65th 1141 character, "=", is used to signify a special processing 1142 function.) 1144 NOTE: This subset has the important property that it is 1145 represented identically in all versions of ISO 646, including 1146 US-ASCII, and all characters in the subset are also 1147 represented identically in all versions of EBCDIC. Other 1148 popular encodings, such as the encoding used by the uuencode 1149 utility, Macintosh binhex 4.0 [RFC-1741], and the base85 1150 encoding specified as part of Level 2 PostScript, do not share 1151 these properties, and thus do not fulfill the portability 1152 requirements a binary transport encoding for mail must meet. 1154 The encoding process represents 24-bit groups of input bits as 1155 output strings of 4 encoded characters. Proceeding from left 1156 to right, a 24-bit input group is formed by concatenating 3 1157 8bit input groups. These 24 bits are then treated as 4 1158 concatenated 6-bit groups, each of which is translated into a 1159 single digit in the base64 alphabet. When encoding a bit 1160 stream via the base64 encoding, the bit stream must be 1161 presumed to be ordered with the most-significant-bit first. 1162 That is, the first bit in the stream will be the high-order 1163 bit in the first 8bit byte, and the eighth bit will be the 1164 low-order bit in the first 8bit byte, and so on. 1166 Each 6-bit group is used as an index into an array of 64 1167 printable characters. The character referenced by the index 1168 is placed in the output string. These characters, identified 1169 in Table 1, below, are selected so as to be universally 1170 representable, and the set excludes characters with particular 1171 significance to SMTP (e.g., ".", CR, LF) and to the multipart 1172 boundary delimiters defined in MIME-IMT (e.g., "-"). 1174 Table 1: The Base64 Alphabet 1176 Value Encoding Value Encoding Value Encoding Value Encoding 1177 0 A 17 R 34 i 51 z 1178 1 B 18 S 35 j 52 0 1179 2 C 19 T 36 k 53 1 1180 3 D 20 U 37 l 54 2 1181 4 E 21 V 38 m 55 3 1182 5 F 22 W 39 n 56 4 1183 6 G 23 X 40 o 57 5 1184 7 H 24 Y 41 p 58 6 1185 8 I 25 Z 42 q 59 7 1186 9 J 26 a 43 r 60 8 1187 10 K 27 b 44 s 61 9 1188 11 L 28 c 45 t 62 + 1189 12 M 29 d 46 u 63 / 1190 13 N 30 e 47 v 1191 14 O 31 f 48 w (pad) = 1192 15 P 32 g 49 x 1193 16 Q 33 h 50 y 1195 The encoded output stream must be represented in lines of no 1196 more than 76 characters each. All line breaks or other 1197 characters not found in Table 1 must be ignored by decoding 1198 software. In base64 data, characters other than those in 1199 Table 1, line breaks, and other white space probably indicate 1200 a transmission error, about which a warning message or even a 1201 message rejection might be appropriate under some 1202 circumstances. 1204 Special processing is performed if fewer than 24 bits are 1205 available at the end of the data being encoded. A full 1206 encoding quantum is always completed at the end of a body. 1207 When fewer than 24 input bits are available in an input group, 1208 zero bits are added (on the right) to form an integral number 1209 of 6-bit groups. Padding at the end of the data is performed 1210 using the "=" character. Since all base64 input is an 1211 integral number of octets, only the following cases can arise: 1212 (1) the final quantum of encoding input is an integral 1213 multiple of 24 bits; here, the final unit of encoded output 1214 will be an integral multiple of 4 characters with no "=" 1215 padding, (2) the final quantum of encoding input is exactly 8 1216 bits; here, the final unit of encoded output will be two 1217 characters followed by two "=" padding characters, or (3) the 1218 final quantum of encoding input is exactly 16 bits; here, the 1219 final unit of encoded output will be three characters followed 1220 by one "=" padding character. 1222 Because it is used only for padding at the end of the data, 1223 the occurrence of any "=" characters may be taken as evidence 1224 that the end of the data has been reached (without truncation 1225 in transit). No such assurance is possible, however, when the 1226 number of octets transmitted was a multiple of three and no 1227 "=" characters are present. 1229 Any characters outside of the base64 alphabet are to be 1230 ignored in base64-encoded data. 1232 Care must be taken to use the proper octets for line breaks if 1233 base64 encoding is applied directly to text material that has 1234 not been converted to canonical form. In particular, text 1235 line breaks must be converted into CRLF sequences prior to 1236 base64 encoding. The important thing to note is that this may 1237 be done directly by the encoder rather than in a prior 1238 canonicalization step in some implementations. 1240 NOTE: There is no need to worry about quoting potential 1241 boundary delimiters within base64-encoded bodies within 1242 multipart entities because no hyphen characters are used in 1243 the base64 encoding. 1245 9. Content-ID Header Field 1247 In constructing a high-level user agent, it may be desirable 1248 to allow one body to make reference to another. Accordingly, 1249 bodies may be labelled using the "Content-ID" header field, 1250 which is syntactically identical to the "Message-ID" header 1251 field: 1253 id := "Content-ID" ":" msg-id 1255 Like the Message-ID values, Content-ID values must be 1256 generated to be world-unique. 1258 The Content-ID value may be used for uniquely identifying MIME 1259 entities in several contexts, particularly for caching data 1260 referenced by the message/external-body mechanism. Although 1261 the Content-ID header is generally optional, its use is 1262 MANDATORY in implementations which generate data of the 1263 optional MIME media type "message/external-body". That is, 1264 each message/external-body entity must have a Content-ID field 1265 to permit caching of such data. 1267 It is also worth noting that the Content-ID value has special 1268 semantics in the case of the multipart/alternative media type. 1269 This is explained in the section of MIME-IMT dealing with 1270 multipart/alternative. 1272 10. Content-Description Header Field 1274 The ability to associate some descriptive information with a 1275 given body is often desirable. For example, it may be useful 1276 to mark an "image" body as "a picture of the Space Shuttle 1277 Endeavor." Such text may be placed in the Content-Description 1278 header field. This header field is always optional. 1280 description := "Content-Description" ":" *text 1282 The description is presumed to be given in the US-ASCII 1283 character set, although the mechanism specified in RFC MIME- 1284 HEADERS [RFC-MIME-HEADERS] may be used for non-US-ASCII 1285 Content-Description values. 1287 11. Additional MIME Header Fields 1289 Future documents may elect to define additional MIME header 1290 fields for various purposes. Any new header field that 1291 further describes the content of a message should begin with 1292 the string "Content-" to allow such fields which appear in a 1293 message header to be distinguished from ordinary RFC 822 1294 message header fields. 1296 MIME-extension-field := 1300 12. Summary 1302 Using the MIME-Version, Content-Type, and Content-Transfer- 1303 Encoding header fields, it is possible to include, in a 1304 standardized way, arbitrary types of data with RFC 822 1305 conformant mail messages. No restrictions imposed by either 1306 RFC 821 or RFC 822 are violated, and care has been taken to 1307 avoid problems caused by additional restrictions imposed by 1308 the characteristics of some Internet mail transport mechanisms 1309 (see RFC MIME-CONF). 1311 The next document in this set, RFC MIME-IMT, specifies the 1312 initial set of media types that can be labelled and 1313 transported using these headers. 1315 13. Security Considerations 1317 Security issues are discussed in the second document in this 1318 set, RFC MIME-IMT. 1320 14. Authors' Addresses 1322 For more information, the authors of this document are best 1323 contacted via Internet mail: 1325 Nathaniel S. Borenstein 1326 First Virtual Holdings 1327 25 Washington Avenue 1328 Morristown, NJ 07960 1329 USA 1331 Email: nsb@nsb.fv.com 1332 Phone: +1 201 540 8967 1333 Fax: +1 201 993 3032 1335 Ned Freed 1336 Innosoft International, Inc. 1337 1050 East Garvey Avenue South 1338 West Covina, CA 91790 1339 USA 1341 Email: ned@innosoft.com 1342 Phone: +1 818 919 3600 1343 Fax: +1 818 919 3614 1345 MIME is a result of the work of the Internet Engineering Task 1346 Force Working Group on Email Extensions. The chairman of that 1347 group, Greg Vaudreuil, may be reached at: 1349 Gregory M. Vaudreuil 1350 Tigon Corporation 1351 17060 Dallas Parkway 1352 Dallas Texas, 75248 1354 Email: greg.vaudreuil@ons.octel.com 1355 Phone: +1 214 733 2722 1356 Appendix A -- Collected Grammar 1358 This appendix contains the complete BNF grammar for all the 1359 syntax specified by this document. 1361 By itself, however, this grammar is incomplete. It refers by 1362 name to several syntax rules that are defined by RFC 822. 1363 Rather than reproduce those definitions here, and risk 1364 unintentional differences between the two, this document 1365 simply refers the reader to RFC 822 for the remaining 1366 definitions. Wherever a term is undefined, it refers to the 1367 RFC 822 definition. 1369 attribute := token 1370 ; Matching of attributes 1371 ; is ALWAYS case-insensitive. 1373 composite-type := "message" / "multipart" / extension-token 1375 content := "Content-Type" ":" type "/" subtype 1376 *(";" parameter) 1377 ; Matching of media type and subtype 1378 ; is ALWAYS case-insensitive. 1380 description := "Content-Description" ":" *text 1382 discrete-type := "text" / "image" / "audio" / "video" / 1383 "application" / extension-token 1385 encoding := "Content-Transfer-Encoding" ":" mechanism 1387 entity-headers := [ content CRLF ] 1388 [ encoding CRLF ] 1389 [ id CRLF ] 1390 [ description CRLF ] 1391 *( MIME-extension-field CRLF ) 1393 extension-token := ietf-token / x-token 1394 hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1395 ; Octet must be used for characters > 127, =, 1396 ; SPACEs or TABs at the ends of lines, and is 1397 ; recommended for any character not listed in 1398 ; RFC MIME-CONF as "mail-safe". 1400 iana-token := 1404 ietf-token := 1408 id := "Content-ID" ":" msg-id 1410 mechanism := "7bit" / "8bit" / "binary" / 1411 "quoted-printable" / "base64" / 1412 ietf-token / x-token 1414 MIME-extension-field := 1418 MIME-message-headers := entity-headers 1419 fields 1420 version CRLF 1421 ; The ordering of the header 1422 ; fields implied by this BNF 1423 ; definition should be ignored. 1425 MIME-part-headers := entity-headers 1426 [fields] 1427 ; Any field not beginning with 1428 ; "content-" can have no defined 1429 ; meaning and should be ignored. 1430 ; The ordering of the header 1431 ; fields implied by this BNF 1432 ; definition should be ignored. 1434 parameter := attribute "=" value 1436 ptext := hex-octet / safe-char 1437 qp-line := *(qp-segment transport-padding CRLF) 1438 qp-part transport-padding 1440 qp-part := qp-section 1441 ; Maximum length of 76 characters 1443 qp-section := [*(ptext / SPACE / TAB) ptext] 1445 qp-segment := qp-section *(SPACE / TAB) "=" 1446 ; Maximum length of 76 characters 1448 quoted-printable := qp-line *(CRLF qp-line) 1450 safe-char := 1452 ; Characters not listed as "mail-safe" in 1453 ; RFC MIME-CONF are also not recommended. 1455 subtype := extension-token / iana-token 1457 token := 1* 1460 transport-padding := *LWSP-char 1461 ; Composers MUST NOT generate 1462 ; non-zero length transport 1463 ; padding, but receivers MUST 1464 ; be able to handle padding 1465 ; added by message transports. 1467 tspecials := "(" / ")" / "<" / ">" / "@" / 1468 "," / ";" / ":" / "\" / <"> 1469 "/" / "[" / "]" / "?" / "=" 1470 ; Must be in quoted-string, 1471 ; to use within parameter values 1473 type := discrete-type / composite-type 1475 value := token / quoted-string 1477 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 1479 x-token :=