idnits 2.17.00 (12 Aug 2021) /tmp/idnits5023/draft-ietf-822ext-mime-imb-02.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2022-05-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 409: '... in accordance with this document MUST...' RFC 2119 keyword, line 485: '...agents MUST include proper MIME labell...' RFC 2119 keyword, line 659: '... standard values MUST be documented, r...' RFC 2119 keyword, line 945: '... MAY be represented as the US-AS...' RFC 2119 keyword, line 950: '... Octets with values of 9 and 32 MAY be...' (6 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1424 has weird spacing: '... no inter...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 11, 1995) is 9901 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC-822' on line 122 looks like a reference -- Missing reference section? 'RFC-821' on line 127 looks like a reference -- Missing reference section? 'RFC821' on line 342 looks like a reference -- Missing reference section? 'ATK' on line 151 looks like a reference -- Missing reference section? 'X400' on line 156 looks like a reference -- Missing reference section? 'RFC-1049' on line 187 looks like a reference -- Missing reference section? 'RFC-1123' on line 223 looks like a reference -- Missing reference section? 'RFC-1344' on line 227 looks like a reference -- Missing reference section? 'RFC-1345' on line 228 looks like a reference -- Missing reference section? 'RFC-1524' on line 228 looks like a reference -- Missing reference section? 'RFC-MIME-REG' on line 661 looks like a reference -- Missing reference section? 'RFC-1652' on line 760 looks like a reference -- Missing reference section? 'RFC-1421' on line 1109 looks like a reference -- Missing reference section? 'RFC-MIME-HEADERS' on line 1256 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Nathaniel Borenstein 2 Internet Draft Ned Freed 3 5 Multipurpose Internet Mail Extensions 6 (MIME) Part One: 8 Format of Internet Message Bodies 10 April 11, 1995 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are 15 working documents of the Internet Engineering Task Force 16 (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months. Internet-Drafts may be updated, replaced, or obsoleted 22 by other documents at any time. It is not appropriate to use 23 Internet-Drafts as reference material or to cite them other 24 than as a "working draft" or "work in progress". 26 To learn the current status of any Internet-Draft, please 27 check the 1id-abstracts.txt listing contained in the 28 Internet-Drafts Shadow Directories on ds.internic.net (US East 29 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), 30 or munnari.oz.au (Pacific Rim). 32 1. Abstract 34 STD 11, RFC 822, defines a message representation protocol 35 specifying considerable detail about US-ASCII message headers, 36 and leaves the message content, or message body, as flat US- 37 ASCII text. This set of documents, collectively called the 38 Multipurpose Internet Mail Extensions, or MIME, redefines the 39 format of messages to allow for 40 (1) textual message bodies in character sets other than 41 US-ASCII, 43 (2) non-textual message bodies, 45 (3) multi-part message bodies, and 47 (4) textual header information in character sets other than 48 US-ASCII. 50 These documents are based on earlier work documented in RFC 51 934, STD 11, and RFC 1049, but extends and revises them. 52 Because RFC 822 said so little about message bodies, these 53 documents are largely orthogonal to (rather than a revision 54 of) RFC 822. 56 In particular, these documents are designed to provide 57 facilities to include multiple parts in a single message, to 58 represent body and header text in character sets other than 59 US-ASCII, to represent formatted multi-font text messages, to 60 represent non-textual material such as images and audio 61 fragments, and generally to facilitate later extensions 62 defining new types of Internet mail for use by cooperating 63 mail agents. 65 This initial document specifies the various headers used to 66 describe the structure of MIME messages. The second document, 67 RFC MIME-IMT, defines the general structure of the MIME media 68 typing system and defines an initial set of media types. The 69 third document, RFC MIME-HEADERS, describes extensions to RFC 70 822 to allow non-US-ASCII text data in Internet mail header 71 fields. The fourth document, RFC MIME-REG, specifies various 72 IANA registration procedures for MIME-related entities. The 73 fifth and final document, RFC MIME-CONF, describes MIME 74 conformance conformance criteria as well as providing some 75 illustrative examples of MIME message formats, 76 acknowledgements, and the bibliography. 78 These documents are revisions of RFCs 1521, 1522, and 1590, 79 which themselves were revisions of RFCs 1341 and 1342. An 80 appendix in RFC MIME-CONF describes differences and changes 81 from previous versions. 83 2. Table of Contents 85 1 Abstract .............................................. 1 86 2 Table of Contents ..................................... 3 87 3 Introduction .......................................... 4 88 4 Notations, Conventions, and Generic BNF Grammar ....... 6 89 4.1 CRLF ................................................ 7 90 4.2 Character Set ....................................... 7 91 4.3 Message ............................................. 8 92 4.4 Body Part ........................................... 8 93 4.5 Entity .............................................. 8 94 4.6 Body ................................................ 8 95 4.7 7bit Data ........................................... 8 96 4.8 8bit Data ........................................... 9 97 4.9 Binary Data ......................................... 9 98 4.10 Lines .............................................. 9 99 5 MIME Header Fields .................................... 9 100 6 MIME-Version Header Field ............................. 10 101 7 Content-Type Header Field ............................. 12 102 7.1 Syntax of the Content-Type Header Field ............. 14 103 7.2 Content-Type Defaults ............................... 16 104 8 Content-Transfer-Encoding Header Field ................ 17 105 8.1 Content-Transfer-Encoding Syntax .................... 17 106 8.2 Content-Transfer-Encodings Sematics ................. 17 107 8.3 New Content-Transfer-Encodings ...................... 19 108 8.4 Interpretation and Use .............................. 19 109 8.5 Translating Encodings ............................... 21 110 8.6 Canonical Encoding Model ............................ 22 111 8.7 Quoted-Printable Content-Transfer-Encoding .......... 22 112 8.8 Base64 Content-Transfer-Encoding .................... 26 113 9 Content-ID Header Field ............................... 29 114 10 Content-Description Header Field ..................... 29 115 11 Additional MIME Header Fields ........................ 30 116 12 Summary .............................................. 30 117 13 Security Considerations .............................. 30 118 14 Authors' Addresses ................................... 31 119 A Collected Grammar ..................................... 32 120 3. Introduction 122 Since its publication in 1982, RFC 822 [RFC-822] has defined 123 the standard format of textual mail messages on the Internet. 124 Its success has been such that the RFC 822 format has been 125 adopted, wholly or partially, well beyond the confines of the 126 Internet and the Internet SMTP transport defined by RFC 821 127 [RFC-821]. As the format has seen wider use, a number of 128 limitations have proven increasingly restrictive for the user 129 community. 131 RFC 822 was intended to specify a format for text messages. 132 As such, non-text messages, such as multimedia messages that 133 might include audio or images, are simply not mentioned. Even 134 in the case of text, however, RFC 822 is inadequate for the 135 needs of mail users whose languages require the use of 136 character sets richer than US-ASCII. Since RFC 822 does not 137 specify mechanisms for mail containing audio, video, Asian 138 language text, or even text in most European languages, 139 additional specifications are needed. 141 One of the notable limitations of RFC 821/822 based mail 142 systems is the fact that they limit the contents of electronic 143 mail messages to relatively short lines (e.g. 1000 characters 144 or less [RFC821]) of 7-bit US-ASCII. This forces users to 145 convert any non-textual data that they may wish to send into 146 seven-bit bytes representable as printable US-ASCII characters 147 before invoking a local mail UA (User Agent, a program with 148 which human users send and receive mail). Examples of such 149 encodings currently used in the Internet include pure 150 hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in 151 RFC 1421, the Andrew Toolkit Representation [ATK], and many 152 others. 154 The limitations of RFC 822 mail become even more apparent as 155 gateways are designed to allow for the exchange of mail 156 messages between RFC 822 hosts and X.400 hosts. X.400 [X400] 157 specifies mechanisms for the inclusion of non-textual body 158 parts within electronic mail messages. The current standards 159 for the mapping of X.400 messages to RFC 822 messages specify 160 either that X.400 non-textual body parts must be converted to 161 (not encoded in) IA5Text format, or that they must be 162 discarded, notifying the RFC 822 user that discarding has 163 occurred. This is clearly undesirable, as information that a 164 user may wish to receive is lost. Even though a user agent 165 may not have the capability of dealing with the non-textual 166 body part, the user might have some mechanism external to the 167 UA that can extract useful information from the body part. 168 Moreover, it does not allow for the fact that the message may 169 eventually be gatewayed back into an X.400 message handling 170 system (i.e., the X.400 message is "tunneled" through Internet 171 mail), where the non-textual information would definitely 172 become useful again. 174 This document describes several mechanisms that combine to 175 solve most of these problems without introducing any serious 176 incompatibilities with the existing world of RFC 822 mail. In 177 particular, it describes: 179 (1) A MIME-Version header field, which uses a version 180 number to declare a message to be conformant with this 181 specification and allows mail processing agents to 182 distinguish between such messages and those generated 183 by older or non-conformant software, which are presumed 184 to lack such a field. 186 (2) A Content-Type header field, generalized from RFC 1049 187 [RFC-1049], which can be used to specify the media type 188 and subtype of data in the body of a message and to 189 fully specify the native representation (canonical 190 form) of such data. 192 (3) A Content-Transfer-Encoding header field, which can be 193 used to specify an auxiliary encoding that was applied 194 to the data in order to allow it to pass through mail 195 transport mechanisms which may have data or character 196 set limitations. 198 (4) Two additional header fields that can be used to 199 further describe the data in a body, the Content-ID and 200 Content-Description header fields. 202 All of the header fields defined in this document are subject 203 to the general syntactic rules for header fields specified in 204 RFC 822. In particular, all of these header fields can 205 include RFC 822 comments, which have no semantic content and 206 should be ignored during MIME processing. 208 Finally, to specify and promote interoperability, RFC MIME- 209 CONF provides a basic applicability statement for a subset of 210 the above mechanisms that defines a minimal level of 211 "conformance" with this document. 213 HISTORICAL NOTE: Several of the mechanisms described in this 214 document may seem somewhat strange or even baroque at first 215 reading. It is important to note that compatibility with 216 existing standards AND robustness across existing practice 217 were two of the highest priorities of the working group that 218 developed this document. In particular, compatibility was 219 always favored over elegance. 221 Please refer to the current edition of the "IAB Official 222 Protocol Standards" for the standardization state and status 223 of this protocol. RFC 822 and RFC 1123 [RFC-1123] also 224 provide essential background for MIME since no conforming 225 implementation of MIME can violate them. In addition, several 226 other informational RFC documents will be of interest to the 227 MIME implementor, in particular RFC 1344 [RFC-1344], RFC 1345 228 [RFC-1345], and RFC 1524 [RFC-1524]. 230 4. Notations, Conventions, and Generic BNF Grammar 232 Although the mechanisms specified in this document are all 233 described in prose, most are also described formally in the 234 augmented BNF notation of RFC 822. Implementors will need to 235 be familiar with this notation in order to understand this 236 specification, and are referred to RFC 822 for a complete 237 explanation of the augmented BNF notation. 239 Some of the augmented BNF in this document makes reference to 240 syntactic entities that are defined in RFC 822 and not in this 241 document. A complete formal grammar, then, is obtained by 242 combining Appendix A of this document, the collected grammar, 243 with the BNF of RFC 822 plus the modifications to RFC 822 244 defined in RFC 1123 (which specifically changes the syntax for 245 `return', `date' and `mailbox'). 247 In this document, all numeric and octet values are given in 248 decimal notation. All media type values, subtype values, and 249 parameter names as defined in this document are case- 250 insensitive. However, parameter values are case-sensitive 251 unless otherwise specified for the specific parameter. 253 FORMATTING NOTE: Notes, such at this one, provide additional 254 nonessential information which may be skipped by the reader 255 without missing anything essential. The primary purpose of 256 these non-essential notes is to convey information about the 257 rationale of this document, or to place this document in the 258 proper historical or evolutionary context. Such information 259 may in particular be skipped by those who are focused entirely 260 on building a conformant implementation, but may be of use to 261 those who wish to understand why certain design choices were 262 made. 264 4.1. CRLF 266 The term CRLF, in this document, refers to the sequence of 267 octets corresponding to the two US-ASCII characters CR 268 (decimal value 13) and LF (decimal value 10) which, taken 269 together, in this order, denote a line break in RFC 822 mail. 271 4.2. Character Set 273 The term "character set" is used in this document to refer to 274 a table-based method of converting a sequence of octets into a 275 sequence of characters. Note that unconditional and 276 unambiguous conversion in the other direction is not required, 277 in that not all characters may be available in a given 278 character set and a character set may provide more than one 279 sequence of octets to represent a particular character. This 280 definition is intended to allow various kinds of character 281 encodings, from simple single-table mappings such as US-ASCII 282 to complex table switching methods such as those that use ISO 283 2022's techniques. However, the definition associated with a 284 MIME character set name must fully specify the mapping to be 285 performed from octets to characters. In particular, use of 286 external profiling information to determine the exact mapping 287 is not permitted. 289 HISTORICAL NOTE: The term "character set" originated in the 290 definition of US-ASCII and similar 7-bit and 8-bit 291 specifications. These define true sets. However, the advent 292 of multi-octet character encodings and switching techniques 293 have transformed character sets into entities that properly 294 speaking are no longer strictly sets. Some other communities 295 have adopted the term "character encoding" for what MIME calls 296 a "character set" as a result. 298 4.3. Message 300 The term "message", when not further qualified, means either 301 the (complete or "top-level") message being transferred on a 302 network, or a message encapsulated in a body part of type 303 "message". 305 4.4. Body Part 307 The term "body part", in this document, refers to either a 308 single part message or one of the parts in the body of a 309 multipart entity. A body part has a header and a body, so it 310 makes sense to speak about the body of a body part. 312 4.5. Entity 314 The term "entity", in this document, means either a message or 315 a body part. All kinds of entities share the property that 316 they have a header and a body. 318 4.6. Body 320 The term "body", when not further qualified, means the body of 321 an entity, that is the body of either a message or of a body 322 part. 324 NOTE: The previous four definitions are clearly circular. 325 This is unavoidable, since the overall structure of a MIME 326 message is indeed recursive. 328 4.7. 7bit Data 330 "7bit data" refers to data that is all represented as 331 relatively short lines (e.g. 1000 octets or less between CRLF 332 line separation sequences [RFC821]). No octets with decimal 333 values greater than 127 are allowed and neither are NULs 334 (octets with decimal value 0). CR (decimal value 13) and LF 335 (decimal value 10) octets only occur as part of CRLF line 336 separation sequences. 338 4.8. 8bit Data 340 "8bit data" refers to data that is all represented as 341 relatively short lines (e.g. 1000 octets or less between CRLF 342 line separation sequences [RFC821]), but characters with 343 decimal values greater than 127 may be used. As with "7bit 344 data" CR and LF octets only occur as part of CRLF line 345 separation sequences and no NULs are allowed. 347 4.9. Binary Data 349 "Binary data" refers to data where any sequence of octets 350 whatsoever is allowed. 352 4.10. Lines 354 "Lines" are defined as sequences of octets separated by a CRLF 355 sequences. This is consistent with both RFC 821 and RFC 822. 357 5. MIME Header Fields 359 MIME defines a number of new RFC 822 header fields that are 360 used to describe the content of messages. These header fields 361 occur in two contexts: 363 (1) As part of a regular RFC 822 message header. 365 (2) In a MIME body part header within a multipart 366 construct. 368 The formal definition of these header fields is as follows: 370 MIME-message-headers := fields 371 version CRLF 372 [ content CRLF ] 373 [ encoding CRLF ] 374 [ id CRLF ] 375 [ description CRLF ] 376 *( mime-extension-field CRLF ) 377 ; The ordering of the header 378 ; fields implied by this BNF 379 ; definition should be ignored 381 MIME-part-headers := [ content CRLF ] 382 [ encoding CRLF ] 383 [ id CRLF ] 384 [ description CRLF ] 385 *( mime-extension-field CRLF ) 386 ; The ordering of the header 387 ; fields implied by this BNF 388 ; definition should be ignored 390 The syntax of the various specific MIME header fields will be 391 described in the following sections. 393 6. MIME-Version Header Field 395 Since RFC 822 was published in 1982, there has really been 396 only one format standard for Internet messages, and there has 397 been little perceived need to declare the format standard in 398 use. This document is an independent document that 399 complements RFC 822. Although the extensions in this document 400 have been defined in such a way as to be compatible with RFC 401 822, there are still circumstances in which it might be 402 desirable for a mail-processing agent to know whether a 403 message was composed with the new standard in mind. 405 Therefore, this document defines a new header field, "MIME- 406 Version", which is to be used to declare the version of the 407 Internet message body format standard in use. 409 Messages composed in accordance with this document MUST 410 include such a header field, with the following verbatim text: 412 MIME-Version: 1.0 414 The presence of this header field is an assertion that the 415 message has been composed in compliance with this document. 417 Since it is possible that a future document might extend the 418 message format standard again, a formal BNF is given for the 419 content of the MIME-Version field: 421 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 423 Thus, future format specifiers, which might replace or extend 424 "1.0", are constrained to be two integer fields, separated by 425 a period. If a message is received with a MIME-version value 426 other than "1.0", it cannot be assumed to conform with this 427 specification. 429 Note that the MIME-Version header field is required at the top 430 level of a message. It is not required for each body part of 431 a multipart entity. It is required for the embedded headers 432 of a body of type "message" if and only if the embedded 433 message is itself claimed to be MIME-conformant. 435 It is not possible to fully specify how a mail reader that 436 conforms with MIME as defined in this document should treat a 437 message that might arrive in the future with some value of 438 MIME-Version other than "1.0". 440 It is also worth noting that version control for specific 441 media types is not accomplished using the MIME-Version 442 mechanism. In particular, some formats (such as 443 application/postscript) have version numbering conventions 444 that are internal to the document format. Where such 445 conventions exist, MIME does nothing to supersede them. Where 446 no such conventions exist, a MIME media type might use a 447 "version" parameter in the content-type field if necessary. 449 NOTE TO IMPLEMENTORS: When checking MIME-Version values any 450 RFC 822 comment strings that are present must be ignored. In 451 particular, the following four MIME-Version fields are 452 equivalent: 454 MIME-Version: 1.0 456 MIME-Version: 1.0 (produced by MetaSend Vx.x) 458 MIME-Version: (produced by MetaSend Vx.x) 1.0 459 MIME-Version: 1.(produced by MetaSend Vx.x)0 461 In the absence of a MIME-Version field, a receiving user agent 462 (whether MIME compliant or not) may optionally choose to 463 interpret the body of the message according to local 464 conventions. Many such conventions are currently in use and 465 it should be noted that in practice non-MIME messages can 466 contain just about anything. 468 It is impossible to be certain that a non-MIME message is 469 actually plain text in the US-ASCII character set since it 470 might well be a message that, using some set of nonstandard 471 local conventions that predate this document, includes text in 472 another character set or non-textual data presented in a 473 manner that cannot be automatically recognized (e.g., a 474 uuencoded compressed UNIX tar file). 476 MIME-compliant user agents are required, if they support any 477 such nonstandard conventions at all, to do so on received 478 messages only -- they must not send non-MIME messages 479 containing anything other than US-ASCII text. 481 In particular, the use of non-US-ASCII text in messages 482 without a MIME-Version field is strongly discouraged as it 483 impedes interoperability when sending messages between regions 484 with different localization conventions. MIME-compliant user 485 agents MUST include proper MIME labelling when sending 486 anything other than plain text in the US-ASCII character set. 488 In addition, non-MIME user agents should be upgraded if at all 489 possible to include appropriate MIME header information in the 490 messages they send even if nothing else in MIME is supported. 491 This upgrade will have little, if any, effect on non-MIME 492 recipients and will aid MIME in correctly displaying such 493 messages. It also provides a smooth transition path to 494 eventual adoption of other MIME capabilities. 496 7. Content-Type Header Field 498 The purpose of the Content-Type field is to describe the data 499 contained in the body fully enough that the receiving user 500 agent can pick an appropriate agent or mechanism to present 501 the data to the user, or otherwise deal with the data in an 502 appropriate manner. The value in this field is called a media 503 type. 505 HISTORICAL NOTE: The Content-Type header field was first 506 defined in RFC 1049. RFC 1049 used a simpler and less 507 powerful syntax, but one that is largely compatible with the 508 mechanism given here. 510 The Content-Type header field specifies the nature of the data 511 in the body of an entity by giving media type and subtype 512 identifiers, and by providing auxiliary information that may 513 be required for certain media types. After the media type and 514 subtype names, the remainder of the header field is simply a 515 set of parameters, specified in an attribute=value notation. 516 The ordering of parameters is not significant. 518 In general, the top-level media type is used to declare the 519 general type of data, while the subtype specifies a specific 520 format for that type of data. Thus, a media type of 521 "image/xyz" is enough to tell a user agent that the data is an 522 image, even if the user agent has no knowledge of the specific 523 image format "xyz". Such information can be used, for 524 example, to decide whether or not to show a user the raw data 525 from an unrecognized subtype -- such an action might be 526 reasonable for unrecognized subtypes of text, but not for 527 unrecognized subtypes of image or audio. For this reason, 528 registered subtypes of text, image, audio, and video should 529 not contain embedded information that is really of a different 530 type. Such compound formats should be represented using the 531 "multipart" or "application" types. 533 Parameters are modifiers of the media subtype, and as such do 534 not fundamentally affect the nature of the content. The set 535 of meaningful parameters depends on the media type and 536 subtype. Most parameters are associated with a single 537 specific subtype. However, a given top-level media type may 538 define parameters which are applicable to any subtype of that 539 type. Parameters may be required by their defining content 540 type or subtype or they may be optional. MIME implementations 541 must ignore any parameters whose names they do not recognize. 543 For example, the "charset" parameter is applicable to any 544 subtype of "text", while the "boundary" parameter is required 545 for any subtype of the "multipart" media type. 547 There are NO globally-meaningful parameters that apply to all 548 media types. Truly global mechanisms are best addressed, in 549 the MIME model, by the definition of additional Content-* 550 header fields. 552 An initial set of seven top-level media types is defined by 553 this document. Five of these are discrete types whose content 554 is essentially opaque as far as MIME processing is concerned. 555 The remaining two are composite types whose contents require 556 additional handling by MIME processors. 558 This set of top-level media types is intended to be 559 substantially complete. It is expected that additions to the 560 larger set of supported types can generally be accomplished by 561 the creation of new subtypes of these initial types. In the 562 future, more top-level types may be defined only by a 563 standards-track extension to this standard. If another top- 564 level type is to be used for any reason, it must be given a 565 name starting with "X-" to indicate its non-standard status 566 and to avoid a potential conflict with a future official name. 568 7.1. Syntax of the Content-Type Header Field 570 In the Augmented BNF notation of RFC 822, a Content-Type 571 header field value is defined as follows: 573 content := "Content-Type" ":" type "/" subtype 574 *(";" parameter) 575 ; Matching of media type and subtype 576 ; is ALWAYS case-insensitive 578 type := discrete-type / composite-type 580 discrete-type := "text" / "image" / "audio" / "video" / 581 "application" / extension-token 583 composite-type := "message" / "multipart" / extension-token 585 extension-token := iana-token / ietf-token / x-token 587 iana-token := 591 ietf-token := 595 x-token := 598 subtype := extension-token 600 parameter := attribute "=" value 602 attribute := token 604 value := token / quoted-string 606 token := 1* 609 tspecials := "(" / ")" / "<" / ">" / "@" / 610 "," / ";" / ":" / "\" / <"> 611 "/" / "[" / "]" / "?" / "=" 612 ; Must be in quoted-string, 613 ; to use within parameter values 615 Note that the definition of "tspecials" is the same as the RFC 616 822 definition of "specials" with the addition of the three 617 characters "/", "?", and "=", and the removal of ".". 619 Note also that a subtype specification is MANDATORY -- it may 620 not be omitted from a Content-Type header field. As such, 621 there are no default subtypes. 623 The type, subtype, and parameter names are not case sensitive. 624 For example, TEXT, Text, and TeXt are all equivalent top-level 625 media types. Parameter values are normally case sensitive, 626 but sometimes are interpreted in a case-insensitive fashion, 627 depending on the intended use. (For example, multipart 628 boundaries are case-sensitive, but the "access-type" parameter 629 for message/External-body is not case-sensitive.) 631 Note that the value of a quoted string parameter does not 632 include the quotes. That is, the quotation marks in a 633 quoted-string are not a part of the value of the parameter, 634 but are merely used to delimit that parameter value. In 635 addition, comments are allowed in accordance with RFC 822 636 rules for structured header fields. Thus the following two 637 forms 639 Content-type: text/plain; charset=us-ascii (Plain text) 641 Content-type: text/plain; charset="us-ascii" 643 are completely equivalent. 645 Beyond this syntax, the only syntactic constraint on the 646 definition of subtype names is the desire that their uses must 647 not conflict. That is, it would be undesirable to have two 648 different communities using "Content-Type: application/foobar" 649 to mean two different things. The process of defining new 650 media subtypes, then, is not intended to be a mechanism for 651 imposing restrictions, but simply a mechanism for publicizing 652 their definition and usage. There are, therefore, two 653 acceptable mechanisms for defining new media subtypes: 655 (1) Private values (starting with "X-") may be defined 656 bilaterally between two cooperating agents without 657 outside registration or standardization. 659 (2) New standard values MUST be documented, registered 660 with, and approved by IANA, as described in RFC MIME- 661 REG [RFC-MIME-REG]. 663 The second document in this set, RFC MIME-IMT, defines the 664 initial set of media types for MIME. 666 7.2. Content-Type Defaults 668 Default RFC 822 messages without a MIME Content-Type header 669 are taken by this protocol to be plain text in the US-ASCII 670 character set, which can be explicitly specified as: 672 Content-type: text/plain; charset=us-ascii 674 This default is assumed if no Content-Type header field is 675 specified. In the presence of a MIME-Version header field, a 676 receiving User Agent can also assume that plain US-ASCII text 677 was the sender's intent. Plain US-ASCII text may still be 678 assumed in the absence of a MIME-Version specification, but 679 the sender's intent might have been otherwise. 681 8. Content-Transfer-Encoding Header Field 683 Many media types which could be usefully transported via email 684 are represented, in their "natural" format, as 8-bit character 685 or binary data. Such data cannot be transmitted over some 686 transfer protocols. For example, RFC 821 (SMTP) restricts 687 mail messages to 7-bit US-ASCII data with lines no longer than 688 1000 characters. 690 It is necessary, therefore, to define a standard mechanism for 691 encoding such data into a 7-bit short line format. Proper 692 labelling of unencoded material in less restrictive formats 693 for direct use over less restrictive transports is also 694 desireable. This document specifies that such encodings will 695 be indicated by a new "Content-Transfer-Encoding" header 696 field. This field has not been defined by any previous 697 standard. 699 8.1. Content-Transfer-Encoding Syntax 701 The Content-Transfer-Encoding field's value is a single token 702 specifying the type of encoding, as enumerated below. 703 Formally: 705 encoding := "Content-Transfer-Encoding" ":" mechanism 707 mechanism := "7bit" / "8bit" / "binary" / 708 "quoted-printable" / "base64" / 709 ietf-token / x-token 711 These values are not case sensitive -- Base64 and BASE64 and 712 bAsE64 are all equivalent. An encoding type of 7BIT requires 713 that the body is already in a 7-bit mail-ready representation. 714 This is the default value -- that is, "Content-Transfer- 715 Encoding: 7BIT" is assumed if the Content-Transfer-Encoding 716 header field is not present. 718 8.2. Content-Transfer-Encodings Sematics 720 This single Content-Transfer-Encoding token actually provides 721 two pieces of information. It specifies what sort of encoding 722 transformation the body was subjected to, and it specifies 723 what the domain of the result is. 725 Three transformations are currently defined: identity, the 726 "quoted-printable" encoding, and the "base64" encoding. The 727 domains are "binary", "8bit" and "7bit". 729 The Content-Transfer-Encoding values "7bit", "8bit", and 730 "binary" all mean that the identity (i.e. NO) encoding 731 transformation has been performed. As such, they serve simply 732 as indicators of the domain of the body part data, and provide 733 useful information about the sort of encoding that might be 734 needed for transmission in a given transport system. The 735 terms "7bit data", "8bit data", and "binary data" are all 736 defined in Section 4. 738 The quoted-printable and base64 encodings transform their 739 input from an arbitrary domain into material in the "7bit" 740 range, thus making it safe to carry over restricted 741 transports. The specific definition of the transformations 742 are given below. 744 The proper Content-Transfer-Encoding label must always be 745 used. Labelling unencoded data containing 8-bit characters as 746 "7bit" is not allowed, nor is labelling unencoded non-line- 747 oriented data as anything other than "binary" allowed. 749 Unlike media subtypes, a proliferation of Content-Transfer- 750 Encoding values is both undesirable and unnecessary. However, 751 establishing only a single transformation into the "7bit" 752 domain does not seem possible. There is a tradeoff between 753 the desire for a compact and efficient encoding of largely- 754 binary data and the desire for a readable encoding of data 755 that is mostly, but not entirely, 7-bit. For this reason, at 756 least two encoding mechanisms are necessary: a "readable" 757 encoding (quoted-printable) and a "dense" encoding (base64). 759 Mail transport for unencoded 8-bit data is defined in RFC 1652 760 [RFC-1652]. As of the publication of this document, there are 761 no standardized Internet mail transports for which it is 762 legitimate to include unencoded binary data in mail bodies. 763 Thus there are no circumstances in which the "binary" 764 Content-Transfer-Encoding is actually valid in Internet mail. 765 However, in the event that binary mail transport becomes a 766 reality in Internet mail, or when this document is used in 767 conjunction with any other binary-capable transport mechanism, 768 binary bodies should be labelled as such using this mechanism. 770 NOTE: The five values defined for the Content-Transfer- 771 Encoding field imply nothing about the media type other than 772 the algorithm by which it was encoded or the transport system 773 requirements if unencoded. 775 8.3. New Content-Transfer-Encodings 777 Implementors may, if necessary, define private Content- 778 Transfer-Encoding values, but must use an x-token, which is a 779 name prefixed by "X-", to indicate its non-standard status, 780 e.g., "Content-Transfer-Encoding: x-my-new-encoding". 781 Additional standardized Content-Transfer-Encoding values must 782 be specified by a standards-track RFC. Additional 783 requirements such specifications must meet are given in RFC 784 REG. As such, all content-transfer-encoding namespace except 785 that beginning with "X-" is explicitly reserved to the IANA 786 for future use. 788 Unlike media types and subtypes, the creation of new Content- 789 Transfer-Encoding values is STRONGLY discouraged, as it seems 790 likely to hinder interoperability with little potential 791 benefit 793 8.4. Interpretation and Use 795 If a Content-Transfer-Encoding header field appears as part of 796 a message header, it applies to the entire body of that 797 message. If a Content-Transfer-Encoding header field appears 798 as part of a body part's headers, it applies only to the body 799 of that body part. If an entity is of type "multipart" the 800 Content-Transfer-Encoding is not permitted to have any value 801 other than "7bit", "8bit" or "binary". Even more severe 802 restrictions apply to some subtypes of the "message" type. 804 It should be noted that most media types are defined in terms 805 of octets rather than bits, so that the mechanisms described 806 here are mechanisms for encoding arbitrary octet streams, not 807 bit streams. If a bit stream is to be encoded via one of 808 these mechanisms, it must first be converted to an 8-bit byte 809 stream using the network standard bit order ("big-endian"), in 810 which the earlier bits in a stream become the higher-order 811 bits in a 8-bit byte. A bit stream not ending at an 8-bit 812 boundary must be padded with zeroes. This document provides a 813 mechanism for noting the addition of such padding in the case 814 of the application/octet-stream media type, which has a 815 "padding" parameter. 817 The encoding mechanisms defined here explicitly encode all 818 data in US-ASCII. Thus, for example, suppose an entity has 819 header fields such as: 821 Content-Type: text/plain; charset=ISO-8859-1 822 Content-transfer-encoding: base64 824 This must be interpreted to mean that the body is a base64 825 US-ASCII encoding of data that was originally in ISO-8859-1, 826 and will be in that character set again after decoding. 828 Certain Content-Transfer-Encoding values may only be used on 829 certain media types. In particular, it is EXPRESSLY FORBIDDEN 830 to use any encodings other than "7bit", "8bit", or "binary" 831 with any composite media type, i.e. one that recursively 832 includes other Content-Type fields. Currently the only 833 composite media types are "multipart" and "message". All 834 encodings that are desired for bodies of type multipart or 835 message must be done at the innermost level, by encoding the 836 actual body that needs to be encoded. 838 It should also be noted that, by definition, if a composite 839 entity has a transfer-encoding value such as "7bit", but one 840 of the enclosed parts has a less restrictive value such as 841 "8bit", then either the outer "7bit" labelling is in error, 842 because 8-bit data are included, or the inner "8bit" labelling 843 placed an unnecessarily high demand on the transport system 844 because the actual included data were actually 7-bit-safe. 846 NOTE ON ENCODING RESTRICTIONS: Though the prohibition against 847 using content-transfer-encodings on composite body data may 848 seem overly restrictive, it is necessary to prevent nested 849 encodings, in which data are passed through an encoding 850 algorithm multiple times, and must be decoded multiple times 851 in order to be properly viewed. Nested encodings add 852 considerable complexity to user agents: Aside from the 853 obvious efficiency problems with such multiple encodings, they 854 can obscure the basic structure of a message. In particular, 855 they can imply that several decoding operations are necessary 856 simply to find out what types of bodies a message contains. 858 Banning nested encodings may complicate the job of certain 859 mail gateways, but this seems less of a problem than the 860 effect of nested encodings on user agents. 862 NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT- 863 TRANSFER-ENCODING: It may seem that the Content-Transfer- 864 Encoding could be inferred from the characteristics of the 865 media that is to be encoded, or, at the very least, that 866 certain Content-Transfer-Encodings could be mandated for use 867 with specific media types. There are several reasons why this 868 is not the case. First, given the varying types of transports 869 used for mail, some encodings may be appropriate for some 870 combinations of media types and transports but not for others. 871 (For example, in an 8-bit transport, no encoding would be 872 required for text in certain character sets, while such 873 encodings are clearly required for 7-bit SMTP.) 875 Second, certain media types may require different types of 876 transfer encoding under different circumstances. For example, 877 many PostScript bodies might consist entirely of short lines 878 of 7-bit data and hence require no encoding at all. Other 879 PostScript bodies (especially those using Level 2 PostScript's 880 binary encoding mechanism) may only be reasonably represented 881 using a binary transport encoding. Finally, since the 882 Content-Type field is intended to be an open-ended 883 specification mechanism, strict specification of an 884 association between media types and encodings effectively 885 couples the specification of an application protocol with a 886 specific lower-level transport. This is not desirable since 887 the developers of a media type should not have to be aware of 888 all the transports in use and what their limitations are. 890 8.5. Translating Encodings 892 The quoted-printable and base64 encodings are designed so that 893 conversion between them is possible. The only issue that 894 arises in such a conversion is the handling of line breaks. 895 When converting from quoted-printable to base64 a line break 896 must be converted into a CRLF sequence. Similarly, a CRLF 897 sequence in base64 data must be converted to a quoted- 898 printable line break, but ONLY when converting text data. 900 8.6. Canonical Encoding Model 902 There was some confusion, in the predecessors of this RFC, 903 regarding the model for when email data was to be converted to 904 canonical form and encoded, and in particular how this process 905 would affect the treatment of CRLFs, given that the 906 representation of newlines varies greatly from system to 907 system, and the relationship between content-transfer- 908 encodings and character sets. A canonical model for encoding 909 is presented in RFC MIME-CONF for this reason. 911 8.7. Quoted-Printable Content-Transfer-Encoding 913 The Quoted-Printable encoding is intended to represent data 914 that largely consists of octets that correspond to printable 915 characters in the US-ASCII character set. It encodes the data 916 in such a way that the resulting octets are unlikely to be 917 modified by mail transport. If the data being encoded are 918 mostly US-ASCII text, the encoded form of the data remains 919 largely recognizable by humans. A body which is entirely US- 920 ASCII may also be encoded in Quoted-Printable to ensure the 921 integrity of the data should the message pass through a 922 character-translating, and/or line-wrapping gateway. 924 In this encoding, octets are to be represented as determined 925 by the following rules: 927 (1) (General 8-bit representation) Any octet, except a CR 928 or LF that is part of a CRLF line break of the 929 canonical (standard) form of the data being encoded, 930 may be represented by an "=" followed by a two digit 931 hexadecimal representation of the octet's value. The 932 digits of the hexadecimal alphabet, for this purpose, 933 are "0123456789ABCDEF". Uppercase letters must be used 934 when sending hexadecimal data, though a robust 935 implementation may choose to recognize lowercase 936 letters on receipt. Thus, for example, the decimal 937 value 12 (US-ASCII form feed) can be represented by 938 "=0C", and the decimal value 61 (US-ASCII EQUAL SIGN) 939 can be represented by "=3D". This rule must be 940 followed except when the following rules allow an 941 alternative encoding. 943 (2) (Literal representation) Octets with decimal values of 944 33 through 60 inclusive, and 62 through 126, inclusive, 945 MAY be represented as the US-ASCII characters which 946 correspond to those octets (EXCLAMATION POINT through 947 LESS THAN, and GREATER THAN through TILDE, 948 respectively). 950 (3) (White Space) Octets with values of 9 and 32 MAY be 951 represented as US-ASCII TAB (HT) and SPACE characters, 952 respectively, but MUST NOT be so represented at the end 953 of an encoded line. Any TAB (HT) or SPACE characters 954 on an encoded line MUST thus be followed on that line 955 by a printable character. In particular, an "=" at the 956 end of an encoded line, indicating a soft line break 957 (see rule #5) may follow one or more TAB (HT) or SPACE 958 characters. It follows that an octet with decimal 959 value 9 or 32 appearing at the end of an encoded line 960 must be represented according to Rule #1. This rule is 961 necessary because some MTAs (Message Transport Agents, 962 programs which transport messages from one user to 963 another, or perform a part of such transfers) are known 964 to pad lines of text with SPACEs, and others are known 965 to remove "white space" characters from the end of a 966 line. Therefore, when decoding a Quoted-Printable 967 body, any trailing white space on a line must be 968 deleted, as it will necessarily have been added by 969 intermediate transport agents. 971 (4) (Line Breaks) A line break in a text body, represented 972 as a CRLF sequence in the text canonical form, must be 973 represented by a (RFC 822) line break, which is also a 974 CRLF sequence, in the Quoted-Printable encoding. Since 975 the canonical representation of media types other than 976 text do not generally include the representation of 977 line breaks as CRLF sequences, no hard line breaks 978 (i.e. line breaks that are intended to be meaningful 979 and to be displayed to the user) should occur in the 980 quoted-printable encoding of such types. Sequences 981 like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely 982 appear in non-text data represented in quoted- 983 printable, of course. 985 Note that many implementations may elect to encode the 986 local representation of various content types directly 987 rather than converting to canonical form first, 988 encoding, and then converting back to local 989 representation. In particular, this may apply to plain 990 text material on systems that use newline conventions 991 other than a CRLF terminator sequence. Such an 992 implementation optimization is permissible, but only 993 when the combined canonicalization-encoding step is 994 equivalent to performing the three steps separately. 996 (5) (Soft Line Breaks) The Quoted-Printable encoding 997 REQUIRES that encoded lines be no more than 76 998 characters long. If longer lines are to be encoded 999 with the Quoted-Printable encoding, "soft" line breaks 1000 must be used. An equal sign as the last character on a 1001 encoded line indicates such a non-significant ("soft") 1002 line break in the encoded text. 1004 Thus if the "raw" form of the line is a single unencoded line 1005 that says: 1007 Now's the time for all folk to come to the aid of their country. 1009 This can be represented, in the Quoted-Printable encoding, as: 1011 Now's the time = 1012 for all folk to come= 1013 to the aid of their country. 1015 This provides a mechanism with which long lines are encoded in 1016 such a way as to be restored by the user agent. The 76 1017 character limit does not count the trailing CRLF, but counts 1018 all other characters, including any equal signs. 1020 Since the hyphen character ("-") is represented as itself in 1021 the Quoted-Printable encoding, care must be taken, when 1022 encapsulating a quoted-printable encoded body inside one or 1023 more multipart entities, to ensure that the boundary delimiter 1024 does not appear anywhere in the encoded body. (A good 1025 strategy is to choose a boundary that includes a character 1026 sequence such as "=_" which can never appear in a quoted- 1027 printable body. See the definition of multipart messages 1028 later in this document.) 1030 NOTE: The quoted-printable encoding represents something of a 1031 compromise between readability and reliability in transport. 1032 Bodies encoded with the quoted-printable encoding will work 1033 reliably over most mail gateways, but may not work perfectly 1034 over a few gateways, notably those involving translation into 1035 EBCDIC. A higher level of confidence is offered by the base64 1036 Content-Transfer-Encoding. A way to get reasonably reliable 1037 transport through EBCDIC gateways is to also quote the US- 1038 ASCII characters 1040 !"#$@[\]^`{|}~ 1042 according to rule #1. 1044 Because quoted-printable data is generally assumed to be 1045 line-oriented, it is to be expected that the representation of 1046 the breaks between the lines of quoted printable data may be 1047 altered in transport, in the same manner that plain text mail 1048 has always been altered in Internet mail when passing between 1049 systems with differing newline conventions. If such 1050 alterations are likely to constitute a corruption of the data, 1051 it is probably more sensible to use the base64 encoding rather 1052 than the quoted-printable encoding. 1054 WARNING TO IMPLEMENTORS: If binary data are encoded in 1055 quoted-printable, care must be taken to encode CR and LF 1056 characters as "=0D" and "=0A", respectively. In particular, a 1057 CRLF sequence in binary data should be encoded as "=0D=0A". 1058 Otherwise, if CRLF were represented as a hard line break, it 1059 might be incorrectly decoded on platforms with different line 1060 break conventions. 1062 For formalists, the syntax of quoted-printable data is 1063 described by the following grammar: 1065 quoted-printable := qp-line *(CRLF qp-line) 1067 qp-line := *(qp-segment transport-padding CRLF) 1068 qp-part transport-padding 1070 qp-part := qp-section 1071 ; Maximum length of 76 characters 1073 qp-segment := qp-section *(SPACE / TAB) "=" 1074 ; Maximum length of 76 characters 1076 qp-section := [*(ptext / SPACE / TAB) ptext] 1077 ptext := octet / safe-char 1079 safe-char := 1081 ; Characters not listed as "mail-safe" in 1082 ; RFC MIME-CONF are also not recommended. 1084 octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1085 ; Octet must be used for characters > 127, =, 1086 ; SPACEs or TABs at the ends of lines, and is 1087 ; recommended for any character not listed in 1088 ; RFC MIME-CONF as "mail-safe". 1090 transport-padding := *LWSP-char 1091 ; Composers MUST NOT generate 1092 ; non-zero length transport 1093 ; padding, but receivers MUST 1094 ; be able to handle padding 1095 ; added by message transports. 1097 IMPORTANT NOTE: The addition of LWSP between the elements 1098 shown in this BNF is NOT allowed since this BNF does not 1099 specify a structured header field. 1101 8.8. Base64 Content-Transfer-Encoding 1103 The Base64 Content-Transfer-Encoding is designed to represent 1104 arbitrary sequences of octets in a form that need not be 1105 humanly readable. The encoding and decoding algorithms are 1106 simple, but the encoded data are consistently only about 33 1107 percent larger than the unencoded data. This encoding is 1108 virtually identical to the one used in Privacy Enhanced Mail 1109 (PEM) applications, as defined in RFC 1421 [RFC-1421]. 1111 A 65-character subset of US-ASCII is used, enabling 6 bits to 1112 be represented per printable character. (The extra 65th 1113 character, "=", is used to signify a special processing 1114 function.) 1116 NOTE: This subset has the important property that it is 1117 represented identically in all versions of ISO 646, including 1118 US-ASCII, and all characters in the subset are also 1119 represented identically in all versions of EBCDIC. Other 1120 popular encodings, such as the encoding used by the uuencode 1121 utility and the base85 encoding specified as part of Level 2 1122 PostScript, do not share these properties, and thus do not 1123 fulfill the portability requirements a binary transport 1124 encoding for mail must meet. 1126 The encoding process represents 24-bit groups of input bits as 1127 output strings of 4 encoded characters. Proceeding from left 1128 to right, a 24-bit input group is formed by concatenating 3 1129 8-bit input groups. These 24 bits are then treated as 4 1130 concatenated 6-bit groups, each of which is translated into a 1131 single digit in the base64 alphabet. When encoding a bit 1132 stream via the base64 encoding, the bit stream must be 1133 presumed to be ordered with the most-significant-bit first. 1134 That is, the first bit in the stream will be the high-order 1135 bit in the first 8-bit byte, and the eighth bit will be the 1136 low-order bit in the first 8-bit byte, and so on. 1138 Each 6-bit group is used as an index into an array of 64 1139 printable characters. The character referenced by the index 1140 is placed in the output string. These characters, identified 1141 in Table 1, below, are selected so as to be universally 1142 representable, and the set excludes characters with particular 1143 significance to SMTP (e.g., ".", CR, LF) and to the multipart 1144 boundary delimiters defined in this document (e.g., "-"). 1146 Table 1: The Base64 Alphabet 1148 Value Encoding Value Encoding Value Encoding Value Encoding 1149 0 A 17 R 34 i 51 z 1150 1 B 18 S 35 j 52 0 1151 2 C 19 T 36 k 53 1 1152 3 D 20 U 37 l 54 2 1153 4 E 21 V 38 m 55 3 1154 5 F 22 W 39 n 56 4 1155 6 G 23 X 40 o 57 5 1156 7 H 24 Y 41 p 58 6 1157 8 I 25 Z 42 q 59 7 1158 9 J 26 a 43 r 60 8 1159 10 K 27 b 44 s 61 9 1160 11 L 28 c 45 t 62 + 1161 12 M 29 d 46 u 63 / 1162 13 N 30 e 47 v 1163 14 O 31 f 48 w (pad) = 1164 15 P 32 g 49 x 1165 16 Q 33 h 50 y 1167 The encoded output stream must be represented in lines of no 1168 more than 76 characters each. All line breaks or other 1169 characters not found in Table 1 must be ignored by decoding 1170 software. In base64 data, characters other than those in 1171 Table 1, line breaks, and other white space probably indicate 1172 a transmission error, about which a warning message or even a 1173 message rejection might be appropriate under some 1174 circumstances. 1176 Special processing is performed if fewer than 24 bits are 1177 available at the end of the data being encoded. A full 1178 encoding quantum is always completed at the end of a body. 1179 When fewer than 24 input bits are available in an input group, 1180 zero bits are added (on the right) to form an integral number 1181 of 6-bit groups. Padding at the end of the data is performed 1182 using the "=" character. Since all base64 input is an 1183 integral number of octets, only the following cases can arise: 1184 (1) the final quantum of encoding input is an integral 1185 multiple of 24 bits; here, the final unit of encoded output 1186 will be an integral multiple of 4 characters with no "=" 1187 padding, (2) the final quantum of encoding input is exactly 8 1188 bits; here, the final unit of encoded output will be two 1189 characters followed by two "=" padding characters, or (3) the 1190 final quantum of encoding input is exactly 16 bits; here, the 1191 final unit of encoded output will be three characters followed 1192 by one "=" padding character. 1194 Because it is used only for padding at the end of the data, 1195 the occurrence of any "=" characters may be taken as evidence 1196 that the end of the data has been reached (without truncation 1197 in transit). No such assurance is possible, however, when the 1198 number of octets transmitted was a multiple of three and no 1199 "=" characters are present. 1201 Any characters outside of the base64 alphabet are to be 1202 ignored in base64-encoded data. 1204 Care must be taken to use the proper octets for line breaks if 1205 base64 encoding is applied directly to text material that has 1206 not been converted to canonical form. In particular, text 1207 line breaks must be converted into CRLF sequences prior to 1208 base64 encoding. The important thing to note is that this may 1209 be done directly by the encoder rather than in a prior 1210 canonicalization step in some implementations. 1212 NOTE: There is no need to worry about quoting potential 1213 boundary delimiters within base64-encoded parts of multipart 1214 entities because no hyphen characters are used in the base64 1215 encoding. 1217 9. Content-ID Header Field 1219 In constructing a high-level user agent, it may be desirable 1220 to allow one body to make reference to another. Accordingly, 1221 bodies may be labelled using the "Content-ID" header field, 1222 which is syntactically identical to the "Message-ID" header 1223 field: 1225 id := "Content-ID" ":" msg-id 1227 Like the Message-ID values, Content-ID values must be 1228 generated to be world-unique. 1230 The Content-ID value may be used for uniquely identifying MIME 1231 entities in several contexts, particularly for caching data 1232 referenced by the message/external-body mechanism. Although 1233 the Content-ID header is generally optional, its use is 1234 MANDATORY in implementations which generate data of the 1235 optional MIME media type "message/external-body". That is, 1236 each message/external-body entity must have a Content-ID field 1237 to permit caching of such data. 1239 It is also worth noting that the Content-ID value has special 1240 semantics in the case of the multipart/alternative media type. 1241 This is explained in the section of this document dealing with 1242 multipart/alternative. 1244 10. Content-Description Header Field 1246 The ability to associate some descriptive information with a 1247 given body is often desirable. For example, it may be useful 1248 to mark an "image" body as "a picture of the Space Shuttle 1249 Endeavor." Such text may be placed in the Content-Description 1250 header field. This header field is always optional. 1252 description := "Content-Description" ":" *text 1254 The description is presumed to be given in the US-ASCII 1255 character set, although the mechanism specified in RFC MIME- 1256 HEADERS [RFC-MIME-HEADERS] may be used for non-US-ASCII 1257 Content-Description values. 1259 11. Additional MIME Header Fields 1261 Future documents may elect to define additional MIME header 1262 fields for various purposes. Any new header field that 1263 further describes the content of a message should begin with 1264 the string "Content-" to allow such fields which appear in a 1265 message header to be distinguished from ordinary RFC 822 1266 message header fields. 1268 MIME-extension-field := 1272 12. Summary 1274 Using the MIME-Version, Content-Type, and Content-Transfer- 1275 Encoding header fields, it is possible to include, in a 1276 standardized way, arbitrary types of data objects with RFC 822 1277 conformant mail messages. No restrictions imposed by either 1278 RFC 821 or RFC 822 are violated, and care has been taken to 1279 avoid problems caused by additional restrictions imposed by 1280 the characteristics of some Internet mail transport mechanisms 1281 (see RFC MIME-CONF). 1283 The next document in this set, RFC MIME-IMT, specifies the 1284 media types that can be labelled and transported using these 1285 headers. 1287 13. Security Considerations 1289 Security issues are discussed in the second document in this 1290 set, RFC MIME-IMT. 1292 14. Authors' Addresses 1294 For more information, the authors of this document are best 1295 contacted via Internet mail: 1297 Nathaniel S. Borenstein 1298 First Virtual Holdings 1299 25 Washington Avenue 1300 Morristown, NJ 07960 1301 USA 1303 Email: nsb@nsb.fv.com 1304 Phone: +1 201 540 8967 1305 Fax: +1 201 993 3032 1307 Ned Freed 1308 Innosoft International, Inc. 1309 1050 East Garvey Avenue South 1310 West Covina, CA 91790 1311 USA 1313 Email: ned@innosoft.com 1314 Phone: +1 818 919 3600 1315 Fax: +1 818 919 3614 1317 MIME is a result of the work of the Internet Engineering Task 1318 Force Working Group on Email Extensions. The chairman of that 1319 group, Greg Vaudreuil, may be reached at: 1321 Gregory M. Vaudreuil 1322 Tigon Corporation 1323 17060 Dallas Parkway 1324 Dallas Texas, 75248 1326 Email: greg.vaudreuil@ons.octel.com 1327 Phone: +1 214 733 2722 1328 Appendix A -- Collected Grammar 1330 This appendix contains the complete BNF grammar for all the 1331 syntax specified by this document. 1333 By itself, however, this grammar is incomplete. It refers to 1334 several entities that are defined by RFC 822. Rather than 1335 reproduce those definitions here, and risk unintentional 1336 differences between the two, this document simply refers the 1337 reader to RFC 822 for the remaining definitions. Wherever a 1338 term is undefined, it refers to the RFC 822 definition. 1340 attribute := token 1342 composite-type := "message" / "multipart" / extension-token 1344 content := "Content-Type" ":" type "/" subtype 1345 *(";" parameter) 1346 ; Matching of media type and subtype 1347 ; is ALWAYS case-insensitive 1349 description := "Content-Description" ":" *text 1351 discrete-type := "text" / "image" / "audio" / "video" / 1352 "application" / extension-token 1354 encoding := "Content-Transfer-Encoding" ":" mechanism 1356 extension-token := iana-token / ietf-token / x-token 1358 iana-token := 1362 ietf-token := 1366 id := "Content-ID" ":" msg-id 1367 mechanism := "7bit" / "8bit" / "binary" / 1368 "quoted-printable" / "base64" / 1369 ietf-token / x-token 1371 octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1372 ; Octet must be used for characters > 127, =, 1373 ; SPACEs or TABs at the ends of lines, and is 1374 ; recommended for any character not listed in 1375 ; RFC MIME-CONF as "mail-safe". 1377 parameter := attribute "=" value 1379 ptext := octet / safe-char 1381 qp-line := *(qp-segment transport-padding CRLF) 1382 qp-part transport-padding 1384 qp-part := qp-section 1385 ; Maximum length of 76 characters 1387 qp-section := [*(ptext / SPACE / TAB) ptext] 1389 qp-segment := qp-section *(SPACE / TAB) "=" 1390 ; Maximum length of 76 characters 1392 quoted-printable := qp-line *(CRLF qp-line) 1394 safe-char := 1396 ; Characters not listed as "mail-safe" in 1397 ; RFC MIME-CONF are also not recommended. 1399 subtype := extension-token 1401 token := 1* 1404 transport-padding := *LWSP-char 1405 ; Composers MUST NOT generate 1406 ; non-zero length transport 1407 ; padding, but receivers MUST 1408 ; be able to handle padding 1409 ; added by message transports. 1411 tspecials := "(" / ")" / "<" / ">" / "@" / 1412 "," / ";" / ":" / "\" / <"> 1413 "/" / "[" / "]" / "?" / "=" 1414 ; Must be in quoted-string, 1415 ; to use within parameter values 1417 type := discrete-type / composite-type 1419 value := token / quoted-string 1421 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 1423 x-token :=