idnits 2.17.00 (12 Aug 2021) /tmp/idnits6064/draft-ietf-822ext-mime-imb-03.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2022-05-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 414: '... in accordance with this document MUST...' RFC 2119 keyword, line 490: '...agents MUST include proper MIME labell...' RFC 2119 keyword, line 664: '... standard values MUST be documented, r...' RFC 2119 keyword, line 950: '... MAY be represented as the US-AS...' RFC 2119 keyword, line 955: '... Octets with values of 9 and 32 MAY be...' (6 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1429 has weird spacing: '... no inter...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 5, 1995) is 9877 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC-822' on line 123 looks like a reference -- Missing reference section? 'RFC-821' on line 128 looks like a reference -- Missing reference section? 'RFC821' on line 344 looks like a reference -- Missing reference section? 'ATK' on line 152 looks like a reference -- Missing reference section? 'X400' on line 157 looks like a reference -- Missing reference section? 'RFC-1049' on line 188 looks like a reference -- Missing reference section? 'RFC-1123' on line 224 looks like a reference -- Missing reference section? 'RFC-1344' on line 228 looks like a reference -- Missing reference section? 'RFC-1345' on line 229 looks like a reference -- Missing reference section? 'RFC-1524' on line 229 looks like a reference -- Missing reference section? 'RFC-MIME-REG' on line 666 looks like a reference -- Missing reference section? 'RFC-1652' on line 765 looks like a reference -- Missing reference section? 'RFC-1421' on line 1114 looks like a reference -- Missing reference section? 'RFC-MIME-HEADERS' on line 1261 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Nathaniel Borenstein 3 Internet Draft Ned Freed 4 6 Multipurpose Internet Mail Extensions 7 (MIME) Part One: 9 Format of Internet Message Bodies 11 May 5, 1995 13 Status of this Memo 15 This document is an Internet-Draft. Internet-Drafts are 16 working documents of the Internet Engineering Task Force 17 (IETF), its areas, and its working groups. Note that other 18 groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months. Internet-Drafts may be updated, replaced, or obsoleted 23 by other documents at any time. It is not appropriate to use 24 Internet-Drafts as reference material or to cite them other 25 than as a "working draft" or "work in progress". 27 To learn the current status of any Internet-Draft, please 28 check the 1id-abstracts.txt listing contained in the 29 Internet-Drafts Shadow Directories on ds.internic.net (US East 30 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), 31 or munnari.oz.au (Pacific Rim). 33 1. Abstract 35 STD 11, RFC 822, defines a message representation protocol 36 specifying considerable detail about US-ASCII message headers, 37 and leaves the message content, or message body, as flat US- 38 ASCII text. This set of documents, collectively called the 39 Multipurpose Internet Mail Extensions, or MIME, redefines the 40 format of messages to allow for 41 (1) textual message bodies in character sets other than 42 US-ASCII, 44 (2) non-textual message bodies, 46 (3) multi-part message bodies, and 48 (4) textual header information in character sets other than 49 US-ASCII. 51 These documents are based on earlier work documented in RFC 52 934, STD 11, and RFC 1049, but extends and revises them. 53 Because RFC 822 said so little about message bodies, these 54 documents are largely orthogonal to (rather than a revision 55 of) RFC 822. 57 In particular, these documents are designed to provide 58 facilities to include multiple parts in a single message, to 59 represent body and header text in character sets other than 60 US-ASCII, to represent formatted multi-font text messages, to 61 represent non-textual material such as images and audio 62 fragments, and generally to facilitate later extensions 63 defining new types of Internet mail for use by cooperating 64 mail agents. 66 This initial document specifies the various headers used to 67 describe the structure of MIME messages. The second document, 68 RFC MIME-IMT, defines the general structure of the MIME media 69 typing system and defines an initial set of media types. The 70 third document, RFC MIME-HEADERS, describes extensions to RFC 71 822 to allow non-US-ASCII text data in Internet mail header 72 fields. The fourth document, RFC MIME-REG, specifies various 73 IANA registration procedures for MIME-related entities. The 74 fifth and final document, RFC MIME-CONF, describes MIME 75 conformance conformance criteria as well as providing some 76 illustrative examples of MIME message formats, 77 acknowledgements, and the bibliography. 79 These documents are revisions of RFCs 1521, 1522, and 1590, 80 which themselves were revisions of RFCs 1341 and 1342. An 81 appendix in RFC MIME-CONF describes differences and changes 82 from previous versions. 84 2. Table of Contents 86 1 Abstract .............................................. 1 87 2 Table of Contents ..................................... 3 88 3 Introduction .......................................... 4 89 4 Definitions, Conventions, and Generic BNF Grammar ..... 6 90 4.1 CRLF ................................................ 7 91 4.2 Character Set ....................................... 7 92 4.3 Message ............................................. 8 93 4.4 Body Part ........................................... 8 94 4.5 Entity .............................................. 8 95 4.6 Body ................................................ 8 96 4.7 7bit Data ........................................... 8 97 4.8 8bit Data ........................................... 9 98 4.9 Binary Data ......................................... 9 99 4.10 Lines .............................................. 9 100 5 MIME Header Fields .................................... 9 101 6 MIME-Version Header Field ............................. 10 102 7 Content-Type Header Field ............................. 12 103 7.1 Syntax of the Content-Type Header Field ............. 14 104 7.2 Content-Type Defaults ............................... 16 105 8 Content-Transfer-Encoding Header Field ................ 17 106 8.1 Content-Transfer-Encoding Syntax .................... 17 107 8.2 Content-Transfer-Encodings Sematics ................. 17 108 8.3 New Content-Transfer-Encodings ...................... 19 109 8.4 Interpretation and Use .............................. 19 110 8.5 Translating Encodings ............................... 21 111 8.6 Canonical Encoding Model ............................ 22 112 8.7 Quoted-Printable Content-Transfer-Encoding .......... 22 113 8.8 Base64 Content-Transfer-Encoding .................... 26 114 9 Content-ID Header Field ............................... 29 115 10 Content-Description Header Field ..................... 29 116 11 Additional MIME Header Fields ........................ 30 117 12 Summary .............................................. 30 118 13 Security Considerations .............................. 30 119 14 Authors' Addresses ................................... 31 120 A Collected Grammar ..................................... 32 121 3. Introduction 123 Since its publication in 1982, RFC 822 [RFC-822] has defined 124 the standard format of textual mail messages on the Internet. 125 Its success has been such that the RFC 822 format has been 126 adopted, wholly or partially, well beyond the confines of the 127 Internet and the Internet SMTP transport defined by RFC 821 128 [RFC-821]. As the format has seen wider use, a number of 129 limitations have proven increasingly restrictive for the user 130 community. 132 RFC 822 was intended to specify a format for text messages. 133 As such, non-text messages, such as multimedia messages that 134 might include audio or images, are simply not mentioned. Even 135 in the case of text, however, RFC 822 is inadequate for the 136 needs of mail users whose languages require the use of 137 character sets richer than US-ASCII. Since RFC 822 does not 138 specify mechanisms for mail containing audio, video, Asian 139 language text, or even text in most European languages, 140 additional specifications are needed. 142 One of the notable limitations of RFC 821/822 based mail 143 systems is the fact that they limit the contents of electronic 144 mail messages to relatively short lines (e.g. 1000 characters 145 or less [RFC821]) of 7-bit US-ASCII. This forces users to 146 convert any non-textual data that they may wish to send into 147 seven-bit bytes representable as printable US-ASCII characters 148 before invoking a local mail UA (User Agent, a program with 149 which human users send and receive mail). Examples of such 150 encodings currently used in the Internet include pure 151 hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in 152 RFC 1421, the Andrew Toolkit Representation [ATK], and many 153 others. 155 The limitations of RFC 822 mail become even more apparent as 156 gateways are designed to allow for the exchange of mail 157 messages between RFC 822 hosts and X.400 hosts. X.400 [X400] 158 specifies mechanisms for the inclusion of non-textual body 159 parts within electronic mail messages. The current standards 160 for the mapping of X.400 messages to RFC 822 messages specify 161 either that X.400 non-textual body parts must be converted to 162 (not encoded in) IA5Text format, or that they must be 163 discarded, notifying the RFC 822 user that discarding has 164 occurred. This is clearly undesirable, as information that a 165 user may wish to receive is lost. Even though a user agent 166 may not have the capability of dealing with the non-textual 167 body part, the user might have some mechanism external to the 168 UA that can extract useful information from the body part. 169 Moreover, it does not allow for the fact that the message may 170 eventually be gatewayed back into an X.400 message handling 171 system (i.e., the X.400 message is "tunneled" through Internet 172 mail), where the non-textual information would definitely 173 become useful again. 175 This document describes several mechanisms that combine to 176 solve most of these problems without introducing any serious 177 incompatibilities with the existing world of RFC 822 mail. In 178 particular, it describes: 180 (1) A MIME-Version header field, which uses a version 181 number to declare a message to be conformant with this 182 specification and allows mail processing agents to 183 distinguish between such messages and those generated 184 by older or non-conformant software, which are presumed 185 to lack such a field. 187 (2) A Content-Type header field, generalized from RFC 1049 188 [RFC-1049], which can be used to specify the media type 189 and subtype of data in the body of a message and to 190 fully specify the native representation (canonical 191 form) of such data. 193 (3) A Content-Transfer-Encoding header field, which can be 194 used to specify an auxiliary encoding that was applied 195 to the data in order to allow it to pass through mail 196 transport mechanisms which may have data or character 197 set limitations. 199 (4) Two additional header fields that can be used to 200 further describe the data in a body, the Content-ID and 201 Content-Description header fields. 203 All of the header fields defined in this document are subject 204 to the general syntactic rules for header fields specified in 205 RFC 822. In particular, all of these header fields can 206 include RFC 822 comments, which have no semantic content and 207 should be ignored during MIME processing. 209 Finally, to specify and promote interoperability, RFC MIME- 210 CONF provides a basic applicability statement for a subset of 211 the above mechanisms that defines a minimal level of 212 "conformance" with this document. 214 HISTORICAL NOTE: Several of the mechanisms described in this 215 document may seem somewhat strange or even baroque at first 216 reading. It is important to note that compatibility with 217 existing standards AND robustness across existing practice 218 were two of the highest priorities of the working group that 219 developed this document. In particular, compatibility was 220 always favored over elegance. 222 Please refer to the current edition of the "IAB Official 223 Protocol Standards" for the standardization state and status 224 of this protocol. RFC 822 and RFC 1123 [RFC-1123] also 225 provide essential background for MIME since no conforming 226 implementation of MIME can violate them. In addition, several 227 other informational RFC documents will be of interest to the 228 MIME implementor, in particular RFC 1344 [RFC-1344], RFC 1345 229 [RFC-1345], and RFC 1524 [RFC-1524]. 231 4. Definitions, Conventions, and Generic BNF Grammar 233 Although the mechanisms specified in this document are all 234 described in prose, most are also described formally in the 235 augmented BNF notation of RFC 822. Implementors will need to 236 be familiar with this notation in order to understand this 237 specification, and are referred to RFC 822 for a complete 238 explanation of the augmented BNF notation. 240 Some of the augmented BNF in this document makes reference to 241 syntactic entities that are defined in RFC 822 and not in this 242 document. A complete formal grammar, then, is obtained by 243 combining Appendix A of this document, the collected grammar, 244 with the BNF of RFC 822 plus the modifications to RFC 822 245 defined in RFC 1123 (which specifically changes the syntax for 246 `return', `date' and `mailbox'). 248 In this document, all numeric and octet values are given in 249 decimal notation. All media type values, subtype values, and 250 parameter names as defined in this document are case- 251 insensitive. However, parameter values are case-sensitive 252 unless otherwise specified for the specific parameter. 254 FORMATTING NOTE: Notes, such at this one, provide additional 255 nonessential information which may be skipped by the reader 256 without missing anything essential. The primary purpose of 257 these non-essential notes is to convey information about the 258 rationale of this document, or to place this document in the 259 proper historical or evolutionary context. Such information 260 may in particular be skipped by those who are focused entirely 261 on building a conformant implementation, but may be of use to 262 those who wish to understand why certain design choices were 263 made. 265 4.1. CRLF 267 The term CRLF, in this document, refers to the sequence of 268 octets corresponding to the two US-ASCII characters CR 269 (decimal value 13) and LF (decimal value 10) which, taken 270 together, in this order, denote a line break in RFC 822 mail. 272 4.2. Character Set 274 The term "character set" is used in this document to refer to 275 a table-based method of converting a sequence of octets into a 276 sequence of characters. Note that unconditional and 277 unambiguous conversion in the other direction is not required, 278 in that not all characters may be available in a given 279 character set and a character set may provide more than one 280 sequence of octets to represent a particular character. This 281 definition is intended to allow various kinds of character 282 encodings, from simple single-table mappings such as US-ASCII 283 to complex table switching methods such as those that use ISO 284 2022's techniques. However, the definition associated with a 285 MIME character set name must fully specify the mapping to be 286 performed from octets to characters. In particular, use of 287 external profiling information to determine the exact mapping 288 is not permitted. 290 HISTORICAL NOTE: The term "character set" originated in the 291 definition of US-ASCII and similar 7-bit and 8-bit 292 specifications. These define true sets. However, the advent 293 of multi-octet character encodings and switching techniques 294 have transformed character sets into entities that properly 295 speaking are no longer strictly sets. Some other communities 296 have adopted the term "character encoding" for what MIME calls 297 a "character set" as a result. 299 4.3. Message 301 The term "message", when not further qualified, means either 302 the (complete or "top-level") message being transferred on a 303 network, or a message encapsulated in a body part of type 304 "message". 306 4.4. Body Part 308 The term "body part", in this document, refers to content 309 headers and contents of either a message or one of the parts 310 in the body of a multipart entity. A body part has a header 311 and a body, so it makes sense to speak about the body of a 312 body part. 314 4.5. Entity 316 The term "entity", in this document, means either a message or 317 a body part. All kinds of entities share the property that 318 they have a header and a body. 320 4.6. Body 322 The term "body", when not further qualified, means the body of 323 an entity, that is the body of either a message or of a body 324 part. 326 NOTE: The previous four definitions are clearly circular. 327 This is unavoidable, since the overall structure of a MIME 328 message is indeed recursive. 330 4.7. 7bit Data 332 "7bit data" refers to data that is all represented as 333 relatively short lines with 998 octets or less between CRLF 334 line separation sequences [RFC821]. No octets with decimal 335 values greater than 127 are allowed and neither are NULs 336 (octets with decimal value 0). CR (decimal value 13) and LF 337 (decimal value 10) octets only occur as part of CRLF line 338 separation sequences. 340 4.8. 8bit Data 342 "8bit data" refers to data that is all represented as 343 relatively short lines with 998 octets or less between CRLF 344 line separation sequences [RFC821]), but octets with decimal 345 values greater than 127 may be used. As with "7bit data" CR 346 and LF octets only occur as part of CRLF line separation 347 sequences and no NULs are allowed. 349 4.9. Binary Data 351 "Binary data" refers to data where any sequence of octets 352 whatsoever is allowed. 354 4.10. Lines 356 "Lines" are defined as sequences of octets separated by a CRLF 357 sequences. This is consistent with both RFC 821 and RFC 822. 358 "Lines" only refers to a unit of text in a message, which may 359 or may not correspond to something that is actually displayed 360 by a user agent. 362 5. MIME Header Fields 364 MIME defines a number of new RFC 822 header fields that are 365 used to describe the content of messages. These header fields 366 occur in two contexts: 368 (1) As part of a regular RFC 822 message header. 370 (2) In a MIME body part header within a multipart 371 construct. 373 The formal definition of these header fields is as follows: 375 MIME-message-headers := fields 376 version CRLF 377 [ content CRLF ] 378 [ encoding CRLF ] 379 [ id CRLF ] 380 [ description CRLF ] 381 *( mime-extension-field CRLF ) 382 ; The ordering of the header 383 ; fields implied by this BNF 384 ; definition should be ignored 386 MIME-part-headers := [ content CRLF ] 387 [ encoding CRLF ] 388 [ id CRLF ] 389 [ description CRLF ] 390 *( mime-extension-field CRLF ) 391 ; The ordering of the header 392 ; fields implied by this BNF 393 ; definition should be ignored 395 The syntax of the various specific MIME header fields will be 396 described in the following sections. 398 6. MIME-Version Header Field 400 Since RFC 822 was published in 1982, there has really been 401 only one format standard for Internet messages, and there has 402 been little perceived need to declare the format standard in 403 use. This document is an independent document that 404 complements RFC 822. Although the extensions in this document 405 have been defined in such a way as to be compatible with RFC 406 822, there are still circumstances in which it might be 407 desirable for a mail-processing agent to know whether a 408 message was composed with the new standard in mind. 410 Therefore, this document defines a new header field, "MIME- 411 Version", which is to be used to declare the version of the 412 Internet message body format standard in use. 414 Messages composed in accordance with this document MUST 415 include such a header field, with the following verbatim text: 417 MIME-Version: 1.0 419 The presence of this header field is an assertion that the 420 message has been composed in compliance with this document. 422 Since it is possible that a future document might extend the 423 message format standard again, a formal BNF is given for the 424 content of the MIME-Version field: 426 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 428 Thus, future format specifiers, which might replace or extend 429 "1.0", are constrained to be two integer fields, separated by 430 a period. If a message is received with a MIME-version value 431 other than "1.0", it cannot be assumed to conform with this 432 specification. 434 Note that the MIME-Version header field is required at the top 435 level of a message. It is not required for each body part of 436 a multipart entity. It is required for the embedded headers 437 of a body of type "message" if and only if the embedded 438 message is itself claimed to be MIME-conformant. 440 It is not possible to fully specify how a mail reader that 441 conforms with MIME as defined in this document should treat a 442 message that might arrive in the future with some value of 443 MIME-Version other than "1.0". 445 It is also worth noting that version control for specific 446 media types is not accomplished using the MIME-Version 447 mechanism. In particular, some formats (such as 448 application/postscript) have version numbering conventions 449 that are internal to the document format. Where such 450 conventions exist, MIME does nothing to supersede them. Where 451 no such conventions exist, a MIME media type might use a 452 "version" parameter in the content-type field if necessary. 454 NOTE TO IMPLEMENTORS: When checking MIME-Version values any 455 RFC 822 comment strings that are present must be ignored. In 456 particular, the following four MIME-Version fields are 457 equivalent: 459 MIME-Version: 1.0 461 MIME-Version: 1.0 (produced by MetaSend Vx.x) 463 MIME-Version: (produced by MetaSend Vx.x) 1.0 464 MIME-Version: 1.(produced by MetaSend Vx.x)0 466 In the absence of a MIME-Version field, a receiving user agent 467 (whether MIME compliant or not) may optionally choose to 468 interpret the body of the message according to local 469 conventions. Many such conventions are currently in use and 470 it should be noted that in practice non-MIME messages can 471 contain just about anything. 473 It is impossible to be certain that a non-MIME message is 474 actually plain text in the US-ASCII character set since it 475 might well be a message that, using some set of nonstandard 476 local conventions that predate this document, includes text in 477 another character set or non-textual data presented in a 478 manner that cannot be automatically recognized (e.g., a 479 uuencoded compressed UNIX tar file). 481 MIME-compliant user agents are required, if they support any 482 such nonstandard conventions at all, to do so on received 483 messages only -- they must not send non-MIME messages 484 containing anything other than US-ASCII text. 486 In particular, the use of non-US-ASCII text in messages 487 without a MIME-Version field is strongly discouraged as it 488 impedes interoperability when sending messages between regions 489 with different localization conventions. MIME-compliant user 490 agents MUST include proper MIME labelling when sending 491 anything other than plain text in the US-ASCII character set. 493 In addition, non-MIME user agents should be upgraded if at all 494 possible to include appropriate MIME header information in the 495 messages they send even if nothing else in MIME is supported. 496 This upgrade will have little, if any, effect on non-MIME 497 recipients and will aid MIME in correctly displaying such 498 messages. It also provides a smooth transition path to 499 eventual adoption of other MIME capabilities. 501 7. Content-Type Header Field 503 The purpose of the Content-Type field is to describe the data 504 contained in the body fully enough that the receiving user 505 agent can pick an appropriate agent or mechanism to present 506 the data to the user, or otherwise deal with the data in an 507 appropriate manner. The value in this field is called a media 508 type. 510 HISTORICAL NOTE: The Content-Type header field was first 511 defined in RFC 1049. RFC 1049 used a simpler and less 512 powerful syntax, but one that is largely compatible with the 513 mechanism given here. 515 The Content-Type header field specifies the nature of the data 516 in the body of an entity by giving media type and subtype 517 identifiers, and by providing auxiliary information that may 518 be required for certain media types. After the media type and 519 subtype names, the remainder of the header field is simply a 520 set of parameters, specified in an attribute=value notation. 521 The ordering of parameters is not significant. 523 In general, the top-level media type is used to declare the 524 general type of data, while the subtype specifies a specific 525 format for that type of data. Thus, a media type of 526 "image/xyz" is enough to tell a user agent that the data is an 527 image, even if the user agent has no knowledge of the specific 528 image format "xyz". Such information can be used, for 529 example, to decide whether or not to show a user the raw data 530 from an unrecognized subtype -- such an action might be 531 reasonable for unrecognized subtypes of text, but not for 532 unrecognized subtypes of image or audio. For this reason, 533 registered subtypes of text, image, audio, and video should 534 not contain embedded information that is really of a different 535 type. Such compound formats should be represented using the 536 "multipart" or "application" types. 538 Parameters are modifiers of the media subtype, and as such do 539 not fundamentally affect the nature of the content. The set 540 of meaningful parameters depends on the media type and 541 subtype. Most parameters are associated with a single 542 specific subtype. However, a given top-level media type may 543 define parameters which are applicable to any subtype of that 544 type. Parameters may be required by their defining content 545 type or subtype or they may be optional. MIME implementations 546 must ignore any parameters whose names they do not recognize. 548 For example, the "charset" parameter is applicable to any 549 subtype of "text", while the "boundary" parameter is required 550 for any subtype of the "multipart" media type. 552 There are NO globally-meaningful parameters that apply to all 553 media types. Truly global mechanisms are best addressed, in 554 the MIME model, by the definition of additional Content-* 555 header fields. 557 An initial set of seven top-level media types is defined in 558 MIME-IMT. Five of these are discrete types whose content is 559 essentially opaque as far as MIME processing is concerned. 560 The remaining two are composite types whose contents require 561 additional handling by MIME processors. 563 This set of top-level media types is intended to be 564 substantially complete. It is expected that additions to the 565 larger set of supported types can generally be accomplished by 566 the creation of new subtypes of these initial types. In the 567 future, more top-level types may be defined only by a 568 standards-track extension to this standard. If another top- 569 level type is to be used for any reason, it must be given a 570 name starting with "X-" to indicate its non-standard status 571 and to avoid a potential conflict with a future official name. 573 7.1. Syntax of the Content-Type Header Field 575 In the Augmented BNF notation of RFC 822, a Content-Type 576 header field value is defined as follows: 578 content := "Content-Type" ":" type "/" subtype 579 *(";" parameter) 580 ; Matching of media type and subtype 581 ; is ALWAYS case-insensitive 583 type := discrete-type / composite-type 585 discrete-type := "text" / "image" / "audio" / "video" / 586 "application" / extension-token 588 composite-type := "message" / "multipart" / extension-token 590 extension-token := iana-token / ietf-token / x-token 592 iana-token := 596 ietf-token := 600 x-token := 603 subtype := extension-token 605 parameter := attribute "=" value 607 attribute := token 609 value := token / quoted-string 611 token := 1* 614 tspecials := "(" / ")" / "<" / ">" / "@" / 615 "," / ";" / ":" / "\" / <"> 616 "/" / "[" / "]" / "?" / "=" 617 ; Must be in quoted-string, 618 ; to use within parameter values 620 Note that the definition of "tspecials" is the same as the RFC 621 822 definition of "specials" with the addition of the three 622 characters "/", "?", and "=", and the removal of ".". 624 Note also that a subtype specification is MANDATORY -- it may 625 not be omitted from a Content-Type header field. As such, 626 there are no default subtypes. 628 The type, subtype, and parameter names are not case sensitive. 629 For example, TEXT, Text, and TeXt are all equivalent top-level 630 media types. Parameter values are normally case sensitive, 631 but sometimes are interpreted in a case-insensitive fashion, 632 depending on the intended use. (For example, multipart 633 boundaries are case-sensitive, but the "access-type" parameter 634 for message/External-body is not case-sensitive.) 636 Note that the value of a quoted string parameter does not 637 include the quotes. That is, the quotation marks in a 638 quoted-string are not a part of the value of the parameter, 639 but are merely used to delimit that parameter value. In 640 addition, comments are allowed in accordance with RFC 822 641 rules for structured header fields. Thus the following two 642 forms 644 Content-type: text/plain; charset=us-ascii (Plain text) 646 Content-type: text/plain; charset="us-ascii" 648 are completely equivalent. 650 Beyond this syntax, the only syntactic constraint on the 651 definition of subtype names is the desire that their uses must 652 not conflict. That is, it would be undesirable to have two 653 different communities using "Content-Type: application/foobar" 654 to mean two different things. The process of defining new 655 media subtypes, then, is not intended to be a mechanism for 656 imposing restrictions, but simply a mechanism for publicizing 657 their definition and usage. There are, therefore, two 658 acceptable mechanisms for defining new media subtypes: 660 (1) Private values (starting with "X-") may be defined 661 bilaterally between two cooperating agents without 662 outside registration or standardization. 664 (2) New standard values MUST be documented, registered 665 with, and approved by IANA, as described in RFC MIME- 666 REG [RFC-MIME-REG]. 668 The second document in this set, RFC MIME-IMT, defines the 669 initial set of media types for MIME. 671 7.2. Content-Type Defaults 673 Default RFC 822 messages without a MIME Content-Type header 674 are taken by this protocol to be plain text in the US-ASCII 675 character set, which can be explicitly specified as: 677 Content-type: text/plain; charset=us-ascii 679 This default is assumed if no Content-Type header field is 680 specified. In the presence of a MIME-Version header field, a 681 receiving User Agent can also assume that plain US-ASCII text 682 was the sender's intent. Plain US-ASCII text may still be 683 assumed in the absence of a MIME-Version specification, but 684 the sender's intent might have been otherwise. 686 8. Content-Transfer-Encoding Header Field 688 Many media types which could be usefully transported via email 689 are represented, in their "natural" format, as 8-bit character 690 or binary data. Such data cannot be transmitted over some 691 transfer protocols. For example, RFC 821 (SMTP) restricts 692 mail messages to 7-bit US-ASCII data with lines no longer than 693 1000 characters including any trailing CRLF line separator. 695 It is necessary, therefore, to define a standard mechanism for 696 encoding such data into a 7-bit short line format. Proper 697 labelling of unencoded material in less restrictive formats 698 for direct use over less restrictive transports is also 699 desireable. This document specifies that such encodings will 700 be indicated by a new "Content-Transfer-Encoding" header 701 field. This field has not been defined by any previous 702 standard. 704 8.1. Content-Transfer-Encoding Syntax 706 The Content-Transfer-Encoding field's value is a single token 707 specifying the type of encoding, as enumerated below. 708 Formally: 710 encoding := "Content-Transfer-Encoding" ":" mechanism 712 mechanism := "7bit" / "8bit" / "binary" / 713 "quoted-printable" / "base64" / 714 ietf-token / x-token 716 These values are not case sensitive -- Base64 and BASE64 and 717 bAsE64 are all equivalent. An encoding type of 7BIT requires 718 that the body is already in a 7-bit mail-ready representation. 719 This is the default value -- that is, "Content-Transfer- 720 Encoding: 7BIT" is assumed if the Content-Transfer-Encoding 721 header field is not present. 723 8.2. Content-Transfer-Encodings Sematics 725 This single Content-Transfer-Encoding token actually provides 726 two pieces of information. It specifies what sort of encoding 727 transformation the body was subjected to, and it specifies 728 what the domain of the result is. 730 Three transformations are currently defined: identity, the 731 "quoted-printable" encoding, and the "base64" encoding. The 732 domains are "binary", "8bit" and "7bit". 734 The Content-Transfer-Encoding values "7bit", "8bit", and 735 "binary" all mean that the identity (i.e. NO) encoding 736 transformation has been performed. As such, they serve simply 737 as indicators of the domain of the body data, and provide 738 useful information about the sort of encoding that might be 739 needed for transmission in a given transport system. The 740 terms "7bit data", "8bit data", and "binary data" are all 741 defined in Section 4. 743 The quoted-printable and base64 encodings transform their 744 input from an arbitrary domain into material in the "7bit" 745 range, thus making it safe to carry over restricted 746 transports. The specific definition of the transformations 747 are given below. 749 The proper Content-Transfer-Encoding label must always be 750 used. Labelling unencoded data containing 8-bit characters as 751 "7bit" is not allowed, nor is labelling unencoded non-line- 752 oriented data as anything other than "binary" allowed. 754 Unlike media subtypes, a proliferation of Content-Transfer- 755 Encoding values is both undesirable and unnecessary. However, 756 establishing only a single transformation into the "7bit" 757 domain does not seem possible. There is a tradeoff between 758 the desire for a compact and efficient encoding of largely- 759 binary data and the desire for a readable encoding of data 760 that is mostly, but not entirely, 7-bit. For this reason, at 761 least two encoding mechanisms are necessary: a "readable" 762 encoding (quoted-printable) and a "dense" encoding (base64). 764 Mail transport for unencoded 8-bit data is defined in RFC 1652 765 [RFC-1652]. As of the publication of this document, there are 766 no standardized Internet mail transports for which it is 767 legitimate to include unencoded binary data in mail bodies. 768 Thus there are no circumstances in which the "binary" 769 Content-Transfer-Encoding is actually valid in Internet mail. 770 However, in the event that binary mail transport becomes a 771 reality in Internet mail, or when this document is used in 772 conjunction with any other binary-capable transport mechanism, 773 binary bodies should be labelled as such using this mechanism. 775 NOTE: The five values defined for the Content-Transfer- 776 Encoding field imply nothing about the media type other than 777 the algorithm by which it was encoded or the transport system 778 requirements if unencoded. 780 8.3. New Content-Transfer-Encodings 782 Implementors may, if necessary, define private Content- 783 Transfer-Encoding values, but must use an x-token, which is a 784 name prefixed by "X-", to indicate its non-standard status, 785 e.g., "Content-Transfer-Encoding: x-my-new-encoding". 786 Additional standardized Content-Transfer-Encoding values must 787 be specified by a standards-track RFC. Additional 788 requirements such specifications must meet are given in RFC 789 REG. As such, all content-transfer-encoding namespace except 790 that beginning with "X-" is explicitly reserved to the IANA 791 for future use. 793 Unlike media types and subtypes, the creation of new Content- 794 Transfer-Encoding values is STRONGLY discouraged, as it seems 795 likely to hinder interoperability with little potential 796 benefit 798 8.4. Interpretation and Use 800 If a Content-Transfer-Encoding header field appears as part of 801 a message header, it applies to the entire body of that 802 message. If a Content-Transfer-Encoding header field appears 803 as part of a body part's headers, it applies only to the body 804 of that body part. If an entity is of type "multipart" the 805 Content-Transfer-Encoding is not permitted to have any value 806 other than "7bit", "8bit" or "binary". Even more severe 807 restrictions apply to some subtypes of the "message" type. 809 It should be noted that most media types are defined in terms 810 of octets rather than bits, so that the mechanisms described 811 here are mechanisms for encoding arbitrary octet streams, not 812 bit streams. If a bit stream is to be encoded via one of 813 these mechanisms, it must first be converted to an 8-bit byte 814 stream using the network standard bit order ("big-endian"), in 815 which the earlier bits in a stream become the higher-order 816 bits in a 8-bit byte. A bit stream not ending at an 8-bit 817 boundary must be padded with zeroes. This document provides a 818 mechanism for noting the addition of such padding in the case 819 of the application/octet-stream media type, which has a 820 "padding" parameter. 822 The encoding mechanisms defined here explicitly encode all 823 data in US-ASCII. Thus, for example, suppose an entity has 824 header fields such as: 826 Content-Type: text/plain; charset=ISO-8859-1 827 Content-transfer-encoding: base64 829 This must be interpreted to mean that the body is a base64 830 US-ASCII encoding of data that was originally in ISO-8859-1, 831 and will be in that character set again after decoding. 833 Certain Content-Transfer-Encoding values may only be used on 834 certain media types. In particular, it is EXPRESSLY FORBIDDEN 835 to use any encodings other than "7bit", "8bit", or "binary" 836 with any composite media type, i.e. one that recursively 837 includes other Content-Type fields. Currently the only 838 composite media types are "multipart" and "message". All 839 encodings that are desired for bodies of type multipart or 840 message must be done at the innermost level, by encoding the 841 actual body that needs to be encoded. 843 It should also be noted that, by definition, if a composite 844 entity has a transfer-encoding value such as "7bit", but one 845 of the enclosed parts has a less restrictive value such as 846 "8bit", then either the outer "7bit" labelling is in error, 847 because 8-bit data are included, or the inner "8bit" labelling 848 placed an unnecessarily high demand on the transport system 849 because the actual included data were actually 7-bit-safe. 851 NOTE ON ENCODING RESTRICTIONS: Though the prohibition against 852 using content-transfer-encodings on composite body data may 853 seem overly restrictive, it is necessary to prevent nested 854 encodings, in which data are passed through an encoding 855 algorithm multiple times, and must be decoded multiple times 856 in order to be properly viewed. Nested encodings add 857 considerable complexity to user agents: Aside from the 858 obvious efficiency problems with such multiple encodings, they 859 can obscure the basic structure of a message. In particular, 860 they can imply that several decoding operations are necessary 861 simply to find out what types of bodies a message contains. 863 Banning nested encodings may complicate the job of certain 864 mail gateways, but this seems less of a problem than the 865 effect of nested encodings on user agents. 867 NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT- 868 TRANSFER-ENCODING: It may seem that the Content-Transfer- 869 Encoding could be inferred from the characteristics of the 870 media that is to be encoded, or, at the very least, that 871 certain Content-Transfer-Encodings could be mandated for use 872 with specific media types. There are several reasons why this 873 is not the case. First, given the varying types of transports 874 used for mail, some encodings may be appropriate for some 875 combinations of media types and transports but not for others. 876 (For example, in an 8-bit transport, no encoding would be 877 required for text in certain character sets, while such 878 encodings are clearly required for 7-bit SMTP.) 880 Second, certain media types may require different types of 881 transfer encoding under different circumstances. For example, 882 many PostScript bodies might consist entirely of short lines 883 of 7-bit data and hence require no encoding at all. Other 884 PostScript bodies (especially those using Level 2 PostScript's 885 binary encoding mechanism) may only be reasonably represented 886 using a binary transport encoding. Finally, since the 887 Content-Type field is intended to be an open-ended 888 specification mechanism, strict specification of an 889 association between media types and encodings effectively 890 couples the specification of an application protocol with a 891 specific lower-level transport. This is not desirable since 892 the developers of a media type should not have to be aware of 893 all the transports in use and what their limitations are. 895 8.5. Translating Encodings 897 The quoted-printable and base64 encodings are designed so that 898 conversion between them is possible. The only issue that 899 arises in such a conversion is the handling of line breaks. 900 When converting from quoted-printable to base64 a line break 901 must be converted into a CRLF sequence. Similarly, a CRLF 902 sequence in base64 data must be converted to a quoted- 903 printable line break, but ONLY when converting text data. 905 8.6. Canonical Encoding Model 907 There was some confusion, in the predecessors of this RFC, 908 regarding the model for when email data was to be converted to 909 canonical form and encoded, and in particular how this process 910 would affect the treatment of CRLFs, given that the 911 representation of newlines varies greatly from system to 912 system, and the relationship between content-transfer- 913 encodings and character sets. A canonical model for encoding 914 is presented in RFC MIME-CONF for this reason. 916 8.7. Quoted-Printable Content-Transfer-Encoding 918 The Quoted-Printable encoding is intended to represent data 919 that largely consists of octets that correspond to printable 920 characters in the US-ASCII character set. It encodes the data 921 in such a way that the resulting octets are unlikely to be 922 modified by mail transport. If the data being encoded are 923 mostly US-ASCII text, the encoded form of the data remains 924 largely recognizable by humans. A body which is entirely US- 925 ASCII may also be encoded in Quoted-Printable to ensure the 926 integrity of the data should the message pass through a 927 character-translating, and/or line-wrapping gateway. 929 In this encoding, octets are to be represented as determined 930 by the following rules: 932 (1) (General 8-bit representation) Any octet, except a CR 933 or LF that is part of a CRLF line break of the 934 canonical (standard) form of the data being encoded, 935 may be represented by an "=" followed by a two digit 936 hexadecimal representation of the octet's value. The 937 digits of the hexadecimal alphabet, for this purpose, 938 are "0123456789ABCDEF". Uppercase letters must be used 939 when sending hexadecimal data, though a robust 940 implementation may choose to recognize lowercase 941 letters on receipt. Thus, for example, the decimal 942 value 12 (US-ASCII form feed) can be represented by 943 "=0C", and the decimal value 61 (US-ASCII EQUAL SIGN) 944 can be represented by "=3D". This rule must be 945 followed except when the following rules allow an 946 alternative encoding. 948 (2) (Literal representation) Octets with decimal values of 949 33 through 60 inclusive, and 62 through 126, inclusive, 950 MAY be represented as the US-ASCII characters which 951 correspond to those octets (EXCLAMATION POINT through 952 LESS THAN, and GREATER THAN through TILDE, 953 respectively). 955 (3) (White Space) Octets with values of 9 and 32 MAY be 956 represented as US-ASCII TAB (HT) and SPACE characters, 957 respectively, but MUST NOT be so represented at the end 958 of an encoded line. Any TAB (HT) or SPACE characters 959 on an encoded line MUST thus be followed on that line 960 by a printable character. In particular, an "=" at the 961 end of an encoded line, indicating a soft line break 962 (see rule #5) may follow one or more TAB (HT) or SPACE 963 characters. It follows that an octet with decimal 964 value 9 or 32 appearing at the end of an encoded line 965 must be represented according to Rule #1. This rule is 966 necessary because some MTAs (Message Transport Agents, 967 programs which transport messages from one user to 968 another, or perform a part of such transfers) are known 969 to pad lines of text with SPACEs, and others are known 970 to remove "white space" characters from the end of a 971 line. Therefore, when decoding a Quoted-Printable 972 body, any trailing white space on a line must be 973 deleted, as it will necessarily have been added by 974 intermediate transport agents. 976 (4) (Line Breaks) A line break in a text body, represented 977 as a CRLF sequence in the text canonical form, must be 978 represented by a (RFC 822) line break, which is also a 979 CRLF sequence, in the Quoted-Printable encoding. Since 980 the canonical representation of media types other than 981 text do not generally include the representation of 982 line breaks as CRLF sequences, no hard line breaks 983 (i.e. line breaks that are intended to be meaningful 984 and to be displayed to the user) should occur in the 985 quoted-printable encoding of such types. Sequences 986 like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely 987 appear in non-text data represented in quoted- 988 printable, of course. 990 Note that many implementations may elect to encode the 991 local representation of various content types directly 992 rather than converting to canonical form first, 993 encoding, and then converting back to local 994 representation. In particular, this may apply to plain 995 text material on systems that use newline conventions 996 other than a CRLF terminator sequence. Such an 997 implementation optimization is permissible, but only 998 when the combined canonicalization-encoding step is 999 equivalent to performing the three steps separately. 1001 (5) (Soft Line Breaks) The Quoted-Printable encoding 1002 REQUIRES that encoded lines be no more than 76 1003 characters long. If longer lines are to be encoded 1004 with the Quoted-Printable encoding, "soft" line breaks 1005 must be used. An equal sign as the last character on a 1006 encoded line indicates such a non-significant ("soft") 1007 line break in the encoded text. 1009 Thus if the "raw" form of the line is a single unencoded line 1010 that says: 1012 Now's the time for all folk to come to the aid of their country. 1014 This can be represented, in the Quoted-Printable encoding, as: 1016 Now's the time = 1017 for all folk to come= 1018 to the aid of their country. 1020 This provides a mechanism with which long lines are encoded in 1021 such a way as to be restored by the user agent. The 76 1022 character limit does not count the trailing CRLF, but counts 1023 all other characters, including any equal signs. 1025 Since the hyphen character ("-") is represented as itself in 1026 the Quoted-Printable encoding, care must be taken, when 1027 encapsulating a quoted-printable encoded body inside one or 1028 more multipart entities, to ensure that the boundary delimiter 1029 does not appear anywhere in the encoded body. (A good 1030 strategy is to choose a boundary that includes a character 1031 sequence such as "=_" which can never appear in a quoted- 1032 printable body. See the definition of multipart messages in 1033 MIME-IMT.) 1035 NOTE: The quoted-printable encoding represents something of a 1036 compromise between readability and reliability in transport. 1037 Bodies encoded with the quoted-printable encoding will work 1038 reliably over most mail gateways, but may not work perfectly 1039 over a few gateways, notably those involving translation into 1040 EBCDIC. A higher level of confidence is offered by the base64 1041 Content-Transfer-Encoding. A way to get reasonably reliable 1042 transport through EBCDIC gateways is to also quote the US- 1043 ASCII characters 1045 !"#$@[\]^`{|}~ 1047 according to rule #1. 1049 Because quoted-printable data is generally assumed to be 1050 line-oriented, it is to be expected that the representation of 1051 the breaks between the lines of quoted printable data may be 1052 altered in transport, in the same manner that plain text mail 1053 has always been altered in Internet mail when passing between 1054 systems with differing newline conventions. If such 1055 alterations are likely to constitute a corruption of the data, 1056 it is probably more sensible to use the base64 encoding rather 1057 than the quoted-printable encoding. 1059 WARNING TO IMPLEMENTORS: If binary data are encoded in 1060 quoted-printable, care must be taken to encode CR and LF 1061 characters as "=0D" and "=0A", respectively. In particular, a 1062 CRLF sequence in binary data should be encoded as "=0D=0A". 1063 Otherwise, if CRLF were represented as a hard line break, it 1064 might be incorrectly decoded on platforms with different line 1065 break conventions. 1067 For formalists, the syntax of quoted-printable data is 1068 described by the following grammar: 1070 quoted-printable := qp-line *(CRLF qp-line) 1072 qp-line := *(qp-segment transport-padding CRLF) 1073 qp-part transport-padding 1075 qp-part := qp-section 1076 ; Maximum length of 76 characters 1078 qp-segment := qp-section *(SPACE / TAB) "=" 1079 ; Maximum length of 76 characters 1081 qp-section := [*(ptext / SPACE / TAB) ptext] 1082 ptext := octet / safe-char 1084 safe-char := 1086 ; Characters not listed as "mail-safe" in 1087 ; RFC MIME-CONF are also not recommended. 1089 octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1090 ; Octet must be used for characters > 127, =, 1091 ; SPACEs or TABs at the ends of lines, and is 1092 ; recommended for any character not listed in 1093 ; RFC MIME-CONF as "mail-safe". 1095 transport-padding := *LWSP-char 1096 ; Composers MUST NOT generate 1097 ; non-zero length transport 1098 ; padding, but receivers MUST 1099 ; be able to handle padding 1100 ; added by message transports. 1102 IMPORTANT NOTE: The addition of LWSP between the elements 1103 shown in this BNF is NOT allowed since this BNF does not 1104 specify a structured header field. 1106 8.8. Base64 Content-Transfer-Encoding 1108 The Base64 Content-Transfer-Encoding is designed to represent 1109 arbitrary sequences of octets in a form that need not be 1110 humanly readable. The encoding and decoding algorithms are 1111 simple, but the encoded data are consistently only about 33 1112 percent larger than the unencoded data. This encoding is 1113 virtually identical to the one used in Privacy Enhanced Mail 1114 (PEM) applications, as defined in RFC 1421 [RFC-1421]. 1116 A 65-character subset of US-ASCII is used, enabling 6 bits to 1117 be represented per printable character. (The extra 65th 1118 character, "=", is used to signify a special processing 1119 function.) 1121 NOTE: This subset has the important property that it is 1122 represented identically in all versions of ISO 646, including 1123 US-ASCII, and all characters in the subset are also 1124 represented identically in all versions of EBCDIC. Other 1125 popular encodings, such as the encoding used by the uuencode 1126 utility and the base85 encoding specified as part of Level 2 1127 PostScript, do not share these properties, and thus do not 1128 fulfill the portability requirements a binary transport 1129 encoding for mail must meet. 1131 The encoding process represents 24-bit groups of input bits as 1132 output strings of 4 encoded characters. Proceeding from left 1133 to right, a 24-bit input group is formed by concatenating 3 1134 8-bit input groups. These 24 bits are then treated as 4 1135 concatenated 6-bit groups, each of which is translated into a 1136 single digit in the base64 alphabet. When encoding a bit 1137 stream via the base64 encoding, the bit stream must be 1138 presumed to be ordered with the most-significant-bit first. 1139 That is, the first bit in the stream will be the high-order 1140 bit in the first 8-bit byte, and the eighth bit will be the 1141 low-order bit in the first 8-bit byte, and so on. 1143 Each 6-bit group is used as an index into an array of 64 1144 printable characters. The character referenced by the index 1145 is placed in the output string. These characters, identified 1146 in Table 1, below, are selected so as to be universally 1147 representable, and the set excludes characters with particular 1148 significance to SMTP (e.g., ".", CR, LF) and to the multipart 1149 boundary delimiters defined in MIME-IMT (e.g., "-"). 1151 Table 1: The Base64 Alphabet 1153 Value Encoding Value Encoding Value Encoding Value Encoding 1154 0 A 17 R 34 i 51 z 1155 1 B 18 S 35 j 52 0 1156 2 C 19 T 36 k 53 1 1157 3 D 20 U 37 l 54 2 1158 4 E 21 V 38 m 55 3 1159 5 F 22 W 39 n 56 4 1160 6 G 23 X 40 o 57 5 1161 7 H 24 Y 41 p 58 6 1162 8 I 25 Z 42 q 59 7 1163 9 J 26 a 43 r 60 8 1164 10 K 27 b 44 s 61 9 1165 11 L 28 c 45 t 62 + 1166 12 M 29 d 46 u 63 / 1167 13 N 30 e 47 v 1168 14 O 31 f 48 w (pad) = 1169 15 P 32 g 49 x 1170 16 Q 33 h 50 y 1172 The encoded output stream must be represented in lines of no 1173 more than 76 characters each. All line breaks or other 1174 characters not found in Table 1 must be ignored by decoding 1175 software. In base64 data, characters other than those in 1176 Table 1, line breaks, and other white space probably indicate 1177 a transmission error, about which a warning message or even a 1178 message rejection might be appropriate under some 1179 circumstances. 1181 Special processing is performed if fewer than 24 bits are 1182 available at the end of the data being encoded. A full 1183 encoding quantum is always completed at the end of a body. 1184 When fewer than 24 input bits are available in an input group, 1185 zero bits are added (on the right) to form an integral number 1186 of 6-bit groups. Padding at the end of the data is performed 1187 using the "=" character. Since all base64 input is an 1188 integral number of octets, only the following cases can arise: 1189 (1) the final quantum of encoding input is an integral 1190 multiple of 24 bits; here, the final unit of encoded output 1191 will be an integral multiple of 4 characters with no "=" 1192 padding, (2) the final quantum of encoding input is exactly 8 1193 bits; here, the final unit of encoded output will be two 1194 characters followed by two "=" padding characters, or (3) the 1195 final quantum of encoding input is exactly 16 bits; here, the 1196 final unit of encoded output will be three characters followed 1197 by one "=" padding character. 1199 Because it is used only for padding at the end of the data, 1200 the occurrence of any "=" characters may be taken as evidence 1201 that the end of the data has been reached (without truncation 1202 in transit). No such assurance is possible, however, when the 1203 number of octets transmitted was a multiple of three and no 1204 "=" characters are present. 1206 Any characters outside of the base64 alphabet are to be 1207 ignored in base64-encoded data. 1209 Care must be taken to use the proper octets for line breaks if 1210 base64 encoding is applied directly to text material that has 1211 not been converted to canonical form. In particular, text 1212 line breaks must be converted into CRLF sequences prior to 1213 base64 encoding. The important thing to note is that this may 1214 be done directly by the encoder rather than in a prior 1215 canonicalization step in some implementations. 1217 NOTE: There is no need to worry about quoting potential 1218 boundary delimiters within base64-encoded parts of multipart 1219 entities because no hyphen characters are used in the base64 1220 encoding. 1222 9. Content-ID Header Field 1224 In constructing a high-level user agent, it may be desirable 1225 to allow one body to make reference to another. Accordingly, 1226 bodies may be labelled using the "Content-ID" header field, 1227 which is syntactically identical to the "Message-ID" header 1228 field: 1230 id := "Content-ID" ":" msg-id 1232 Like the Message-ID values, Content-ID values must be 1233 generated to be world-unique. 1235 The Content-ID value may be used for uniquely identifying MIME 1236 entities in several contexts, particularly for caching data 1237 referenced by the message/external-body mechanism. Although 1238 the Content-ID header is generally optional, its use is 1239 MANDATORY in implementations which generate data of the 1240 optional MIME media type "message/external-body". That is, 1241 each message/external-body entity must have a Content-ID field 1242 to permit caching of such data. 1244 It is also worth noting that the Content-ID value has special 1245 semantics in the case of the multipart/alternative media type. 1246 This is explained in the section of MIME-IMT dealing with 1247 multipart/alternative. 1249 10. Content-Description Header Field 1251 The ability to associate some descriptive information with a 1252 given body is often desirable. For example, it may be useful 1253 to mark an "image" body as "a picture of the Space Shuttle 1254 Endeavor." Such text may be placed in the Content-Description 1255 header field. This header field is always optional. 1257 description := "Content-Description" ":" *text 1259 The description is presumed to be given in the US-ASCII 1260 character set, although the mechanism specified in RFC MIME- 1261 HEADERS [RFC-MIME-HEADERS] may be used for non-US-ASCII 1262 Content-Description values. 1264 11. Additional MIME Header Fields 1266 Future documents may elect to define additional MIME header 1267 fields for various purposes. Any new header field that 1268 further describes the content of a message should begin with 1269 the string "Content-" to allow such fields which appear in a 1270 message header to be distinguished from ordinary RFC 822 1271 message header fields. 1273 MIME-extension-field := 1277 12. Summary 1279 Using the MIME-Version, Content-Type, and Content-Transfer- 1280 Encoding header fields, it is possible to include, in a 1281 standardized way, arbitrary types of data objects with RFC 822 1282 conformant mail messages. No restrictions imposed by either 1283 RFC 821 or RFC 822 are violated, and care has been taken to 1284 avoid problems caused by additional restrictions imposed by 1285 the characteristics of some Internet mail transport mechanisms 1286 (see RFC MIME-CONF). 1288 The next document in this set, RFC MIME-IMT, specifies the 1289 media types that can be labelled and transported using these 1290 headers. 1292 13. Security Considerations 1294 Security issues are discussed in the second document in this 1295 set, RFC MIME-IMT. 1297 14. Authors' Addresses 1299 For more information, the authors of this document are best 1300 contacted via Internet mail: 1302 Nathaniel S. Borenstein 1303 First Virtual Holdings 1304 25 Washington Avenue 1305 Morristown, NJ 07960 1306 USA 1308 Email: nsb@nsb.fv.com 1309 Phone: +1 201 540 8967 1310 Fax: +1 201 993 3032 1312 Ned Freed 1313 Innosoft International, Inc. 1314 1050 East Garvey Avenue South 1315 West Covina, CA 91790 1316 USA 1318 Email: ned@innosoft.com 1319 Phone: +1 818 919 3600 1320 Fax: +1 818 919 3614 1322 MIME is a result of the work of the Internet Engineering Task 1323 Force Working Group on Email Extensions. The chairman of that 1324 group, Greg Vaudreuil, may be reached at: 1326 Gregory M. Vaudreuil 1327 Tigon Corporation 1328 17060 Dallas Parkway 1329 Dallas Texas, 75248 1331 Email: greg.vaudreuil@ons.octel.com 1332 Phone: +1 214 733 2722 1333 Appendix A -- Collected Grammar 1335 This appendix contains the complete BNF grammar for all the 1336 syntax specified by this document. 1338 By itself, however, this grammar is incomplete. It refers to 1339 several entities that are defined by RFC 822. Rather than 1340 reproduce those definitions here, and risk unintentional 1341 differences between the two, this document simply refers the 1342 reader to RFC 822 for the remaining definitions. Wherever a 1343 term is undefined, it refers to the RFC 822 definition. 1345 attribute := token 1347 composite-type := "message" / "multipart" / extension-token 1349 content := "Content-Type" ":" type "/" subtype 1350 *(";" parameter) 1351 ; Matching of media type and subtype 1352 ; is ALWAYS case-insensitive 1354 description := "Content-Description" ":" *text 1356 discrete-type := "text" / "image" / "audio" / "video" / 1357 "application" / extension-token 1359 encoding := "Content-Transfer-Encoding" ":" mechanism 1361 extension-token := iana-token / ietf-token / x-token 1363 iana-token := 1367 ietf-token := 1371 id := "Content-ID" ":" msg-id 1372 mechanism := "7bit" / "8bit" / "binary" / 1373 "quoted-printable" / "base64" / 1374 ietf-token / x-token 1376 octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1377 ; Octet must be used for characters > 127, =, 1378 ; SPACEs or TABs at the ends of lines, and is 1379 ; recommended for any character not listed in 1380 ; RFC MIME-CONF as "mail-safe". 1382 parameter := attribute "=" value 1384 ptext := octet / safe-char 1386 qp-line := *(qp-segment transport-padding CRLF) 1387 qp-part transport-padding 1389 qp-part := qp-section 1390 ; Maximum length of 76 characters 1392 qp-section := [*(ptext / SPACE / TAB) ptext] 1394 qp-segment := qp-section *(SPACE / TAB) "=" 1395 ; Maximum length of 76 characters 1397 quoted-printable := qp-line *(CRLF qp-line) 1399 safe-char := 1401 ; Characters not listed as "mail-safe" in 1402 ; RFC MIME-CONF are also not recommended. 1404 subtype := extension-token 1406 token := 1* 1409 transport-padding := *LWSP-char 1410 ; Composers MUST NOT generate 1411 ; non-zero length transport 1412 ; padding, but receivers MUST 1413 ; be able to handle padding 1414 ; added by message transports. 1416 tspecials := "(" / ")" / "<" / ">" / "@" / 1417 "," / ";" / ":" / "\" / <"> 1418 "/" / "[" / "]" / "?" / "=" 1419 ; Must be in quoted-string, 1420 ; to use within parameter values 1422 type := discrete-type / composite-type 1424 value := token / quoted-string 1426 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 1428 x-token :=