idnits 2.17.00 (12 Aug 2021) /tmp/idnits4501/draft-ietf-822ext-mime-imb-05.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2022-05-20) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 423: '... in accordance with this document MUST...' RFC 2119 keyword, line 953: '... MAY be represented as the US-AS...' RFC 2119 keyword, line 958: '... Octets with values of 9 and 32 MAY be...' RFC 2119 keyword, line 960: '...espectively, but MUST NOT be so repres...' RFC 2119 keyword, line 962: '... an encoded line MUST thus be followed...' (4 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1460 has weird spacing: '... no inter...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 1996) is 9622 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC821' on line 351 looks like a reference -- Missing reference section? 'ATK' on line 149 looks like a reference -- Missing reference section? 'X400' on line 154 looks like a reference -- Missing reference section? 'RFC-1741' on line 1129 looks like a reference Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Nathaniel Borenstein 2 Internet Draft Ned Freed 3 5 Multipurpose Internet Mail Extensions 6 (MIME) Part One: 8 Format of Internet Message Bodies 10 January 1996 12 Status of this Memo 14 This document is an Internet-Draft. Internet-Drafts are 15 working documents of the Internet Engineering Task Force 16 (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months. Internet-Drafts may be updated, replaced, or obsoleted 22 by other documents at any time. It is not appropriate to use 23 Internet-Drafts as reference material or to cite them other 24 than as a "working draft" or "work in progress". 26 To learn the current status of any Internet-Draft, please 27 check the 1id-abstracts.txt listing contained in the 28 Internet-Drafts Shadow Directories on ds.internic.net (US East 29 Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), 30 or munnari.oz.au (Pacific Rim). 32 1. Abstract 34 STD 11, RFC 822, defines a message representation protocol 35 specifying considerable detail about US-ASCII message headers, 36 and leaves the message content, or message body, as flat US- 37 ASCII text. This set of documents, collectively called the 38 Multipurpose Internet Mail Extensions, or MIME, redefines the 39 format of messages to allow for 40 (1) textual message bodies in character sets other than 41 US-ASCII, 43 (2) non-textual message bodies, 45 (3) multi-part message bodies, and 47 (4) textual header information in character sets other than 48 US-ASCII. 50 These documents are based on earlier work documented in RFC 51 934, STD 11, and RFC 1049, but extends and revises them. 52 Because RFC 822 said so little about message bodies, these 53 documents are largely orthogonal to (rather than a revision 54 of) RFC 822. 56 In particular, these documents are designed to provide 57 facilities to include multiple parts in a single message, to 58 represent body and header text in character sets other than 59 US-ASCII, to represent formatted multi-font text messages, to 60 represent non-textual material such as images and audio clips, 61 and generally to facilitate later extensions defining new 62 types of Internet mail for use by cooperating mail agents. 64 This initial document specifies the various headers used to 65 describe the structure of MIME messages. The second document, 66 RFC MIME-IMT, defines the general structure of the MIME media 67 typing system and defines an initial set of media types. The 68 third document, RFC MIME-HEADERS, describes extensions to RFC 69 822 to allow non-US-ASCII text data in Internet mail header 70 fields. The fourth document, RFC MIME-REG, specifies various 71 IANA registration procedures for MIME-related facilities. The 72 fifth and final document, RFC MIME-CONF, describes MIME 73 conformance criteria as well as providing some illustrative 74 examples of MIME message formats, acknowledgements, and the 75 bibliography. 77 These documents are revisions of RFCs 1521, 1522, and 1590, 78 which themselves were revisions of RFCs 1341 and 1342. An 79 appendix in RFC MIME-CONF describes differences and changes 80 from previous versions. 82 2. Table of Contents 84 1 Abstract .............................................. 1 85 2 Table of Contents ..................................... 3 86 3 Introduction .......................................... 4 87 4 Definitions, Conventions, and Generic BNF Grammar ..... 6 88 4.1 CRLF ................................................ 7 89 4.2 Character Set ....................................... 7 90 4.3 Message ............................................. 8 91 4.4 Entity .............................................. 8 92 4.5 Body Part ........................................... 8 93 4.6 Body ................................................ 8 94 4.7 7bit Data ........................................... 9 95 4.8 8bit Data ........................................... 9 96 4.9 Binary Data ......................................... 9 97 4.10 Lines .............................................. 9 98 5 MIME Header Fields .................................... 9 99 6 MIME-Version Header Field ............................. 10 100 7 Content-Type Header Field ............................. 12 101 7.1 Syntax of the Content-Type Header Field ............. 14 102 7.2 Content-Type Defaults ............................... 16 103 8 Content-Transfer-Encoding Header Field ................ 17 104 8.1 Content-Transfer-Encoding Syntax .................... 17 105 8.2 Content-Transfer-Encodings Sematics ................. 17 106 8.3 New Content-Transfer-Encodings ...................... 19 107 8.4 Interpretation and Use .............................. 19 108 8.5 Translating Encodings ............................... 21 109 8.6 Canonical Encoding Model ............................ 22 110 8.7 Quoted-Printable Content-Transfer-Encoding .......... 22 111 8.8 Base64 Content-Transfer-Encoding .................... 26 112 9 Content-ID Header Field ............................... 29 113 10 Content-Description Header Field ..................... 30 114 11 Additional MIME Header Fields ........................ 30 115 12 Summary .............................................. 30 116 13 Security Considerations .............................. 31 117 14 Authors' Addresses ................................... 32 118 A Collected Grammar ..................................... 33 119 3. Introduction 121 Since its publication in 1982, RFC 822 has defined the 122 standard format of textual mail messages on the Internet. Its 123 success has been such that the RFC 822 format has been 124 adopted, wholly or partially, well beyond the confines of the 125 Internet and the Internet SMTP transport defined by RFC 821. 126 As the format has seen wider use, a number of limitations have 127 proven increasingly restrictive for the user community. 129 RFC 822 was intended to specify a format for text messages. 130 As such, non-text messages, such as multimedia messages that 131 might include audio or images, are simply not mentioned. Even 132 in the case of text, however, RFC 822 is inadequate for the 133 needs of mail users whose languages require the use of 134 character sets richer than US-ASCII. Since RFC 822 does not 135 specify mechanisms for mail containing audio, video, Asian 136 language text, or even text in most European languages, 137 additional specifications are needed. 139 One of the notable limitations of RFC 821/822 based mail 140 systems is the fact that they limit the contents of electronic 141 mail messages to relatively short lines (e.g. 1000 characters 142 or less [RFC821]) of 7bit US-ASCII. This forces users to 143 convert any non-textual data that they may wish to send into 144 seven-bit bytes representable as printable US-ASCII characters 145 before invoking a local mail UA (User Agent, a program with 146 which human users send and receive mail). Examples of such 147 encodings currently used in the Internet include pure 148 hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in 149 RFC 1421, the Andrew Toolkit Representation [ATK], and many 150 others. 152 The limitations of RFC 822 mail become even more apparent as 153 gateways are designed to allow for the exchange of mail 154 messages between RFC 822 hosts and X.400 hosts. X.400 [X400] 155 specifies mechanisms for the inclusion of non-textual material 156 within electronic mail messages. The current standards for 157 the mapping of X.400 messages to RFC 822 messages specify 158 either that X.400 non-textual material must be converted to 159 (not encoded in) IA5Text format, or that they must be 160 discarded, notifying the RFC 822 user that discarding has 161 occurred. This is clearly undesirable, as information that a 162 user may wish to receive is lost. Even though a user agent 163 may not have the capability of dealing with the non-textual 164 material, the user might have some mechanism external to the 165 UA that can extract useful information from the material. 166 Moreover, it does not allow for the fact that the message may 167 eventually be gatewayed back into an X.400 message handling 168 system (i.e., the X.400 message is "tunneled" through Internet 169 mail), where the non-textual information would definitely 170 become useful again. 172 This document describes several mechanisms that combine to 173 solve most of these problems without introducing any serious 174 incompatibilities with the existing world of RFC 822 mail. In 175 particular, it describes: 177 (1) A MIME-Version header field, which uses a version 178 number to declare a message to be conformant with this 179 specification and allows mail processing agents to 180 distinguish between such messages and those generated 181 by older or non-conformant software, which are presumed 182 to lack such a field. 184 (2) A Content-Type header field, generalized from RFC 1049, 185 which can be used to specify the media type and subtype 186 of data in the body of a message and to fully specify 187 the native representation (canonical form) of such 188 data. 190 (3) A Content-Transfer-Encoding header field, which can be 191 used to specify both the encoding transformation that 192 was applied to the body and the domain of the result. 193 Encoding transformations other than the identity 194 transformation are usually applied to data in order to 195 allow it to pass through mail transport mechanisms 196 which may have data or character set limitations. 198 (4) Two additional header fields that can be used to 199 further describe the data in a body, the Content-ID and 200 Content-Description header fields. 202 All of the header fields defined in this document are subject 203 to the general syntactic rules for header fields specified in 204 RFC 822. In particular, all of these header fields except for 205 Content-Disposition can include RFC 822 comments, which have 206 no semantic content and should be ignored during MIME 207 processing. 209 Finally, to specify and promote interoperability, RFC MIME- 210 CONF provides a basic applicability statement for a subset of 211 the above mechanisms that defines a minimal level of 212 "conformance" with this document. 214 HISTORICAL NOTE: Several of the mechanisms described in this 215 set of documents may seem somewhat strange or even baroque at 216 first reading. It is important to note that compatibility 217 with existing standards AND robustness across existing 218 practice were two of the highest priorities of the working 219 group that developed this set of documents. In particular, 220 compatibility was always favored over elegance. 222 Please refer to the current edition of the "IAB Official 223 Protocol Standards" for the standardization state and status 224 of this protocol. RFC 822 and RFC 1123 also provide 225 essential background for MIME since no conforming 226 implementation of MIME can violate them. In addition, several 227 other informational RFC documents will be of interest to the 228 MIME implementor, in particular RFC 1344, RFC 1345, and RFC 229 1524. 231 4. Definitions, Conventions, and Generic BNF Grammar 233 Although the mechanisms specified in this set of documents are 234 all described in prose, most are also described formally in 235 the augmented BNF notation of RFC 822. Implementors will need 236 to be familiar with this notation in order to understand this 237 specification, and are referred to RFC 822 for a complete 238 explanation of the augmented BNF notation. 240 Some of the augmented BNF in this set of documents makes named 241 references to syntax rules defined in RFC 822. A complete 242 formal grammar, then, is obtained by combining the collected 243 grammar appendices in each document in this set with the BNF 244 of RFC 822 plus the modifications to RFC 822 defined in RFC 245 1123 (which specifically changes the syntax for `return', 246 `date' and `mailbox'). 248 All numeric and octet values are given in decimal notation in 249 this set of documents. All media type values, subtype values, 250 and parameter names as defined are case-insensitive. However, 251 parameter values are case-sensitive unless otherwise specified 252 for the specific parameter. 254 FORMATTING NOTE: Notes, such at this one, provide additional 255 nonessential information which may be skipped by the reader 256 without missing anything essential. The primary purpose of 257 these non-essential notes is to convey information about the 258 rationale of this set of documents, or to place these 259 documents in the proper historical or evolutionary context. 260 Such information may in particular be skipped by those who are 261 focused entirely on building a conformant implementation, but 262 may be of use to those who wish to understand why certain 263 design choices were made. 265 4.1. CRLF 267 The term CRLF, in this set of documents, refers to the 268 sequence of octets corresponding to the two US-ASCII 269 characters CR (decimal value 13) and LF (decimal value 10) 270 which, taken together, in this order, denote a line break in 271 RFC 822 mail. 273 4.2. Character Set 275 The term "character set" is used in MIME to refer to a method 276 of converting a sequence of octets into a sequence of 277 characters. Note that unconditional and unambiguous 278 conversion in the other direction is not required, in that not 279 all characters may be representable by a given character set 280 and a character set may provide more than one sequence of 281 octets to represent a particular sequence of characters. 283 This definition is intended to allow various kinds of 284 character encodings, from simple single-table mappings such as 285 US-ASCII to complex table switching methods such as those that 286 use ISO 2022's techniques. However, the definition associated 287 with a MIME character set name must fully specify the mapping 288 to be performed. In particular, use of external profiling 289 information to determine the exact mapping is not permitted. 291 NOTE: The term "character set" was originally used in MIME 292 with specifications such as US-ASCII and other 7bit and 8bit 293 schemes which have a simple mapping from single octets to 294 single characters. Multi-octet coded character sets and 295 switching techniques make the situation more complex. For 296 example, some communities use the term "character encoding" 297 for what MIME calls a "character set", while using the phrase 298 "coded character set" to denote an abstract mapping from 299 integers (not octets) to characters. 301 4.3. Message 303 The term "message", when not further qualified, means either a 304 (complete or "top-level") RFC 822 message being transferred on 305 a network, or a message encapsulated in a body of type 306 "message/rfc822" or "message/partial". 308 4.4. Entity 310 The term "entity", refers specifically to the MIME-defined 311 header fields and contents of either a message or one of the 312 parts in the body of a multipart entity. The specification of 313 such entities is the essence of MIME. Since the contents of 314 an entity are often called the "body", it makes sense to speak 315 about the body of an entity. Any sort of field may be present 316 in the header of an entity, but only those fields whose names 317 begin with "content-" actually have any MIME-related meaning. 318 Note that this does NOT imply thay they have no meaning at all 319 -- an entity that is also a message has non-MIME header fields 320 whose meanings are defined by RFC 822. 322 4.5. Body Part 324 The term "body part" refers to an entity inside of a multipart 325 entity. 327 4.6. Body 329 The term "body", when not further qualified, means the body of 330 an entity, that is, the body of either a message or of a body 331 part. 333 NOTE: The previous four definitions are clearly circular. 334 This is unavoidable, since the overall structure of a MIME 335 message is indeed recursive. 337 4.7. 7bit Data 339 "7bit data" refers to data that is all represented as 340 relatively short lines with 998 octets or less between CRLF 341 line separation sequences [RFC821]. No octets with decimal 342 values greater than 127 are allowed and neither are NULs 343 (octets with decimal value 0). CR (decimal value 13) and LF 344 (decimal value 10) octets only occur as part of CRLF line 345 separation sequences. 347 4.8. 8bit Data 349 "8bit data" refers to data that is all represented as 350 relatively short lines with 998 octets or less between CRLF 351 line separation sequences [RFC821]), but octets with decimal 352 values greater than 127 may be used. As with "7bit data" CR 353 and LF octets only occur as part of CRLF line separation 354 sequences and no NULs are allowed. 356 4.9. Binary Data 358 "Binary data" refers to data where any sequence of octets 359 whatsoever is allowed. 361 4.10. Lines 363 "Lines" are defined as sequences of octets separated by a CRLF 364 sequences. This is consistent with both RFC 821 and RFC 822. 365 "Lines" only refers to a unit of data in a message, which may 366 or may not correspond to something that is actually displayed 367 by a user agent. 369 5. MIME Header Fields 371 MIME defines a number of new RFC 822 header fields that are 372 used to describe the content of a MIME entity. These header 373 fields occur in at least two contexts: 375 (1) As part of a regular RFC 822 message header. 377 (2) In a MIME body part header within a multipart 378 construct. 380 The formal definition of these header fields is as follows: 382 entity-headers := [ content CRLF ] 383 [ encoding CRLF ] 384 [ id CRLF ] 385 [ description CRLF ] 386 *( MIME-extension-field CRLF ) 388 MIME-message-headers := entity-headers 389 fields 390 version CRLF 391 ; The ordering of the header 392 ; fields implied by this BNF 393 ; definition should be ignored. 395 MIME-part-headers := entity-headers 396 [ fields ] 397 ; Any field not beginning with 398 ; "content-" can have no defined 399 ; meaning and may be ignored. 400 ; The ordering of the header 401 ; fields implied by this BNF 402 ; definition should be ignored. 404 The syntax of the various specific MIME header fields will be 405 described in the following sections. 407 6. MIME-Version Header Field 409 Since RFC 822 was published in 1982, there has really been 410 only one format standard for Internet messages, and there has 411 been little perceived need to declare the format standard in 412 use. This document is an independent document that 413 complements RFC 822. Although the extensions in this document 414 have been defined in such a way as to be compatible with RFC 415 822, there are still circumstances in which it might be 416 desirable for a mail-processing agent to know whether a 417 message was composed with the new standard in mind. 419 Therefore, this document defines a new header field, "MIME- 420 Version", which is to be used to declare the version of the 421 Internet message body format standard in use. 423 Messages composed in accordance with this document MUST 424 include such a header field, with the following verbatim text: 426 MIME-Version: 1.0 428 The presence of this header field is an assertion that the 429 message has been composed in compliance with this document. 431 Since it is possible that a future document might extend the 432 message format standard again, a formal BNF is given for the 433 content of the MIME-Version field: 435 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 437 Thus, future format specifiers, which might replace or extend 438 "1.0", are constrained to be two integer fields, separated by 439 a period. If a message is received with a MIME-version value 440 other than "1.0", it cannot be assumed to conform with this 441 specification. 443 Note that the MIME-Version header field is required at the top 444 level of a message. It is not required for each body part of 445 a multipart entity. It is required for the embedded headers 446 of a body of type "message/rfc822" or "message/partial" if and 447 only if the embedded message is itself claimed to be MIME- 448 conformant. 450 It is not possible to fully specify how a mail reader that 451 conforms with MIME as defined in this document should treat a 452 message that might arrive in the future with some value of 453 MIME-Version other than "1.0". 455 It is also worth noting that version control for specific 456 media types is not accomplished using the MIME-Version 457 mechanism. In particular, some formats (such as 458 application/postscript) have version numbering conventions 459 that are internal to the media format. Where such conventions 460 exist, MIME does nothing to supersede them. Where no such 461 conventions exist, a MIME media type might use a "version" 462 parameter in the content-type field if necessary. 464 NOTE TO IMPLEMENTORS: When checking MIME-Version values any 465 RFC 822 comment strings that are present must be ignored. In 466 particular, the following four MIME-Version fields are 467 equivalent: 469 MIME-Version: 1.0 471 MIME-Version: 1.0 (produced by MetaSend Vx.x) 473 MIME-Version: (produced by MetaSend Vx.x) 1.0 475 MIME-Version: 1.(produced by MetaSend Vx.x)0 477 In the absence of a MIME-Version field, a receiving mail user 478 agent (whether conforming to MIME requirements or not) may 479 optionally choose to interpret the body of the message 480 according to local conventions. Many such conventions are 481 currently in use and it should be noted that in practice non- 482 MIME messages can contain just about anything. 484 It is impossible to be certain that a non-MIME mail message is 485 actually plain text in the US-ASCII character set since it 486 might well be a message that, using some set of nonstandard 487 local conventions that predate this document, includes text in 488 another character set or non-textual data presented in a 489 manner that cannot be automatically recognized (e.g., a 490 uuencoded compressed UNIX tar file). 492 7. Content-Type Header Field 494 The purpose of the Content-Type field is to describe the data 495 contained in the body fully enough that the receiving user 496 agent can pick an appropriate agent or mechanism to present 497 the data to the user, or otherwise deal with the data in an 498 appropriate manner. The value in this field is called a media 499 type. 501 HISTORICAL NOTE: The Content-Type header field was first 502 defined in RFC 1049. RFC 1049 used a simpler and less 503 powerful syntax, but one that is largely compatible with the 504 mechanism given here. 506 The Content-Type header field specifies the nature of the data 507 in the body of an entity by giving media type and subtype 508 identifiers, and by providing auxiliary information that may 509 be required for certain media types. After the media type and 510 subtype names, the remainder of the header field is simply a 511 set of parameters, specified in an attribute=value notation. 512 The ordering of parameters is not significant. 514 In general, the top-level media type is used to declare the 515 general type of data, while the subtype specifies a specific 516 format for that type of data. Thus, a media type of 517 "image/xyz" is enough to tell a user agent that the data is an 518 image, even if the user agent has no knowledge of the specific 519 image format "xyz". Such information can be used, for 520 example, to decide whether or not to show a user the raw data 521 from an unrecognized subtype -- such an action might be 522 reasonable for unrecognized subtypes of text, but not for 523 unrecognized subtypes of image or audio. For this reason, 524 registered subtypes of text, image, audio, and video should 525 not contain embedded information that is really of a different 526 type. Such compound formats should be represented using the 527 "multipart" or "application" types. 529 Parameters are modifiers of the media subtype, and as such do 530 not fundamentally affect the nature of the content. The set 531 of meaningful parameters depends on the media type and 532 subtype. Most parameters are associated with a single 533 specific subtype. However, a given top-level media type may 534 define parameters which are applicable to any subtype of that 535 type. Parameters may be required by their defining content 536 type or subtype or they may be optional. MIME implementations 537 must ignore any parameters whose names they do not recognize. 539 For example, the "charset" parameter is applicable to any 540 subtype of "text", while the "boundary" parameter is required 541 for any subtype of the "multipart" media type. 543 There are NO globally-meaningful parameters that apply to all 544 media types. Truly global mechanisms are best addressed, in 545 the MIME model, by the definition of additional Content-* 546 header fields. 548 An initial set of seven top-level media types is defined in 549 MIME-IMT. Five of these are discrete types whose content is 550 essentially opaque as far as MIME processing is concerned. 551 The remaining two are composite types whose contents require 552 additional handling by MIME processors. 554 This set of top-level media types is intended to be 555 substantially complete. It is expected that additions to the 556 larger set of supported types can generally be accomplished by 557 the creation of new subtypes of these initial types. In the 558 future, more top-level types may be defined only by a 559 standards-track extension to this standard. If another top- 560 level type is to be used for any reason, it must be given a 561 name starting with "X-" to indicate its non-standard status 562 and to avoid a potential conflict with a future official name. 564 7.1. Syntax of the Content-Type Header Field 566 In the Augmented BNF notation of RFC 822, a Content-Type 567 header field value is defined as follows: 569 content := "Content-Type" ":" type "/" subtype 570 *(";" parameter) 571 ; Matching of media type and subtype 572 ; is ALWAYS case-insensitive. 574 type := discrete-type / composite-type 576 discrete-type := "text" / "image" / "audio" / "video" / 577 "application" / extension-token 579 composite-type := "message" / "multipart" / extension-token 581 extension-token := ietf-token / x-token 583 ietf-token := 587 x-token := 590 subtype := extension-token / iana-token 592 iana-token := 596 parameter := attribute "=" value 597 attribute := token 598 ; Matching of attributes 599 ; is ALWAYS case-insensitive. 601 value := token / quoted-string 603 token := 1* 606 tspecials := "(" / ")" / "<" / ">" / "@" / 607 "," / ";" / ":" / "\" / <"> 608 "/" / "[" / "]" / "?" / "=" 609 ; Must be in quoted-string, 610 ; to use within parameter values 612 Note that the definition of "tspecials" is the same as the RFC 613 822 definition of "specials" with the addition of the three 614 characters "/", "?", and "=", and the removal of ".". 616 Note also that a subtype specification is MANDATORY -- it may 617 not be omitted from a Content-Type header field. As such, 618 there are no default subtypes. 620 The type, subtype, and parameter names are not case sensitive. 621 For example, TEXT, Text, and TeXt are all equivalent top-level 622 media types. Parameter values are normally case sensitive, 623 but sometimes are interpreted in a case-insensitive fashion, 624 depending on the intended use. (For example, multipart 625 boundaries are case-sensitive, but the "access-type" parameter 626 for message/External-body is not case-sensitive.) 628 Note that the value of a quoted string parameter does not 629 include the quotes. That is, the quotation marks in a 630 quoted-string are not a part of the value of the parameter, 631 but are merely used to delimit that parameter value. In 632 addition, comments are allowed in accordance with RFC 822 633 rules for structured header fields. Thus the following two 634 forms 636 Content-type: text/plain; charset=us-ascii (Plain text) 638 Content-type: text/plain; charset="us-ascii" 640 are completely equivalent. 642 Beyond this syntax, the only syntactic constraint on the 643 definition of subtype names is the desire that their uses must 644 not conflict. That is, it would be undesirable to have two 645 different communities using "Content-Type: application/foobar" 646 to mean two different things. The process of defining new 647 media subtypes, then, is not intended to be a mechanism for 648 imposing restrictions, but simply a mechanism for publicizing 649 their definition and usage. There are, therefore, two 650 acceptable mechanisms for defining new media subtypes: 652 (1) Private values (starting with "X-") may be defined 653 bilaterally between two cooperating agents without 654 outside registration or standardization. Such values 655 cannot be registered or standardized. 657 (2) New standard values should be registered with IANA as 658 described in RFC MIME-REG. 660 The second document in this set, RFC MIME-IMT, defines the 661 initial set of media types for MIME. 663 7.2. Content-Type Defaults 665 Default RFC 822 messages without a MIME Content-Type header 666 are taken by this protocol to be plain text in the US-ASCII 667 character set, which can be explicitly specified as: 669 Content-type: text/plain; charset=us-ascii 671 This default is assumed if no Content-Type header field is 672 specified. It is also recommend that this default be assumed 673 when a syntactically invalid Content-Type header field is 674 encountered. In the presence of a MIME-Version header field 675 and the absence of any Content-Type header field, a receiving 676 User Agent can also assume that plain US-ASCII text was the 677 sender's intent. Plain US-ASCII text may still be assumed in 678 the absence of a MIME-Version or the presence of an 679 syntactically invalid Content-Type header field, but the 680 sender's intent might have been otherwise. 682 8. Content-Transfer-Encoding Header Field 684 Many media types which could be usefully transported via email 685 are represented, in their "natural" format, as 8bit character 686 or binary data. Such data cannot be transmitted over some 687 transfer protocols. For example, RFC 821 (SMTP) restricts 688 mail messages to 7bit US-ASCII data with lines no longer than 689 1000 characters including any trailing CRLF line separator. 691 It is necessary, therefore, to define a standard mechanism for 692 encoding such data into a 7bit short line format. Proper 693 labelling of unencoded material in less restrictive formats 694 for direct use over less restrictive transports is also 695 desireable. This document specifies that such encodings will 696 be indicated by a new "Content-Transfer-Encoding" header 697 field. This field has not been defined by any previous 698 standard. 700 8.1. Content-Transfer-Encoding Syntax 702 The Content-Transfer-Encoding field's value is a single token 703 specifying the type of encoding, as enumerated below. 704 Formally: 706 encoding := "Content-Transfer-Encoding" ":" mechanism 708 mechanism := "7bit" / "8bit" / "binary" / 709 "quoted-printable" / "base64" / 710 ietf-token / x-token 712 These values are not case sensitive -- Base64 and BASE64 and 713 bAsE64 are all equivalent. An encoding type of 7BIT requires 714 that the body is already in a 7bit mail-ready representation. 715 This is the default value -- that is, "Content-Transfer- 716 Encoding: 7BIT" is assumed if the Content-Transfer-Encoding 717 header field is not present. 719 8.2. Content-Transfer-Encodings Sematics 721 This single Content-Transfer-Encoding token actually provides 722 two pieces of information. It specifies what sort of encoding 723 transformation the body was subjected to, and it specifies 724 what the domain of the result is. 726 Three transformations are currently defined: identity, the 727 "quoted-printable" encoding, and the "base64" encoding. The 728 domains are "binary", "8bit" and "7bit". 730 The Content-Transfer-Encoding values "7bit", "8bit", and 731 "binary" all mean that the identity (i.e. NO) encoding 732 transformation has been performed. As such, they serve simply 733 as indicators of the domain of the body data, and provide 734 useful information about the sort of encoding that might be 735 needed for transmission in a given transport system. The 736 terms "7bit data", "8bit data", and "binary data" are all 737 defined in Section 4. 739 The quoted-printable and base64 encodings transform their 740 input from an arbitrary domain into material in the "7bit" 741 range, thus making it safe to carry over restricted 742 transports. The specific definition of the transformations 743 are given below. 745 The proper Content-Transfer-Encoding label must always be 746 used. Labelling unencoded data containing 8bit characters as 747 "7bit" is not allowed, nor is labelling unencoded non-line- 748 oriented data as anything other than "binary" allowed. 750 Unlike media subtypes, a proliferation of Content-Transfer- 751 Encoding values is both undesirable and unnecessary. However, 752 establishing only a single transformation into the "7bit" 753 domain does not seem possible. There is a tradeoff between 754 the desire for a compact and efficient encoding of largely- 755 binary data and the desire for a readable encoding of data 756 that is mostly, but not entirely, 7bit. For this reason, at 757 least two encoding mechanisms are necessary: a "readable" 758 encoding (quoted-printable) and a "dense" encoding (base64). 760 Mail transport for unencoded 8bit data is defined in RFC 1652. 761 As of the initial publication of this document, there are no 762 standardized Internet mail transports for which it is 763 legitimate to include unencoded binary data in mail bodies. 764 Thus there are no circumstances in which the "binary" 765 Content-Transfer-Encoding is actually valid in Internet mail. 766 However, in the event that binary mail transport becomes a 767 reality in Internet mail, or when this document is used in 768 conjunction with any other binary-capable transport mechanism, 769 binary bodies should be labelled as such using this mechanism. 771 NOTE: The five values defined for the Content-Transfer- 772 Encoding field imply nothing about the media type other than 773 the algorithm by which it was encoded or the transport system 774 requirements if unencoded. 776 8.3. New Content-Transfer-Encodings 778 Implementors may, if necessary, define private Content- 779 Transfer-Encoding values, but must use an x-token, which is a 780 name prefixed by "X-", to indicate its non-standard status, 781 e.g., "Content-Transfer-Encoding: x-my-new-encoding". 782 Additional standardized Content-Transfer-Encoding values must 783 be specified by a standards-track RFC. Additional 784 requirements such specifications must meet are given in RFC 785 REG. As such, all content-transfer-encoding namespace except 786 that beginning with "X-" is explicitly reserved to the IETF 787 for future use. 789 Unlike media types and subtypes, the creation of new Content- 790 Transfer-Encoding values is STRONGLY discouraged, as it seems 791 likely to hinder interoperability with little potential 792 benefit 794 8.4. Interpretation and Use 796 If a Content-Transfer-Encoding header field appears as part of 797 a message header, it applies to the entire body of that 798 message. If a Content-Transfer-Encoding header field appears 799 as part of an entity's headers, it applies only to the body of 800 that entity. If an entity is of type "multipart" the 801 Content-Transfer-Encoding is not permitted to have any value 802 other than "7bit", "8bit" or "binary". Even more severe 803 restrictions apply to some subtypes of the "message" type. 805 It should be noted that most media types are defined in terms 806 of octets rather than bits, so that the mechanisms described 807 here are mechanisms for encoding arbitrary octet streams, not 808 bit streams. If a bit stream is to be encoded via one of 809 these mechanisms, it must first be converted to an 8bit byte 810 stream using the network standard bit order ("big-endian"), in 811 which the earlier bits in a stream become the higher-order 812 bits in a 8bit byte. A bit stream not ending at an 8bit 813 boundary must be padded with zeroes. RFC MIME-IMT provides a 814 mechanism for noting the addition of such padding in the case 815 of the application/octet-stream media type, which has a 816 "padding" parameter. 818 The encoding mechanisms defined here explicitly encode all 819 data in US-ASCII. Thus, for example, suppose an entity has 820 header fields such as: 822 Content-Type: text/plain; charset=ISO-8859-1 823 Content-transfer-encoding: base64 825 This must be interpreted to mean that the body is a base64 826 US-ASCII encoding of data that was originally in ISO-8859-1, 827 and will be in that character set again after decoding. 829 Certain Content-Transfer-Encoding values may only be used on 830 certain media types. In particular, it is EXPRESSLY FORBIDDEN 831 to use any encodings other than "7bit", "8bit", or "binary" 832 with any composite media type, i.e. one that recursively 833 includes other Content-Type fields. Currently the only 834 composite media types are "multipart" and "message". All 835 encodings that are desired for bodies of type multipart or 836 message must be done at the innermost level, by encoding the 837 actual body that needs to be encoded. 839 It should also be noted that, by definition, if a composite 840 entity has a transfer-encoding value such as "7bit", but one 841 of the enclosed entities has a less restrictive value such as 842 "8bit", then either the outer "7bit" labelling is in error, 843 because 8bit data are included, or the inner "8bit" labelling 844 placed an unnecessarily high demand on the transport system 845 because the actual included data were actually 7bit-safe. 847 NOTE ON ENCODING RESTRICTIONS: Though the prohibition against 848 using content-transfer-encodings on composite body data may 849 seem overly restrictive, it is necessary to prevent nested 850 encodings, in which data are passed through an encoding 851 algorithm multiple times, and must be decoded multiple times 852 in order to be properly viewed. Nested encodings add 853 considerable complexity to user agents: Aside from the 854 obvious efficiency problems with such multiple encodings, they 855 can obscure the basic structure of a message. In particular, 856 they can imply that several decoding operations are necessary 857 simply to find out what types of bodies a message contains. 859 Banning nested encodings may complicate the job of certain 860 mail gateways, but this seems less of a problem than the 861 effect of nested encodings on user agents. 863 Any entity with an unrecognized Content-Transfer-Encoding must 864 be treated as if it has a Content-Type of "application/octet- 865 stream", regardless of what the Content-Type header field 866 actually says. 868 NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT- 869 TRANSFER-ENCODING: It may seem that the Content-Transfer- 870 Encoding could be inferred from the characteristics of the 871 media that is to be encoded, or, at the very least, that 872 certain Content-Transfer-Encodings could be mandated for use 873 with specific media types. There are several reasons why this 874 is not the case. First, given the varying types of transports 875 used for mail, some encodings may be appropriate for some 876 combinations of media types and transports but not for others. 877 (For example, in an 8bit transport, no encoding would be 878 required for text in certain character sets, while such 879 encodings are clearly required for 7bit SMTP.) 881 Second, certain media types may require different types of 882 transfer encoding under different circumstances. For example, 883 many PostScript bodies might consist entirely of short lines 884 of 7bit data and hence require no encoding at all. Other 885 PostScript bodies (especially those using Level 2 PostScript's 886 binary encoding mechanism) may only be reasonably represented 887 using a binary transport encoding. Finally, since the 888 Content-Type field is intended to be an open-ended 889 specification mechanism, strict specification of an 890 association between media types and encodings effectively 891 couples the specification of an application protocol with a 892 specific lower-level transport. This is not desirable since 893 the developers of a media type should not have to be aware of 894 all the transports in use and what their limitations are. 896 8.5. Translating Encodings 898 The quoted-printable and base64 encodings are designed so that 899 conversion between them is possible. The only issue that 900 arises in such a conversion is the handling of hard line 901 breaks. When converting from quoted-printable to base64 a 902 hard line break must be converted into a CRLF sequence. 904 Similarly, a CRLF sequence in base64 data must be converted to 905 a quoted-printable hard line break, but ONLY when converting 906 text data. 908 8.6. Canonical Encoding Model 910 There was some confusion, in the previous versions of this 911 RFC, regarding the model for when email data was to be 912 converted to canonical form and encoded, and in particular how 913 this process would affect the treatment of CRLFs, given that 914 the representation of newlines varies greatly from system to 915 system, and the relationship between content-transfer- 916 encodings and character sets. A canonical model for encoding 917 is presented in RFC MIME-CONF for this reason. 919 8.7. Quoted-Printable Content-Transfer-Encoding 921 The Quoted-Printable encoding is intended to represent data 922 that largely consists of octets that correspond to printable 923 characters in the US-ASCII character set. It encodes the data 924 in such a way that the resulting octets are unlikely to be 925 modified by mail transport. If the data being encoded are 926 mostly US-ASCII text, the encoded form of the data remains 927 largely recognizable by humans. A body which is entirely US- 928 ASCII may also be encoded in Quoted-Printable to ensure the 929 integrity of the data should the message pass through a 930 character-translating, and/or line-wrapping gateway. 932 In this encoding, octets are to be represented as determined 933 by the following rules: 935 (1) (General 8bit representation) Any octet, except a CR or 936 LF that is part of a CRLF line break of the canonical 937 (standard) form of the data being encoded, may be 938 represented by an "=" followed by a two digit 939 hexadecimal representation of the octet's value. The 940 digits of the hexadecimal alphabet, for this purpose, 941 are "0123456789ABCDEF". Uppercase letters must be used 942 when sending hexadecimal data, though a robust 943 implementation may choose to recognize lowercase 944 letters on receipt. Thus, for example, the decimal 945 value 12 (US-ASCII form feed) can be represented by 946 "=0C", and the decimal value 61 (US-ASCII EQUAL SIGN) 947 can be represented by "=3D". This rule must be 948 followed except when the following rules allow an 949 alternative encoding. 951 (2) (Literal representation) Octets with decimal values of 952 33 through 60 inclusive, and 62 through 126, inclusive, 953 MAY be represented as the US-ASCII characters which 954 correspond to those octets (EXCLAMATION POINT through 955 LESS THAN, and GREATER THAN through TILDE, 956 respectively). 958 (3) (White Space) Octets with values of 9 and 32 MAY be 959 represented as US-ASCII TAB (HT) and SPACE characters, 960 respectively, but MUST NOT be so represented at the end 961 of an encoded line. Any TAB (HT) or SPACE characters 962 on an encoded line MUST thus be followed on that line 963 by a printable character. In particular, an "=" at the 964 end of an encoded line, indicating a soft line break 965 (see rule #5) may follow one or more TAB (HT) or SPACE 966 characters. It follows that an octet with decimal 967 value 9 or 32 appearing at the end of an encoded line 968 must be represented according to Rule #1. This rule is 969 necessary because some MTAs (Message Transport Agents, 970 programs which transport messages from one user to 971 another, or perform a portion of such transfers) are 972 known to pad lines of text with SPACEs, and others are 973 known to remove "white space" characters from the end 974 of a line. Therefore, when decoding a Quoted-Printable 975 body, any trailing white space on a line must be 976 deleted, as it will necessarily have been added by 977 intermediate transport agents. 979 (4) (Line Breaks) A line break in a text body, represented 980 as a CRLF sequence in the text canonical form, must be 981 represented by a (RFC 822) line break, which is also a 982 CRLF sequence, in the Quoted-Printable encoding. Since 983 the canonical representation of media types other than 984 text do not generally include the representation of 985 line breaks as CRLF sequences, no hard line breaks 986 (i.e. line breaks that are intended to be meaningful 987 and to be displayed to the user) should occur in the 988 quoted-printable encoding of such types. Sequences 989 like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely 990 appear in non-text data represented in quoted- 991 printable, of course. 993 Note that many implementations may elect to encode the 994 local representation of various content types directly 995 rather than converting to canonical form first, 996 encoding, and then converting back to local 997 representation. In particular, this may apply to plain 998 text material on systems that use newline conventions 999 other than a CRLF terminator sequence. Such an 1000 implementation optimization is permissible, but only 1001 when the combined canonicalization-encoding step is 1002 equivalent to performing the three steps separately. 1004 (5) (Soft Line Breaks) The Quoted-Printable encoding 1005 REQUIRES that encoded lines be no more than 76 1006 characters long. If longer lines are to be encoded 1007 with the Quoted-Printable encoding, "soft" line breaks 1008 must be used. An equal sign as the last character on a 1009 encoded line indicates such a non-significant ("soft") 1010 line break in the encoded text. 1012 Thus if the "raw" form of the line is a single unencoded line 1013 that says: 1015 Now's the time for all folk to come to the aid of their country. 1017 This can be represented, in the Quoted-Printable encoding, as: 1019 Now's the time = 1020 for all folk to come= 1021 to the aid of their country. 1023 This provides a mechanism with which long lines are encoded in 1024 such a way as to be restored by the user agent. The 76 1025 character limit does not count the trailing CRLF, but counts 1026 all other characters, including any equal signs. 1028 Since the hyphen character ("-") may be represented as itself 1029 in the Quoted-Printable encoding, care must be taken, when 1030 encapsulating a quoted-printable encoded body inside one or 1031 more multipart entities, to ensure that the boundary delimiter 1032 does not appear anywhere in the encoded body. (A good 1033 strategy is to choose a boundary that includes a character 1034 sequence such as "=_" which can never appear in a quoted- 1035 printable body. See the definition of multipart messages in 1036 MIME-IMT.) 1037 NOTE: The quoted-printable encoding represents something of a 1038 compromise between readability and reliability in transport. 1039 Bodies encoded with the quoted-printable encoding will work 1040 reliably over most mail gateways, but may not work perfectly 1041 over a few gateways, notably those involving translation into 1042 EBCDIC. A higher level of confidence is offered by the base64 1043 Content-Transfer-Encoding. A way to get reasonably reliable 1044 transport through EBCDIC gateways is to also quote the US- 1045 ASCII characters 1047 !"#$@[\]^`{|}~ 1049 according to rule #1. 1051 Because quoted-printable data is generally assumed to be 1052 line-oriented, it is to be expected that the representation of 1053 the breaks between the lines of quoted printable data may be 1054 altered in transport, in the same manner that plain text mail 1055 has always been altered in Internet mail when passing between 1056 systems with differing newline conventions. If such 1057 alterations are likely to constitute a corruption of the data, 1058 it is probably more sensible to use the base64 encoding rather 1059 than the quoted-printable encoding. 1061 WARNING TO IMPLEMENTORS: If binary data are encoded in 1062 quoted-printable, care must be taken to encode CR and LF 1063 characters as "=0D" and "=0A", respectively. In particular, a 1064 CRLF sequence in binary data should be encoded as "=0D=0A". 1065 Otherwise, if CRLF were represented as a hard line break, it 1066 might be incorrectly decoded on platforms with different line 1067 break conventions. 1069 For formalists, the syntax of quoted-printable data is 1070 described by the following grammar: 1072 quoted-printable := qp-line *(CRLF qp-line) 1074 qp-line := *(qp-segment transport-padding CRLF) 1075 qp-part transport-padding 1077 qp-part := qp-section 1078 ; Maximum length of 76 characters 1080 qp-segment := qp-section *(SPACE / TAB) "=" 1081 ; Maximum length of 76 characters 1083 qp-section := [*(ptext / SPACE / TAB) ptext] 1085 ptext := hex-octet / safe-char 1087 safe-char := 1089 ; Characters not listed as "mail-safe" in 1090 ; RFC MIME-CONF are also not recommended. 1092 hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1093 ; Octet must be used for characters > 127, =, 1094 ; SPACEs or TABs at the ends of lines, and is 1095 ; recommended for any character not listed in 1096 ; RFC MIME-CONF as "mail-safe". 1098 transport-padding := *LWSP-char 1099 ; Composers MUST NOT generate 1100 ; non-zero length transport 1101 ; padding, but receivers MUST 1102 ; be able to handle padding 1103 ; added by message transports. 1105 IMPORTANT: The addition of LWSP between the elements shown in 1106 this BNF is NOT allowed since this BNF does not specify a 1107 structured header field. 1109 8.8. Base64 Content-Transfer-Encoding 1111 The Base64 Content-Transfer-Encoding is designed to represent 1112 arbitrary sequences of octets in a form that need not be 1113 humanly readable. The encoding and decoding algorithms are 1114 simple, but the encoded data are consistently only about 33 1115 percent larger than the unencoded data. This encoding is 1116 virtually identical to the one used in Privacy Enhanced Mail 1117 (PEM) applications, as defined in RFC 1421. 1119 A 65-character subset of US-ASCII is used, enabling 6 bits to 1120 be represented per printable character. (The extra 65th 1121 character, "=", is used to signify a special processing 1122 function.) 1124 NOTE: This subset has the important property that it is 1125 represented identically in all versions of ISO 646, including 1126 US-ASCII, and all characters in the subset are also 1127 represented identically in all versions of EBCDIC. Other 1128 popular encodings, such as the encoding used by the uuencode 1129 utility, Macintosh binhex 4.0 [RFC-1741], and the base85 1130 encoding specified as part of Level 2 PostScript, do not share 1131 these properties, and thus do not fulfill the portability 1132 requirements a binary transport encoding for mail must meet. 1134 The encoding process represents 24-bit groups of input bits as 1135 output strings of 4 encoded characters. Proceeding from left 1136 to right, a 24-bit input group is formed by concatenating 3 1137 8bit input groups. These 24 bits are then treated as 4 1138 concatenated 6-bit groups, each of which is translated into a 1139 single digit in the base64 alphabet. When encoding a bit 1140 stream via the base64 encoding, the bit stream must be 1141 presumed to be ordered with the most-significant-bit first. 1142 That is, the first bit in the stream will be the high-order 1143 bit in the first 8bit byte, and the eighth bit will be the 1144 low-order bit in the first 8bit byte, and so on. 1146 Each 6-bit group is used as an index into an array of 64 1147 printable characters. The character referenced by the index 1148 is placed in the output string. These characters, identified 1149 in Table 1, below, are selected so as to be universally 1150 representable, and the set excludes characters with particular 1151 significance to SMTP (e.g., ".", CR, LF) and to the multipart 1152 boundary delimiters defined in MIME-IMT (e.g., "-"). 1154 Table 1: The Base64 Alphabet 1156 Value Encoding Value Encoding Value Encoding Value Encoding 1157 0 A 17 R 34 i 51 z 1158 1 B 18 S 35 j 52 0 1159 2 C 19 T 36 k 53 1 1160 3 D 20 U 37 l 54 2 1161 4 E 21 V 38 m 55 3 1162 5 F 22 W 39 n 56 4 1163 6 G 23 X 40 o 57 5 1164 7 H 24 Y 41 p 58 6 1165 8 I 25 Z 42 q 59 7 1166 9 J 26 a 43 r 60 8 1167 10 K 27 b 44 s 61 9 1168 11 L 28 c 45 t 62 + 1169 12 M 29 d 46 u 63 / 1170 13 N 30 e 47 v 1171 14 O 31 f 48 w (pad) = 1172 15 P 32 g 49 x 1173 16 Q 33 h 50 y 1175 The encoded output stream must be represented in lines of no 1176 more than 76 characters each. All line breaks or other 1177 characters not found in Table 1 must be ignored by decoding 1178 software. In base64 data, characters other than those in 1179 Table 1, line breaks, and other white space probably indicate 1180 a transmission error, about which a warning message or even a 1181 message rejection might be appropriate under some 1182 circumstances. 1184 Special processing is performed if fewer than 24 bits are 1185 available at the end of the data being encoded. A full 1186 encoding quantum is always completed at the end of a body. 1187 When fewer than 24 input bits are available in an input group, 1188 zero bits are added (on the right) to form an integral number 1189 of 6-bit groups. Padding at the end of the data is performed 1190 using the "=" character. Since all base64 input is an 1191 integral number of octets, only the following cases can arise: 1192 (1) the final quantum of encoding input is an integral 1193 multiple of 24 bits; here, the final unit of encoded output 1194 will be an integral multiple of 4 characters with no "=" 1195 padding, (2) the final quantum of encoding input is exactly 8 1196 bits; here, the final unit of encoded output will be two 1197 characters followed by two "=" padding characters, or (3) the 1198 final quantum of encoding input is exactly 16 bits; here, the 1199 final unit of encoded output will be three characters followed 1200 by one "=" padding character. 1202 Because it is used only for padding at the end of the data, 1203 the occurrence of any "=" characters may be taken as evidence 1204 that the end of the data has been reached (without truncation 1205 in transit). No such assurance is possible, however, when the 1206 number of octets transmitted was a multiple of three and no 1207 "=" characters are present. 1209 Any characters outside of the base64 alphabet are to be 1210 ignored in base64-encoded data. 1212 Care must be taken to use the proper octets for line breaks if 1213 base64 encoding is applied directly to text material that has 1214 not been converted to canonical form. In particular, text 1215 line breaks must be converted into CRLF sequences prior to 1216 base64 encoding. The important thing to note is that this may 1217 be done directly by the encoder rather than in a prior 1218 canonicalization step in some implementations. 1220 NOTE: There is no need to worry about quoting potential 1221 boundary delimiters within base64-encoded bodies within 1222 multipart entities because no hyphen characters are used in 1223 the base64 encoding. 1225 9. Content-ID Header Field 1227 In constructing a high-level user agent, it may be desirable 1228 to allow one body to make reference to another. Accordingly, 1229 bodies may be labelled using the "Content-ID" header field, 1230 which is syntactically identical to the "Message-ID" header 1231 field: 1233 id := "Content-ID" ":" msg-id 1235 Like the Message-ID values, Content-ID values must be 1236 generated to be world-unique. 1238 The Content-ID value may be used for uniquely identifying MIME 1239 entities in several contexts, particularly for caching data 1240 referenced by the message/external-body mechanism. Although 1241 the Content-ID header is generally optional, its use is 1242 MANDATORY in implementations which generate data of the 1243 optional MIME media type "message/external-body". That is, 1244 each message/external-body entity must have a Content-ID field 1245 to permit caching of such data. 1247 It is also worth noting that the Content-ID value has special 1248 semantics in the case of the multipart/alternative media type. 1249 This is explained in the section of MIME-IMT dealing with 1250 multipart/alternative. 1252 10. Content-Description Header Field 1254 The ability to associate some descriptive information with a 1255 given body is often desirable. For example, it may be useful 1256 to mark an "image" body as "a picture of the Space Shuttle 1257 Endeavor." Such text may be placed in the Content-Description 1258 header field. This header field is always optional. 1260 description := "Content-Description" ":" *text 1262 The description is presumed to be given in the US-ASCII 1263 character set, although the mechanism specified in RFC MIME- 1264 HEADERS may be used for non-US-ASCII Content-Description 1265 values. 1267 11. Additional MIME Header Fields 1269 Future documents may elect to define additional MIME header 1270 fields for various purposes. Any new header field that 1271 further describes the content of a message should begin with 1272 the string "Content-" to allow such fields which appear in a 1273 message header to be distinguished from ordinary RFC 822 1274 message header fields. 1276 MIME-extension-field := 1280 12. Summary 1282 Using the MIME-Version, Content-Type, and Content-Transfer- 1283 Encoding header fields, it is possible to include, in a 1284 standardized way, arbitrary types of data with RFC 822 1285 conformant mail messages. No restrictions imposed by either 1286 RFC 821 or RFC 822 are violated, and care has been taken to 1287 avoid problems caused by additional restrictions imposed by 1288 the characteristics of some Internet mail transport mechanisms 1289 (see RFC MIME-CONF). 1291 The next document in this set, RFC MIME-IMT, specifies the 1292 initial set of media types that can be labelled and 1293 transported using these headers. 1295 13. Security Considerations 1297 Security issues are discussed in the second document in this 1298 set, RFC MIME-IMT. 1300 14. Authors' Addresses 1302 For more information, the authors of this document are best 1303 contacted via Internet mail: 1305 Nathaniel S. Borenstein 1306 First Virtual Holdings 1307 25 Washington Avenue 1308 Morristown, NJ 07960 1309 USA 1311 Email: nsb@nsb.fv.com 1312 Phone: +1 201 540 8967 1313 Fax: +1 201 993 3032 1315 Ned Freed 1316 Innosoft International, Inc. 1317 1050 East Garvey Avenue South 1318 West Covina, CA 91790 1319 USA 1321 Email: ned@innosoft.com 1322 Phone: +1 818 919 3600 1323 Fax: +1 818 919 3614 1325 MIME is a result of the work of the Internet Engineering Task 1326 Force Working Group on Email Extensions. The chairman of that 1327 group, Greg Vaudreuil, may be reached at: 1329 Gregory M. Vaudreuil 1330 Tigon Corporation 1331 17060 Dallas Parkway 1332 Dallas Texas, 75248 1334 Email: greg.vaudreuil@ons.octel.com 1335 Phone: +1 214 733 2722 1336 Appendix A -- Collected Grammar 1338 This appendix contains the complete BNF grammar for all the 1339 syntax specified by this document. 1341 By itself, however, this grammar is incomplete. It refers by 1342 name to several syntax rules that are defined by RFC 822. 1343 Rather than reproduce those definitions here, and risk 1344 unintentional differences between the two, this document 1345 simply refers the reader to RFC 822 for the remaining 1346 definitions. Wherever a term is undefined, it refers to the 1347 RFC 822 definition. 1349 attribute := token 1350 ; Matching of attributes 1351 ; is ALWAYS case-insensitive. 1353 composite-type := "message" / "multipart" / extension-token 1355 content := "Content-Type" ":" type "/" subtype 1356 *(";" parameter) 1357 ; Matching of media type and subtype 1358 ; is ALWAYS case-insensitive. 1360 description := "Content-Description" ":" *text 1362 discrete-type := "text" / "image" / "audio" / "video" / 1363 "application" / extension-token 1365 encoding := "Content-Transfer-Encoding" ":" mechanism 1367 entity-headers := [ content CRLF ] 1368 [ encoding CRLF ] 1369 [ id CRLF ] 1370 [ description CRLF ] 1371 *( MIME-extension-field CRLF ) 1373 extension-token := ietf-token / x-token 1374 hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") 1375 ; Octet must be used for characters > 127, =, 1376 ; SPACEs or TABs at the ends of lines, and is 1377 ; recommended for any character not listed in 1378 ; RFC MIME-CONF as "mail-safe". 1380 iana-token := 1384 ietf-token := 1388 id := "Content-ID" ":" msg-id 1390 mechanism := "7bit" / "8bit" / "binary" / 1391 "quoted-printable" / "base64" / 1392 ietf-token / x-token 1394 MIME-extension-field := 1398 MIME-message-headers := entity-headers 1399 fields 1400 version CRLF 1401 ; The ordering of the header 1402 ; fields implied by this BNF 1403 ; definition should be ignored. 1405 MIME-part-headers := entity-headers 1406 [fields] 1407 ; Any field not beginning with 1408 ; "content-" can have no defined 1409 ; meaning and may be ignored. 1410 ; The ordering of the header 1411 ; fields implied by this BNF 1412 ; definition should be ignored. 1414 parameter := attribute "=" value 1416 ptext := hex-octet / safe-char 1417 qp-line := *(qp-segment transport-padding CRLF) 1418 qp-part transport-padding 1420 qp-part := qp-section 1421 ; Maximum length of 76 characters 1423 qp-section := [*(ptext / SPACE / TAB) ptext] 1425 qp-segment := qp-section *(SPACE / TAB) "=" 1426 ; Maximum length of 76 characters 1428 quoted-printable := qp-line *(CRLF qp-line) 1430 safe-char := 1432 ; Characters not listed as "mail-safe" in 1433 ; RFC MIME-CONF are also not recommended. 1435 subtype := extension-token / iana-token 1437 token := 1* 1440 transport-padding := *LWSP-char 1441 ; Composers MUST NOT generate 1442 ; non-zero length transport 1443 ; padding, but receivers MUST 1444 ; be able to handle padding 1445 ; added by message transports. 1447 tspecials := "(" / ")" / "<" / ">" / "@" / 1448 "," / ";" / ":" / "\" / <"> 1449 "/" / "[" / "]" / "?" / "=" 1450 ; Must be in quoted-string, 1451 ; to use within parameter values 1453 type := discrete-type / composite-type 1455 value := token / quoted-string 1457 version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT 1459 x-token :=