idnits 2.17.00 (12 Aug 2021) /tmp/idnits35440/draft-josefsson-base-encoding-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing revision: the document name given in the document, 'draft-josefsson-base-encoding', does not give the document revision number ~~ Missing draftname component: the document name given in the document, 'draft-josefsson-base-encoding', does not seem to contain all the document name components required ('draft' prefix, document source, document name, and revision) -- see https://www.ietf.org/id-info/guidelines#naming for more information. == Mismatching filename: the document gives the document name as 'draft-josefsson-base-encoding', but the file name used is 'draft-josefsson-base-encoding-03' == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 6 characters in excess of 72. ** The abstract seems to contain references ([3]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 13, 2001) is 7493 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1113 (ref. '1') (Obsoleted by RFC 1421) ** Obsolete normative reference: RFC 2440 (ref. '4') (Obsoleted by RFC 4880) ** Obsolete normative reference: RFC 2535 (ref. '5') (Obsoleted by RFC 4033, RFC 4034, RFC 4035) == Outdated reference: A later version (-05) exists of draft-ietf-cat-sasl-gssapi-01 -- Possible downref: Normative reference to a draft: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' Summary: 10 errors (**), 1 flaw (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Josefsson (editor) 3 Internet-Draft November 13, 2001 4 Expires: May 14, 2002 6 Base Encodings 7 draft-josefsson-base-encoding 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This Internet-Draft will expire on May 14, 2002. 32 Copyright Notice 34 Copyright (C) The Internet Society (2001). All Rights Reserved. 36 Abstract 38 This draft contain descriptions of the commonly used base 64, base 39 32, and base 16 encoding schemes. It also discusses the use of line- 40 feeds in encoded data, use of padding in encoded data, use of non- 41 alphabet characters in encoded data, and use of different encoding 42 alphabets. 44 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 45 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 46 document are to be interpreted as described in RFC 2119 [3]. 48 Table of Contents 50 1. Implementation variances . . . . . . . . . . . . . . . . . . . 3 51 1.1 Line feeds in encoded data . . . . . . . . . . . . . . . . . . 3 52 1.2 Padding of encoded data . . . . . . . . . . . . . . . . . . . 3 53 1.3 Interpretation of non-alphabet characters in encoded data . . 3 54 1.4 Chosing the alphabet . . . . . . . . . . . . . . . . . . . . . 4 55 2. Base 64 Encoding . . . . . . . . . . . . . . . . . . . . . . . 5 56 3. Base 32 Encoding . . . . . . . . . . . . . . . . . . . . . . . 7 57 4. Base 16 Encoding . . . . . . . . . . . . . . . . . . . . . . . 9 58 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 59 5.1 Examples of Base 64 . . . . . . . . . . . . . . . . . . . . . 10 60 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 61 7. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 11 62 References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 63 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 11 64 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 12 66 1. Implementation variances 68 Base encodings have historically been implemented with some minor 69 variances. This section describe these differences, and mandate a 70 default behaviour, to reduce the possibility for ambiguity in other 71 documents using base encodings. Optimizations, such as those used in 72 PDF's Base 85 encoding, are not discussed. 74 1.1 Line feeds in encoded data 76 RFC 2045 [2] is often used as a reference for base 64 encoding. 77 However, RFC 2045 does not define "base 64" per se, but rather a 78 "base 64 Content-Transfer-Encoding" for use within MIME. As such, 79 RFC 2045 enforces a limit on line length of base 64 encode data to 76 80 characters. 82 Implementation of specifications using this document as reference for 83 base encodings MUST NOT add line feeds to the encoded data, unless 84 explicitely stated and handled otherwise in said specifications. 86 1.2 Padding of encoded data 88 In some circumstances, the use of padding ("=") in base encoded data 89 is not required nor used. 91 Implementation of specifications using this document as reference for 92 base encodings MUST do proper padding to the encoded data, unless 93 explicitely stated and handled otherwise in said specifications. 95 1.3 Interpretation of non-alphabet characters in encoded data 97 Base encodings use a specific, reduced, alphabet to encode binary 98 data. Non base alphabet characters may exist within base encoded 99 data, caused by data corruption or by design. 101 Implementations of specifications using this document as reference 102 for base encodings MUST ignore characters outside the base encoding 103 alphabet when interpreting base encoded data (``be liberal in what 104 you accept''), unless explicitely stated and handled otherwise in 105 said specifications. 107 Note that this means that e.g., CRLF-padding after 76 characters 108 constitue "non alphabet characters", and should simply be ignored. 109 Also, the pad character, "=", should not be regarded as part of the 110 base alphabet until the end of the string. If more than the allowed 111 number of pad characters are found at the end of the string, e.g., a 112 base 64 string terminated with "===" the excess pad characters should 113 preferably be ignored in a robust implementation. 115 1.4 Chosing the alphabet 117 Different applications have different requirements on the characters 118 in the alphabet. Here are a few requirements that determine which 119 alphabet should be used: 121 o Handled by humans. Characters "0", "O" are easily interchanged, 122 as well "1", "l" and "I". 124 o Encoded into structures that place other requirements. This 125 determines the use of upper- or lowercase alphabets (for case- 126 insensitive alphabets such as base 32). For base 64, the non- 127 alphanumeric characters (especially "/") may be problematic in 128 filenames and URLs. 130 o Used as identifiers. Certain characters, notably "+" and "/" in 131 the base 64 alphabet, are treated as word-breaks by legacy text 132 search/index tools. 134 There is no universally accepted alphabet that fulfill all the 135 requirements. In this document, we document and name some currently 136 used alphabet variances. 138 2. Base 64 Encoding 140 The following description of base 64 is due to [1], [2], [4] and [5]. 141 The URL and filename safe base 64 alphabet is due to [8]. (An 142 alternative alphabet has been suggested as a URL safe alphabet, which 143 used "~" as the 63rd character. However, since this character has 144 special meaning in some file system environments, the "URL and 145 Filename safe" alphabet below is recommended instead.) 147 The Base 64 encoding is designed to represent arbitrary sequences of 148 octets in a form that requires case sensitivity but need not be 149 humanly readable. 151 A 65-character subset of US-ASCII is used, enabling 6 bits to be 152 represented per printable character. (The extra 65th character, "=", 153 is used to signify a special processing function.) 155 The encoding process represents 24-bit groups of input bits as output 156 strings of 4 encoded characters. Proceeding from left to right, a 157 24-bit input group is formed by concatenating 3 8-bit input groups. 158 These 24 bits are then treated as 4 concatenated 6-bit groups, each 159 of which is translated into a single digit in the base 64 alphabet. 161 Each 6-bit group is used as an index into an array of 64 printable 162 characters. The character referenced by the index is placed in the 163 output string. 165 Table 1: The "Canonical" Base 64 Alphabet 167 Value Encoding Value Encoding Value Encoding Value Encoding 168 0 A 17 R 34 i 51 z 169 1 B 18 S 35 j 52 0 170 2 C 19 T 36 k 53 1 171 3 D 20 U 37 l 54 2 172 4 E 21 V 38 m 55 3 173 5 F 22 W 39 n 56 4 174 6 G 23 X 40 o 57 5 175 7 H 24 Y 41 p 58 6 176 8 I 25 Z 42 q 59 7 177 9 J 26 a 43 r 60 8 178 10 K 27 b 44 s 61 9 179 11 L 28 c 45 t 62 + 180 12 M 29 d 46 u 63 / 181 13 N 30 e 47 v 182 14 O 31 f 48 w (pad) = 183 15 P 32 g 49 x 184 16 Q 33 h 50 y 185 Table 2: The "URL and Filename safe" Base 64 Alphabet 187 Value Encoding Value Encoding Value Encoding Value Encoding 188 0 A 17 R 34 i 51 z 189 1 B 18 S 35 j 52 0 190 2 C 19 T 36 k 53 1 191 3 D 20 U 37 l 54 2 192 4 E 21 V 38 m 55 3 193 5 F 22 W 39 n 56 4 194 6 G 23 X 40 o 57 5 195 7 H 24 Y 41 p 58 6 196 8 I 25 Z 42 q 59 7 197 9 J 26 a 43 r 60 8 198 10 K 27 b 44 s 61 9 199 11 L 28 c 45 t 62 - (minus) 200 12 M 29 d 46 u 63 _ (understrike) 201 13 N 30 e 47 v 202 14 O 31 f 48 w (pad) = 203 15 P 32 g 49 x 204 16 Q 33 h 50 y 206 Special processing is performed if fewer than 24 bits are available 207 at the end of the data being encoded. A full encoding quantum is 208 always completed at the end of a quantity. When fewer than 24 input 209 bits are available in an input group, zero bits are added (on the 210 right) to form an integral number of 6-bit groups. Padding at the 211 end of the data is performed using the '=' character. Since all base 212 64 input is an integral number of octets, only the following cases 213 can arise: 215 (1) the final quantum of encoding input is an integral multiple of 24 216 bits; here, the final unit of encoded output will be an integral 217 multiple of 4 characters with no "=" padding, 219 (2) the final quantum of encoding input is exactly 8 bits; here, the 220 final unit of encoded output will be two characters followed by two 221 "=" padding characters, or 223 (3) the final quantum of encoding input is exactly 16 bits; here, the 224 final unit of encoded output will be three characters followed by one 225 "=" padding character. 227 3. Base 32 Encoding 229 The following description of base 32 is due to [7] (with corrections) 230 and [6] (the "extended hex" alphabet). 232 The Base 32 encoding is designed to represent arbitrary sequences of 233 octets in a form that needs to be case insensitive but need not be 234 humanly readable. 236 A 33-character subset of US-ASCII is used, enabling 5 bits to be 237 represented per printable character. (The extra 33rd character, "=", 238 is used to signify a special processing function.) 240 The encoding process represents 40-bit groups of input bits as output 241 strings of 8 encoded characters. Proceeding from left to right, a 242 40-bit input group is formed by concatenating 5 8bit input groups. 243 These 40 bits are then treated as 8 concatenated 5-bit groups, each 244 of which is translated into a single digit in the base 32 alphabet. 245 When encoding a bit stream via the base 32 encoding, the bit stream 246 must be presumed to be ordered with the most-significant-bit first. 247 That is, the first bit in the stream will be the high-order bit in 248 the first 8bit byte, and the eighth bit will be the low-order bit in 249 the first 8bit byte, and so on. 251 Each 5-bit group is used as an index into an array of 32 printable 252 characters. The character referenced by the index is placed in the 253 output string. These characters, identified in Table 2, below, are 254 selected from US-ASCII digits and uppercase letters. 256 Table 3: The "Canonical" Base 32 Alphabet 258 Value Encoding Value Encoding Value Encoding Value Encoding 259 0 A 9 J 18 S 27 3 260 1 B 10 K 19 T 28 4 261 2 C 11 L 20 U 29 5 262 3 D 12 M 21 V 30 6 263 4 E 13 N 22 W 31 7 264 5 F 14 O 23 X 265 6 G 15 P 24 Y (pad) = 266 7 H 16 Q 25 Z 267 8 I 17 R 26 2 268 Table 4: The "Extended Hex" Base 32 Alphabet 270 Value Encoding Value Encoding Value Encoding Value Encoding 271 0 0 9 9 18 I 27 R 272 1 1 10 A 19 J 28 S 273 2 2 11 B 20 K 29 T 274 3 3 12 C 21 L 30 U 275 4 4 13 D 22 M 31 V 276 5 5 14 E 23 N 277 6 6 15 F 24 O (pad) = 278 7 7 16 G 25 P 279 8 8 17 H 26 Q 281 Special processing is performed if fewer than 40 bits are available 282 at the end of the data being encoded. A full encoding quantum is 283 always completed at the end of a body. When fewer than 40 input bits 284 are available in an input group, zero bits are added (on the right) 285 to form an integral number of 5-bit groups. Padding at the end of 286 the data is performed using the "=" character. Since all base 32 287 input is an integral number of octets, only the following cases can 288 arise: 290 (1) the final quantum of encoding input is an integral multiple of 40 291 bits; here, the final unit of encoded output will be an integral 292 multiple of 8 characters with no "=" padding, 294 (2) the final quantum of encoding input is exactly 8 bits; here, the 295 final unit of encoded output will be two characters followed by six 296 "=" padding characters, 298 (3) the final quantum of encoding input is exactly 16 bits; here, the 299 final unit of encoded output will be four characters followed by four 300 "=" padding characters, 302 (4) the final quantum of encoding input is exactly 24 bits; here, the 303 final unit of encoded output will be five characters followed by 304 three "=" padding characters, or 306 (5) the final quantum of encoding input is exactly 32 bits; here, the 307 final unit of encoded output will be seven characters followed by one 308 "=" padding character. 310 4. Base 16 Encoding 312 The following description is original but analogous to previous 313 descriptions. 315 A 16-character subset of US-ASCII is used, enabling 4 bits to be 316 represented per printable character. 318 The encoding process represents 8-bit groups (octets) of input bits 319 as output strings of 2 encoded characters. Proceeding from left to 320 right, a 8-bit input is taken from the input data. These 8 bits are 321 then treated as 2 concatenated 4-bit groups, each of which is 322 translated into a single digit in the base 16 alphabet. 324 Each 4-bit group is used as an index into an array of 16 printable 325 characters. The character referenced by the index is placed in the 326 output string. 328 This draft describe two alphabets for Base 16 encoding. While the 329 Hex alphabet is arguable more natural there may be situations with 330 special constraints, such as forbidden leading digits in strings, 331 which the other may be useful. Both alphabets are to be handled case 332 insensitive. 334 Table 5: The "Hex" Base 16 Alphabet 336 Value Encoding Value Encoding Value Encoding Value Encoding 337 0 0 4 4 8 8 12 C 338 1 1 5 5 9 9 13 D 339 2 2 6 6 10 A 14 E 340 3 3 7 7 11 B 15 F 342 Table 6: The Canonical Base 16 Alphabet 344 Value Encoding Value Encoding Value Encoding Value Encoding 345 0 A 4 E 8 I 12 M 346 1 B 5 F 9 J 13 N 347 2 C 6 G 10 K 14 O 348 3 D 7 H 11 L 15 P 350 Unlike base 32 and base 64, no special padding is necessery since a 351 full code word is always available. 353 5. Examples 355 To translate between binary and a base encoding, the input is stored 356 in a structure and the output is extracted. The case for base 64 is 357 displayed in the following figure, borrowed from [4]. 359 +--first octet--+-second octet--+--third octet--+ 360 |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| 361 +-----------+---+-------+-------+---+-----------+ 362 |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0| 363 +--1.index--+--2.index--+--3.index--+--4.index--+ 365 5.1 Examples of Base 64 367 This example is from [4]. 369 Input data: 0x14fb9c03d97e 370 Hex: 1 4 f b 9 c | 0 3 d 9 7 e 371 8-bit: 00010100 11111011 10011100 | 00000011 11011001 372 11111110 373 6-bit: 000101 001111 101110 011100 | 000000 111101 100111 374 111110 375 Decimal: 5 15 46 28 0 61 37 62 376 Output: F P u c A 9 l + 378 Input data: 0x14fb9c03d9 379 Hex: 1 4 f b 9 c | 0 3 d 9 380 8-bit: 00010100 11111011 10011100 | 00000011 11011001 381 pad with 00 382 6-bit: 000101 001111 101110 011100 | 000000 111101 100100 383 Decimal: 5 15 46 28 0 61 36 384 pad with = 385 Output: F P u c A 9 k = 387 Input data: 0x14fb9c03 388 Hex: 1 4 f b 9 c | 0 3 389 8-bit: 00010100 11111011 10011100 | 00000011 390 pad with 0000 391 6-bit: 000101 001111 101110 011100 | 000000 110000 392 Decimal: 5 15 46 28 0 48 393 pad with = = 394 Output: F P u c A w = = 396 6. Security Considerations 398 When implementing Base 64 encoding and decoding, care should be made 399 not to introduce vulnerabilities to buffer overflows. 401 7. Acknowledgement 403 I'd like to thank Tony Hansen and Gordon Mohr for comments and 404 suggestions. 406 References 408 [1] Linn, J., "Privacy enhancement for Internet electronic mail: 409 Part I - message encipherment and authentication procedures", 410 RFC 1113, August 1989. 412 [2] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 413 Extensions (MIME) Part One: Format of Internet Message Bodies", 414 RFC 2045, November 1996. 416 [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement 417 Levels", BCP 14, RFC 2119, March 1997. 419 [4] Callas, J., Donnerhacke, L., Finney, H. and R. Thayer, "OpenPGP 420 Message Format", RFC 2440, November 1998. 422 [5] Eastlake, D., "Domain Name System Security Extensions", RFC 423 2535, March 1999. 425 [6] Klyne, G. and L. Masinter, "Identifying Composite Media 426 Features", RFC 2938, September 2000. 428 [7] Myers, J., "SASL GSSAPI mechanisms", draft draft-ietf-cat-sasl- 429 gssapi-01, May 2000. 431 [8] Zooko, O., "Post to P2P-hackers mailing list", World Wide Web 432 http://zgp.org/pipermail/p2p-hackers/2001-September/000315.html, 433 September 2001. 435 Author's Address 437 Simon Josefsson 438 Drottningholmsv. 70 439 Stockholm 112 42 440 Sweden 442 EMail: simon@josefsson.org 444 Full Copyright Statement 446 Copyright (C) The Internet Society (2001). All Rights Reserved. 448 This document and translations of it may be copied and furnished to 449 others, and derivative works that comment on or otherwise explain it 450 or assist in its implementation may be prepared, copied, published 451 and distributed, in whole or in part, without restriction of any 452 kind, provided that the above copyright notice and this paragraph are 453 included on all such copies and derivative works. However, this 454 document itself may not be modified in any way, such as by removing 455 the copyright notice or references to the Internet Society or other 456 Internet organizations, except as needed for the purpose of 457 developing Internet standards in which case the procedures for 458 copyrights defined in the Internet Standards process must be 459 followed, or as required to translate it into languages other than 460 English. 462 The limited permissions granted above are perpetual and will not be 463 revoked by the Internet Society or its successors or assigns. 465 This document and the information contained herein is provided on an 466 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 467 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 468 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 469 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 470 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 472 Acknowledgement 474 Funding for the RFC Editor function is currently provided by the 475 Internet Society.