idnits 2.17.00 (12 Aug 2021) /tmp/idnits20687/draft-josefsson-rfc3548bis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1168. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1145. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1152. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1158. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) -- The draft header indicates that this document obsoletes RFC3548, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 22, 2006) is 5903 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '64' is mentioned on line 666, but not defined == Missing Reference: '0' is mentioned on line 917, but not defined -- Looks like a reference, but probably isn't: '0x100' on line 816 -- Obsolete informational reference (is this intentional?): RFC 2440 (ref. '5') (Obsoleted by RFC 4880) -- Obsolete informational reference (is this intentional?): RFC 2535 (ref. '6') (Obsoleted by RFC 4033, RFC 4034, RFC 4035) -- Obsolete informational reference (is this intentional?): RFC 3501 (ref. '8') (Obsoleted by RFC 9051) == Outdated reference: A later version (-05) exists of draft-ietf-cat-sasl-gssapi-01 Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Josefsson 3 Internet-Draft March 22, 2006 4 Obsoletes: 3548 (if approved) 5 Expires: September 23, 2006 7 The Base16, Base32, and Base64 Data Encodings 8 draft-josefsson-rfc3548bis-01 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on September 23, 2006. 35 Copyright Notice 37 Copyright (C) The Internet Society (2006). 39 Keywords 41 Base Encoding, Base64, Base32, Base16, Hex. 43 Abstract 45 This document describes the commonly used base 64, base 32, and base 46 16 encoding schemes. It also discusses the use of line-feeds in 47 encoded data, use of padding in encoded data, use of non-alphabet 48 characters in encoded data, and use of different encoding alphabets. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 2. Conventions Used in this Document . . . . . . . . . . . . . . 3 54 3. Implementation discrepancies . . . . . . . . . . . . . . . . . 3 55 3.1. Line feeds in encoded data . . . . . . . . . . . . . . . . 3 56 3.2. Padding of encoded data . . . . . . . . . . . . . . . . . 4 57 3.3. Interpretation of non-alphabet characters in encoded 58 data . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3.4. Choosing the alphabet . . . . . . . . . . . . . . . . . . 4 60 4. Base 64 Encoding . . . . . . . . . . . . . . . . . . . . . . . 5 61 5. Base 64 Encoding with URL and Filename Safe Alphabet . . . . . 7 62 6. Base 32 Encoding . . . . . . . . . . . . . . . . . . . . . . . 7 63 7. Base 32 Encoding with Extended Hex Alphabet . . . . . . . . . 9 64 8. Base 16 Encoding . . . . . . . . . . . . . . . . . . . . . . . 10 65 9. Illustrations and examples . . . . . . . . . . . . . . . . . . 11 66 10. Test vectors . . . . . . . . . . . . . . . . . . . . . . . . . 12 67 11. ISO C99 Implementation of Base64 . . . . . . . . . . . . . . . 13 68 11.1. Prototypes: base64.h . . . . . . . . . . . . . . . . . . . 13 69 11.2. Implementation: base64.c . . . . . . . . . . . . . . . . . 15 70 12. Security Considerations . . . . . . . . . . . . . . . . . . . 24 71 13. Changes since RFC 3548 . . . . . . . . . . . . . . . . . . . . 24 72 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 24 73 15. Copying conditions . . . . . . . . . . . . . . . . . . . . . . 25 74 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 75 16.1. Normative References . . . . . . . . . . . . . . . . . . . 25 76 16.2. Informative References . . . . . . . . . . . . . . . . . . 25 77 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 27 78 Intellectual Property and Copyright Statements . . . . . . . . . . 28 80 1. Introduction 82 Base encoding of data is used in many situations to store or transfer 83 data in environments that, perhaps for legacy reasons, are restricted 84 to only US-ASCII [2] data. Base encoding can also be used in new 85 applications that do not have legacy restrictions, simply because it 86 makes it possible to manipulate objects with text editors. 88 In the past, different applications have had different requirements 89 and thus sometimes implemented base encodings in slightly different 90 ways. Today, protocol specifications sometimes use base encodings in 91 general, and "base64" in particular, without a precise description or 92 reference. MIME [4] is often used as a reference for base64 without 93 considering the consequences for line-wrapping or non-alphabet 94 characters. The purpose of this specification is to establish common 95 alphabet and encoding considerations. This will hopefully reduce 96 ambiguity in other documents, leading to better interoperability. 98 2. Conventions Used in this Document 100 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 101 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 102 document are to be interpreted as described in [1]. 104 3. Implementation discrepancies 106 Here we discuss the discrepancies between base encoding 107 implementations in the past, and where appropriate, mandate a 108 specific recommended behavior for the future. 110 3.1. Line feeds in encoded data 112 MIME [4] is often used as a reference for base 64 encoding. However, 113 MIME does not define "base 64" per se, but rather a "base 64 Content- 114 Transfer-Encoding" for use within MIME. As such, MIME enforces a 115 limit on line length of base 64 encoded data to 76 characters. MIME 116 inherits the encoding from PEM [3] stating it is "virtually 117 identical", however PEM uses a line length of 64 characters. The 118 MIME and PEM limits are both due to limits within SMTP. 120 Implementations MUST NOT add line feeds to base encoded data unless 121 the specification referring to this document explicitly directs base 122 encoders to add line feeds after a specific number of characters. 124 3.2. Padding of encoded data 126 In some circumstances, the use of padding ("=") in base encoded data 127 is not required nor used. In the general case, when assumptions on 128 size of transported data cannot be made, padding is required to yield 129 correct decoded data. 131 Implementations MUST include appropriate pad characters at the end of 132 encoded data unless the specification referring to this document 133 explicitly states otherwise. 135 3.3. Interpretation of non-alphabet characters in encoded data 137 Base encodings use a specific, reduced, alphabet to encode binary 138 data. Non alphabet characters could exist within base encoded data, 139 caused by data corruption or by design. Non alphabet characters may 140 be exploited as a "covert channel", where non-protocol data can be 141 sent for nefarious purposes. Non alphabet characters might also be 142 sent in order to exploit implementation errors leading to, e.g., 143 buffer overflow attacks. 145 Implementations MUST reject the encoding if it contains characters 146 outside the base alphabet when interpreting base encoded data, unless 147 the specification referring to this document explicitly states 148 otherwise. Such specifications may, as MIME does, instead state that 149 characters outside the base encoding alphabet should simply be 150 ignored when interpreting data ("be liberal in what you accept"). 151 Note that this means that any CRLF constitute "non alphabet 152 characters" and are ignored. Furthermore, such specifications may 153 consider the pad character, "=", as not part of the base alphabet 154 until the end of the string. If more than the allowed number of pad 155 characters are found at the end of the string, e.g., a base 64 string 156 terminated with "===", the excess pad characters could be ignored. 158 3.4. Choosing the alphabet 160 Different applications have different requirements on the characters 161 in the alphabet. Here are a few requirements that determine which 162 alphabet should be used: 164 o Handled by humans. Characters "0", "O" are easily interchanged, 165 as well "1", "l" and "I". In the base32 alphabet below, where 0 166 (zero) and 1 (one) is not present, a decoder may interpret 0 as O, 167 and 1 as I or L depending on case. (However, by default it should 168 not, see previous section.) 170 o Encoded into structures that place other requirements. For base 171 16 and base 32, this determines the use of upper- or lowercase 172 alphabets. For base 64, the non-alphanumeric characters (in 173 particular "/") may be problematic in file names and URLs. 175 o Used as identifiers. Certain characters, notably "+" and "/" in 176 the base 64 alphabet, are treated as word-breaks by legacy text 177 search/index tools. 179 There is no universally accepted alphabet that fulfills all the 180 requirements. For an example of a highly specialized variant, see 181 IMAP [8]. In this document, we document and name some currently used 182 alphabets. 184 4. Base 64 Encoding 186 The following description of base 64 is due to [3], [4], [5] and [6]. 188 The Base 64 encoding is designed to represent arbitrary sequences of 189 octets in a form that requires case sensitivity but need not be 190 humanly readable. 192 A 65-character subset of US-ASCII is used, enabling 6 bits to be 193 represented per printable character. (The extra 65th character, "=", 194 is used to signify a special processing function.) 196 The encoding process represents 24-bit groups of input bits as output 197 strings of 4 encoded characters. Proceeding from left to right, a 198 24-bit input group is formed by concatenating 3 8-bit input groups. 199 These 24 bits are then treated as 4 concatenated 6-bit groups, each 200 of which is translated into a single digit in the base 64 alphabet. 202 Each 6-bit group is used as an index into an array of 64 printable 203 characters. The character referenced by the index is placed in the 204 output string. 206 Table 1: The Base 64 Alphabet 208 Value Encoding Value Encoding Value Encoding Value Encoding 209 0 A 17 R 34 i 51 z 210 1 B 18 S 35 j 52 0 211 2 C 19 T 36 k 53 1 212 3 D 20 U 37 l 54 2 213 4 E 21 V 38 m 55 3 214 5 F 22 W 39 n 56 4 215 6 G 23 X 40 o 57 5 216 7 H 24 Y 41 p 58 6 217 8 I 25 Z 42 q 59 7 218 9 J 26 a 43 r 60 8 219 10 K 27 b 44 s 61 9 220 11 L 28 c 45 t 62 + 221 12 M 29 d 46 u 63 / 222 13 N 30 e 47 v 223 14 O 31 f 48 w (pad) = 224 15 P 32 g 49 x 225 16 Q 33 h 50 y 227 Special processing is performed if fewer than 24 bits are available 228 at the end of the data being encoded. A full encoding quantum is 229 always completed at the end of a quantity. When fewer than 24 input 230 bits are available in an input group, bits with value zero are added 231 (on the right) to form an integral number of 6-bit groups. Padding 232 at the end of the data is performed using the '=' character. Since 233 all base 64 input is an integral number of octets, only the following 234 cases can arise: 236 (1) the final quantum of encoding input is an integral multiple of 24 237 bits; here, the final unit of encoded output will be an integral 238 multiple of 4 characters with no "=" padding, 240 (2) the final quantum of encoding input is exactly 8 bits; here, the 241 final unit of encoded output will be two characters followed by two 242 "=" padding characters, or 244 (3) the final quantum of encoding input is exactly 16 bits; here, the 245 final unit of encoded output will be three characters followed by one 246 "=" padding character. 248 5. Base 64 Encoding with URL and Filename Safe Alphabet 250 The Base 64 encoding with an URL and filename safe alphabet has been 251 used in [10]. 253 An alternative alphabet has been suggested that used "~" as the 63rd 254 character. Since the "~" character has special meaning in some file 255 system environments, the encoding described in this section is 256 recommended instead. 258 This encoding should not be regarded as the same as the "base64" 259 encoding, and should not be referred to as only "base64". Unless 260 made clear, "base64" refer to the base 64 in the previous section. 262 This encoding is technically identical to the previous one, except 263 for the 62:nd and 63:rd alphabet character, as indicated in table 2. 265 Table 2: The "URL and Filename safe" Base 64 Alphabet 267 Value Encoding Value Encoding Value Encoding Value Encoding 268 0 A 17 R 34 i 51 z 269 1 B 18 S 35 j 52 0 270 2 C 19 T 36 k 53 1 271 3 D 20 U 37 l 54 2 272 4 E 21 V 38 m 55 3 273 5 F 22 W 39 n 56 4 274 6 G 23 X 40 o 57 5 275 7 H 24 Y 41 p 58 6 276 8 I 25 Z 42 q 59 7 277 9 J 26 a 43 r 60 8 278 10 K 27 b 44 s 61 9 279 11 L 28 c 45 t 62 - (minus) 280 12 M 29 d 46 u 63 _ 281 13 N 30 e 47 v (understrike) 282 14 O 31 f 48 w 283 15 P 32 g 49 x 284 16 Q 33 h 50 y (pad) = 286 6. Base 32 Encoding 288 The following description of base 32 is due to [9] (with 289 corrections). 291 The Base 32 encoding is designed to represent arbitrary sequences of 292 octets in a form that needs to be case insensitive but need not be 293 humanly readable. 295 A 33-character subset of US-ASCII is used, enabling 5 bits to be 296 represented per printable character. (The extra 33rd character, "=", 297 is used to signify a special processing function.) 299 The encoding process represents 40-bit groups of input bits as output 300 strings of 8 encoded characters. Proceeding from left to right, a 301 40-bit input group is formed by concatenating 5 8bit input groups. 302 These 40 bits are then treated as 8 concatenated 5-bit groups, each 303 of which is translated into a single digit in the base 32 alphabet. 304 When encoding a bit stream via the base 32 encoding, the bit stream 305 must be presumed to be ordered with the most-significant-bit first. 306 That is, the first bit in the stream will be the high-order bit in 307 the first 8bit byte, and the eighth bit will be the low-order bit in 308 the first 8bit byte, and so on. 310 Each 5-bit group is used as an index into an array of 32 printable 311 characters. The character referenced by the index is placed in the 312 output string. These characters, identified in Table 3, below, are 313 selected from US-ASCII digits and uppercase letters. 315 Table 3: The Base 32 Alphabet 317 Value Encoding Value Encoding Value Encoding Value Encoding 318 0 A 9 J 18 S 27 3 319 1 B 10 K 19 T 28 4 320 2 C 11 L 20 U 29 5 321 3 D 12 M 21 V 30 6 322 4 E 13 N 22 W 31 7 323 5 F 14 O 23 X 324 6 G 15 P 24 Y (pad) = 325 7 H 16 Q 25 Z 326 8 I 17 R 26 2 328 Special processing is performed if fewer than 40 bits are available 329 at the end of the data being encoded. A full encoding quantum is 330 always completed at the end of a body. When fewer than 40 input bits 331 are available in an input group, bits with value zero are added (on 332 the right) to form an integral number of 5-bit groups. Padding at 333 the end of the data is performed using the "=" character. Since all 334 base 32 input is an integral number of octets, only the following 335 cases can arise: 337 (1) the final quantum of encoding input is an integral multiple of 40 338 bits; here, the final unit of encoded output will be an integral 339 multiple of 8 characters with no "=" padding, 341 (2) the final quantum of encoding input is exactly 8 bits; here, the 342 final unit of encoded output will be two characters followed by six 343 "=" padding characters, 345 (3) the final quantum of encoding input is exactly 16 bits; here, the 346 final unit of encoded output will be four characters followed by four 347 "=" padding characters, 349 (4) the final quantum of encoding input is exactly 24 bits; here, the 350 final unit of encoded output will be five characters followed by 351 three "=" padding characters, or 353 (5) the final quantum of encoding input is exactly 32 bits; here, the 354 final unit of encoded output will be seven characters followed by one 355 "=" padding character. 357 7. Base 32 Encoding with Extended Hex Alphabet 359 The following description of base 32 is due to [7]. This encoding 360 should not be regarded as the same as the "base32" encoding, and 361 should not be referred to as only "base32". 363 One property with this alphabet, that the base64 and base32 alphabet 364 lack, is that encoded data maintain its sort order when the encoded 365 data is compared bit-wise. 367 This encoding is identical to the previous one, except for the 368 alphabet. The new alphabet is found in table 4. 370 Table 4: The "Extended Hex" Base 32 Alphabet 372 Value Encoding Value Encoding Value Encoding Value Encoding 373 0 0 9 9 18 I 27 R 374 1 1 10 A 19 J 28 S 375 2 2 11 B 20 K 29 T 376 3 3 12 C 21 L 30 U 377 4 4 13 D 22 M 31 V 378 5 5 14 E 23 N 379 6 6 15 F 24 O (pad) = 380 7 7 16 G 25 P 381 8 8 17 H 26 Q 383 8. Base 16 Encoding 385 The following description is original but analogous to previous 386 descriptions. Essentially, Base 16 encoding is the standard case 387 insensitive hex encoding, and may be referred to as "base16" or 388 "hex". 390 A 16-character subset of US-ASCII is used, enabling 4 bits to be 391 represented per printable character. 393 The encoding process represents 8-bit groups (octets) of input bits 394 as output strings of 2 encoded characters. Proceeding from left to 395 right, a 8-bit input is taken from the input data. These 8 bits are 396 then treated as 2 concatenated 4-bit groups, each of which is 397 translated into a single digit in the base 16 alphabet. 399 Each 4-bit group is used as an index into an array of 16 printable 400 characters. The character referenced by the index is placed in the 401 output string. 403 Table 5: The Base 16 Alphabet 405 Value Encoding Value Encoding Value Encoding Value Encoding 406 0 0 4 4 8 8 12 C 407 1 1 5 5 9 9 13 D 408 2 2 6 6 10 A 14 E 409 3 3 7 7 11 B 15 F 411 Unlike base 32 and base 64, no special padding is necessary since a 412 full code word is always available. 414 9. Illustrations and examples 416 To translate between binary and a base encoding, the input is stored 417 in a structure and the output is extracted. The case for base 64 is 418 displayed in the following figure, borrowed from [5]. 420 +--first octet--+-second octet--+--third octet--+ 421 |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| 422 +-----------+---+-------+-------+---+-----------+ 423 |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0| 424 +--1.index--+--2.index--+--3.index--+--4.index--+ 426 The case for base 32 is shown in the following figure, borrowed from 427 [7]. Each successive character in a base-32 value represents 5 428 successive bits of the underlying octet sequence. Thus, each group 429 of 8 characters represents a sequence of 5 octets (40 bits). 431 1 2 3 432 01234567 89012345 67890123 45678901 23456789 433 +--------+--------+--------+--------+--------+ 434 |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >| 435 +--------+--------+--------+--------+--------+ 436 <===> 8th character 437 <====> 7th character 438 <===> 6th character 439 <====> 5th character 440 <====> 4th character 441 <===> 3rd character 442 <====> 2nd character 443 <===> 1st character 445 The following example of Base64 data is from [5], with corrections. 447 Input data: 0x14fb9c03d97e 448 Hex: 1 4 f b 9 c | 0 3 d 9 7 e 449 8-bit: 00010100 11111011 10011100 | 00000011 11011001 01111110 450 6-bit: 000101 001111 101110 011100 | 000000 111101 100101 111110 451 Decimal: 5 15 46 28 0 61 37 62 452 Output: F P u c A 9 l + 454 Input data: 0x14fb9c03d9 455 Hex: 1 4 f b 9 c | 0 3 d 9 456 8-bit: 00010100 11111011 10011100 | 00000011 11011001 457 pad with 00 458 6-bit: 000101 001111 101110 011100 | 000000 111101 100100 459 Decimal: 5 15 46 28 0 61 36 460 pad with = 461 Output: F P u c A 9 k = 463 Input data: 0x14fb9c03 464 Hex: 1 4 f b 9 c | 0 3 465 8-bit: 00010100 11111011 10011100 | 00000011 466 pad with 0000 467 6-bit: 000101 001111 101110 011100 | 000000 110000 468 Decimal: 5 15 46 28 0 48 469 pad with = = 470 Output: F P u c A w = = 472 10. Test vectors 474 BASE64("") = "" 476 BASE64("f") = "Zg==" 478 BASE64("fo") = "Zm8=" 480 BASE64("foo") = "Zm9v" 482 BASE64("foob") = "Zm9vYg==" 484 BASE64("fooba") = "Zm9vYmE=" 486 BASE64("foobar") = "Zm9vYmFy" 488 BASE32("") = "" 490 BASE32("f") = "MY======" 492 BASE32("fo") = "MZXQ====" 493 BASE32("foo") = "MZXW6===" 495 BASE32("foob") = "MZXW6YQ=" 497 BASE32("fooba") = "MZXW6YTB" 499 BASE32("foobar") = "MZXW6YTBOI======" 501 BASE32-HEX("") = "" 503 BASE32-HEX("f") = "CO======" 505 BASE32-HEX("fo") = "CPNG====" 507 BASE32-HEX("foo") = "CPNMU===" 509 BASE32-HEX("foob") = "CPNMUOG=" 511 BASE32-HEX("fooba") = "CPNMUOJ1" 513 BASE32-HEX("foobar") = "CPNMUOJ1E8======" 515 BASE16("") = "" 517 BASE16("f") = "GG" 519 BASE16("fo") = "GGGP" 521 BASE16("foo") = "GGGPGP" 523 BASE16("foob") = "GGGPGPGC" 525 BASE16("fooba") = "GGGPGPGCGB" 527 BASE16("foobar") = "GGGPGPGCGBHC" 529 11. ISO C99 Implementation of Base64 531 Below is an ISO C99 implementation of Base64 encoding and decoding. 532 The code assume that the US-ASCII characters are encoding inside 533 'char' with values below 255, which holds for all POSIX platforms, 534 but should otherwise be portable. This code is not intended as a 535 normative specification of base64. 537 11.1. Prototypes: base64.h 539 /* base64.h -- Encode binary data using printable characters. 541 This program is free software; you can redistribute it 542 and/or modify it under the terms of the GNU Lesser 543 General Public License as published by the Free Software 544 Foundation; either version 2.1, or (at your option) any 545 later version. 547 This program is distributed in the hope that it will be 548 useful, but WITHOUT ANY WARRANTY; without even the 549 implied warranty of MERCHANTABILITY or FITNESS FOR A 550 PARTICULAR PURPOSE. See the GNU Lesser General Public 551 License for more details. 553 You should have received a copy of the GNU Lesser General 554 Public License along with this program; if not, write to 555 the Free Software Foundation, Inc., 51 Franklin Street, 556 Fifth Floor, Boston, MA 02110-1301, USA. */ 558 /* Written by Simon Josefsson. */ 560 #ifndef BASE64_H 561 # define BASE64_H 563 /* Get size_t. */ 564 # include 566 /* Get bool. */ 567 # include 569 /* This uses that the expression (n+(k-1))/k means the 570 smallest integer >= n/k, i.e., the ceiling of n/k. */ 571 # define BASE64_LENGTH(inlen) ((((inlen) + 2) / 3) * 4) 573 extern bool isbase64 (char ch); 575 extern void base64_encode (const char *restrict in, 576 size_t inlen, 577 char *restrict out, 578 size_t outlen); 580 extern size_t base64_encode_alloc (const char *in, 581 size_t inlen, 582 char **out); 584 extern bool base64_decode (const char *restrict in, 585 size_t inlen, 586 char *restrict out, 587 size_t *outlen); 589 extern bool base64_decode_alloc (const char *in, 590 size_t inlen, 591 char **out, 592 size_t *outlen); 594 #endif /* BASE64_H */ 596 11.2. Implementation: base64.c 598 /* base64.c -- Encode binary data using printable characters. 600 This program is free software; you can redistribute it 601 and/or modify it under the terms of the GNU Lesser 602 General Public License as published by the Free Software 603 Foundation; either version 2.1, or (at your option) any 604 later version. 606 This program is distributed in the hope that it will be 607 useful, but WITHOUT ANY WARRANTY; without even the 608 implied warranty of MERCHANTABILITY or FITNESS FOR A 609 PARTICULAR PURPOSE. See the GNU Lesser General Public 610 License for more details. 612 You should have received a copy of the GNU Lesser General 613 Public License along with this program; if not, write to 614 the Free Software Foundation, Inc., 51 Franklin Street, 615 Fifth Floor, Boston, MA 02110-1301, USA. */ 617 /* Written by Simon Josefsson. Partially adapted from GNU 618 * MailUtils (mailbox/filter_trans.c, as of 2004-11-28). 619 * Improved by review from Paul Eggert, Bruno Haible, and 620 * Stepan Kasal. 621 * 622 * Be careful with error checking. Here is how you would 623 * typically use these functions: 624 * 625 * bool ok = base64_decode_alloc (in, inlen, &out, &outlen); 626 * if (!ok) 627 * FAIL: input was not valid base64 628 * if (out == NULL) 629 * FAIL: memory allocation error 630 * OK: data in OUT/OUTLEN 631 * 632 * size_t outlen = base64_encode_alloc (in, inlen, &out); 633 * if (out == NULL && outlen == 0 && inlen != 0) 634 * FAIL: input too long 635 * if (out == NULL) 636 * FAIL: memory allocation error 637 * OK: data in OUT/OUTLEN. 638 * 639 */ 641 /* Get prototype. */ 642 #include "base64.h" 644 /* Get malloc. */ 645 #include 647 /* Get UCHAR_MAX. */ 648 #include 650 /* C89 compliant way to cast 'char' to 'unsigned char'. */ 651 static inline unsigned char 652 to_uchar (char ch) 653 { 654 return ch; 655 } 657 /* Base64 encode IN array of size INLEN into OUT array of 658 size OUTLEN. If OUTLEN is less than 659 BASE64_LENGTH(INLEN), write as many bytes as possible. 660 If OUTLEN is larger than BASE64_LENGTH(INLEN), also zero 661 terminate the output buffer. */ 662 void 663 base64_encode (const char *restrict in, size_t inlen, 664 char *restrict out, size_t outlen) 665 { 666 static const char b64str[64] = 667 "ABCDEFGHIJKLMNOPQRSTUVWXYZ" 668 "abcdefghijklmnopqrstuvwxyz0123456789+/"; 670 while (inlen && outlen) 671 { 672 *out++ = b64str[to_uchar (in[0]) >> 2]; 673 if (!--outlen) 674 break; 675 *out++ = b64str[((to_uchar (in[0]) << 4) 676 + (--inlen ? to_uchar (in[1]) >> 4 : 0)) 677 & 0x3f]; 678 if (!--outlen) 679 break; 680 *out++ = 681 (inlen 682 ? b64str[((to_uchar (in[1]) << 2) 683 + (--inlen ? to_uchar (in[2]) >> 6 : 0)) 684 & 0x3f] 686 : '='); 687 if (!--outlen) 688 break; 689 *out++ = inlen ? b64str[to_uchar (in[2]) & 0x3f] : '='; 690 if (!--outlen) 691 break; 692 if (inlen) 693 inlen--; 694 if (inlen) 695 in += 3; 696 } 698 if (outlen) 699 *out = '\0'; 700 } 702 /* Allocate a buffer and store zero terminated base64 703 encoded data from array IN of size INLEN, returning 704 BASE64_LENGTH(INLEN), i.e., the length of the encoded 705 data, excluding the terminating zero. On return, the OUT 706 variable will hold a pointer to newly allocated memory 707 that must be deallocated by the caller. If output string 708 length would overflow, 0 is returned and OUT is set to 709 NULL. If memory allocation fail, OUT is set to NULL, and 710 the return value indicate length of the requested memory 711 block, i.e., BASE64_LENGTH(inlen) + 1. */ 712 size_t 713 base64_encode_alloc (const char *in, size_t inlen, char **out) 714 { 715 size_t outlen = 1 + BASE64_LENGTH (inlen); 717 /* Check for overflow in outlen computation. 718 * 719 * If there is no overflow, outlen >= inlen. 720 * 721 * If the operation (inlen + 2) overflows then it yields 722 * at most +1, so outlen is 0. 723 * 724 * If the multiplication overflows, we lose at least half 725 * of the correct value, so the result is < ((inlen + 726 * 2) / 3) * 2, which is less than (inlen + 2) * 0.66667, 727 * which is less than inlen as soon as (inlen > 4). 728 */ 729 if (inlen > outlen) 730 { 731 *out = NULL; 732 return 0; 733 } 735 *out = malloc (outlen); 736 if (*out) 737 base64_encode (in, inlen, *out, outlen); 739 return outlen - 1; 740 } 742 /* With this approach this file works independent of the 743 charset used (think EBCDIC). However, it does assume 744 that the characters in the Base64 alphabet (A-Za-z0-9+/) 745 are encoded in 0..255. POSIX 1003.1-2001 require that 746 char and unsigned char are 8-bit quantities, though, 747 taking care of that problem. But this may be a potential 748 problem on non-POSIX C99 platforms. */ 749 #define B64(x) \ 750 ((x) == 'A' ? 0 \ 751 : (x) == 'B' ? 1 \ 752 : (x) == 'C' ? 2 \ 753 : (x) == 'D' ? 3 \ 754 : (x) == 'E' ? 4 \ 755 : (x) == 'F' ? 5 \ 756 : (x) == 'G' ? 6 \ 757 : (x) == 'H' ? 7 \ 758 : (x) == 'I' ? 8 \ 759 : (x) == 'J' ? 9 \ 760 : (x) == 'K' ? 10 \ 761 : (x) == 'L' ? 11 \ 762 : (x) == 'M' ? 12 \ 763 : (x) == 'N' ? 13 \ 764 : (x) == 'O' ? 14 \ 765 : (x) == 'P' ? 15 \ 766 : (x) == 'Q' ? 16 \ 767 : (x) == 'R' ? 17 \ 768 : (x) == 'S' ? 18 \ 769 : (x) == 'T' ? 19 \ 770 : (x) == 'U' ? 20 \ 771 : (x) == 'V' ? 21 \ 772 : (x) == 'W' ? 22 \ 773 : (x) == 'X' ? 23 \ 774 : (x) == 'Y' ? 24 \ 775 : (x) == 'Z' ? 25 \ 776 : (x) == 'a' ? 26 \ 777 : (x) == 'b' ? 27 \ 778 : (x) == 'c' ? 28 \ 779 : (x) == 'd' ? 29 \ 780 : (x) == 'e' ? 30 \ 781 : (x) == 'f' ? 31 \ 782 : (x) == 'g' ? 32 \ 783 : (x) == 'h' ? 33 \ 784 : (x) == 'i' ? 34 \ 785 : (x) == 'j' ? 35 \ 786 : (x) == 'k' ? 36 \ 787 : (x) == 'l' ? 37 \ 788 : (x) == 'm' ? 38 \ 789 : (x) == 'n' ? 39 \ 790 : (x) == 'o' ? 40 \ 791 : (x) == 'p' ? 41 \ 792 : (x) == 'q' ? 42 \ 793 : (x) == 'r' ? 43 \ 794 : (x) == 's' ? 44 \ 795 : (x) == 't' ? 45 \ 796 : (x) == 'u' ? 46 \ 797 : (x) == 'v' ? 47 \ 798 : (x) == 'w' ? 48 \ 799 : (x) == 'x' ? 49 \ 800 : (x) == 'y' ? 50 \ 801 : (x) == 'z' ? 51 \ 802 : (x) == '0' ? 52 \ 803 : (x) == '1' ? 53 \ 804 : (x) == '2' ? 54 \ 805 : (x) == '3' ? 55 \ 806 : (x) == '4' ? 56 \ 807 : (x) == '5' ? 57 \ 808 : (x) == '6' ? 58 \ 809 : (x) == '7' ? 59 \ 810 : (x) == '8' ? 60 \ 811 : (x) == '9' ? 61 \ 812 : (x) == '+' ? 62 \ 813 : (x) == '/' ? 63 \ 814 : -1) 816 static const signed char b64[0x100] = { 817 B64 (0), B64 (1), B64 (2), B64 (3), 818 B64 (4), B64 (5), B64 (6), B64 (7), 819 B64 (8), B64 (9), B64 (10), B64 (11), 820 B64 (12), B64 (13), B64 (14), B64 (15), 821 B64 (16), B64 (17), B64 (18), B64 (19), 822 B64 (20), B64 (21), B64 (22), B64 (23), 823 B64 (24), B64 (25), B64 (26), B64 (27), 824 B64 (28), B64 (29), B64 (30), B64 (31), 825 B64 (32), B64 (33), B64 (34), B64 (35), 826 B64 (36), B64 (37), B64 (38), B64 (39), 827 B64 (40), B64 (41), B64 (42), B64 (43), 828 B64 (44), B64 (45), B64 (46), B64 (47), 829 B64 (48), B64 (49), B64 (50), B64 (51), 830 B64 (52), B64 (53), B64 (54), B64 (55), 831 B64 (56), B64 (57), B64 (58), B64 (59), 832 B64 (60), B64 (61), B64 (62), B64 (63), 833 B64 (64), B64 (65), B64 (66), B64 (67), 834 B64 (68), B64 (69), B64 (70), B64 (71), 835 B64 (72), B64 (73), B64 (74), B64 (75), 836 B64 (76), B64 (77), B64 (78), B64 (79), 837 B64 (80), B64 (81), B64 (82), B64 (83), 838 B64 (84), B64 (85), B64 (86), B64 (87), 839 B64 (88), B64 (89), B64 (90), B64 (91), 840 B64 (92), B64 (93), B64 (94), B64 (95), 841 B64 (96), B64 (97), B64 (98), B64 (99), 842 B64 (100), B64 (101), B64 (102), B64 (103), 843 B64 (104), B64 (105), B64 (106), B64 (107), 844 B64 (108), B64 (109), B64 (110), B64 (111), 845 B64 (112), B64 (113), B64 (114), B64 (115), 846 B64 (116), B64 (117), B64 (118), B64 (119), 847 B64 (120), B64 (121), B64 (122), B64 (123), 848 B64 (124), B64 (125), B64 (126), B64 (127), 849 B64 (128), B64 (129), B64 (130), B64 (131), 850 B64 (132), B64 (133), B64 (134), B64 (135), 851 B64 (136), B64 (137), B64 (138), B64 (139), 852 B64 (140), B64 (141), B64 (142), B64 (143), 853 B64 (144), B64 (145), B64 (146), B64 (147), 854 B64 (148), B64 (149), B64 (150), B64 (151), 855 B64 (152), B64 (153), B64 (154), B64 (155), 856 B64 (156), B64 (157), B64 (158), B64 (159), 857 B64 (160), B64 (161), B64 (162), B64 (163), 858 B64 (164), B64 (165), B64 (166), B64 (167), 859 B64 (168), B64 (169), B64 (170), B64 (171), 860 B64 (172), B64 (173), B64 (174), B64 (175), 861 B64 (176), B64 (177), B64 (178), B64 (179), 862 B64 (180), B64 (181), B64 (182), B64 (183), 863 B64 (184), B64 (185), B64 (186), B64 (187), 864 B64 (188), B64 (189), B64 (190), B64 (191), 865 B64 (192), B64 (193), B64 (194), B64 (195), 866 B64 (196), B64 (197), B64 (198), B64 (199), 867 B64 (200), B64 (201), B64 (202), B64 (203), 868 B64 (204), B64 (205), B64 (206), B64 (207), 869 B64 (208), B64 (209), B64 (210), B64 (211), 870 B64 (212), B64 (213), B64 (214), B64 (215), 871 B64 (216), B64 (217), B64 (218), B64 (219), 872 B64 (220), B64 (221), B64 (222), B64 (223), 873 B64 (224), B64 (225), B64 (226), B64 (227), 874 B64 (228), B64 (229), B64 (230), B64 (231), 875 B64 (232), B64 (233), B64 (234), B64 (235), 876 B64 (236), B64 (237), B64 (238), B64 (239), 877 B64 (240), B64 (241), B64 (242), B64 (243), 878 B64 (244), B64 (245), B64 (246), B64 (247), 879 B64 (248), B64 (249), B64 (250), B64 (251), 880 B64 (252), B64 (253), B64 (254), B64 (255) 881 }; 883 #if UCHAR_MAX == 255 884 # define uchar_in_range(c) true 885 #else 886 # define uchar_in_range(c) ((c) <= 255) 887 #endif 889 bool 890 isbase64 (char ch) 891 { 892 return uchar_in_range (to_uchar (ch)) && 0 <= b64[to_uchar (ch)]; 893 } 895 /* Decode base64 encoded input array IN of length INLEN to 896 output array OUT that can hold *OUTLEN bytes. Return 897 true if decoding was successful, i.e. if the input was 898 valid base64 data, false otherwise. If *OUTLEN is too 899 small, as many bytes as possible will be written to OUT. 900 On return, *OUTLEN holds the length of decoded bytes in 901 OUT. Note that as soon as any non-alphabet characters 902 are encountered, decoding is stopped and false is 903 returned. */ 904 bool 905 base64_decode (const char *restrict in, size_t inlen, 906 char *restrict out, size_t *outlen) 907 { 908 size_t outleft = *outlen; 910 while (inlen >= 2) 911 { 912 if (!isbase64 (in[0]) || !isbase64 (in[1])) 913 break; 915 if (outleft) 916 { 917 *out++ = ((b64[to_uchar (in[0])] << 2) 918 | (b64[to_uchar (in[1])] >> 4)); 919 outleft--; 920 } 922 if (inlen == 2) 923 break; 925 if (in[2] == '=') 926 { 927 if (inlen != 4) 928 break; 930 if (in[3] != '=') 931 break; 933 } 934 else 935 { 936 if (!isbase64 (in[2])) 937 break; 939 if (outleft) 940 { 941 *out++ = (((b64[to_uchar (in[1])] << 4) & 0xf0) 942 | (b64[to_uchar (in[2])] >> 2)); 943 outleft--; 944 } 946 if (inlen == 3) 947 break; 949 if (in[3] == '=') 950 { 951 if (inlen != 4) 952 break; 953 } 954 else 955 { 956 if (!isbase64 (in[3])) 957 break; 959 if (outleft) 960 { 961 *out++ = (((b64[to_uchar (in[2])] << 6) & 0xc0) 962 | b64[to_uchar (in[3])]); 963 outleft--; 964 } 965 } 966 } 968 in += 4; 969 inlen -= 4; 970 } 972 *outlen -= outleft; 974 if (inlen != 0) 975 return false; 977 return true; 978 } 980 /* Allocate an output buffer in *OUT, and decode the base64 981 encoded data stored in IN of size INLEN to the *OUT 982 buffer. On return, the size of the decoded data is 983 stored in *OUTLEN. OUTLEN may be NULL, if the caller is 984 not interested in the decoded length. *OUT may be NULL 985 to indicate an out of memory error, in which case *OUTLEN 986 contain the size of the memory block needed. The 987 function return true on successful decoding and memory 988 allocation errors. (Use the *OUT and *OUTLEN parameters 989 to differentiate between successful decoding and memory 990 error.) The function return false if the input was 991 invalid, in which case *OUT is NULL and *OUTLEN is 992 undefined. */ 993 bool 994 base64_decode_alloc (const char *in, size_t inlen, char **out, 995 size_t *outlen) 996 { 997 /* This may allocate a few bytes too much, depending on 998 input, but it's not worth the extra CPU time to compute 999 the exact amount. The exact amount is 3 * inlen / 4, 1000 minus 1 if the input ends with "=" and minus another 1 1001 if the input ends with "==". Dividing before 1002 multiplying avoids the possibility of overflow. */ 1003 size_t needlen = 3 * (inlen / 4) + 2; 1005 *out = malloc (needlen); 1006 if (!*out) 1007 return true; 1009 if (!base64_decode (in, inlen, *out, &needlen)) 1010 { 1011 free (*out); 1012 *out = NULL; 1013 return false; 1014 } 1016 if (outlen) 1017 *outlen = needlen; 1019 return true; 1020 } 1022 12. Security Considerations 1024 When implementing Base encoding and decoding, care should be taken 1025 not to introduce vulnerabilities to buffer overflow attacks, or other 1026 attacks on the implementation. A decoder should not break on invalid 1027 input including, e.g., embedded NUL characters (ASCII 0). 1029 If non-alphabet characters are ignored, instead of causing rejection 1030 of the entire encoding (as recommended), a covert channel that can be 1031 used to "leak" information is made possible. The implications of 1032 this should be understood in applications that do not follow the 1033 recommended practice. Similarly, when the base 16 and base 32 1034 alphabets are handled case insensitively, alteration of case can be 1035 used to leak information. 1037 Base encoding visually hides otherwise easily recognized information, 1038 such as passwords, but does not provide any computational 1039 confidentiality. This has been known to cause security incidents 1040 when, e.g., a user reports details of a network protocol exchange 1041 (perhaps to illustrate some other problem) and accidentally reveals 1042 the password because she is unaware that the base encoding does not 1043 protect the password. 1045 Base encoding adds no entropy to the plaintext, but it does increase 1046 the amount of plaintext available and provides a signature for 1047 cryptanalysis in the form of a characteristic probability 1048 distribution. 1050 13. Changes since RFC 3548 1052 Added the "base32 extended hex alphabet", needed to preserve sort 1053 order of encoded data. 1055 Reference IMAP for the special Base64 encoding used there. 1057 Fix the example copied from RFC 2440. 1059 Add security consideration about providing a signature for 1060 cryptoanalysis. 1062 Add test vectors and C99 implementation. 1064 Typo fixes. 1066 14. Acknowledgements 1067 Several people offered comments and/or suggestions, including John E. 1068 Hadstate, Tony Hansen, Gordon Mohr, John Myers, Chris Newman and 1069 Andrew Sieber. Text used in this document are based on earlier RFCs 1070 describing specific uses of various base encodings. The author 1071 acknowledges the RSA Laboratories for supporting the work that led to 1072 this document. 1074 This revised version is based in parts on comments and/or suggestions 1075 made by Roy Arends, Ted Hardie, Per Hygum, Jelte Jansen, Clement 1076 Kent, Paul Kwiatkowski, and Ben Laurie. 1078 15. Copying conditions 1080 Regarding the abstract and section 1, 3, 8, 10, 12, 13, and 14 of 1081 this document, that were written by Simon Josefsson ("the author", 1082 for the remainder of this section), the author makes no guarantees 1083 and is not responsible for any damage resulting from its use. The 1084 author grants irrevocable permission to anyone to use, modify, and 1085 distribute it in any way that does not diminish the rights of anyone 1086 else to use, modify, and distribute it, provided that redistributed 1087 derivative works do not contain misleading author or version 1088 information. Derivative works need not be licensed under similar 1089 terms. 1091 16. References 1093 16.1. Normative References 1095 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1096 Levels", BCP 14, RFC 2119, March 1997. 1098 16.2. Informative References 1100 [2] Cerf, V., "ASCII format for network interchange", RFC 20, 1101 October 1969. 1103 [3] Linn, J., "Privacy Enhancement for Internet Electronic Mail: 1104 Part I: Message Encryption and Authentication Procedures", 1105 RFC 1421, February 1993. 1107 [4] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1108 Extensions (MIME) Part One: Format of Internet Message Bodies", 1109 RFC 2045, November 1996. 1111 [5] Callas, J., Donnerhacke, L., Finney, H., and R. Thayer, 1112 "OpenPGP Message Format", RFC 2440, November 1998. 1114 [6] Eastlake, D., "Domain Name System Security Extensions", 1115 RFC 2535, March 1999. 1117 [7] Klyne, G. and L. Masinter, "Identifying Composite Media 1118 Features", RFC 2938, September 2000. 1120 [8] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 1121 4rev1", RFC 3501, March 2003. 1123 [9] Myers, J., "SASL GSSAPI mechanisms", Work in 1124 progress draft-ietf-cat-sasl-gssapi-01, May 2000. 1126 [10] Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list", World 1127 Wide Web http://zgp.org/pipermail/p2p-hackers/2001-September/ 1128 000315.html, September 2001. 1130 Author's Address 1132 Simon Josefsson 1134 Email: simon@josefsson.org 1136 Intellectual Property Statement 1138 The IETF takes no position regarding the validity or scope of any 1139 Intellectual Property Rights or other rights that might be claimed to 1140 pertain to the implementation or use of the technology described in 1141 this document or the extent to which any license under such rights 1142 might or might not be available; nor does it represent that it has 1143 made any independent effort to identify any such rights. Information 1144 on the procedures with respect to rights in RFC documents can be 1145 found in BCP 78 and BCP 79. 1147 Copies of IPR disclosures made to the IETF Secretariat and any 1148 assurances of licenses to be made available, or the result of an 1149 attempt made to obtain a general license or permission for the use of 1150 such proprietary rights by implementers or users of this 1151 specification can be obtained from the IETF on-line IPR repository at 1152 http://www.ietf.org/ipr. 1154 The IETF invites any interested party to bring to its attention any 1155 copyrights, patents or patent applications, or other proprietary 1156 rights that may cover technology that may be required to implement 1157 this standard. Please address the information to the IETF at 1158 ietf-ipr@ietf.org. 1160 Disclaimer of Validity 1162 This document and the information contained herein are provided on an 1163 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1164 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1165 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1166 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1167 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1168 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1170 Copyright Statement 1172 Copyright (C) The Internet Society (2006). This document is subject 1173 to the rights, licenses and restrictions contained in BCP 78, and 1174 except as set forth therein, the authors retain all their rights. 1176 Acknowledgment 1178 Funding for the RFC Editor function is currently provided by the 1179 Internet Society.