idnits 2.17.00 (12 Aug 2021) /tmp/idnits23654/draft-josefsson-rfc3548bis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1169. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1146. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1153. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1159. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) -- The draft header indicates that this document obsoletes RFC3548, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 24, 2006) is 5901 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '64' is mentioned on line 669, but not defined == Missing Reference: '0' is mentioned on line 919, but not defined -- Looks like a reference, but probably isn't: '0x100' on line 818 -- Obsolete informational reference (is this intentional?): RFC 2440 (ref. '5') (Obsoleted by RFC 4880) -- Obsolete informational reference (is this intentional?): RFC 2535 (ref. '6') (Obsoleted by RFC 4033, RFC 4034, RFC 4035) -- Obsolete informational reference (is this intentional?): RFC 3501 (ref. '8') (Obsoleted by RFC 9051) == Outdated reference: A later version (-05) exists of draft-ietf-cat-sasl-gssapi-01 Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Josefsson 3 Internet-Draft March 24, 2006 4 Obsoletes: 3548 (if approved) 5 Expires: September 25, 2006 7 The Base16, Base32, and Base64 Data Encodings 8 draft-josefsson-rfc3548bis-02 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on September 25, 2006. 35 Copyright Notice 37 Copyright (C) The Internet Society (2006). 39 Keywords 41 Base Encoding, Base64, Base32, Base16, Hex. 43 Abstract 45 This document describes the commonly used base 64, base 32, and base 46 16 encoding schemes. It also discusses the use of line-feeds in 47 encoded data, use of padding in encoded data, use of non-alphabet 48 characters in encoded data, and use of different encoding alphabets. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 2. Conventions Used in this Document . . . . . . . . . . . . . . 3 54 3. Implementation discrepancies . . . . . . . . . . . . . . . . . 3 55 3.1. Line feeds in encoded data . . . . . . . . . . . . . . . . 3 56 3.2. Padding of encoded data . . . . . . . . . . . . . . . . . 4 57 3.3. Interpretation of non-alphabet characters in encoded 58 data . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3.4. Choosing the alphabet . . . . . . . . . . . . . . . . . . 4 60 4. Base 64 Encoding . . . . . . . . . . . . . . . . . . . . . . . 5 61 5. Base 64 Encoding with URL and Filename Safe Alphabet . . . . . 7 62 6. Base 32 Encoding . . . . . . . . . . . . . . . . . . . . . . . 7 63 7. Base 32 Encoding with Extended Hex Alphabet . . . . . . . . . 9 64 8. Base 16 Encoding . . . . . . . . . . . . . . . . . . . . . . . 10 65 9. Illustrations and examples . . . . . . . . . . . . . . . . . . 11 66 10. Test vectors . . . . . . . . . . . . . . . . . . . . . . . . . 12 67 11. ISO C99 Implementation of Base64 . . . . . . . . . . . . . . . 13 68 11.1. Prototypes: base64.h . . . . . . . . . . . . . . . . . . . 13 69 11.2. Implementation: base64.c . . . . . . . . . . . . . . . . . 15 70 12. Security Considerations . . . . . . . . . . . . . . . . . . . 24 71 13. Changes since RFC 3548 . . . . . . . . . . . . . . . . . . . . 24 72 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 24 73 15. Copying conditions . . . . . . . . . . . . . . . . . . . . . . 25 74 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 75 16.1. Normative References . . . . . . . . . . . . . . . . . . . 25 76 16.2. Informative References . . . . . . . . . . . . . . . . . . 25 77 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 27 78 Intellectual Property and Copyright Statements . . . . . . . . . . 28 80 1. Introduction 82 Base encoding of data is used in many situations to store or transfer 83 data in environments that, perhaps for legacy reasons, are restricted 84 to only US-ASCII [2] data. Base encoding can also be used in new 85 applications that do not have legacy restrictions, simply because it 86 makes it possible to manipulate objects with text editors. 88 In the past, different applications have had different requirements 89 and thus sometimes implemented base encodings in slightly different 90 ways. Today, protocol specifications sometimes use base encodings in 91 general, and "base64" in particular, without a precise description or 92 reference. MIME [4] is often used as a reference for base64 without 93 considering the consequences for line-wrapping or non-alphabet 94 characters. The purpose of this specification is to establish common 95 alphabet and encoding considerations. This will hopefully reduce 96 ambiguity in other documents, leading to better interoperability. 98 2. Conventions Used in this Document 100 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 101 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 102 document are to be interpreted as described in [1]. 104 3. Implementation discrepancies 106 Here we discuss the discrepancies between base encoding 107 implementations in the past, and where appropriate, mandate a 108 specific recommended behavior for the future. 110 3.1. Line feeds in encoded data 112 MIME [4] is often used as a reference for base 64 encoding. However, 113 MIME does not define "base 64" per se, but rather a "base 64 Content- 114 Transfer-Encoding" for use within MIME. As such, MIME enforces a 115 limit on line length of base 64 encoded data to 76 characters. MIME 116 inherits the encoding from PEM [3] stating it is "virtually 117 identical", however PEM uses a line length of 64 characters. The 118 MIME and PEM limits are both due to limits within SMTP. 120 Implementations MUST NOT add line feeds to base encoded data unless 121 the specification referring to this document explicitly directs base 122 encoders to add line feeds after a specific number of characters. 124 3.2. Padding of encoded data 126 In some circumstances, the use of padding ("=") in base encoded data 127 is not required nor used. In the general case, when assumptions on 128 size of transported data cannot be made, padding is required to yield 129 correct decoded data. 131 Implementations MUST include appropriate pad characters at the end of 132 encoded data unless the specification referring to this document 133 explicitly states otherwise. 135 3.3. Interpretation of non-alphabet characters in encoded data 137 Base encodings use a specific, reduced, alphabet to encode binary 138 data. Non alphabet characters could exist within base encoded data, 139 caused by data corruption or by design. Non alphabet characters may 140 be exploited as a "covert channel", where non-protocol data can be 141 sent for nefarious purposes. Non alphabet characters might also be 142 sent in order to exploit implementation errors leading to, e.g., 143 buffer overflow attacks. 145 Implementations MUST reject the encoding if it contains characters 146 outside the base alphabet when interpreting base encoded data, unless 147 the specification referring to this document explicitly states 148 otherwise. Such specifications may, as MIME does, instead state that 149 characters outside the base encoding alphabet should simply be 150 ignored when interpreting data ("be liberal in what you accept"). 151 Note that this means that any CRLF constitute "non alphabet 152 characters" and are ignored. Furthermore, such specifications may 153 consider the pad character, "=", as not part of the base alphabet 154 until the end of the string. If more than the allowed number of pad 155 characters are found at the end of the string, e.g., a base 64 string 156 terminated with "===", the excess pad characters could be ignored. 158 3.4. Choosing the alphabet 160 Different applications have different requirements on the characters 161 in the alphabet. Here are a few requirements that determine which 162 alphabet should be used: 164 o Handled by humans. Characters "0", "O" are easily interchanged, 165 as well "1", "l" and "I". In the base32 alphabet below, where 0 166 (zero) and 1 (one) is not present, a decoder may interpret 0 as O, 167 and 1 as I or L depending on case. (However, by default it should 168 not, see previous section.) 170 o Encoded into structures that place other requirements. For base 171 16 and base 32, this determines the use of upper- or lowercase 172 alphabets. For base 64, the non-alphanumeric characters (in 173 particular "/") may be problematic in file names and URLs. 175 o Used as identifiers. Certain characters, notably "+" and "/" in 176 the base 64 alphabet, are treated as word-breaks by legacy text 177 search/index tools. 179 There is no universally accepted alphabet that fulfills all the 180 requirements. For an example of a highly specialized variant, see 181 IMAP [8]. In this document, we document and name some currently used 182 alphabets. 184 4. Base 64 Encoding 186 The following description of base 64 is due to [3], [4], [5] and [6]. 188 The Base 64 encoding is designed to represent arbitrary sequences of 189 octets in a form that requires case sensitivity but need not be 190 humanly readable. 192 A 65-character subset of US-ASCII is used, enabling 6 bits to be 193 represented per printable character. (The extra 65th character, "=", 194 is used to signify a special processing function.) 196 The encoding process represents 24-bit groups of input bits as output 197 strings of 4 encoded characters. Proceeding from left to right, a 198 24-bit input group is formed by concatenating 3 8-bit input groups. 199 These 24 bits are then treated as 4 concatenated 6-bit groups, each 200 of which is translated into a single digit in the base 64 alphabet. 202 Each 6-bit group is used as an index into an array of 64 printable 203 characters. The character referenced by the index is placed in the 204 output string. 206 Table 1: The Base 64 Alphabet 208 Value Encoding Value Encoding Value Encoding Value Encoding 209 0 A 17 R 34 i 51 z 210 1 B 18 S 35 j 52 0 211 2 C 19 T 36 k 53 1 212 3 D 20 U 37 l 54 2 213 4 E 21 V 38 m 55 3 214 5 F 22 W 39 n 56 4 215 6 G 23 X 40 o 57 5 216 7 H 24 Y 41 p 58 6 217 8 I 25 Z 42 q 59 7 218 9 J 26 a 43 r 60 8 219 10 K 27 b 44 s 61 9 220 11 L 28 c 45 t 62 + 221 12 M 29 d 46 u 63 / 222 13 N 30 e 47 v 223 14 O 31 f 48 w (pad) = 224 15 P 32 g 49 x 225 16 Q 33 h 50 y 227 Special processing is performed if fewer than 24 bits are available 228 at the end of the data being encoded. A full encoding quantum is 229 always completed at the end of a quantity. When fewer than 24 input 230 bits are available in an input group, bits with value zero are added 231 (on the right) to form an integral number of 6-bit groups. Padding 232 at the end of the data is performed using the '=' character. Since 233 all base 64 input is an integral number of octets, only the following 234 cases can arise: 236 (1) the final quantum of encoding input is an integral multiple of 24 237 bits; here, the final unit of encoded output will be an integral 238 multiple of 4 characters with no "=" padding, 240 (2) the final quantum of encoding input is exactly 8 bits; here, the 241 final unit of encoded output will be two characters followed by two 242 "=" padding characters, or 244 (3) the final quantum of encoding input is exactly 16 bits; here, the 245 final unit of encoded output will be three characters followed by one 246 "=" padding character. 248 5. Base 64 Encoding with URL and Filename Safe Alphabet 250 The Base 64 encoding with an URL and filename safe alphabet has been 251 used in [10]. 253 An alternative alphabet has been suggested that used "~" as the 63rd 254 character. Since the "~" character has special meaning in some file 255 system environments, the encoding described in this section is 256 recommended instead. 258 This encoding should not be regarded as the same as the "base64" 259 encoding, and should not be referred to as only "base64". Unless 260 made clear, "base64" refer to the base 64 in the previous section. 262 This encoding is technically identical to the previous one, except 263 for the 62:nd and 63:rd alphabet character, as indicated in table 2. 265 Table 2: The "URL and Filename safe" Base 64 Alphabet 267 Value Encoding Value Encoding Value Encoding Value Encoding 268 0 A 17 R 34 i 51 z 269 1 B 18 S 35 j 52 0 270 2 C 19 T 36 k 53 1 271 3 D 20 U 37 l 54 2 272 4 E 21 V 38 m 55 3 273 5 F 22 W 39 n 56 4 274 6 G 23 X 40 o 57 5 275 7 H 24 Y 41 p 58 6 276 8 I 25 Z 42 q 59 7 277 9 J 26 a 43 r 60 8 278 10 K 27 b 44 s 61 9 279 11 L 28 c 45 t 62 - (minus) 280 12 M 29 d 46 u 63 _ 281 13 N 30 e 47 v (understrike) 282 14 O 31 f 48 w 283 15 P 32 g 49 x 284 16 Q 33 h 50 y (pad) = 286 6. Base 32 Encoding 288 The following description of base 32 is due to [9] (with 289 corrections). 291 The Base 32 encoding is designed to represent arbitrary sequences of 292 octets in a form that needs to be case insensitive but need not be 293 humanly readable. 295 A 33-character subset of US-ASCII is used, enabling 5 bits to be 296 represented per printable character. (The extra 33rd character, "=", 297 is used to signify a special processing function.) 299 The encoding process represents 40-bit groups of input bits as output 300 strings of 8 encoded characters. Proceeding from left to right, a 301 40-bit input group is formed by concatenating 5 8bit input groups. 302 These 40 bits are then treated as 8 concatenated 5-bit groups, each 303 of which is translated into a single digit in the base 32 alphabet. 304 When encoding a bit stream via the base 32 encoding, the bit stream 305 must be presumed to be ordered with the most-significant-bit first. 306 That is, the first bit in the stream will be the high-order bit in 307 the first 8bit byte, and the eighth bit will be the low-order bit in 308 the first 8bit byte, and so on. 310 Each 5-bit group is used as an index into an array of 32 printable 311 characters. The character referenced by the index is placed in the 312 output string. These characters, identified in Table 3, below, are 313 selected from US-ASCII digits and uppercase letters. 315 Table 3: The Base 32 Alphabet 317 Value Encoding Value Encoding Value Encoding Value Encoding 318 0 A 9 J 18 S 27 3 319 1 B 10 K 19 T 28 4 320 2 C 11 L 20 U 29 5 321 3 D 12 M 21 V 30 6 322 4 E 13 N 22 W 31 7 323 5 F 14 O 23 X 324 6 G 15 P 24 Y (pad) = 325 7 H 16 Q 25 Z 326 8 I 17 R 26 2 328 Special processing is performed if fewer than 40 bits are available 329 at the end of the data being encoded. A full encoding quantum is 330 always completed at the end of a body. When fewer than 40 input bits 331 are available in an input group, bits with value zero are added (on 332 the right) to form an integral number of 5-bit groups. Padding at 333 the end of the data is performed using the "=" character. Since all 334 base 32 input is an integral number of octets, only the following 335 cases can arise: 337 (1) the final quantum of encoding input is an integral multiple of 40 338 bits; here, the final unit of encoded output will be an integral 339 multiple of 8 characters with no "=" padding, 341 (2) the final quantum of encoding input is exactly 8 bits; here, the 342 final unit of encoded output will be two characters followed by six 343 "=" padding characters, 345 (3) the final quantum of encoding input is exactly 16 bits; here, the 346 final unit of encoded output will be four characters followed by four 347 "=" padding characters, 349 (4) the final quantum of encoding input is exactly 24 bits; here, the 350 final unit of encoded output will be five characters followed by 351 three "=" padding characters, or 353 (5) the final quantum of encoding input is exactly 32 bits; here, the 354 final unit of encoded output will be seven characters followed by one 355 "=" padding character. 357 7. Base 32 Encoding with Extended Hex Alphabet 359 The following description of base 32 is due to [7]. This encoding 360 should not be regarded as the same as the "base32" encoding, and 361 should not be referred to as only "base32". 363 One property with this alphabet, that the base64 and base32 alphabet 364 lack, is that encoded data maintain its sort order when the encoded 365 data is compared bit-wise. 367 This encoding is identical to the previous one, except for the 368 alphabet. The new alphabet is found in table 4. 370 Table 4: The "Extended Hex" Base 32 Alphabet 372 Value Encoding Value Encoding Value Encoding Value Encoding 373 0 0 9 9 18 I 27 R 374 1 1 10 A 19 J 28 S 375 2 2 11 B 20 K 29 T 376 3 3 12 C 21 L 30 U 377 4 4 13 D 22 M 31 V 378 5 5 14 E 23 N 379 6 6 15 F 24 O (pad) = 380 7 7 16 G 25 P 381 8 8 17 H 26 Q 383 8. Base 16 Encoding 385 The following description is original but analogous to previous 386 descriptions. Essentially, Base 16 encoding is the standard case 387 insensitive hex encoding, and may be referred to as "base16" or 388 "hex". 390 A 16-character subset of US-ASCII is used, enabling 4 bits to be 391 represented per printable character. 393 The encoding process represents 8-bit groups (octets) of input bits 394 as output strings of 2 encoded characters. Proceeding from left to 395 right, a 8-bit input is taken from the input data. These 8 bits are 396 then treated as 2 concatenated 4-bit groups, each of which is 397 translated into a single digit in the base 16 alphabet. 399 Each 4-bit group is used as an index into an array of 16 printable 400 characters. The character referenced by the index is placed in the 401 output string. 403 Table 5: The Base 16 Alphabet 405 Value Encoding Value Encoding Value Encoding Value Encoding 406 0 0 4 4 8 8 12 C 407 1 1 5 5 9 9 13 D 408 2 2 6 6 10 A 14 E 409 3 3 7 7 11 B 15 F 411 Unlike base 32 and base 64, no special padding is necessary since a 412 full code word is always available. 414 9. Illustrations and examples 416 To translate between binary and a base encoding, the input is stored 417 in a structure and the output is extracted. The case for base 64 is 418 displayed in the following figure, borrowed from [5]. 420 +--first octet--+-second octet--+--third octet--+ 421 |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0| 422 +-----------+---+-------+-------+---+-----------+ 423 |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0| 424 +--1.index--+--2.index--+--3.index--+--4.index--+ 426 The case for base 32 is shown in the following figure, borrowed from 427 [7]. Each successive character in a base-32 value represents 5 428 successive bits of the underlying octet sequence. Thus, each group 429 of 8 characters represents a sequence of 5 octets (40 bits). 431 1 2 3 432 01234567 89012345 67890123 45678901 23456789 433 +--------+--------+--------+--------+--------+ 434 |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >| 435 +--------+--------+--------+--------+--------+ 436 <===> 8th character 437 <====> 7th character 438 <===> 6th character 439 <====> 5th character 440 <====> 4th character 441 <===> 3rd character 442 <====> 2nd character 443 <===> 1st character 445 The following example of Base64 data is from [5], with corrections. 447 Input data: 0x14fb9c03d97e 448 Hex: 1 4 f b 9 c | 0 3 d 9 7 e 449 8-bit: 00010100 11111011 10011100 | 00000011 11011001 01111110 450 6-bit: 000101 001111 101110 011100 | 000000 111101 100101 111110 451 Decimal: 5 15 46 28 0 61 37 62 452 Output: F P u c A 9 l + 454 Input data: 0x14fb9c03d9 455 Hex: 1 4 f b 9 c | 0 3 d 9 456 8-bit: 00010100 11111011 10011100 | 00000011 11011001 457 pad with 00 458 6-bit: 000101 001111 101110 011100 | 000000 111101 100100 459 Decimal: 5 15 46 28 0 61 36 460 pad with = 461 Output: F P u c A 9 k = 463 Input data: 0x14fb9c03 464 Hex: 1 4 f b 9 c | 0 3 465 8-bit: 00010100 11111011 10011100 | 00000011 466 pad with 0000 467 6-bit: 000101 001111 101110 011100 | 000000 110000 468 Decimal: 5 15 46 28 0 48 469 pad with = = 470 Output: F P u c A w = = 472 10. Test vectors 474 BASE64("") = "" 476 BASE64("f") = "Zg==" 478 BASE64("fo") = "Zm8=" 480 BASE64("foo") = "Zm9v" 482 BASE64("foob") = "Zm9vYg==" 484 BASE64("fooba") = "Zm9vYmE=" 486 BASE64("foobar") = "Zm9vYmFy" 488 BASE32("") = "" 490 BASE32("f") = "MY======" 492 BASE32("fo") = "MZXQ====" 493 BASE32("foo") = "MZXW6===" 495 BASE32("foob") = "MZXW6YQ=" 497 BASE32("fooba") = "MZXW6YTB" 499 BASE32("foobar") = "MZXW6YTBOI======" 501 BASE32-HEX("") = "" 503 BASE32-HEX("f") = "CO======" 505 BASE32-HEX("fo") = "CPNG====" 507 BASE32-HEX("foo") = "CPNMU===" 509 BASE32-HEX("foob") = "CPNMUOG=" 511 BASE32-HEX("fooba") = "CPNMUOJ1" 513 BASE32-HEX("foobar") = "CPNMUOJ1E8======" 515 BASE16("") = "" 517 BASE16("f") = "GG" 519 BASE16("fo") = "GGGP" 521 BASE16("foo") = "GGGPGP" 523 BASE16("foob") = "GGGPGPGC" 525 BASE16("fooba") = "GGGPGPGCGB" 527 BASE16("foobar") = "GGGPGPGCGBHC" 529 11. ISO C99 Implementation of Base64 531 Below is an ISO C99 implementation of Base64 encoding and decoding. 532 The code assume that the US-ASCII characters are encoding inside 533 'char' with values below 255, which holds for all POSIX platforms, 534 but should otherwise be portable. This code is not intended as a 535 normative specification of base64. 537 11.1. Prototypes: base64.h 539 /* base64.h -- Encode binary data using printable characters. 541 Copyright (C) 2004, 2005, 2006 Free Software Foundation, Inc. 542 Written by Simon Josefsson. 544 This program is free software; you can redistribute it 545 and/or modify it under the terms of the GNU Lesser 546 General Public License as published by the Free Software 547 Foundation; either version 2.1, or (at your option) any 548 later version. 550 This program is distributed in the hope that it will be 551 useful, but WITHOUT ANY WARRANTY; without even the 552 implied warranty of MERCHANTABILITY or FITNESS FOR A 553 PARTICULAR PURPOSE. See the GNU Lesser General Public 554 License for more details. 556 You should have received a copy of the GNU Lesser General 557 Public License along with this program; if not, write to 558 the Free Software Foundation, Inc., 51 Franklin Street, 559 Fifth Floor, Boston, MA 02110-1301, USA. */ 561 #ifndef BASE64_H 562 # define BASE64_H 564 /* Get size_t. */ 565 # include 567 /* Get bool. */ 568 # include 570 /* This uses that the expression (n+(k-1))/k means the 571 smallest integer >= n/k, i.e., the ceiling of n/k. */ 572 # define BASE64_LENGTH(inlen) ((((inlen) + 2) / 3) * 4) 574 extern bool isbase64 (char ch); 576 extern void base64_encode (const char *restrict in, 577 size_t inlen, 578 char *restrict out, 579 size_t outlen); 581 extern size_t base64_encode_alloc (const char *in, 582 size_t inlen, 583 char **out); 585 extern bool base64_decode (const char *restrict in, 586 size_t inlen, 587 char *restrict out, 588 size_t *outlen); 590 extern bool base64_decode_alloc (const char *in, 591 size_t inlen, 592 char **out, 593 size_t *outlen); 595 #endif /* BASE64_H */ 597 11.2. Implementation: base64.c 599 /* base64.c -- Encode binary data using printable characters. 600 Copyright (C) 1999, 2000, 2001, 2004, 2005, 2006 Free Software 601 Foundation, Inc. 603 This program is free software; you can redistribute it 604 and/or modify it under the terms of the GNU Lesser 605 General Public License as published by the Free Software 606 Foundation; either version 2.1, or (at your option) any 607 later version. 609 This program is distributed in the hope that it will be 610 useful, but WITHOUT ANY WARRANTY; without even the 611 implied warranty of MERCHANTABILITY or FITNESS FOR A 612 PARTICULAR PURPOSE. See the GNU Lesser General Public 613 License for more details. 615 You should have received a copy of the GNU Lesser General 616 Public License along with this program; if not, write to 617 the Free Software Foundation, Inc., 51 Franklin Street, 618 Fifth Floor, Boston, MA 02110-1301, USA. */ 620 /* Written by Simon Josefsson. Partially adapted from GNU 621 * MailUtils (mailbox/filter_trans.c, as of 2004-11-28). 622 * Improved by review from Paul Eggert, Bruno Haible, and 623 * Stepan Kasal. 624 * 625 * Be careful with error checking. Here is how you would 626 * typically use these functions: 627 * 628 * bool ok = base64_decode_alloc (in, inlen, &out, &outlen); 629 * if (!ok) 630 * FAIL: input was not valid base64 631 * if (out == NULL) 632 * FAIL: memory allocation error 633 * OK: data in OUT/OUTLEN 634 * 635 * size_t outlen = base64_encode_alloc (in, inlen, &out); 636 * if (out == NULL && outlen == 0 && inlen != 0) 637 * FAIL: input too long 638 * if (out == NULL) 639 * FAIL: memory allocation error 640 * OK: data in OUT/OUTLEN. 641 * 642 */ 644 /* Get prototype. */ 645 #include "base64.h" 647 /* Get malloc. */ 648 #include 650 /* Get UCHAR_MAX. */ 651 #include 653 /* C89 compliant way to cast 'char' to 'unsigned char'. */ 654 static inline unsigned char 655 to_uchar (char ch) 656 { 657 return ch; 658 } 660 /* Base64 encode IN array of size INLEN into OUT array of 661 size OUTLEN. If OUTLEN is less than 662 BASE64_LENGTH(INLEN), write as many bytes as possible. 663 If OUTLEN is larger than BASE64_LENGTH(INLEN), also zero 664 terminate the output buffer. */ 665 void 666 base64_encode (const char *restrict in, size_t inlen, 667 char *restrict out, size_t outlen) 668 { 669 static const char b64str[64] = 670 "ABCDEFGHIJKLMNOPQRSTUVWXYZ" 671 "abcdefghijklmnopqrstuvwxyz0123456789+/"; 673 while (inlen && outlen) 674 { 675 *out++ = b64str[to_uchar (in[0]) >> 2]; 676 if (!--outlen) 677 break; 678 *out++ = b64str[((to_uchar (in[0]) << 4) 679 + (--inlen ? to_uchar (in[1]) >> 4 : 0)) 680 & 0x3f]; 681 if (!--outlen) 682 break; 683 *out++ = 684 (inlen 685 ? b64str[((to_uchar (in[1]) << 2) 686 + (--inlen ? to_uchar (in[2]) >> 6 : 0)) 687 & 0x3f] 688 : '='); 689 if (!--outlen) 690 break; 691 *out++ = inlen ? b64str[to_uchar (in[2]) & 0x3f] : '='; 692 if (!--outlen) 693 break; 694 if (inlen) 695 inlen--; 696 if (inlen) 697 in += 3; 698 } 700 if (outlen) 701 *out = '\0'; 702 } 704 /* Allocate a buffer and store zero terminated base64 705 encoded data from array IN of size INLEN, returning 706 BASE64_LENGTH(INLEN), i.e., the length of the encoded 707 data, excluding the terminating zero. On return, the OUT 708 variable will hold a pointer to newly allocated memory 709 that must be deallocated by the caller. If output string 710 length would overflow, 0 is returned and OUT is set to 711 NULL. If memory allocation fail, OUT is set to NULL, and 712 the return value indicate length of the requested memory 713 block, i.e., BASE64_LENGTH(inlen) + 1. */ 714 size_t 715 base64_encode_alloc (const char *in, size_t inlen, char **out) 716 { 717 size_t outlen = 1 + BASE64_LENGTH (inlen); 719 /* Check for overflow in outlen computation. 720 * 721 * If there is no overflow, outlen >= inlen. 722 * 723 * If the operation (inlen + 2) overflows then it yields 724 * at most +1, so outlen is 0. 725 * 726 * If the multiplication overflows, we lose at least half 727 * of the correct value, so the result is < ((inlen + 728 * 2) / 3) * 2, which is less than (inlen + 2) * 0.66667, 729 * which is less than inlen as soon as (inlen > 4). 730 */ 731 if (inlen > outlen) 732 { 733 *out = NULL; 734 return 0; 735 } 737 *out = malloc (outlen); 738 if (*out) 739 base64_encode (in, inlen, *out, outlen); 741 return outlen - 1; 742 } 744 /* With this approach this file works independent of the 745 charset used (think EBCDIC). However, it does assume 746 that the characters in the Base64 alphabet (A-Za-z0-9+/) 747 are encoded in 0..255. POSIX 1003.1-2001 require that 748 char and unsigned char are 8-bit quantities, though, 749 taking care of that problem. But this may be a potential 750 problem on non-POSIX C99 platforms. */ 751 #define B64(x) \ 752 ((x) == 'A' ? 0 \ 753 : (x) == 'B' ? 1 \ 754 : (x) == 'C' ? 2 \ 755 : (x) == 'D' ? 3 \ 756 : (x) == 'E' ? 4 \ 757 : (x) == 'F' ? 5 \ 758 : (x) == 'G' ? 6 \ 759 : (x) == 'H' ? 7 \ 760 : (x) == 'I' ? 8 \ 761 : (x) == 'J' ? 9 \ 762 : (x) == 'K' ? 10 \ 763 : (x) == 'L' ? 11 \ 764 : (x) == 'M' ? 12 \ 765 : (x) == 'N' ? 13 \ 766 : (x) == 'O' ? 14 \ 767 : (x) == 'P' ? 15 \ 768 : (x) == 'Q' ? 16 \ 769 : (x) == 'R' ? 17 \ 770 : (x) == 'S' ? 18 \ 771 : (x) == 'T' ? 19 \ 772 : (x) == 'U' ? 20 \ 773 : (x) == 'V' ? 21 \ 774 : (x) == 'W' ? 22 \ 775 : (x) == 'X' ? 23 \ 776 : (x) == 'Y' ? 24 \ 777 : (x) == 'Z' ? 25 \ 778 : (x) == 'a' ? 26 \ 779 : (x) == 'b' ? 27 \ 780 : (x) == 'c' ? 28 \ 781 : (x) == 'd' ? 29 \ 782 : (x) == 'e' ? 30 \ 783 : (x) == 'f' ? 31 \ 784 : (x) == 'g' ? 32 \ 785 : (x) == 'h' ? 33 \ 786 : (x) == 'i' ? 34 \ 787 : (x) == 'j' ? 35 \ 788 : (x) == 'k' ? 36 \ 789 : (x) == 'l' ? 37 \ 790 : (x) == 'm' ? 38 \ 791 : (x) == 'n' ? 39 \ 792 : (x) == 'o' ? 40 \ 793 : (x) == 'p' ? 41 \ 794 : (x) == 'q' ? 42 \ 795 : (x) == 'r' ? 43 \ 796 : (x) == 's' ? 44 \ 797 : (x) == 't' ? 45 \ 798 : (x) == 'u' ? 46 \ 799 : (x) == 'v' ? 47 \ 800 : (x) == 'w' ? 48 \ 801 : (x) == 'x' ? 49 \ 802 : (x) == 'y' ? 50 \ 803 : (x) == 'z' ? 51 \ 804 : (x) == '0' ? 52 \ 805 : (x) == '1' ? 53 \ 806 : (x) == '2' ? 54 \ 807 : (x) == '3' ? 55 \ 808 : (x) == '4' ? 56 \ 809 : (x) == '5' ? 57 \ 810 : (x) == '6' ? 58 \ 811 : (x) == '7' ? 59 \ 812 : (x) == '8' ? 60 \ 813 : (x) == '9' ? 61 \ 814 : (x) == '+' ? 62 \ 815 : (x) == '/' ? 63 \ 816 : -1) 818 static const signed char b64[0x100] = { 819 B64 (0), B64 (1), B64 (2), B64 (3), 820 B64 (4), B64 (5), B64 (6), B64 (7), 821 B64 (8), B64 (9), B64 (10), B64 (11), 822 B64 (12), B64 (13), B64 (14), B64 (15), 823 B64 (16), B64 (17), B64 (18), B64 (19), 824 B64 (20), B64 (21), B64 (22), B64 (23), 825 B64 (24), B64 (25), B64 (26), B64 (27), 826 B64 (28), B64 (29), B64 (30), B64 (31), 827 B64 (32), B64 (33), B64 (34), B64 (35), 828 B64 (36), B64 (37), B64 (38), B64 (39), 829 B64 (40), B64 (41), B64 (42), B64 (43), 830 B64 (44), B64 (45), B64 (46), B64 (47), 831 B64 (48), B64 (49), B64 (50), B64 (51), 832 B64 (52), B64 (53), B64 (54), B64 (55), 833 B64 (56), B64 (57), B64 (58), B64 (59), 834 B64 (60), B64 (61), B64 (62), B64 (63), 835 B64 (64), B64 (65), B64 (66), B64 (67), 836 B64 (68), B64 (69), B64 (70), B64 (71), 837 B64 (72), B64 (73), B64 (74), B64 (75), 838 B64 (76), B64 (77), B64 (78), B64 (79), 839 B64 (80), B64 (81), B64 (82), B64 (83), 840 B64 (84), B64 (85), B64 (86), B64 (87), 841 B64 (88), B64 (89), B64 (90), B64 (91), 842 B64 (92), B64 (93), B64 (94), B64 (95), 843 B64 (96), B64 (97), B64 (98), B64 (99), 844 B64 (100), B64 (101), B64 (102), B64 (103), 845 B64 (104), B64 (105), B64 (106), B64 (107), 846 B64 (108), B64 (109), B64 (110), B64 (111), 847 B64 (112), B64 (113), B64 (114), B64 (115), 848 B64 (116), B64 (117), B64 (118), B64 (119), 849 B64 (120), B64 (121), B64 (122), B64 (123), 850 B64 (124), B64 (125), B64 (126), B64 (127), 851 B64 (128), B64 (129), B64 (130), B64 (131), 852 B64 (132), B64 (133), B64 (134), B64 (135), 853 B64 (136), B64 (137), B64 (138), B64 (139), 854 B64 (140), B64 (141), B64 (142), B64 (143), 855 B64 (144), B64 (145), B64 (146), B64 (147), 856 B64 (148), B64 (149), B64 (150), B64 (151), 857 B64 (152), B64 (153), B64 (154), B64 (155), 858 B64 (156), B64 (157), B64 (158), B64 (159), 859 B64 (160), B64 (161), B64 (162), B64 (163), 860 B64 (164), B64 (165), B64 (166), B64 (167), 861 B64 (168), B64 (169), B64 (170), B64 (171), 862 B64 (172), B64 (173), B64 (174), B64 (175), 863 B64 (176), B64 (177), B64 (178), B64 (179), 864 B64 (180), B64 (181), B64 (182), B64 (183), 865 B64 (184), B64 (185), B64 (186), B64 (187), 866 B64 (188), B64 (189), B64 (190), B64 (191), 867 B64 (192), B64 (193), B64 (194), B64 (195), 868 B64 (196), B64 (197), B64 (198), B64 (199), 869 B64 (200), B64 (201), B64 (202), B64 (203), 870 B64 (204), B64 (205), B64 (206), B64 (207), 871 B64 (208), B64 (209), B64 (210), B64 (211), 872 B64 (212), B64 (213), B64 (214), B64 (215), 873 B64 (216), B64 (217), B64 (218), B64 (219), 874 B64 (220), B64 (221), B64 (222), B64 (223), 875 B64 (224), B64 (225), B64 (226), B64 (227), 876 B64 (228), B64 (229), B64 (230), B64 (231), 877 B64 (232), B64 (233), B64 (234), B64 (235), 878 B64 (236), B64 (237), B64 (238), B64 (239), 879 B64 (240), B64 (241), B64 (242), B64 (243), 880 B64 (244), B64 (245), B64 (246), B64 (247), 881 B64 (248), B64 (249), B64 (250), B64 (251), 882 B64 (252), B64 (253), B64 (254), B64 (255) 883 }; 885 #if UCHAR_MAX == 255 886 # define uchar_in_range(c) true 887 #else 888 # define uchar_in_range(c) ((c) <= 255) 889 #endif 891 bool 892 isbase64 (char ch) 893 { 894 return uchar_in_range (to_uchar (ch)) && 0 <= b64[to_uchar (ch)]; 895 } 897 /* Decode base64 encoded input array IN of length INLEN to 898 output array OUT that can hold *OUTLEN bytes. Return 899 true if decoding was successful, i.e. if the input was 900 valid base64 data, false otherwise. If *OUTLEN is too 901 small, as many bytes as possible will be written to OUT. 902 On return, *OUTLEN holds the length of decoded bytes in 903 OUT. Note that as soon as any non-alphabet characters 904 are encountered, decoding is stopped and false is 905 returned. */ 906 bool 907 base64_decode (const char *restrict in, size_t inlen, 908 char *restrict out, size_t *outlen) 909 { 910 size_t outleft = *outlen; 912 while (inlen >= 2) 913 { 914 if (!isbase64 (in[0]) || !isbase64 (in[1])) 915 break; 917 if (outleft) 918 { 919 *out++ = ((b64[to_uchar (in[0])] << 2) 920 | (b64[to_uchar (in[1])] >> 4)); 921 outleft--; 922 } 924 if (inlen == 2) 925 break; 927 if (in[2] == '=') 928 { 929 if (inlen != 4) 930 break; 932 if (in[3] != '=') 933 break; 935 } 936 else 937 { 938 if (!isbase64 (in[2])) 939 break; 941 if (outleft) 942 { 943 *out++ = (((b64[to_uchar (in[1])] << 4) & 0xf0) 944 | (b64[to_uchar (in[2])] >> 2)); 945 outleft--; 946 } 948 if (inlen == 3) 949 break; 951 if (in[3] == '=') 952 { 953 if (inlen != 4) 954 break; 955 } 956 else 957 { 958 if (!isbase64 (in[3])) 959 break; 961 if (outleft) 962 { 963 *out++ = (((b64[to_uchar (in[2])] << 6) & 0xc0) 964 | b64[to_uchar (in[3])]); 965 outleft--; 966 } 967 } 968 } 970 in += 4; 971 inlen -= 4; 972 } 974 *outlen -= outleft; 975 if (inlen != 0) 976 return false; 978 return true; 979 } 981 /* Allocate an output buffer in *OUT, and decode the base64 982 encoded data stored in IN of size INLEN to the *OUT 983 buffer. On return, the size of the decoded data is 984 stored in *OUTLEN. OUTLEN may be NULL, if the caller is 985 not interested in the decoded length. *OUT may be NULL 986 to indicate an out of memory error, in which case *OUTLEN 987 contain the size of the memory block needed. The 988 function return true on successful decoding and memory 989 allocation errors. (Use the *OUT and *OUTLEN parameters 990 to differentiate between successful decoding and memory 991 error.) The function return false if the input was 992 invalid, in which case *OUT is NULL and *OUTLEN is 993 undefined. */ 994 bool 995 base64_decode_alloc (const char *in, size_t inlen, char **out, 996 size_t *outlen) 997 { 998 /* This may allocate a few bytes too much, depending on 999 input, but it's not worth the extra CPU time to compute 1000 the exact amount. The exact amount is 3 * inlen / 4, 1001 minus 1 if the input ends with "=" and minus another 1 1002 if the input ends with "==". Dividing before 1003 multiplying avoids the possibility of overflow. */ 1004 size_t needlen = 3 * (inlen / 4) + 2; 1006 *out = malloc (needlen); 1007 if (!*out) 1008 return true; 1010 if (!base64_decode (in, inlen, *out, &needlen)) 1011 { 1012 free (*out); 1013 *out = NULL; 1014 return false; 1015 } 1017 if (outlen) 1018 *outlen = needlen; 1020 return true; 1021 } 1023 12. Security Considerations 1025 When implementing Base encoding and decoding, care should be taken 1026 not to introduce vulnerabilities to buffer overflow attacks, or other 1027 attacks on the implementation. A decoder should not break on invalid 1028 input including, e.g., embedded NUL characters (ASCII 0). 1030 If non-alphabet characters are ignored, instead of causing rejection 1031 of the entire encoding (as recommended), a covert channel that can be 1032 used to "leak" information is made possible. The implications of 1033 this should be understood in applications that do not follow the 1034 recommended practice. Similarly, when the base 16 and base 32 1035 alphabets are handled case insensitively, alteration of case can be 1036 used to leak information. 1038 Base encoding visually hides otherwise easily recognized information, 1039 such as passwords, but does not provide any computational 1040 confidentiality. This has been known to cause security incidents 1041 when, e.g., a user reports details of a network protocol exchange 1042 (perhaps to illustrate some other problem) and accidentally reveals 1043 the password because she is unaware that the base encoding does not 1044 protect the password. 1046 Base encoding adds no entropy to the plaintext, but it does increase 1047 the amount of plaintext available and provides a signature for 1048 cryptanalysis in the form of a characteristic probability 1049 distribution. 1051 13. Changes since RFC 3548 1053 Added the "base32 extended hex alphabet", needed to preserve sort 1054 order of encoded data. 1056 Reference IMAP for the special Base64 encoding used there. 1058 Fix the example copied from RFC 2440. 1060 Add security consideration about providing a signature for 1061 cryptoanalysis. 1063 Add test vectors and C99 implementation. 1065 Typo fixes. 1067 14. Acknowledgements 1068 Several people offered comments and/or suggestions, including John E. 1069 Hadstate, Tony Hansen, Gordon Mohr, John Myers, Chris Newman and 1070 Andrew Sieber. Text used in this document are based on earlier RFCs 1071 describing specific uses of various base encodings. The author 1072 acknowledges the RSA Laboratories for supporting the work that led to 1073 this document. 1075 This revised version is based in parts on comments and/or suggestions 1076 made by Roy Arends, Ted Hardie, Per Hygum, Jelte Jansen, Clement 1077 Kent, Paul Kwiatkowski, and Ben Laurie. 1079 15. Copying conditions 1081 Regarding the abstract and section 1, 3, 8, 10, 12, 13, and 14 of 1082 this document, that were written by Simon Josefsson ("the author", 1083 for the remainder of this section), the author makes no guarantees 1084 and is not responsible for any damage resulting from its use. The 1085 author grants irrevocable permission to anyone to use, modify, and 1086 distribute it in any way that does not diminish the rights of anyone 1087 else to use, modify, and distribute it, provided that redistributed 1088 derivative works do not contain misleading author or version 1089 information. Derivative works need not be licensed under similar 1090 terms. 1092 16. References 1094 16.1. Normative References 1096 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1097 Levels", BCP 14, RFC 2119, March 1997. 1099 16.2. Informative References 1101 [2] Cerf, V., "ASCII format for network interchange", RFC 20, 1102 October 1969. 1104 [3] Linn, J., "Privacy Enhancement for Internet Electronic Mail: 1105 Part I: Message Encryption and Authentication Procedures", 1106 RFC 1421, February 1993. 1108 [4] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1109 Extensions (MIME) Part One: Format of Internet Message Bodies", 1110 RFC 2045, November 1996. 1112 [5] Callas, J., Donnerhacke, L., Finney, H., and R. Thayer, 1113 "OpenPGP Message Format", RFC 2440, November 1998. 1115 [6] Eastlake, D., "Domain Name System Security Extensions", 1116 RFC 2535, March 1999. 1118 [7] Klyne, G. and L. Masinter, "Identifying Composite Media 1119 Features", RFC 2938, September 2000. 1121 [8] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 1122 4rev1", RFC 3501, March 2003. 1124 [9] Myers, J., "SASL GSSAPI mechanisms", Work in 1125 progress draft-ietf-cat-sasl-gssapi-01, May 2000. 1127 [10] Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list", World 1128 Wide Web http://zgp.org/pipermail/p2p-hackers/2001-September/ 1129 000315.html, September 2001. 1131 Author's Address 1133 Simon Josefsson 1135 Email: simon@josefsson.org 1137 Intellectual Property Statement 1139 The IETF takes no position regarding the validity or scope of any 1140 Intellectual Property Rights or other rights that might be claimed to 1141 pertain to the implementation or use of the technology described in 1142 this document or the extent to which any license under such rights 1143 might or might not be available; nor does it represent that it has 1144 made any independent effort to identify any such rights. Information 1145 on the procedures with respect to rights in RFC documents can be 1146 found in BCP 78 and BCP 79. 1148 Copies of IPR disclosures made to the IETF Secretariat and any 1149 assurances of licenses to be made available, or the result of an 1150 attempt made to obtain a general license or permission for the use of 1151 such proprietary rights by implementers or users of this 1152 specification can be obtained from the IETF on-line IPR repository at 1153 http://www.ietf.org/ipr. 1155 The IETF invites any interested party to bring to its attention any 1156 copyrights, patents or patent applications, or other proprietary 1157 rights that may cover technology that may be required to implement 1158 this standard. Please address the information to the IETF at 1159 ietf-ipr@ietf.org. 1161 Disclaimer of Validity 1163 This document and the information contained herein are provided on an 1164 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1165 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1166 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1167 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1168 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1169 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1171 Copyright Statement 1173 Copyright (C) The Internet Society (2006). This document is subject 1174 to the rights, licenses and restrictions contained in BCP 78, and 1175 except as set forth therein, the authors retain all their rights. 1177 Acknowledgment 1179 Funding for the RFC Editor function is currently provided by the 1180 Internet Society.