idnits 2.17.00 (12 Aug 2021) /tmp/idnits19048/draft-ietf-cbor-file-magic-12.txt: -(5): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(514): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(772): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 5 instances of lines with non-ascii characters in the document. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document date (5 May 2022) is 9 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'C' Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CBOR Working Group M. Richardson 3 Internet-Draft Sandelman Software Works 4 Intended status: Standards Track C. Bormann 5 Expires: 6 November 2022 Universität Bremen TZI 6 5 May 2022 8 On storing CBOR encoded items on stable storage 9 draft-ietf-cbor-file-magic-12 11 Abstract 13 This document defines a stored ("file") format for CBOR data items 14 that is friendly to common file type recognition systems such as the 15 Unix file(1) command. 17 About This Document 19 This note is to be removed before publishing as an RFC. 21 Status information for this document may be found at 22 https://datatracker.ietf.org/doc/draft-ietf-cbor-file-magic/. 24 Discussion of this document takes place on the cbor Working Group 25 mailing list (mailto:cbor@ietf.org), which is archived at 26 https://mailarchive.ietf.org/arch/browse/cbor/. 28 Source for this draft and an issue tracker can be found at 29 https://github.com/cbor-wg/cbor-magic-number. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on 6 November 2022. 48 Copyright Notice 50 Copyright (c) 2022 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 55 license-info) in effect on the date of publication of this document. 56 Please review these documents carefully, as they describe your rights 57 and restrictions with respect to this document. Code Components 58 extracted from this document must include Revised BSD License text as 59 described in Section 4.e of the Trust Legal Provisions and are 60 provided without warranty as described in the Revised BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 66 1.2. Requirements for a Magic Number . . . . . . . . . . . . . 5 67 2. Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 5 68 2.1. The CBOR Protocol Specific Tag . . . . . . . . . . . . . 5 69 2.2. Enveloping Method: CBOR Tag Wrapped . . . . . . . . . . . 6 70 2.2.1. Example . . . . . . . . . . . . . . . . . . . . . . . 7 71 2.3. Enveloping Method: Labeled CBOR Sequence . . . . . . . . 7 72 2.3.1. Example . . . . . . . . . . . . . . . . . . . . . . . 8 73 3. Security Considerations . . . . . . . . . . . . . . . . . . . 9 74 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 75 4.1. Labeled CBOR Sequence Tag . . . . . . . . . . . . . . . . 10 76 4.2. CBOR-Labeled Non-CBOR Data Tag . . . . . . . . . . . . . 10 77 4.3. CBOR Tags for CoAP Content-Format Numbers . . . . . . . . 11 78 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 79 5.1. Normative References . . . . . . . . . . . . . . . . . . 12 80 5.2. Informative References . . . . . . . . . . . . . . . . . 12 81 Appendix A. Advice to Protocol Designer . . . . . . . . . . . . 14 82 A.1. Is the on-wire format new? . . . . . . . . . . . . . . . 15 83 A.2. Can many items be trivially concatenated? . . . . . . . . 15 84 A.3. Are there tags at the start? . . . . . . . . . . . . . . 16 85 Appendix B. CBOR Tags for CoAP Content Formats . . . . . . . . . 16 86 B.1. Content-Format Tag Examples . . . . . . . . . . . . . . . 18 87 Appendix C. Example from Openswan . . . . . . . . . . . . . . . 18 88 Appendix D. Using CBOR Labels for non-CBOR data . . . . . . . . 19 89 D.1. Content-Format Tag Examples . . . . . . . . . . . . . . . 20 90 Appendix E. Changelog . . . . . . . . . . . . . . . . . . . . . 20 91 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 20 92 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 21 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 95 1. Introduction 97 Since very early in computing, operating systems have sought ways to 98 mark which files could be processed by which programs. In Unix, 99 everything is a stream of bytes; identifying the contents of a stream 100 of bytes became a heuristic activity. 102 For instance, the Unix file(1) command, which has existed since 1973 103 [file], has for decades been able to identify many file formats based 104 upon the contents of the file. 106 Many systems (Linux, macOS, Windows) will select the correct 107 application based upon the file contents, if the system can not 108 determine it by other means. For instance, in classical MacOS, a 109 resource fork was maintained separately from the file data that 110 included file type information; this way, the OS ideally never needed 111 to know anything about the file data contents to determine the media 112 type. 114 Many other systems do this by file extensions. Many common web 115 servers derive the media-type information from file extensions. 117 Having a media type associated with the file contents can avoid some 118 of the brittleness of this approach. When files become disconnected 119 from their type information, such as when attempting to do forensics 120 on a damaged system, then being able to identify the type of 121 information that is stored in file can become very important. 123 A common way to identify the type of a file from its contents is to 124 place a "magic number" at the start of the file contents [MAGIC]. It 125 is noted that in the media type registration template [RFC6838], a 126 magic number is asked for, if available, as is a file extension. 128 A challenge for the file(1) command is often that it can be confused 129 by the encoding vs. the content. For instance, an Android "apk" (as 130 used to transfer and store an application) may be identified as a ZIP 131 file. Additionally, both OpenOffice and MSOffice files are ZIP files 132 of XML files, and may also be identified as a ZIP file. 134 As CBOR becomes a more and more common encoding for a wide variety of 135 artifacts, identifying them as just "CBOR" is probably not 136 sufficient. This document provides a way to encode a magic number 137 into the beginning of a CBOR format file. As a CBOR format may use a 138 single CBOR data item or a CBOR sequence of data items [RFC8742], two 139 possible methods of enveloping data are presented; a CBOR Protocol 140 designer will specify one. (A CBOR Protocol is a specification which 141 uses CBOR as its encoding.) 142 This document also gives advice to designers of CBOR Protocols on 143 choosing one of these mechanisms for identifying their contents. 144 This advice is informative. 146 A third method is also proposed by which this CBOR format prepended 147 tag is used to identify non-CBOR files. This third method has been 148 placed in Appendix D because it is not about identifying media types 149 containing CBOR-encoded data items. This includes a simple way to 150 derive a magic number to content-formats as defined by [RFC7252], 151 even if not in CBOR form. 153 Examples of CBOR Protocols currently under development include 154 Concise Software Identification Tags (CoSWID, [I-D.ietf-sacm-coswid]) 155 and Entity Attestation Tokens (EAT, [I-D.ietf-rats-eat]). COSE 156 itself [RFC8152] is considered infrastructure. The encoding of 157 public keys in CBOR as described in [I-D.ietf-cose-cbor-encoded-cert] 158 as _C509_ would benefit from being an identified CBOR Protocol. 160 A major inspiration for this document is observing the disarray in 161 certain ASN.1 based systems where most files are PEM encoded; these 162 are then all identified by the extension "pem", confusing public 163 keys, private keys, certificate requests, and S/MIME content. 165 While the envelopes defined in this specification add information to 166 how data conforming to CBOR Protocols are stored in files, there is 167 no requirement that either type of envelope be transferred on the 168 wire. However, there are some protocols which may benefit from 169 having such a magic number on the wire if they are presently using a 170 different (legacy) encoding scheme. The presence of the identifiable 171 magic sequence can be used to signal that a CBOR Protocol is being 172 used as opposed to a legacy scheme. 174 1.1. Terminology 176 Byte is a synonym for octet. The term "byte string" refers to the 177 data item defined in [STD94]. 179 The term "file" is understood to stand in a general way for a stored 180 representation that is somewhat detached from the original context of 181 usage of that representation; its usage in this document encompasses 182 similar units of storage that may have different identification 183 schemes such as partitions or media blocks. 185 The term "diagnostic notation" refers to the human-readable notation 186 for CBOR data items defined in Section 8 of [STD94] and Appendix G of 187 [RFC8610]. 189 The term CDDL (Concise Data Definition Language) refers to the 190 language defined in [RFC8610]. 192 The function TN(ct) is defined in Appendix B. 194 1.2. Requirements for a Magic Number 196 A magic number is ideally a fingerprint that is unique to a specific 197 CBOR protocol, present in the first few (small multiple of 4) bytes 198 of the file, which does not change when the contents change, and does 199 not depend upon the length of the file. 201 Less ideal solutions have a pattern that needs to be matched, but in 202 which some bytes need to be ignored. While the Unix file(1) command 203 can be told to ignore certain bytes, this can lead to ambiguities. 205 2. Protocol 207 This Section presents two enveloping methods. Both use CBOR Tags in 208 a way that results in a deterministic first 8 to 12 bytes. Which one 209 is to be used is up to the CBOR Protocol designer to determine; see 210 Appendix A for some guidance. 212 2.1. The CBOR Protocol Specific Tag 214 In both enveloping methods, CBOR Protocol designers need to obtain a 215 CBOR tag for each kind of object that they might store in files. As 216 there are more than 4 billion available 4-byte tags, there should be 217 little issue in allocating a few to each available CBOR Protocol. 219 The IANA policy for 4-byte CBOR Tags is First Come First Served, so 220 all that is required is a simple interaction (e.g., via web or email) 221 with IANA, having filled in the small template provided in 222 Section 9.2 of [STD94]. In the template, it is suggested to include 223 a reference to this specification (RFC XXXX) alongside the 224 Description of semantics. 225 // (Note to RFC Editor: Please replace all occurrences of "RFC XXXX" 226 // with the RFC number of the present specification and remove this 227 // note.) 228 Allocation of the CBOR tag needs to be initiated by the designer of 229 the CBOR Protocol, who can provide a proposed tag number. In order 230 to be in the four-byte range, and so that there are no leading zero 231 bytes in the four-byte encoding of the tag number, the value needs to 232 be in the range 0x01000000 (decimal 16777216) to 0xFFFFFFFF (decimal 233 4294967295) inclusive. It is further suggested to avoid values that 234 have an embedded zero byte in the four bytes of their binary 235 representation (such as 0x12003456), as these may confuse 236 implementations that treat the magic number as a C string. 238 The use of a sequence of four US-ASCII [RFC20] codes which are 239 mnemonic to the protocol is encouraged, but not required (there may 240 be reasons to encode other information into the tag; see Appendix B 241 for an example). For instance, Appendix C uses "OPSN" which 242 translates to the tag number 1330664270 registered for it. 244 For CBOR data items that form a representation that is described by a 245 CoAP Content-Format Number (Section 12.3 of [RFC7252], Registry CoAP 246 Content-Formats of [IANA.core-parameters]), a tag number has pro- 247 actively been allocated in Section 4.3 (see Appendix B for details 248 and examples). 250 2.2. Enveloping Method: CBOR Tag Wrapped 252 The CBOR Tag Wrapped method is appropriate for use with CBOR 253 protocols that encode a single CBOR data item. This data item is 254 enveloped into two nested tags: 256 The outer tag is a Self-described CBOR tag, 55799, as described in 257 Section 3.4.6 of [STD94]. 259 The tag content of the outer tag is a second CBOR tag whose tag 260 number has been allocated to describe the specific Protocol involved, 261 as discussed in Section 2.1. The tag content of this inner tag is 262 the single CBOR data item. 264 This method wraps the CBOR data item as CBOR tags usually do. 265 Applications that need to send the stored CBOR data item across a 266 constrained network may wish to remove the two tags if the type is 267 understood from the protocol context, e.g., from a CoAP Content- 268 Format Option (Section 5.10.3 of [RFC7252]). A CBOR Protocol 269 specification may therefore pick the specific cases where the CBOR 270 Tag Wrapped enveloping method is to be used. For instance, it might 271 specify its use for storing the representation in a local file or for 272 Web access, but not within protocol messages that already provide the 273 necessary context. 275 2.2.1. Example 277 To construct an example without registering a new tag, we use the 278 Content-Format number registered in [RFC8428] for application/ 279 senml+cbor (as per Registry Content-Formats of 280 [IANA.core-parameters]), the number 112. 282 Using the technique described in Appendix B, this translates into the 283 tag TN(112) = 1668546929. 285 With this tag, the SenML-CBOR pack [{0: "current", 6: 3, 2: 1.5}] 286 would be enveloped as (in diagnostic notation): 288 55799(1668546929([{0: "current", 6: 3, 2: 1.5}])) 290 Or in hex: 292 d9 d9f7 # tag(55799) 293 da 63740171 # tag(1668546929) 294 81 # array(1) 295 a3 # map(3) 296 00 # unsigned(0) 297 67 # text(7) 298 63757272656e74 # "current" 299 06 # unsigned(6) 300 03 # unsigned(3) 301 02 # unsigned(2) 302 f9 3e00 # primitive(15872) 304 At the representation level, the unique fingerprint for application/ 305 senml+cbor is composed of the 8 bytes d9d9f7da63740171 hex, after 306 which the unadorned CBOR data (81... for the SenML data) is appended. 308 2.3. Enveloping Method: Labeled CBOR Sequence 310 The Labeled CBOR Sequence method is appropriate for use with CBOR 311 Sequences as described in [RFC8742]. 313 This method prepends a newly constructed, separate data item to the 314 CBOR Sequence, the _label_. 316 The label is a nesting of two tags, similar to but distinct from the 317 CBOR Tag Wrapped methods, with an inner tag content of a constant 318 byte string. The total length of the label is 12 bytes. 320 1. The outer tag is the self-described CBOR Sequence tag, 55800. 322 2. The inner tag is a CBOR tag, from the First Come First Served 323 space, that uniquely identifies the CBOR Protocol. As with CBOR 324 Tag Wrapped, the use of a four-byte tag is encouraged that 325 encodes without zero bytes. 327 3. The tag content is a three byte CBOR byte string containing 328 0x42_4f_52 ('BOR' in diagnostic notation). 330 The outer tag in the label identifies the file as being a CBOR 331 Sequence, and does so with all the desirable properties explained in 332 Section 3.4.6 of [STD94]. Specifically, it does not appear to 333 conflict with any known file types, and it is not valid Unicode in 334 any Unicode encoding. 336 The inner tag in the label identifies which CBOR Protocol is used, as 337 described above. 339 The inner tag content is a constant byte string which is represented 340 as 0x43_42_4f_52, the ASCII characters "CBOR", which is the CBOR 341 encoded data item for the three-byte string 0x42_4f_52 ('BOR' in 342 diagnostic notation). 344 The actual CBOR Protocol data then follow as the next data item(s) in 345 the CBOR Sequence, without a need for any further specific tag. The 346 use of a CBOR Sequence allows the application to trivially remove the 347 first item with the two tags. 349 Should this file be reviewed by a human (directly in an editor, or in 350 a hexdump display), it will include the ASCII characters "CBOR" 351 prominently. This value is also included simply because the inner 352 nested tag needs to tag something. 354 2.3.1. Example 356 To construct an example without registering a new tag, we use the 357 Content-Format number registered in [RFC9177] for application/ 358 missing-blocks+cbor-seq (as per Registry Content-Formats of 359 [IANA.core-parameters]), the number 272. 361 Using the technique described in Appendix B, this translates into the 362 tag TN(272) = 1668547090. 364 This is a somewhat contrived example, as this is not a media type 365 that is likely to be committed to storage. Nonetheless, with this 366 tag, missing blocks list 0, 8, 15 would be enveloped as (in 367 diagnostic notation): 369 55800(1668547090('BOR')), 370 0, 371 8, 372 15 374 Or in hex: 376 # CBOR sequence with 4 elements 377 d9 d9f8 # tag(55800) 378 da 63740212 # tag(1668547090) 379 43 # bytes(3) 380 424f52 # "BOR" 381 00 # unsigned(0) 382 08 # unsigned(8) 383 0f # unsigned(15) 385 At the representation level, the unique fingerprint for application/ 386 missing-blocks+cbor-seq is composed of the 8 bytes d9d9f8da63740212 387 hex, after which the unadorned CBOR sequence (00... for the missing 388 block list given) is appended. 390 3. Security Considerations 392 This document provides a way to identify CBOR Protocol objects. 393 Clearly identifying CBOR contents in files may have a variety of 394 impacts. 396 The most obvious is that it may allow malware to identify interesting 397 stored objects, and then exfiltrate or corrupt them. 399 Protective applications (that check data) cannot rely on the 400 applications they try to protect (that use the data) to make exactly 401 the same decisions in recognizing file formats. (This is an instance 402 of a check vs. use issue.) For example, end-point assessment 403 technologies should not solely rely on the labeling approaches 404 described in this document to decide whether to inspect a given file. 405 Similarly, depending on operating systems configurations and related 406 properties of the execution environment the labeling might influence 407 the default application used to process a file in a way that may not 408 be predicted by a protective application. 410 4. IANA Considerations 412 These IANA considerations are entirely about CBOR Tags, in the 413 registry CBOR Tags of [IANA.cbor-tags]. 415 Section 4.1 documents the allocation that was done for a CBOR tag to 416 be used in a CBOR sequence to identify the sequence (an example for 417 using this tag is found in Appendix C). Section 4.3 allocates a CBOR 418 tag for each actual or potential CoAP Content-Format number (examples 419 are in Appendix B). 421 4.1. Labeled CBOR Sequence Tag 423 IANA has allocated tag 55800 as the tag for the Labeled CBOR Sequence 424 Enveloping Method from the CBOR Tags Registry. IANA is asked to 425 update this tag registration to point to this document. 427 This tag is from the First Come/First Served area. 429 The value has been picked to have properties similar to the 55799 tag 430 (Section 3.4.6 of [STD94]). 432 The hexadecimal representation of the encoded tag head is: 433 0xd9_d9_f8. 435 This is not valid UTF-8: the first 0xd9 introduces a three-byte 436 sequence in UTF-8, but the 0xd9 as the second value is not a valid 437 second byte for UTF-8. 439 This is not valid UTF-16: the byte sequence 0xd9d9 (in either endian 440 order) puts this value into the UTF-16 high-half zone, which would 441 signal that this a 32-bit Unicode value. However, the following 442 16-bit big-endian value 0xf8.. is not a valid second sequence 443 according to [RFC2781]. On a little-endian system, it would be 444 necessary to examine the fourth byte to determine if it is valid. 445 That next byte is determined by the subsequent encoding, and 446 Section 3.4.6 of [STD94] has already determined that no valid CBOR 447 encodings result in valid UTF-16. 449 Data Item: 450 tagged byte string 452 Semantics: 453 indicates that the file contains CBOR Sequences 455 4.2. CBOR-Labeled Non-CBOR Data Tag 457 IANA is requested to allocate tag 55801 as the tag for the CBOR- 458 Labeled Non-CBOR Data Enveloping Method (Appendix D) from the CBOR 459 Tags Registry. IANA is asked to update this tag registration to 460 point to this document. 462 This tag is from the First Come/First Served area. 464 The value has been picked to have properties similar to the 55799 tag 465 (Section 3.4.6 of [STD94]). 467 The hexadecimal representation of the encoded tag head is: 468 0xd9_d9_f9. 470 This is not valid UTF-8: the first 0xd9 introduces a three-byte 471 sequence in UTF-8, but the 0xd9 as the second value is not a valid 472 second byte for UTF-8. 474 This is not valid UTF-16: the byte sequence 0xd9d9 (in either endian 475 order) puts this value into the UTF-16 high-half zone, which would 476 signal that this a 32-bit Unicode value. However, the following 477 16-bit big-endian value 0xf9.. is not a valid second sequence 478 according to [RFC2781]. On a little-endian system, it would be 479 necessary to examine the fourth byte to determine if it is valid. 480 That next byte is determined by the subsequent encoding, and 481 Section 3.4.6 of [STD94] has already determined that no valid CBOR 482 encodings result in valid UTF-16. 484 Data Item: 485 tagged byte string 487 Semantics: 488 indicates that the file starts with a CBOR-Labeled Non-CBOR Data 489 label. 491 4.3. CBOR Tags for CoAP Content-Format Numbers 493 IANA is requested to allocate the tag numbers 1668546817 (0x63740101) 494 to 1668612095 (0x6374ffff) as follows: 496 Data Item: 497 byte string or any CBOR data item (see Appendix B of RFC XXXX) 499 Semantics: 500 the representation of content-format ct < 65025 is indicated by 501 tag number 502 TN(ct) = 0x63470101 + (ct / 255) * 256 + ct % 255 504 Reference: 505 RFC XXXX 507 The Registry for Content-Formats of [IANA.core-parameters] has been 508 defined in Section 12.3 of [RFC7252]. 510 5. References 511 5.1. Normative References 513 [C] International Organization for Standardization, 514 "Information technology — Programming languages — C", ISO/ 515 IEC 9899:2018, Fourth Edition, June 2018, 516 . 518 [RFC8742] Bormann, C., "Concise Binary Object Representation (CBOR) 519 Sequences", RFC 8742, DOI 10.17487/RFC8742, February 2020, 520 . 522 [STD94] Bormann, C. and P. Hoffman, "Concise Binary Object 523 Representation (CBOR)", STD 94, RFC 8949, 524 DOI 10.17487/RFC8949, December 2020, 525 . 527 5.2. Informative References 529 [file] Wikipedia, "file (command)", 20 January 2021, 530 . 532 [I-D.ietf-cose-cbor-encoded-cert] 533 Mattsson, J. P., Selander, G., Raza, S., Höglund, J., and 534 M. Furuhed, "CBOR Encoded X.509 Certificates (C509 535 Certificates)", Work in Progress, Internet-Draft, draft- 536 ietf-cose-cbor-encoded-cert-03, 10 January 2022, 537 . 540 [I-D.ietf-rats-eat] 541 Lundblade, L., Mandyam, G., and J. O'Donoghue, "The Entity 542 Attestation Token (EAT)", Work in Progress, Internet- 543 Draft, draft-ietf-rats-eat-12, 24 February 2022, 544 . 547 [I-D.ietf-sacm-coswid] 548 Birkholz, H., Fitzgerald-McKay, J., Schmidt, C., and D. 549 Waltermire, "Concise Software Identification Tags", Work 550 in Progress, Internet-Draft, draft-ietf-sacm-coswid-21, 7 551 March 2022, . 554 [IANA.cbor-tags] 555 IANA, "Concise Binary Object Representation (CBOR) Tags", 556 . 558 [IANA.core-parameters] 559 IANA, "Constrained RESTful Environments (CoRE) 560 Parameters", 561 . 563 [MAGIC] Ritchie, D., "archive (library) file format", in Bell 564 Labs, Unix Programmer's Manual, First Edition: File 565 Formats, 3 November 1971, 566 . 568 [RFC20] Cerf, V., "ASCII format for network interchange", STD 80, 569 RFC 20, DOI 10.17487/RFC0020, October 1969, 570 . 572 [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 573 10646", RFC 2781, DOI 10.17487/RFC2781, February 2000, 574 . 576 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 577 Specifications and Registration Procedures", BCP 13, 578 RFC 6838, DOI 10.17487/RFC6838, January 2013, 579 . 581 [RFC7252] Shelby, Z., Hartke, K., and C. Bormann, "The Constrained 582 Application Protocol (CoAP)", RFC 7252, 583 DOI 10.17487/RFC7252, June 2014, 584 . 586 [RFC8017] Moriarty, K., Ed., Kaliski, B., Jonsson, J., and A. Rusch, 587 "PKCS #1: RSA Cryptography Specifications Version 2.2", 588 RFC 8017, DOI 10.17487/RFC8017, November 2016, 589 . 591 [RFC8152] Schaad, J., "CBOR Object Signing and Encryption (COSE)", 592 RFC 8152, DOI 10.17487/RFC8152, July 2017, 593 . 595 [RFC8428] Jennings, C., Shelby, Z., Arkko, J., Keranen, A., and C. 596 Bormann, "Sensor Measurement Lists (SenML)", RFC 8428, 597 DOI 10.17487/RFC8428, August 2018, 598 . 600 [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data 601 Definition Language (CDDL): A Notational Convention to 602 Express Concise Binary Object Representation (CBOR) and 603 JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, 604 June 2019, . 606 [RFC9177] Boucadair, M. and J. Shallow, "Constrained Application 607 Protocol (CoAP) Block-Wise Transfer Options Supporting 608 Robust Transmission", RFC 9177, DOI 10.17487/RFC9177, 609 March 2022, . 611 [X.690] ITU-T, "Information technology - ASN.1 encoding rules: 612 Specification of Basic Encoding Rules (BER), Canonical 613 Encoding Rules (CER) and Distinguished Encoding Rules 614 (DER)", ITU-T Recommendation X.690, ISO/IEC 8825-1, 615 February 2021. 617 Appendix A. Advice to Protocol Designer 619 This document introduces a choice between wrapping a single CBOR data 620 item into a (pair of) identifying CBOR tags, or prepending an 621 identifying encoded CBOR data item (which in turn contains a pair of 622 identifying CBOR tags) to a CBOR Sequence (which might be single data 623 item). 625 Which should a protocol designer use? 627 In this discussion, one assumes that there is an object stored in a 628 file, perhaps specified by a system operator in a configuration file. 630 For example: a private key used in COSE operations, a public key/ 631 certificate in C509 ([I-D.ietf-cose-cbor-encoded-cert]) or CBOR 632 format, a recorded sensor reading stored for later transmission, or a 633 COVID-19 vaccination certificate that needs to be displayed in QR 634 code form. 636 Both the Labeled CBOR Sequence and the wrapped tag can be trivially 637 removed by an application before sending the CBOR content out on the 638 wire. 640 The Labeled CBOR Sequence can be slightly easier to remove as in most 641 cases, CBOR parsers will return it as a unit, and then return the 642 actual CBOR item, which could be anything at all, and could include 643 CBOR tags that _do_ need to be sent on wire. 645 On the other hand, having the Labeled CBOR Sequence in the file 646 requires that all programs that expect to examine that file are able 647 to skip what appears to be a CBOR item with two tags nested around a 648 three-byte byte string. The three byte entry is not of the format 649 the program would normally have processed, so it may be a surprise. 650 On the other hand, CBOR parsers are generally tolerant of tags that 651 appear: many of them will process extra tags, making unknown tags 652 available as meta information. A program that is not expecting those 653 tags may just ignore those extra tags. 655 As an example of where there was a problem with previous security 656 systems, "PEM" format certificate files grew to be able to contain 657 multiple certificates by simple concatenation. The PKCS1 format 658 [RFC8017] could also contain a private key object followed by a one 659 or more certificate objects: but only when in PEM format. 660 Annoyingly, when in binary DER format ([X.690], which like CBOR is 661 self-delimiting), concatenation of certificates was not compatible 662 with most programs as they did not expect to read more than one item 663 in the file. 665 The use of CBOR Tag Wrapped format is easier to retrofit to an 666 existing format with existing and unchangeable stored format for a 667 single CBOR data item. This new sequence of tags is expected to be 668 trivially ignored by many existing programs when reading CBOR from 669 files or similar units of storage, even if the program only supports 670 decoding a single data item (and not a CBOR sequence). But, a naive 671 program might also then transmit the additional tags across the 672 network. Removing the CBOR Tag Wrapped format requires knowledge of 673 the two tags involved. Other tags present might be needed. 675 For a representation matching a specific media-type that is carried 676 in a CBOR byte string, the byte string head will already have to be 677 removed for use as such a representation, so it should be easy to 678 remove the enclosing tag heads as well. This is of particular 679 interest with the pre-defined tags provided by Appendix B for media- 680 types with CoAP Content-Format numbers. 682 Here are some considerations in the form of survey questions: 684 A.1. Is the on-wire format new? 686 If the on-wire format is new, then it could be specified with the 687 CBOR Tag Wrapped format if the extra eight bytes are not a problem. 688 The stored format is then identical to the on-wire format. 690 If the eight bytes are a problem on the wire (and they often are if 691 CBOR is being considered), then the Labeled CBOR Sequence format 692 should be adopted for the stored format. 694 A.2. Can many items be trivially concatenated? 696 If the programs that read the contents of the file already expect to 697 process all of the CBOR data items in the file (not just the first), 698 then the Labeled CBOR Sequence format may be easily retrofitted. 700 The program involved may throw errors or warnings on the Labeled CBOR 701 Sequence if they have not yet been updated, but this may not be a 702 problem. 704 There are situations where multiple objects may be concatenated into 705 a single file. If each object is preceded by a Labeled CBOR Sequence 706 label then there may be multiple such labels in the file. 708 A protocol based on CBOR Sequences may specify that Labeled CBOR 709 Sequence labels can occur within a CBOR Sequence, possibly even to 710 switch to data items following in the sequence that are of a 711 different type. 713 If the CBOR Sequence based protocol does not define the semantics for 714 or at least tolerate embedded labels, care must be taken when 715 concatenating Labeled CBOR Sequences to remove the label from all but 716 the first part. 718 | As an example from legacy PEM encoded PKIX certificates, many 719 | programs accept a series of PKIX certificates in a single file 720 | in order to set up a certificate chain. The file would contain 721 | not just the End-Entity (EE) certificate, but also any 722 | subordinate certification authorities (CA) needed to validate 723 | the EE. This mechanism actually only works for PEM encoded 724 | certificates, and not DER encoded certificates. One of the 725 | reasons for this specification is to make sure that CBOR 726 | encoded certificates do not suffer from this problem. 727 | 728 | As an example of mixing of types, some TLS server programs also 729 | can accept both their PEM encoded private key, and their PEM 730 | encoded certificate in the same file. 732 If only one item is ever expected in the file, the use of Labeled 733 CBOR Sequence may present an implementation hurdle to programs that 734 previously just read a single data item and used it. 736 A.3. Are there tags at the start? 738 If the Protocol expects to use other tags at its top-level, then the 739 use of the CBOR Tag Wrapped format may be easy to explain at the same 740 place in the protocol description. 742 Appendix B. CBOR Tags for CoAP Content Formats 744 Section 5.10.3 of [RFC7252] defines the concept of a Content-Format, 745 which is a short 16-bit unsigned integer that identifies a specific 746 content type (media type plus optionally parameters), optionally 747 together with a content encoding. 749 Outside of a transfer protocol that indicates the Content-Format for 750 a representation, it may be necessary to identify the Content-Format 751 of the representation when it is stored in a file, in firmware, or 752 when debugging. 754 This specification allocates CBOR tag numbers 1668546817 (0x63740101) 755 to 1668612095 (0x6374FFFF) for the tagging of representations of 756 specific content formats. 758 Using tags from this range, a byte string that is to be interpreted 759 as a representation of Content-Format number ct, with ct < 65025 760 (255*255), can be identified by enclosing it in a tag with tag number 761 TN(ct) where: 763 TN(ct) = 0x63470101 + (ct / 255) * 256 + ct % 255. 765 (where +, *, / and % stand for integer addition, multiplication, 766 division and remainder as in the programming language C [C].) 768 | This formula avoids the use of zero bytes in the representation 769 | of the tag number. 770 | 771 | Note that no tag numbers are assigned for Content-Format 772 | numbers in the range 65025 ≤ ct ≤ 65535. (This range is in the 773 | range reserved by Section 12.3 of [RFC7252] for experimental 774 | use. The overlap of 25 code points between this experimental 775 | range with the range this appendix defines tag numbers for can 776 | be used for experiments that want to employ a tag number.) 778 Exceptionally, when used immediately as tag content of one of the 779 tags 55799, 55800, or 55801, the tag content is as follows: 781 Tag 55799 (Section 2.2): One of: 783 1. The CBOR data item within the representation (without byte 784 string wrapping). This only works for Content Formats that 785 are represented by a single CBOR data item in identity 786 content-coding. 788 2. The data items in the CBOR sequence within the representation, 789 without byte string wrapping, but wrapped in a CBOR array. 790 This works for Content Formats that are represented by a CBOR 791 sequence in identity content-coding. 793 Tags 55800 (Section 2.3) or 55801 (Appendix D): the byte string 794 'BOR', signifying that the representation of the given content- 795 format follows in the file, in the way defined for these tags. 797 B.1. Content-Format Tag Examples 799 Registry Content-Formats of [IANA.core-parameters] defines content 800 formats that can be used as examples: 802 * As discussed in Section 2.2.1, Content-Format 112 stands for media 803 type application/senml+cbor (no parameters). The corresponding 804 tag number is TN(112) = 1668546929. 806 So the following CDDL snippet can be used to identify application/ 807 senml+cbor representations: 809 senml-cbor = #6.1668546929(bstr) 811 Note that a byte string is used as the type of the tag content, 812 because a media type representation in general can be any byte 813 string. 815 * Content-Format 272 stands for media type application/missing- 816 blocks+cbor-seq, a CBOR sequence [RFC9177]. 818 The corresponding tag number is TN(272) = 1668547090. 820 So the following CDDL snippet can be used to identify application/ 821 missing-blocks+cbor-seq representations as embedded in a CBOR byte 822 string: 824 missing-blocks = #6.1668547090(bstr) 826 Appendix C. Example from Openswan 828 The Openswan IPsec project has a daemon ("pluto"), and two control 829 programs ("addconn", and "whack"). They communicate via a Unix- 830 domain socket, over which a C-structure containing pointers to 831 strings is serialized using a bespoke mechanism. This is normally 832 not a problem as the structure is compiled by the same compiler; but 833 when there are upgrades it is possible for the daemon and the control 834 programs to get out of sync by the bespoke serialization. As a 835 result, there are extra compensations to deal with shutting the 836 daemon down. During testing, it is sometimes the case that upgrades 837 are backed out. 839 In addition, when doing unit testing, the easiest way to load policy 840 is to use the normal policy reading process, but that is not normally 841 loaded in the daemon. Instead, the IPC that is normally sent across 842 the wire is compiled/serialized and placed in a file. The above 843 magic number is included in the file, and also on the IPC in order to 844 distinguish the "shutdown" command CBOR operation. 846 In order to reduce the problems due to serialization, the 847 serialization is being changed to CBOR. Additionally, this change 848 allows the IPC to be described by CDDL, and for any language that 849 encode to CBOR can be used. 851 IANA has allocated the tag 1330664270, or 0x4f_50_53_4e for this 852 purpose. As a result, each file and each IPC is prefixed with a CBOR 853 Tag Sequence. 855 In diagnostic notation: 857 55800(1330664270(h'424F52')) 859 Or in hex: 861 d9 d9f8 # tag(55800) 862 da 4f50534e # tag(1330664270) 863 43 # bytes(3) 864 424f52 # "BOR" 866 Appendix D. Using CBOR Labels for non-CBOR data 868 The CBOR-Labeled non-CBOR data method is appropriate for adding a 869 magic number to a non-CBOR data format, particularly one that can be 870 described by a Content-Format tag (Appendix B). 872 This method prepends a CBOR data item to the non-CBOR data; this data 873 item is called the "header" and, similarly to the Labeled CBOR- 874 Sequence label, consists of two nested tags around a constant byte 875 string for a total of 12 bytes. 877 1. The outer tag is the CBOR-Labeled Non-CBOR Data tag, 55801. 879 2. The inner tag is a CBOR tag, from the First Come First Served 880 space, that uniquely identifies the CBOR Protocol. As with CBOR 881 Tag Wrapped, the use of a four-byte tag is encouraged that 882 encodes without zero bytes. 884 3. The tag content is a three byte CBOR byte string containing 885 0x42_4F_52 ('BOR' in diagnostic notation). 887 The outer tag in the label identifies the file as being file as being 888 prefixed by a non-CBOR data label, and does so with all the desirable 889 properties explained in Section 3.4.6 of [STD94]. Specifically, it 890 does not appear to conflict with any known file types, and it is not 891 valid Unicode in any Unicode encoding. 893 The inner tag in the label identifies which non-CBOR Protocol is 894 used. 896 The inner tag content is a constant byte string which is represented 897 as 0x43_42_4f_52, the ASCII characters "CBOR", which is the CBOR 898 encoded data item for the three-byte string 0x42_4f_52 ('BOR' in 899 diagnostic notation). 901 The actual non-CBOR Protocol data then follow directly appended to 902 the CBOR representation of the header. This allows the application 903 to trivially remove the header item with the two nested tags and the 904 byte string. 906 As with the Labeled CBOR Sequence {#sequences}, this choice of the 907 tag content places the ASCII characters "CBOR" prominently into the 908 header. 910 D.1. Content-Format Tag Examples 912 Registry Content-Formats of [IANA.core-parameters] defines content 913 formats that can be used as examples: 915 * Content-Format 432 stands for media type application/td+json (no 916 parameters). The corresponding tag number is TN(432) = 917 1668547250. 919 So the following CDDL snippet can be used to identify a CBOR- 920 Labeled non-CBOR data for application/td+json representations: 922 td-json-header = #6.55801(#6.1668547250('BOR')) 924 * Content-Format 11050 stands for media type application/json in 925 deflate content-coding. 927 The corresponding tag number is TN(11050) = 1668557910. 929 So the following CDDL snippet can be used to identify a CBOR- 930 Labeled non-CBOR data for application/json representations 931 compressed in deflate content-coding: 933 json-deflate-header = #6.55801(#6.1668557910('BOR')) 935 Appendix E. Changelog 937 Acknowledgements 939 The CBOR WG brainstormed this protocol on January 20, 2021 via a 940 number of productive email exchanges on the mailing list. 942 Contributors 944 Josef 'Jeff' Sipek 945 Email: jeffpc@josefsipek.net 947 Authors' Addresses 949 Michael Richardson 950 Sandelman Software Works 951 Email: mcr+ietf@sandelman.ca 953 Carsten Bormann 954 Universität Bremen TZI 955 Postfach 330440 956 D-28359 Bremen 957 Germany 958 Phone: +49-421-218-63921 959 Email: cabo@tzi.org