idnits 2.17.00 (12 Aug 2021) /tmp/idnits32128/draft-mcquistin-augmented-ascii-diagrams-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (25 October 2021) is 207 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'Options' is mentioned on line 815, but not defined == Unused Reference: 'RFC7405' is defined on line 1202, but no explicit reference was found in the text == Outdated reference: draft-ietf-quic-transport has been published as RFC 9000 -- Obsolete informational reference (is this intentional?): RFC 7049 (Obsoleted by RFC 8949) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. McQuistin 3 Internet-Draft V. Band 4 Intended status: Experimental D. Jacob 5 Expires: 28 April 2022 C. S. Perkins 6 University of Glasgow 7 25 October 2021 9 Describing Protocol Data Units with Augmented Packet Header Diagrams 10 draft-mcquistin-augmented-ascii-diagrams-09 12 Abstract 14 This document describes a machine-readable format for specifying the 15 syntax of protocol data units within a protocol specification. This 16 format is comprised of a consistently formatted packet header 17 diagram, followed by structured explanatory text. It is designed to 18 maintain human readability while enabling support for automated 19 parser generation from the specification document. This document is 20 itself an example of how the format can be used. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on 28 April 2022. 39 Copyright Notice 41 Copyright (c) 2021 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 46 license-info) in effect on the date of publication of this document. 47 Please review these documents carefully, as they describe your rights 48 and restrictions with respect to this document. Code Components 49 extracted from this document must include Simplified BSD License text 50 as described in Section 4.e of the Trust Legal Provisions and are 51 provided without warranty as described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 2.1. Limitations of Current Packet Format Diagrams . . . . . . 4 58 2.2. Formal languages in standards documents . . . . . . . . . 7 59 3. Design Principles . . . . . . . . . . . . . . . . . . . . . . 7 60 4. Augmented Packet Header Diagrams . . . . . . . . . . . . . . 10 61 4.1. PDUs with Fixed and Variable-Width Fields . . . . . . . . 10 62 4.2. PDUs That Cross-Reference Previously Defined Fields . . . 13 63 4.3. PDUs with Non-Contiguous Fields . . . . . . . . . . . . . 16 64 4.4. PDUs with Constraints on Field Values . . . . . . . . . . 16 65 4.5. PDUs with Constraints on Field Sizes . . . . . . . . . . 18 66 4.6. PDUs That Extend Sub-Structures . . . . . . . . . . . . . 20 67 4.7. Storing Data for Parsing . . . . . . . . . . . . . . . . 21 68 4.8. Connecting Structures with Functions . . . . . . . . . . 22 69 4.9. Specifying Enumerated Types . . . . . . . . . . . . . . . 23 70 4.10. Specifying Protocol Data Units . . . . . . . . . . . . . 24 71 4.11. Importing PDU Definitions from Other Documents . . . . . 25 72 5. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 25 73 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 74 7. Security Considerations . . . . . . . . . . . . . . . . . . . 26 75 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 76 9. Informative References . . . . . . . . . . . . . . . . . . . 26 77 Appendix A. ABNF specification . . . . . . . . . . . . . . . . . 28 78 A.1. Constraint Expressions . . . . . . . . . . . . . . . . . 28 79 A.2. Augmented packet diagrams . . . . . . . . . . . . . . . . 29 80 Appendix B. Tooling & source code . . . . . . . . . . . . . . . 29 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 29 83 1. Introduction 85 Packet header diagrams have become a widely used format for 86 describing the syntax of binary protocols. In otherwise largely 87 textual documents, they allow for the visualisation of packet 88 formats, reducing human error, and aiding in the implementation of 89 parsers for the protocols that they specify. 91 Figure 1 gives an example of how packet header diagrams are used to 92 define binary protocol formats. The format has an obvious structure: 93 the diagram clearly delineates each field, showing its width and its 94 position within the header. This type of diagram is designed for 95 human readers, but is consistent enough that it should be possible to 96 develop a tool that generates a parser for the packet format from the 97 diagram. 99 : 0 1 2 3 100 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 101 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 102 : | Source Port | Destination Port | 103 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 104 : | Sequence Number | 105 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 106 : | Acknowledgment Number | 107 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 108 : | Data | |U|A|P|R|S|F| | 109 : | Offset| Reserved |R|C|S|S|Y|I| Window | 110 : | | |G|K|H|T|N|N| | 111 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 112 : | Checksum | Urgent Pointer | 113 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 114 : | Options | Padding | 115 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 116 : | data | 117 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 119 Figure 1: TCP's header format (from [RFC793]) 121 Unfortunately, the format of such packet diagrams varies both within 122 and between documents. This variation makes it difficult to build 123 tools to generate parsers from the specifications. Better tooling 124 could be developed if protocol specifications adopted a consistent 125 format for their packet descriptions. Indeed, this underpins the 126 format described by this draft: we want to retain the benefits that 127 packet header diagrams provide, while identifying the benefits of 128 adopting a consistent format. 130 This document describes a consistent packet header diagram format and 131 accompanying structured text constructs that allow for the parsing 132 process of protocol headers to be fully specified. This provides 133 support for the automatic generation of parser code. Broad design 134 principles, that seek to maintain the primacy of human readability 135 and flexibility in writing, are described, before the format itself 136 is given. 138 This document is itself an example of the approach that it describes, 139 with the packet header diagrams and structured text format described 140 by example. Examples that do not form part of the protocol 141 description language are marked by a colon at the beginning of each 142 line; this prevents them from being parsed by the accompanying 143 tooling. 145 This draft describes early work. As consensus builds around the 146 particular syntax of the format described, a formal ABNF 147 specification (Appendix A) will be provided. 149 Example specifications of a number of IETF protocols described using 150 the Augmented Packet Header Diagram format are available. These 151 documents describe UDP [draft-mcquistin-augmented-udp-example], TCP 152 [draft-mcquistin-augmented-tcp-example], and QUIC 153 [draft-mcquistin-quic-augmented-diagrams]. Code that parses those 154 documents and automatically generates parser code for the described 155 protocols is described in Appendix B. 157 2. Background 159 This section begins by considering how packet header diagrams are 160 used in existing documents. This exposes the limitations that the 161 current usage has in terms of machine-readability, guiding the design 162 of the format that this document proposes. 164 While this document focuses on the machine-readability of packet 165 format diagrams, this section also discusses the use of other 166 structured or formal languages within IETF documents. Considering 167 how and why these languages are used provides an instructive contrast 168 to the relatively incremental approach proposed here. 170 2.1. Limitations of Current Packet Format Diagrams 171 : The RESET_STREAM frame is as follows: 172 : 173 : 0 1 2 3 174 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 175 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 176 : | Stream ID (i) ... 177 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 178 : | Application Error Code (16) | 179 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 180 : | Final Size (i) ... 181 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 182 : 183 : RESET_STREAM frames contain the following fields: 184 : 185 : Stream ID: A variable-length integer encoding of the Stream ID 186 : of the stream being terminated. 187 : 188 : Application Protocol Error Code: A 16-bit application protocol 189 : error code (see Section 20.1) which indicates why the stream 190 : is being closed. 191 : 192 : Final Size: A variable-length integer indicating the final size 193 : of the stream by the RESET_STREAM sender, in unit of bytes. 195 Figure 2: QUIC's RESET_STREAM frame format (from [QUIC-TRANSPORT]) 197 Packet header diagrams are frequently used in IETF standards to 198 describe the format of binary protocols. While there is no standard 199 for how these diagrams should be formatted, they have a broadly 200 similar structure, where the layout of a protocol data unit (PDU) or 201 structure is shown in diagrammatic form, followed by a description 202 list of the fields that it contains. An example of this format, 203 taken from the QUIC specification, is given in Figure 2. 205 These packet header diagrams, and the accompanying descriptions, are 206 formatted for human readers rather than for automated processing. As 207 a result, while there is rough consistency in how packet header 208 diagrams are formatted, there are a number of limitations that make 209 them difficult to work with programmatically: 211 Inconsistent syntax: There are two classes of consistency that are 212 needed to support automated processing of specifications: internal 213 consistency within a diagram or document, and external consistency 214 across all documents. 216 Figure 2 gives an example of internal inconsistency. Here, the 217 packet diagram shows a field labelled "Application Error Code", 218 while the accompanying description lists the field as "Application 219 Protocol Error Code". The use of an abbreviated name is suitable 220 for human readers, but makes parsing the structure difficult for 221 machines. Figure 3 gives a further example, where the description 222 includes an "Option-Code" field that does not appear in the packet 223 diagram; and where the description states that each field is 16 224 bits in length, but the diagram shows the OPTION_RELAY_PORT as 13 225 bits, and Option-Len as 19 bits. Another example is [RFC6958], 226 where the packet format diagram showing the structure of the 227 Burst/Gap Loss Metrics Report Block shows the Number of Bursts 228 field as being 12 bits wide but the corresponding text describes 229 it as 16 bits. 231 Comparing Figure 2 with Figure 3 exposes external inconsistency 232 across documents. While the packet format diagrams are broadly 233 similar, the surrounding text is formatted differently. If 234 machine parsing is to be made possible, then this text must be 235 structured consistently. 237 Ambiguous constraints: The constraints that are enforced on a 238 particular field are often described ambiguously, or in a way that 239 cannot be parsed easily. In Figure 3, each of the three fields in 240 the structure is constrained. The first two fields ("Option-Code" 241 and "Option-Len") are to be set to constant values (note the 242 inconsistency in how these constraints are expressed in the 243 description). However, the third field ("Downstream Source Port") 244 can take a value from a constrained set. This constraint is 245 expressed in prose that cannot readily by understood by machine. 247 Poor linking between sub-structures: Protocol data units and other 248 structures are often comprised of sub-structures that are defined 249 elsewhere, either in the same document, or within another 250 document. Chaining these structures together is essential for 251 machine parsing: the parsing process for a protocol data unit is 252 only fully expressed if all elements can be parsed. 254 Figure 2 highlights the difficulty that machine parsers have in 255 chaining structures together. Two fields ("Stream ID" and "Final 256 Size") are described as being encoded as variable-length integers; 257 this is a structure described elsewhere in the same document. 258 Structured text is required both alongside the definition of the 259 containing structure and with the definition of the sub-structure, 260 to allow a parser to link the two together. 262 Lack of extension and evolution syntax: Protocols are often 263 specified across multiple documents, either because the protocol 264 explicitly includes extension points (e.g., profiles and payload 265 format specifications in RTP [RFC3550]) or because definition of a 266 protocol data unit has changed and evolved over time. As a 267 result, it is essential that syntax be provided to allow for a 268 complete definition of a protocol's parsing process to be 269 constructed across multiple documents. 271 : The format of the "Relay Source Port Option" is shown below: 272 : 273 : 0 1 2 3 274 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 275 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 276 : | OPTION_RELAY_PORT | Option-Len | 277 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 278 : | Downstream Source Port | 279 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 280 : 281 : Where: 282 : 283 : Option-Code: OPTION_RELAY_PORT. 16-bit value, 135. 284 : 285 : Option-Len: 16-bit value to be set to 2. 286 : 287 : Downstream Source Port: 16-bit value. To be set by the IPv6 288 : relay either to the downstream relay agent's UDP source port 289 : used for the UDP packet, or to zero if only the local relay 290 : agent uses the non-DHCP UDP port (not 547). 292 Figure 3: DHCPv6's Relay Source Port Option (from [RFC8357]) 294 2.2. Formal languages in standards documents 296 A small proportion of IETF standards documents contain structured and 297 formal languages, including ABNF [RFC5234], ASN.1 [ASN1], C, CBOR 298 [RFC7049], JSON, the TLS presentation language [RFC8446], YANG models 299 [RFC7950], and XML. While this broad range of languages may be 300 problematic for the development of tooling to parse specifications, 301 these, and other, languages serve a range of different use cases. 302 ABNF, for example, is typically used to specify text protocols, while 303 ASN.1 is used to specify data structure serialisation. This document 304 specifies a structured language for specifying the parsing of binary 305 protocol data units. 307 3. Design Principles 309 The use of structures that are designed to support machine 310 readability might potentially interfere with the existing ways in 311 which protocol specifications are used and authored. To the extent 312 that these existing uses are more important than machine readability, 313 such interference must be minimised. 315 In this section, the broad design principles that underpin the format 316 described by this document are given. However, these principles 317 apply more generally to any approach that introduces structured and 318 formal languages into standards documents. 320 It should be noted that these are design principles: they expose the 321 trade-offs that are inherent within any given approach. Violating 322 these principles is sometimes necessary and beneficial, and this 323 document sets out the potential consequences of doing so. 325 The central tenet that underpins these design principles is a 326 recognition that the standardisation process is not broken, and so 327 does not need to be fixed. Failure to recognise this will likely 328 lead to approaches that are incompatible with the standards process, 329 or that will see limited adoption. However, the standards process 330 can be improved with appropriate approaches, as guided by the 331 following broad design principles: 333 Most readers are human: Primarily, standards documents should be 334 written for people, who require text and diagrams that they can 335 understand. Structures that cannot be easily parsed by people 336 should be avoided, and if included, should be clearly delineated 337 from human-readable content. 339 Any approach that shifts this balance -- that is, that primarily 340 targets machine readers -- is likely to be disruptive to the 341 standardisation process, which relies upon discussion centered 342 around documents written in prose. 344 Writing tools are diverse: Standards document writing is a 345 distributed process that involves a diverse set of tools and 346 workflows. The introduction of machine-readable structures into 347 specifications should not require that specific tools are used to 348 produce standards documents, to ensure that disruption to existing 349 workflows is minimised. This does not preclude the development of 350 optional, supplementary tools that aid in the authoring machine- 351 readable structures. 353 The immediate impact of requiring specific tooling is that 354 adoption is likely to be limited. A long-term impact might be 355 that authors whose workflows are incompatible might be alienated 356 from the process. 358 Canonical specifications: As far as possible, machine-readable 359 structures should not replicate the human readable specification 360 of the protocol within the same document. Machine-readable 361 structures should form part of a canonical specification of the 362 protocol. Adding supplementary machine-readable structures, in 363 parallel to the existing human readable text, is undesirable 364 because it creates the potential for inconsistency. 366 As an example, program code that describes how a protocol data 367 unit can be parsed might be provided as an appendix within a 368 standards document. This code would provide a specification of 369 the protocol that is separate to the prose description in the main 370 body of the document. This has the undesirable effect of 371 introducing the potential for the program code to specify 372 behaviour that the prose-based specification does not, and vice- 373 versa. 375 Expressiveness: Any approach should be expressive enough to capture 376 the syntax and parsing process for the majority of binary 377 protocols. If a given language is not sufficiently expressive, 378 then adoption is likely to be limited. At the limits of what can 379 be expressed by the language, authors are likely to revert to 380 defining the protocol in prose: this undermines the broad goal of 381 using structured and formal languages. Equally, though, 382 understandable specifications and ease of use are critical for 383 adoption. A tool that is simple to use and addresses the most 384 common use cases might be preferred to a complex tool that 385 addresses all use cases. 387 It may be desirable to restrict expressiveness, however, to 388 guarantee intrinsic safety, security, and computability properties 389 of both the generated parser code for the protocol, and the parser 390 of the description language itself. In much the same way as the 391 language-theoretic security ([LANGSEC]) community advocates for 392 programming language design to be informed by the desired 393 properties of the parsers for those languages, protocol designers 394 should be aware of the implications of their design choices. The 395 expressiveness of the protocol description languages that they use 396 to define their protocols can force such awareness. 398 Broadly, those languages that have grammars which are more 399 expressive tend to have parsers that are more complex and less 400 safe. As a result, while considering the other goals described in 401 this document, protocol description languages should attempt to be 402 minimally expressive, and either restrict protocol designs to 403 those for which safe and secure parsers can be generated, or as a 404 minimum, ensure that protocol designers are aware of the 405 boundaries their designs cross, in terms of computability and 406 decidability [SASSAMAN]. 408 Minimise required change: Any approach should require as few changes 409 as possible to the way that documents are formatted, authored, and 410 published. Forcing adoption of a particular structured or formal 411 language is incompatible with the IETF's standardisation process: 412 there are very few components of standards documents that are non- 413 optional. 415 4. Augmented Packet Header Diagrams 417 The design principles described in Section 3 can largely be met by 418 the existing uses of packet header diagrams. These diagrams aid 419 human readability, do not require new or specialised tools to write, 420 do not split the specification into multiple parts, can express most 421 binary protocol features, and require no changes to existing 422 publication processes. 424 However, as discussed in Section 2.1 there are limitations to how 425 packet header diagrams are used that must be addressed if they are to 426 be parsed by machine. In this section, an augmented packet header 427 diagram format is described. 429 The concept is first illustrated by example. This is appropriate, 430 given the visual nature of the language. In future drafts, these 431 examples will be parsable using provided tools, and a formal 432 specification of the augmented packet diagrams will be given in 433 Appendix A. 435 4.1. PDUs with Fixed and Variable-Width Fields 437 The simplest PDU is one that contains only a set of fixed-width 438 fields in a known order, with no optional fields or variation in the 439 packet format. 441 Some packet formats include variable-width fields, where the size of 442 a field is either derived from the value of some previous field, or 443 is unspecified and inferred from the total size of the packet and the 444 size of the other fields. 446 To ensure that there is no ambiguity, a PDU description can contain 447 only one field whose length is unspecified. The length of a single 448 field, where all other fields are of known (but perhaps variable) 449 length, can be inferred from the total size of the containing PDU. 451 A PDU description is introduced by the exact phrase "A/An _______ is 452 formatted as follows" within a paragraph. This is followed by the 453 PDU description itself, as a packet diagram within an 454 element (itself optionally within a
element) in the XML 455 representation, starting with a header line to show the bit width of 456 the diagram. The description of the fields follows the diagram, as 457 an XML list (either
or hanging ), after a paragraph that 458 begins with the text "where:". 460 PDU names must be unique, both within a document, and across all 461 documents that are linked together (i.e., using the structured 462 language defined in Section 4.11). 464 Each field is defined by a structured text definition and a prose 465 description. The structured text definition comprises the field name 466 and an optional short name in parenthesis. These are followed by a 467 colon, the field length, an optional presence expression (described 468 in Section 4.2), and a terminating period. Field names cannot be the 469 same as a previously defined PDU name, and must be unique within a 470 given structure definition. The structured text definition is given 471 either in a
tag (if using a
) or as the "hangText" (if using 472 a hanging ) of a element. The field's prose description is 473 given in the following
element or within the same element. 474 Prose descriptions may include structured text (e.g., as defined in 475 Section 4.7). 477 For example, this can be illustrated using the IPv4 Header Format 478 [RFC791]. An IPv4 Header is formatted as follows: 480 0 1 2 3 481 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 482 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 483 |Version| IHL | DSCP |ECN| Total Length | 484 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 485 | Identification |Flags| Fragment Offset | 486 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 487 | Time to Live | Protocol | Header Checksum | 488 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 489 | Source Address | 490 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 491 | Destination Address | 492 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 493 | Options ... 494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 495 | : 496 : Payload : 497 : | 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 500 where: 502 Version (V): 4 bits. This is a fixed-width field, whose full label 503 is shown in the diagram. The field's width -- 4 bits -- is given 504 in the label of the description list, separated from the field's 505 label by a colon. 507 Internet Header Length (IHL): 4 bits. This is a shorter field, whose 508 full label is too large to be shown in the diagram. A short label 509 (IHL) is used in the diagram, and this short label is provided, in 510 brackets, after the full label in the description list. 512 Differentiated Services Code Point (DSCP): 6 bits. This is a fixed- 513 width field, as previously discussed. 515 Explicit Congestion Notification (ECN): 2 bits. This is a fixed- 516 width field, as previously discussed. 518 Total Length (TL): 2 bytes. This is a fixed-width field, as 519 previously discussed. Where fields are an integral number of 520 bytes in size, the field length can be given in bytes rather than 521 in bits. 523 Identification: 2 bytes. This is a fixed-width field, as previously 524 discussed. 526 Flags: 3 bits. This is a fixed-width field, as previously discussed. 528 Fragment Offset: 13 bits. This is a fixed-width field, as previously 529 discussed. 531 Time to Live (TTL): 1 byte. This is a fixed-width field, as 532 previously discussed. 534 Protocol: 1 byte. This is a fixed-width field, as previously 535 discussed. 537 Header Checksum: 2 bytes. This is a fixed-width field, as previously 538 discussed. 540 Source Address: 32 bits. This is a fixed-width field, as previously 541 discussed. 543 Destination Address: 32 bits. This is a fixed-width field, as 544 previously discussed. 546 Options: (IHL-5)*32 bits. This is a variable-length field, whose 547 length is defined by the value of the field with short label IHL 548 (Internet Header Length). Constraint expressions can be used in 549 place of constant values: the grammar for the expression language 550 is defined in Appendix A.1. Constraints can include a previously 551 defined field's short or full label, where one has been defined. 552 Short variable-length fields are indicated by "..." instead of a 553 pipe at the end of the row. 555 Payload: TL - ((IHL*32)/8) bytes. This is a multi-row variable- 556 length field, constrained by the values of fields TL and IHL. 557 Instead of the "..." notation, ":" is used to indicate that the 558 field is variable-length. The use of ":" instead of "..." 559 indicates the field is likely to be a longer, multi-row field. 560 However, semantically, there is no difference: these different 561 notations are for the benefit of human readers. 563 4.2. PDUs That Cross-Reference Previously Defined Fields 565 Binary formats often reference sub-structures that have been defined 566 earlier in the specification. For example, in RTP [RFC3550], the 567 Contributing Source Identifiers in an RTP Data Packet are defined as 568 comprising a list of Source Identifier elements. A Source Identifier 569 is formatted as follows: 571 0 1 2 3 572 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 | SSRC | 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 577 where: 579 SSRC: 32 bits. This is a fixed-width field, as described previously. 581 The following example shows how a Source Identifier can be referenced 582 in the description of an RTP Data Packet. It also shows how the 583 presence of some fields in a format may be dependent on the values of 584 an earlier field. 586 An RTP Data Packet is formatted as follows: 588 0 1 2 3 589 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 590 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 591 | V |P|X| CC |M| PT | Sequence Number | 592 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 593 | Timestamp | 594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 595 | Synchronization Source identifier | 596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 597 | [Contributing Source identifiers] | 598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 599 | Header Extension | 600 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 601 | Payload : 602 : : 603 : | 604 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 605 | Padding | Padding Count | 606 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 608 where: 610 Version (V): 2 bits. This is a fixed-width field, as described 611 previously. 613 Padding (P): 1 bit. This is a fixed-width field, as described 614 previously. 616 Extension (X): 1 bit. This is a fixed-width field, as described 617 previously. 619 CSRC count (CC): 4 bits. This is a fixed-width field, as described 620 previously. 622 Marker (M): 1 bit. This is a fixed-width field, as described 623 previously. 625 Payload Type (PT): 7 bits. This is a fixed-width field, as described 626 previously. 628 Sequence Number (PT): 16 bits. This is a fixed-width field, as 629 described previously. 631 Timestamp (PT): 32 bits. This is a fixed-width field, as described 632 previously. 634 Synchronization Source identifier: 1 Source Identifier. This is a 635 field whose structure is a previously defined PDU format (Source 636 Identifier). To indicate this, the width of the field is 637 expressed in terms of cross-referenced structure. When used in 638 constraint expressions, PDU names refer to the length of that PDU 639 structure. 641 Contributing Source identifiers: CC Source Identifier. Where a field 642 is comprised of a sequence of previously defined structures, 643 square brackets can be used to indicate this in the diagram. The 644 length of the sequence can be defined using the constraint 645 expression grammar as described earlier. Where the length is 646 unknown, the type of each element of the sequence must be given in 647 square brackets. 649 In this example, both a PDU name (Source Identifier) and a field 650 name (CC) are used in the constraint expression. The PDU name 651 refers to the length of the PDU, while the field name refers to 652 the value of the field. This is possible because field names 653 cannot be the same as previously defined PDU names. 655 Header Extension: 32 bits; present only when X == 1. This is a field 656 whose presence is predicated on an expression given using the 657 constraint expression grammar described earlier. Optional fields 658 can be of any previously defined format (e.g., fixed- or variable- 659 width). Optional fields are indicated by the presence of "; 660 present only when [expr]." at the end of the definition term 661 (i.e., the text contained within the
tag or "hangText" 662 attribute). 664 [Note that this example deviates from the format as described in 665 [RFC3550]. As specified in that document, the Header Extension 666 would be a cross-referenced structure. This is not shown here for 667 brevity.] 669 Payload. The length of the Payload is not specified, and hence needs 670 to be inferred from the total length of the packet and the lengths 671 of the known fields. There can only be one field of unspecified 672 size in a PDU. Fields where the length is not specified may also 673 denote this with the phrase "variable length" in place of the 674 length definition. 676 Padding: PC bytes; present only when (P == 1) && (PC > 0). This is a 677 variable size field, with size dependent on a later field in the 678 packet. Fields can only depend on the value of a later field if 679 they follow a field with unspecified size. 681 Padding Count (PC): 1 byte; present only when P == 1. This is a 682 fixed-width field, as previously discussed. 684 4.3. PDUs with Non-Contiguous Fields 686 In some binary formats, fields are striped across multiple non- 687 contiguous bits. This is often to allow for backwards compatibility 688 with previous definitions of the same fields in earlier documents: 689 striping in this way allows for careful use of the possible range of 690 values. 692 This format is illustrated using the STUN Message Type 693 [draft-ietf-tram-stunbis-21]. A STUN Message Type is formatted as 694 follows: 696 0 1 697 0 1 2 3 4 5 6 7 8 9 0 1 2 3 698 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 699 |M|M|M|M|M|C|M|M|M|C|M|M|M|M| 700 |B|A|9|8|7|1|6|5|4|0|3|2|1|0| 701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703 where: 705 Method (M): 12 bits (split field). This field is comprised of 706 multiple sub-fields (M0 through MB) as shown in the diagram. That 707 these sub-fields should be concatenated, after parsing, into a 708 single field is indicated by their being labelled using the 'M' 709 short field name followed by a single hexadecimal digit, with the 710 least significant bit labelled with 0, and subsequent bits 711 labelled in sequence. 713 Class (C): 2 bits (split field). This field follows the same format 714 as M described above. 716 4.4. PDUs with Constraints on Field Values 718 A PDU may be defined not only by the layout and type of its fields, 719 but also by the value of those fields. For example, field values may 720 be constrained to be of a known exact value or to be within a range. 721 More generally, our format enables a boolean expression to be 722 attached to a field, which must be true for the PDU to be parsed 723 successfully. 725 This format is illustrated using the QUIC Long Header Packet format 726 [QUIC-TRANSPORT]. A Long Header is formatted as follows: 728 0 1 2 3 729 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 730 +-+-+-+-+-+-+-+-+ 731 |1|1| T | R | P | 732 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 733 | Version | 734 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 735 | DCID Len | 736 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 737 | Destination Connection ID (DCID) ... 738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 739 | SCID Len | 740 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 741 | Source Connection ID (SCID) ... 742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 744 where: 746 Header Form (HF): 1 bit; HF == 1. This is a fixed-width field, 747 constrained to be a of an known, exact value. At most one field 748 value constraint may be given, and if provided, it must be given 749 as a boolean expression, separated by a semi-colon in the field 750 definition name (i.e., the text contained within the
tag or 751 "hangText" attribute). If present, a value constraint must follow 752 the name, short name, and length of the field, but appear before 753 any presence constraint, if applicable. The order of the field 754 must be the same in both the diagram and description list. 756 Fixed Bit (FB): 1 bit; FB == 1. This is a fixed-width field, with a 757 value constraint, as previously described. 759 Long Packet Type (T): 2 bits. This is a fixed-width field as 760 previously described. 762 Reserved Bits (R): 2 bits. This is a fixed-width field as previously 763 described. 765 Packet Number Length (P): 2 bits. This is a fixed-width field as 766 previously described. 768 Version: 32 bits. This is a fixed-width field as previously 769 described. 771 DCID Len (DLen): 1 byte; DLen <= 20. This is a fixed-width field, 772 with a value constraint, as previously described. Note that the 773 constraint language is not limited to equality; it is defined 774 fully in Appendix A.1. 776 Destination Connection ID: DLen bytes. This is a variable-width 777 field as previously described. 779 SCID Len (SLen): 1 byte; SLen <= 20. This is a fixed-width field, 780 with a value constraint, as previously described. 782 Source Connection ID: SLen bytes. This is a variable-width field as 783 previously described. 785 4.5. PDUs with Constraints on Field Sizes 787 A PDU may contain fields that have a size that is specified in terms 788 of the value of another field. So far, our constraint syntax can be 789 used to specify the length of fields in known units (of bits, bytes, 790 or other structures). If the units are of variable-width, then it 791 may not be possible to specify the length of the sequence. However, 792 it is still necessary to be able to constraint the overall width of 793 the field. To support this, our constraint syntax includes a "size" 794 function that evaluates to the width, in bits, of the given named 795 field. 797 This syntax is illustrated using the TCP Header format 798 [draft-ietf-tcpm-rfc793bis]. A TCP Header is formatted as follows: 800 0 1 2 3 801 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 802 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 803 | Source Port | Destination Port | 804 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 805 | Sequence Number | 806 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 807 | Acknowledgment Number | 808 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 809 | Data | |C|E|U|A|P|R|S|F| | 810 | Offset| Rsrvd |W|C|R|C|S|S|Y|I| Window Size | 811 | | |R|E|G|K|H|T|N|N| | 812 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 813 | Checksum | Urgent Pointer | 814 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 815 | [Options] | 816 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 817 | : 818 : Payload : 819 : | 820 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 822 where: 824 Source Port: 16 bits. This is a fixed-width field as previously 825 described. 827 Destination Port: 16 bits. This is a fixed-width field as previously 828 described. 830 Sequence Number: 32 bits. This is a fixed-width field as previously 831 described. 833 Acknowledgment Number: 32 bits. This is a fixed-width field as 834 previously described. 836 Data Offset (DOffset): 4 bits; DOffset >= 5. This is a fixed-width 837 field, with a value constraint, as previously described. 839 Reserved (Rsrvd): 4 bits; Rsrvd == 0. This is a fixed-width field, 840 with a value constraint, as previously described. 842 CWR: 1 bit. This is a fixed-width field as previously described. 844 ECE: 1 bit. This is a fixed-width field as previously described. 846 URG: 1 bit. This is a fixed-width field as previously described. 848 ACK: 1 bit. This is a fixed-width field as previously described. 850 PSH: 1 bit. This is a fixed-width field as previously described. 852 RST: 1 bit. This is a fixed-width field as previously described. 854 SYN: 1 bit. This is a fixed-width field, with a value constraint, as 855 previously described. 857 FIN: 1 bit; (FIN == 0) || (SYN == 0). This is a fixed-width field, 858 with a value constraint, as previously described. 860 Window Size: 16 bits. This is a fixed-width field as previously 861 described. 863 Checksum: 16 bits. This is a fixed-width field as previously 864 described. 866 Urgent Pointer: 16 bits. This is a fixed-width field as previously 867 described. 869 Options: [TCP Option]; size(Options) == (DOffset-5)*32; present 870 only when DOffset > 5. This is a variable-width field that is 871 comprised of a sequence of TCP Option sub-structures. TCP Option 872 is an enumerated type, to be defined in Section 4.9. As defined, 873 the TCP Option type can be either 2 or 3 bytes, depending on the 874 option type. As a result, it is not possible to specify the 875 number of TCP Option structures that the Option field will 876 contain. However, the overall size of the field can be 877 constrained. The "size(Options) == (DOffset-5)*32" makes use of 878 the "size" function. This evaluates to the size, in bits, of the 879 named field. The argument passed to the "size" field must be the 880 name of the field being defined, or of a previously defined field. 882 The "DOffset" field contains the number of 32-bit words that are 883 present in the TCP Header. By default, with no TCP options, this 884 is 5. As a result, the size of the Options field is constrained 885 to the value of DOffset, less 5, and multiplied to get the value 886 in bits. 888 Payload. This is a variable-width field as previously described. 890 4.6. PDUs That Extend Sub-Structures 892 A PDU may not only use or reference existing sub-structures, but they 893 may extend them, adding new fields, or enforcing different or 894 additional constraints. 896 Where a sub-structure is extended, the diagram may show the sub- 897 structure as a block, labelled with the sub-structure name. It may 898 also be desirable to show the sub-structure diagram in full; in this 899 case, the fields must be given in the same order and be of the same 900 length. New field constraints can be shown. Similarly, in the 901 description list, those fields inherited without change (i.e., with 902 no change to their constraints) do not need to be repeated. Those 903 with different or additional constraints must be described, and the 904 order of the fields in the description list must match that of the 905 sub-structure and the containing structure. 907 This format is illustrated using the QUIC Retry Packet format 908 [QUIC-TRANSPORT]. A Retry Packet is formatted as follows: 910 0 1 2 3 911 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 912 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 913 | : 914 : Long Header : 915 : | 916 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 917 | Retry Token ... 918 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 919 | | 920 + + 921 | | 922 + Retry Integrity Tag + 923 | | 924 + + 925 | | 926 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 928 where: 930 Long Header (LH): 1 Long Header; LH.T == 3. This field is a 931 previously defined sub-structure. Its constraints can access 932 fields in that sub-structure. In this example, the T field of the 933 Long Header must be equal to 3. 935 Retry Token. This is a variable-length field as previously defined. 937 Retry Integrity Tag: 128 bits. This is a fixed-width field as 938 previously defined. 940 As shown, the Long Header packet sub-structure is included. The 941 Retry Packet enforces a new value constraint on the Long Packet Type 942 (T) field. 944 4.7. Storing Data for Parsing 946 The parsing process may require data from previously parsed 947 structures. This means that data needs to be stored persistently 948 throughout the process. This data needs to be identified. 950 That the value of a particular field be stored upon parsing is 951 indicated by the exact phrase "On receipt, the value of 952 is stored as ." being present at the end of the 953 description of a field (i.e., at the end of the
or element.) 955 An Initial Packet is formatted as follows: 957 0 1 2 3 958 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 959 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 960 | : 961 : Long Header : 962 : | 963 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 965 where: 967 Long Header (LH): 1 Long Header; LH.T == 0. This is field is a sub- 968 structure, with a constraint, as previously defined. On receipt, 969 the value of LH.DCID is stored as Initial DCID. 971 In this example, the value of the DCID field of the Long Header sub- 972 structure is stored as Initial DCID. 974 4.8. Connecting Structures with Functions 976 The parsing or serialisation of some binary formats cannot be fully 977 described without the use of functions. These functions take 978 arguments (values from another structure), perform some computation, 979 and generate a new structure. 981 Given the goal of fully capturing the parsing or serialisation of 982 binary protocols, it is necessary to include the signature of these 983 helper functions. 985 Function signatures are described in elements. They are 986 constructed as the word "func", followed by a space, then the name of 987 the function. This is immediately followed by a set of brackets 988 containing a comma separated list of the function's parameters, 989 formatted as ": ". This is followed 990 by "->" and the return type of the function, followed by a colon. 992 The body of the function is not captured, owing to the complexity of 993 both capturing and translating arbitrary code. As a result, it can 994 be described in whichever format is most suitable for the document 995 and its readership. 997 Those values that are stored persistently, as defined in Section 4.7, 998 are accessible by functions. 1000 As an example, the "apply_protection" function is defined as: 1002 func apply_protection(to: Unprotected Packet) 1003 -> Protected Packet: 1004 apply packet protection to payload 1005 apply header protection to first_byte and packet_number 1006 construct appropriate Protected Packet based on first_byte 1007 return Protected Packet 1009 In this example, 'Unprotected Packet' and 'Protected Packet' are 1010 existing types. 1012 To indicate that a PDU is created from another by way of a function, 1013 the sentence "A/An is parsed from a using 1014 the function" is used. This indicates that a PDU A 1015 is generated by passing PDU B into the named function. The function 1016 must take a single parameter, of the same type as PDU B, and return a 1017 PDU B. 1019 To indicate that a PDU can be serialised to another by way of a 1020 function, the sentence "A/An is serialised to a using the function" is used. This indicates 1022 that a PDU B is generated by passing PDU A into the named function. 1023 The function must take a single parameter, of the same type as PDU A, 1024 and return a PDU B. 1026 4.9. Specifying Enumerated Types 1028 In addition to the use of the sub-structures, it is desirable to be 1029 able to define a type that may take the value of one of a set of 1030 alternative structures. 1032 The alternative structures that comprise an enumerated type are 1033 identified using the exact phrase "The is one 1034 of: " where the list of structure names is a 1035 comma separated list (with the last element, if there is more than 1036 one element, preceded by 'or'), each optionally preceded by "a" or 1037 "an". The structure names must be defined within the document or a 1038 linked document. 1040 Where an enumerated type has only two variants, an alternative phrase 1041 can be used: "The is either a 1042 or ". The names of the variants must be defined 1043 within the document or a linked document. 1045 An EOL Option is formatted as follows: 1047 0 1048 0 1 2 3 4 5 6 7 1049 +-+-+-+-+-+-+-+-+ 1050 | 0 | 1051 +-+-+-+-+-+-+-+-+ 1053 where: 1055 Option Kind (Kind): 1 byte; Kind == 0. This is a fixed-width field, 1056 with a value constraint, as previously described. 1058 A Window Scale Factor Option is formatted as follows: 1060 0 1 2 3 1061 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1062 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1063 | 3 | Length | Window Scale | 1064 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1066 where: 1068 Option Kind (Kind): 1 byte; Kind == 3. This is a fixed-width field, 1069 with a value constraint, as previously described. 1071 Option Length (Length): 1 byte; Length == 3. This is a fixed-width 1072 field, with a value constraint, as previously described. 1074 Window Scale: 1 byte. This is a fixed-width field, as previously 1075 described. 1077 A TCP Option is either an EOL Option or a Window Scale Factor Option. 1079 4.10. Specifying Protocol Data Units 1081 A document will set out different structures that are not, on their 1082 own, protocol data units. To capture the parsing or serialisation of 1083 a protocol, it is necessary to be able to identify or construct those 1084 packets that are valid PDUs. As a result, it is necessary for the 1085 document to identify those structures that are PDUs. 1087 The PDUs that comprise a protocol are identified using the exact 1088 phrase "This document describes the protocol. The 1089 protocol uses " where the list of 1090 PDU names is a comma separated list (with the last element, if there 1091 is more than one element, preceded by 'and'), each optionally 1092 preceded by "a" or "an". The PDU names must be structure names 1093 defined in the document or a linked document. The PDU names are 1094 pluralised in the list. A document must contain exactly one instance 1095 of this phrase. 1097 This document describes the Example protocol. The Example protocol 1098 uses Long Headers, STUN Message Types, IPv4 Headers, RTP Data 1099 Packets, and TCP Headers. 1101 4.11. Importing PDU Definitions from Other Documents 1103 Protocols are often specified across multiple documents, either 1104 because the specification of a protocol's data units has changed over 1105 time, or because of explicit extension points contained in the 1106 protocol's original specification. To allow a document to make use 1107 of a previous PDU definition, it is possible to import PDU 1108 definitions (written in the format described in this document) from 1109 other documents. 1111 A PDU definition is imported using the exact phrase "A/An ________ is 1112 formatted as described in ". The document 1113 identifier must refer, unambiguously, to an existing document. An 1114 Internet-Draft is identified by its name. RFCs are identified by 1115 "RFC" followed by their number. 1117 5. Open Issues 1119 * Need a simple syntax for defining a list of identical objects, and 1120 a way of referring to the size of the enclosing packet. The 1121 format cannot currently represent RFC 6716 section 3.2.3, and 1122 should be able to (the underlying type system can do so). 1124 * Need some discussion about the checks that the tooling might 1125 perform, and the implications of those checks. For example, the 1126 tooling checks for consistency between the diagram and the 1127 description list of fields, ensuring that fields match by name and 1128 width. -01 of this draft had a field that mismatched because of 1129 case: is this something that the tooling should identify? More 1130 broadly, what is the trade-off between the rigour that the tooling 1131 can enforce, and the flexibility desired/needed by authors? 1133 * Need to describe the rules governing the import of PDU definitions 1134 from other documents. 1136 6. IANA Considerations 1138 This document contains no actions for IANA. 1140 7. Security Considerations 1142 Poorly implemented parsers are a frequent source of security 1143 vulnerabilities in protocol implementations. Structuring the 1144 description of a protocol data unit so that a parser can be 1145 automatically derived from the specification can reduce the 1146 likelihood of vulnerable implementations. 1148 As described in Section 3, the expressiveness of a protocol 1149 description language has implications for the safety, security, and 1150 computability properties of the parser for the protocol description 1151 language itself, and on the generated parser code for the protocols 1152 described using it. The language-theoretic security ([LANGSEC]) 1153 community explores the security implications of programming language 1154 design; the principles developed in that community should guide the 1155 development of protocol description languages. 1157 8. Acknowledgements 1159 The authors would like to thank Marc Petit-Huguenin for extensive 1160 feedback on the draft, including work on formalising the constraint 1161 syntax as given in Appendix A.1. 1163 Wesley Eddy provided valuable feedback on the description format 1164 through adopting it in [draft-ietf-tcpm-rfc793bis]. 1166 The authors would like to thank David Southgate for preparing a 1167 prototype implementation of some of the ideas described here. 1169 This work has received funding from the UK Engineering and Physical 1170 Sciences Research Council under grant EP/R04144X/1. 1172 9. Informative References 1174 [RFC8357] Deering, S. and R. Hinden, "Generalized UDP Source Port 1175 for DHCP Relay", RFC 8357, March 2018, 1176 . 1178 [QUIC-TRANSPORT] 1179 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 1180 and Secure Transport", Work in Progress, Internet-Draft, 1181 draft-ietf-quic-transport-27, 21 February 2020, 1182 . 1185 [RFC6958] Clark, A., Zhang, S., Zhao, J., and Q. Wu, "RTP Control 1186 Protocol (RTCP) Extended Report (XR) Block for Burst/Gap 1187 Loss Metric Reporting", RFC 6958, May 2013, 1188 . 1190 [RFC7950] Bjorklund, M., "The YANG 1.1 Data Modeling Language", 1191 RFC 7950, August 2016, 1192 . 1194 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 1195 Version 1.3", RFC 8446, August 2018, 1196 . 1198 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 1199 Specifications: ABNF", RFC 5234, January 2008, 1200 . 1202 [RFC7405] Kyzivat, P., "Case-Sensitive String Support in ABNF", 1203 RFC 7405, December 2014, 1204 . 1206 [ASN1] ITU-T, "ITU-T Recommendation X.680, X.681, X.682, and 1207 X.683", ITU-T Recommendation X.680, X.681, X.682, and 1208 X.683. 1210 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 1211 Representation (CBOR)", RFC 7049, October 2013, 1212 . 1214 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1215 Jacobson, "RTP: A Transport Protocol for Real-Time 1216 Applications", RFC 3550, July 2003, 1217 . 1219 [draft-ietf-tram-stunbis-21] 1220 Petit-Huguenin, M., Salgueiro, G., Rosenberg, J., Wing, 1221 D., Mahy, R., and P. Matthews, "Session Traversal 1222 Utilities for NAT (STUN)", Work in Progress, Internet- 1223 Draft, draft-ietf-tram-stunbis-21, 21 March 2019, 1224 . 1227 [RFC791] Postel, J., "Internet Protocol", RFC 791, September 1981, 1228 . 1230 [RFC793] Postel, J., "Transmission Control Protocol", RFC 793, 1231 September 1981, . 1233 [LANGSEC] LANGSEC, "LANGSEC: Language-theoretic Security", 1234 . 1236 [SASSAMAN] Sassaman, L., Patterson, M. L., Bratus, S., and A. 1237 Shubina, "The Halting Problems of Network Stack 1238 Insecurity", ;login: -- December 2011, Volume 36, Number 1239 6, . 1243 [draft-mcquistin-augmented-udp-example] 1244 McQuistin, S., Band, V., Jacob, D., and C. S. Perkins, 1245 "Describing UDP with Augmented Packet Header Diagrams", 1246 Work in Progress, Internet-Draft, draft-mcquistin- 1247 augmented-udp-example-02, 25 October 2021, 1248 . 1251 [draft-mcquistin-augmented-tcp-example] 1252 McQuistin, S., Band, V., Jacob, D., and C. S. Perkins, 1253 "Describing TCP with Augmented Packet Header Diagrams", 1254 Work in Progress, Internet-Draft, draft-mcquistin- 1255 augmented-tcp-example-02, 25 October 2021, 1256 . 1259 [draft-mcquistin-quic-augmented-diagrams] 1260 McQuistin, S., Band, V., Jacob, D., and C. S. Perkins, 1261 "Describing QUIC's Protocol Data Units with Augmented 1262 Packet Header Diagrams", Work in Progress, Internet-Draft, 1263 draft-mcquistin-quic-augmented-diagrams-05, 25 October 1264 2021, . 1267 [draft-ietf-tcpm-rfc793bis] 1268 Eddy, W., "Transmission Control Protocol (TCP) 1269 Specification", Work in Progress, Internet-Draft, draft- 1270 ietf-tcpm-rfc793bis-25, 7 September 2021, 1271 . 1274 Appendix A. ABNF specification 1276 A.1. Constraint Expressions 1277 constant = %x31-39 *(%x30-39) ; natural numbers without leading 0s 1278 short-name = ALPHA *(ALPHA / DIGIT / "-" / "_") 1279 name = short-name *(" " short-name) 1280 sp = [" "] ; optional space in expression 1281 bool-expr = "(" sp bool-expr sp ")" / 1282 "!" sp bool-expr / 1283 bool-expr sp bool-op sp bool-expr / 1284 bool-expr sp "?" sp expr sp ":" sp expr / 1285 expr sp cmp-op sp expr 1286 bool-op = "&&" / "||" 1287 cmp-op = "==" / "!=" / "<" / "<=" / ">" / ">=" 1288 expr = "(" sp expr sp ")" / 1289 expr sp op sp expr / 1290 bool-expr "?" expr ":" expr / 1291 name / short-name "." short-name / 1292 "size(" short-name ")" / 1293 constant 1294 op = "+" / "-" / "*" / "/" / "%" / "^" 1295 length = expr sp unit / "[" sp name sp "]" / "variable length" 1296 unit = %s"bit" / %s"bits" / %s"byte" / %s"bytes" / name 1298 A.2. Augmented packet diagrams 1300 Future revisions of this draft will include an ABNF specification for 1301 the augmented packet diagram format described in Section 4. Such a 1302 specification is omitted from this draft given that the format is 1303 likely to change as its syntax is developed. Given the visual nature 1304 of the format, it is more appropriate for discussion to focus on the 1305 examples given in Section 4. 1307 Appendix B. Tooling & source code 1309 The source for this draft is available from https://github.com/ 1310 glasgow-ipl/draft-mcquistin-augmented-ascii-diagrams. 1312 The source code for tooling that can be used to parse this document 1313 is available from https://github.com/glasgow-ipl/ips-protodesc-code. 1314 This tooling supports the automatic generation of Rust parser code 1315 from protocol descriptions written in the Augmented Packet Header 1316 Diagram format. It also provides test harnesses that demonstrate 1317 that example descriptions of UDP 1318 [draft-mcquistin-augmented-udp-example] and TCP 1319 [draft-mcquistin-augmented-udp-example] function as expected. 1321 Authors' Addresses 1322 Stephen McQuistin 1323 University of Glasgow 1324 School of Computing Science 1325 Glasgow 1326 G12 8QQ 1327 United Kingdom 1329 Email: sm@smcquistin.uk 1331 Vivian Band 1332 University of Glasgow 1333 School of Computing Science 1334 Glasgow 1335 G12 8QQ 1336 United Kingdom 1338 Email: vivianband0@gmail.com 1340 Dejice Jacob 1341 University of Glasgow 1342 School of Computing Science 1343 Glasgow 1344 G12 8QQ 1345 United Kingdom 1347 Email: d.jacob.1@research.gla.ac.uk 1349 Colin Perkins 1350 University of Glasgow 1351 School of Computing Science 1352 Glasgow 1353 G12 8QQ 1354 United Kingdom 1356 Email: csp@csperkins.org