idnits 2.17.00 (12 Aug 2021) /tmp/idnits29851/draft-mcquistin-augmented-ascii-diagrams-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 418 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 725 has weird spacing: '...r eq-op bool...' -- The document date (3 November 2019) is 929 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: draft-ietf-quic-transport has been published as RFC 9000 -- Obsolete informational reference (is this intentional?): RFC 7049 (Obsoleted by RFC 8949) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. McQuistin 3 Internet-Draft V. Band 4 Intended status: Experimental C. S. Perkins 5 Expires: 6 May 2020 University of Glasgow 6 3 November 2019 8 Describing Protocol Data Units with Augmented Packet Header Diagrams 9 draft-mcquistin-augmented-ascii-diagrams-01 11 Abstract 13 This document describes a machine-readable format for specifying the 14 syntax of protocol data units within a protocol specification. This 15 format is comprised of a consistently formatted packet header 16 diagram, followed by structured explanatory text. It is designed to 17 maintain human readability while enabling support for automated 18 parser generation from the specification document. This document is 19 itself an example of how the format can be used. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on 6 May 2020. 38 Copyright Notice 40 Copyright (c) 2019 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 45 license-info) in effect on the date of publication of this document. 46 Please review these documents carefully, as they describe your rights 47 and restrictions with respect to this document. Code Components 48 extracted from this document must include Simplified BSD License text 49 as described in Section 4.e of the Trust Legal Provisions and are 50 provided without warranty as described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 2.1. Limitations of Current Packet Format Diagrams . . . . . . 4 57 2.2. Formal languages in standards documents . . . . . . . . . 6 58 3. Design Principles . . . . . . . . . . . . . . . . . . . . . . 7 59 4. Augmented Packet Header Diagrams . . . . . . . . . . . . . . 8 60 4.1. PDUs with Fixed and Variable-Width Fields . . . . . . . . 9 61 4.2. PDUs That Cross-Reference Previously Defined 62 Fields . . . . . . . . . . . . . . . . . . . . . . . . . 11 63 4.3. PDUs with Non-Contiguous Fields . . . . . . . . . . . . . 14 64 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 65 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 66 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 14 67 8. Informative References . . . . . . . . . . . . . . . . . . . 15 68 Appendix A. ABNF specification . . . . . . . . . . . . . . . . . 16 69 A.1. Constraint Expressions . . . . . . . . . . . . . . . . . 16 70 A.2. Augmented packet diagrams . . . . . . . . . . . . . . . . 16 71 Appendix B. Source code repository . . . . . . . . . . . . . . . 16 72 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 74 1. Introduction 76 Packet header diagrams have become a widely used format for 77 describing the syntax of binary protocols. In otherwise largely 78 textual documents, they allow for the visualisation of packet 79 formats, reducing human error, and aiding in the implementation of 80 parsers for the protocols that they specify. 82 Figure 1 gives an example of how packet header diagrams are used to 83 define binary protocol formats. The format has an obvious structure: 84 the diagram clearly delineates each field, showing its width and its 85 position within the header. This type of diagram is designed for 86 human readers, but is consistent enough that it should be possible to 87 develop a tool that generates a parser for the packet format from the 88 diagram. 90 : 0 1 2 3 91 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 92 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 93 : | Source Port | Destination Port | 94 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 95 : | Sequence Number | 96 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 97 : | Acknowledgment Number | 98 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 99 : | Data | |U|A|P|R|S|F| | 100 : | Offset| Reserved |R|C|S|S|Y|I| Window | 101 : | | |G|K|H|T|N|N| | 102 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 103 : | Checksum | Urgent Pointer | 104 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 105 : | Options | Padding | 106 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 107 : | data | 108 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 110 Figure 1: TCP's header format (from [RFC793]) 112 Unfortunately, the format of such packet diagrams varies both within 113 and between documents. This variation makes it difficult to build 114 tools to generate parsers from the specifications. Better tooling 115 could be developed if protocol specifications adopted a consistent 116 format for their packet descriptions. Indeed, this underpins the 117 format described by this draft: we want to retain the benefits that 118 packet header diagrams provide, while identifying the benefits of 119 adopting a consistent format. 121 This document describes a consistent packet header diagram format and 122 accompanying structured text constructs that allow for the parsing 123 process of protocol headers to be fully specified. This provides 124 support for the automatic generation of parser code. Broad design 125 principles, that seek to maintain the primacy of human readability 126 and flexibility in authorship, are described, before the format 127 itself is given. 129 This document is itself an example of the approach that it describes, 130 with the packet header diagrams and structured text format described 131 by example. 133 This draft describes early work. As consensus builds around the 134 particular syntax of the format described, both a formal ABNF 135 specification and code that parses it (and, as described above, this 136 document) will be provided. 138 : The RESET_STREAM frame is as follows: 139 : 140 : 0 1 2 3 141 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 142 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 143 : | Stream ID (i) ... 144 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 145 : | Application Error Code (16) | 146 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 147 : | Final Size (i) ... 148 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 149 : 150 : RESET_STREAM frames contain the following fields: 151 : 152 : Stream ID: A variable-length integer encoding of the Stream ID 153 : of the stream being terminated. 154 : 155 : Application Protocol Error Code: A 16-bit application protocol 156 : error code (see Section 20.1) which indicates why the stream 157 : is being closed. 158 : 159 : Final Size: A variable-length integer indicating the final size 160 : of the stream by the RESET_STREAM sender, in unit of bytes. 162 Figure 2: QUIC's RESET_STREAM frame format (from [QUIC-TRANSPORT]) 164 2. Background 166 This section begins by considering how packet header diagrams are 167 used in existing documents. This exposes the limitations that the 168 current usage has in terms of machine-readability, guiding the design 169 of the format that this document proposes. 171 While this document focuses on the machine-readability of packet 172 format diagrams, this section also discusses the use of other 173 structured or formal languages within IETF documents. Considering 174 how and why these languages are used provides an instructive contrast 175 to the relatively incremental approach proposed here. 177 2.1. Limitations of Current Packet Format Diagrams 179 Packet header diagrams are frequently used in IETF standards to 180 describe the format of binary protocols. While there is no standard 181 for how these diagrams should be formatted, they have a broadly 182 similar structure, where the layout of a protocol data unit (PDU) or 183 structure is shown in diagrammatic form, followed by a description 184 list of the fields that it contains. An example of this format, 185 taken from the QUIC specification, is given in Figure 2. 187 These packet header diagrams, and the accompanying descriptions, are 188 formatted for human readers rather than for automated processing. As 189 a result, while there is rough consistency in how packet header 190 diagrams are formatted, there are a number of limitations that make 191 them difficult to work with programmatically: 193 Inconsistent syntax: There are two classes of consistency that are 194 needed to support automated processing of specifications: internal 195 consistency within a diagram or document, and external consistency 196 across all documents. 198 Figure 2 gives an example of internal inconsistency. Here, the 199 packet diagram shows a field labelled "Application Error Code", 200 while the accompanying description lists the field as "Application 201 Protocol Error Code". The use of an abbreviated name is suitable 202 for human readers, but makes parsing the structure difficult for 203 machines. Figure 3 gives a further example, where the description 204 includes an "Option-Code" field that does not appear in the packet 205 diagram; and where the description states that each field is 16 206 bits in length, but the diagram shows the OPTION_RELAY_PORT as 13 207 bits, and Option-Len as 19 bits. Another example is [RFC6958], 208 where the packet format diagram showing the structure of the 209 Burst/Gap Loss Metrics Report Block shows the Number of Bursts 210 field as being 12 bits wide but the corresponding text describes 211 it as 16 bits. 213 Comparing Figure 2 with Figure 3 exposes external inconsistency 214 across documents. While the packet format diagrams are broadly 215 similar, the surrounding text is formatted differently. If 216 machine parsing is to be made possible, then this text must be 217 structured consistently. 219 Ambiguous constraints: The constraints that are enforced on a 220 particular field are often described ambiguously, or in a way that 221 cannot be parsed easily. In Figure 3, each of the three fields in 222 the structure is constrained. The first two fields ("Option-Code" 223 and "Option-Len") are to be set to constant values (note the 224 inconsistency in how these constraints are expressed in the 225 description). However, the third field ("Downstream Source Port") 226 can take a value from a constrained set. This constraint is 227 expressed in prose that cannot readily by understood by machine. 229 Poor linking between sub-structures: Protocol data units and other 230 structures are often comprised of sub-structures that are defined 231 elsewhere, either in the same document, or within another 232 document. Chaining these structures together is essential for 233 machine parsing: the parsing process for a protocol data unit is 234 only fully expressed if all elements can be parsed. 236 Figure 2 highlights the difficulty that machine parsers have in 237 chaining structures together. Two fields ("Stream ID" and "Final 238 Size") are described as being encoded as variable-length integers; 239 this is a structure described elsewhere in the same document. 240 Structured text is required both alongside the definition of the 241 containing structure and with the definition of the sub-structure, 242 to allow a parser to link the two together. 244 : The format of the "Relay Source Port Option" is shown below: 245 : 246 : 0 1 2 3 247 : 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 248 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 249 : | OPTION_RELAY_PORT | Option-Len | 250 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 251 : | Downstream Source Port | 252 : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 253 : 254 : Where: 255 : 256 : Option-Code: OPTION_RELAY_PORT. 16-bit value, 135. 257 : 258 : Option-Len: 16-bit value to be set to 2. 259 : 260 : Downstream Source Port: 16-bit value. To be set by the IPv6 261 : relay either to the downstream relay agent's UDP source port 262 : used for the UDP packet, or to zero if only the local relay 263 : agent uses the non-DHCP UDP port (not 547). 265 Figure 3: DHCPv6's Relay Source Port Option (from [RFC8357]) 267 2.2. Formal languages in standards documents 269 A small proportion of IETF standards documents contain structured and 270 formal languages, including ABNF [RFC5234], ASN.1 [ASN1], C, CBOR 271 [RFC7049], JSON, the TLS presentation language [RFC8446], YANG models 272 [RFC7950], and XML. While this broad range of languages may be 273 problematic for the development of tooling to parse specifications, 274 these, and other, languages serve a range of different use cases. 275 ABNF, for example, is typically used to specify text protocols, while 276 ASN.1 is used to specify data structure serialisation. This document 277 specifies a structured language for specifying the parsing of binary 278 protocol data units. 280 3. Design Principles 282 The use of structures that are designed to support machine 283 readability may potentially interfere with the existing ways in which 284 protocol specifications are used and authored. To the extent that 285 these existing uses are more important than machine readability, such 286 interference must be minimised. 288 In this section, the broad design principles that underpin the format 289 described by this document are given. However, these principles 290 apply more generally to any approach that introduces structured and 291 formal languages into standards documents. 293 It should be noted that these are design principles: they expose the 294 trade-offs that are inherent within any given approach. Violating 295 these principles is sometimes necessary and beneficial, and this 296 document sets out the potential consequences of doing so. 298 The central tenet that underpins these design principles is a 299 recognition that the standardisation process is not broken, and so 300 does not need to be fixed. Failure to recognise this will likely 301 lead to approaches that are incompatible with the standards process, 302 or that will see limited adoption. However, the standards process 303 can be improved with appropriate approaches, as guided by the 304 following broad design principles: 306 Most readers are human: Primarily, standards documents should be 307 written for people, who require text and diagrams that they can 308 understand. Structures that cannot be easily parsed by people 309 should be avoided, and if included, should be clearly delineated 310 from human-readable content. 312 Any approach that shifts this balance -- that is, that primarily 313 targets machine readers -- is likely to be disruptive to the 314 standardisation process, which relies upon discussion centered 315 around documents written in prose. 317 Authorship tools are diverse: Authorship is a distributed process 318 that involves a diverse set of tools and workflows. The 319 introduction of machine-readable structures into specifications 320 should not require that specific tools are used to produce 321 standards documents, to ensure that disruption to existing 322 workflows is minimised. This does not preclude the development of 323 optional, supplementary tools that aid in the authoring machine- 324 readable structures. 326 The immediate impact of requiring specific tooling is that 327 adoption is likely to be limited. A long-term impact might be 328 that authors whose workflows are incompatible might be alienated 329 from the process. 331 Canonical specifications: As far as possible, machine-readable 332 structures should not replicate the human readable specification 333 of the protocol within the same document. Such structures should 334 form part of a canonical specification of the protocol. Adding 335 supplementary machine-readable structures, in parallel to the 336 existing human readable text, is undesirable because it creates 337 the potential for inconsistency. 339 As an example, program code that describes how a protocol data 340 unit can be parsed might be provided as an appendix within a 341 standards document. This code would provide a specification of 342 the protocol that is separate to the prose description in the main 343 body of the document. This has the undesirable effect of 344 introducing the potential for the program code to specify 345 behaviour that the prose-based specification does not, and vice- 346 versa. 348 Expressiveness: Any approach should be expressive enough to capture 349 the syntax and parsing process for the majority of binary 350 protocols. If a given language is not sufficiently expressive, 351 then adoption is likely to be limited. At the limits of what can 352 be expressed by the language, authors are likely to revert to 353 defining the protocol in prose: this undermines the broad goal of 354 using structured and formal languages. Equally, though, 355 understandable specifications and ease of use are critical for 356 adoption. A tool that is simple to use and addresses the most 357 common use cases might be preferred to a complex tool that 358 addresses all use cases. 360 Minimise required change: Any approach should require as few changes 361 as possible to the way that documents are formatted, authored, and 362 published. Forcing adoption of a particular structured or formal 363 language is incompatible with the IETF's standardisation process: 364 there are very few components of standards documents that are non- 365 optional. 367 4. Augmented Packet Header Diagrams 369 The design principles described in Section 3 can largely be met by 370 the existing uses of packet header diagrams. These diagrams aid 371 human readability, do not require new or specialised authorship 372 tools, do not split the specification into multiple parts, can 373 express most binary protocol features, and require no changes to 374 existing publication processes. 376 However, as discussed in Section 2.1 there are limitations to how 377 packet header diagrams are used that must be addressed if they are to 378 be parsed by machine. In this section, an augmented packet header 379 diagram format is described. 381 The concept is first illustrated by example. This is appropriate, 382 given the visual nature of the language. In future drafts, these 383 examples will be parsable using provided tools, and a formal 384 specification of the augmented packet diagrams will be given in 385 Appendix A. 387 4.1. PDUs with Fixed and Variable-Width Fields 389 The simplest PDU is one that contains only a set of fixed-width 390 fields in a known order, with no optional fields or variation in the 391 packet format. 393 Some packet formats include variable-width fields, where the size of 394 a field is either derived from the value of some previous field, or 395 is unspecified and inferred from the total size of the packet and the 396 size of the other fields. A packet can contain only one unspecified 397 length field, to ensure there is no ambiguity. 399 A PDU description is introduced by the exact phrase "A/An _______ is 400 formatted as follows:" at the end of a paragraph. This is followed 401 by the PDU description itself, as a packet diagram within an 402 element in the XML representation, starting with a header 403 line to show the bit width of the diagram. The description of the 404 fields follows the diagram, as an XML
list, after a paragraph 405 containing the text "where:". 407 Each field of the description starts with a
tag comprising the 408 field name and an optional short name in parenthesis. These are 409 followed by a colon, the field length, and a terminating period. The 410 following
tag contains a prose description of the field. 412 For example, this can be illustrated using the IPv4 Header Format 413 [RFC791]. An IPv4 Header is formatted as follows: 415 0 1 2 3 416 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 417 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 418 |Version| IHL | DSCP |ECN| Total Length | 419 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 420 | Identification |Flags| Fragment Offset | 421 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 422 | Time to Live | Protocol | Header Checksum | 423 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 424 | Source Address | 425 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 426 | Destination Address | 427 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 428 | Options ... 429 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 430 | : 431 : Payload : 432 : | 433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 435 where: 437 Version (V): 4 bits. This is a fixed-width field, whose full label 438 is shown in the diagram. The field's width -- 4 bits -- is given 439 in the label of the description list, separated from the field's 440 label by a colon. 442 Internet Header Length (IHL): 4 bits. This is a shorter field, whose 443 full label is too large to be shown in the diagram. A short label 444 (IHL) is used in the diagram, and this short label is provided, in 445 brackets, after the full label in the description list. 447 Differentiated Services Code Point (DSCP): 6 bits. This is a fixed- 448 width field, as previously defined. 450 Explicit Congestion Notification (ECN): 2 bits. This is a fixed- 451 width field, as previously defined. 453 Total Length (TL): 2 bytes. This is a fixed-width field, as 454 previously defined. Where fields are an integral number of bytes 455 in size, the field length can be given in bytes rather than in 456 bits. 458 Identification: 2 bytes. This is a fixed-width field, as previously 459 defined. 461 Flags: 3 bits. This is a fixed-width field, as previously defined. 463 Fragment Offset: 13 bits. This is a fixed-width field, as previously 464 defined. 466 Time To Live (TTL): 1 byte. This is a fixed-width field, as 467 previously defined. 469 Protocol: 1 byte. This is a fixed-width field, as previously 470 defined. 472 Header Checksum: 2 bytes. This is a fixed-width field, as previously 473 defined. 475 Source Address: 32 bits. This is a fixed-width field, as previously 476 defined. 478 Destination Address: 32 bits. This is a fixed-width field, as 479 previously defined. 481 Options: (IHL-5)*32 bits. This is a variable-length field, whose 482 length is defined by the value of the field with short label IHL 483 (Internet Header Length). Constraint expressions can be used in 484 place of constant values: the grammar for the expression language 485 is defined in Appendix A.1. Constraints can include a previously 486 defined field's short or full label, where one has been defined. 487 Short variable-length fields are indicated by "..." instead of a 488 pipe at the end of the row. 490 Payload: TL - ((IHL*32)/8) bytes. This is a multi-row variable- 491 length field, constrained by the values of fields TL and IHL. 492 Instead of the "..." notation, ":" is used to indicate that the 493 field is variable-length. The use of ":" instead of "..." 494 indicates the field is likely to be a longer, multi-row field. 495 However, semantically, there is no difference: these different 496 notations are for the benefit of human readers. 498 4.2. PDUs That Cross-Reference Previously Defined Fields 500 Binary formats often reference sub-structures that have been defined 501 earlier in the specification. For example, in RTP [RFC3550], the 502 Contributing Source Identifiers in an RTP Data Packet are defined as 503 comprising a list of Source Identifier elements. A Source Identifier 504 is formatted as follows: 506 0 1 2 3 507 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 508 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 509 | Source Identifier | 510 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 512 where: 514 Source Identifier: 32 bits. This is a fixed-width field, as 515 described previously. 517 The following example shows how a Source Identifier can be referenced 518 in the description of an RTP Data Packet. It also shows how the 519 presence of some fields in a format may be dependent on the values of 520 an earlier field. 522 An RTP Data Packet is formatted as follows: 524 0 1 2 3 525 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 527 | V |P|X| CC |M| PT | Sequence Number | 528 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 529 | Timestamp | 530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 531 | Synchronization Source identifier | 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 533 | [Contributing Source identifiers] | 534 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 535 | Header Extension | 536 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 537 | Payload : 538 : : 539 : | 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 541 | Padding | Padding Count | 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 544 where: 546 Version (V): 2 bits. This is a fixed-width field, as described 547 previously. 549 Padding (P): 1 bit. This is a fixed-width field, as described 550 previously. 552 Extension (X): 1 bit. This is a fixed-width field, as described 553 previously. 555 CSRC count (CC): 4 bits. This is a fixed-width field, as described 556 previously. 558 Marker (M): 1 bit. This is a fixed-width field, as described 559 previously. 561 Payload Type (PT): 7 bits. This is a fixed-width field, as described 562 previously. 564 Sequence Number (PT): 16 bits. This is a fixed-width field, as 565 described previously. 567 Timestamp (PT): 32 bits. This is a fixed-width field, as described 568 previously. 570 Synchronization Source identifier: 1 * Source Identifier. This is a 571 field whose structure is a previously defined PDU format. To 572 indicate this, the width of the field is expressed in terms of 573 cross-referenced structure (here, Source Identifier). When used 574 in constraint expressions, PDU names refer to the length of that 575 PDU structure. 577 Contributing Source identifiers: CC * Source Identifier. Where a 578 field is comprised of a sequence of previously defined structures, 579 square brackets can be used to indicate this in the diagram. The 580 length of the sequence can be defined using the constraint 581 expression grammar as described earlier. 583 Header Extension: 32 bits; present only when X == 1. This is a field 584 whose presence is predicated on an expression given using the 585 constraint expression grammar described earlier. Optional fields 586 can be of any previously defined format (e.g., fixed- or variable- 587 width). Optional fields are indicated by the presence of a 588 "Present only when [expr]." as the first line in their 589 description. 591 [Note that this example deviates from the format as described in 592 [RFC3550]. As specified in that document, the Header Extension 593 would be a cross-referenced structure. This is not shown here for 594 brevity.] 596 Payload. The length of the Payload is not specified, and hence needs 597 to be inferred from the total length of the packet and the lengths 598 of the known fields. There can only be one field of unspecified 599 size in a PDU. 601 Padding: Padding Count bytes; present only when (P == 1) and 602 (Padding Count > 0). 603 This is a variable size field, with size dependent on a later 604 field in the packet. Fields can only depend on the value of a 605 later field if they follow a field with unspecified size. 607 Padding Count: 1 byte; present only when P == 1. This is a fixed- 608 width field, as previously defined. 610 4.3. PDUs with Non-Contiguous Fields 612 In some binary formats, fields are striped across multiple non- 613 contiguous bits. This is often to allow for backwards compatibility 614 with previous definitions of the same fields in earlier documents: 615 striping in this way allows for careful use of the possible range of 616 values. 618 This format is illustrated using the STUN Message Type 619 [draft-ietf-tram-stunbis-21]. A STUN Message Type is formatted as 620 follows: 622 0 1 623 0 1 2 3 4 5 6 7 8 9 0 1 2 3 624 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 625 |M|M|M|M|M|C|M|M|M|C|M|M|M|M| 626 |B|A|9|8|7|1|6|5|4|0|3|2|1|0| 627 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 629 where: 631 Method (M): 12 bits. This field is comprised of multiple sub-fields 632 (M0 through MB) as shown in the diagram. That these sub-fields 633 should be concatenated, after parsing, into a single field is 634 indicated by their being labelled using the 'M' short field name 635 followed by a single hexadecimal digit, with the least significant 636 bit labelled with 0, and subsequent bits labelled in sequence. 638 Class (C): 2 bits. This field follows the same format as M described 639 above. 641 5. IANA Considerations 643 This document contains no actions for IANA. 645 6. Security Considerations 647 Poorly implemented parsers are a frequent source of security 648 vulnerabilities in protocol implementations. Structuring the 649 description of a protocol data unit so that a parser can be 650 automatically derived from the specification can reduce the 651 likelihood of vulnerable implementations. 653 7. Acknowledgements 655 The authors would like to thank David Southgate for preparing a 656 prototype implementation of some of the ideas described here. 658 The authors would like to thank Marc Petit-Huguenin for feedback on 659 the draft. 661 This work has received funding from the UK Engineering and Physical 662 Sciences Research Council under grant EP/R04144X/1. 664 8. Informative References 666 [RFC8357] Deering, S. and R. Hinden, "Generalized UDP Source Port 667 for DHCP Relay", RFC 8357, March 2018, 668 . 670 [QUIC-TRANSPORT] 671 Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed 672 and Secure Transport", Work in Progress, Internet-Draft, 673 draft-ietf-quic-transport-20, 23 April 2019, 674 . 677 [RFC6958] Clark, A., Zhang, S., Zhao, J., and Q. Wu, "RTP Control 678 Protocol (RTCP) Extended Report (XR) Block for Burst/Gap 679 Loss Metric Reporting", RFC 6958, May 2013, 680 . 682 [RFC7950] Bjorklund, M., "The YANG 1.1 Data Modeling Language", 683 RFC 7950, August 2016, 684 . 686 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 687 Version 1.3", RFC 8446, August 2018, 688 . 690 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 691 Specifications: ABNF", RFC 5234, January 2008, 692 . 694 [ASN1] ITU-T, "ITU-T Recommendation X.680, X.681, X.682, and 695 X.683", ITU-T Recommendation X.680, X.681, X.682, and 696 X.683. 698 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 699 Representation (CBOR)", RFC 7049, October 2013, 700 . 702 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 703 Jacobson, "RTP: A Transport Protocol for Real-Time 704 Applications", RFC 3550, July 2003, 705 . 707 [draft-ietf-tram-stunbis-21] 708 Petit-Huguenin, M., Salgueiro, G., Rosenberg, J., Wing, 709 D., Mahy, R., and P. Matthews, "Session Traversal 710 Utilities for NAT (STUN)", Work in Progress, Internet- 711 Draft, draft-ietf-tram-stunbis-21, 21 March 2019, 712 . 715 [RFC791] Postel, J., "Internet Protocol", RFC 791, September 1981, 716 . 718 [RFC793] Postel, J., "Transmission Control Protocol", RFC 793, 719 September 1981, . 721 Appendix A. ABNF specification 723 A.1. Constraint Expressions 725 cond-expr = eq-expr "?" cond-expr ":" eq-expr eq-expr = bool-expr eq-op bool-expr bool-expr = ord-expr bool-op ord-expr ord-expr = add-expr ord-op add-expr add-expr = mul-expr add-op mul-expr mul-expr = expr mul-op expr expr = *DIGIT / field-name / field-name-ws / "(" expr ")" field-name = *ALPHA field-name-ws = *(field-name " ") mul-op = "*" / "/" / "%" add-op = "+" / "-" ord-op = "<=" / "<" / ">=" / ">" bool-op = "&&" / "||" / "!" eq-op = "==" / "!=" 727 A.2. Augmented packet diagrams 729 Future revisions of this draft will include an ABNF specification for 730 the augmented packet diagram format described in Section 4. Such a 731 specification is omitted from this draft given that the format is 732 likely to change as its syntax is developed. Given the visual nature 733 of the format, it is more appropriate for discussion to focus on the 734 examples given in Section 4. 736 Appendix B. Source code repository 738 The source code for tooling that can be used to parse this document 739 is available from https://github.com/lumisota/improving-protocol- 740 standards. 742 Authors' Addresses 744 Stephen McQuistin 745 University of Glasgow 746 School of Computing Science 747 Glasgow 748 G12 8QQ 749 United Kingdom 751 Email: sm@smcquistin.uk 752 Vivian Band 753 University of Glasgow 754 School of Computing Science 755 Glasgow 756 G12 8QQ 757 United Kingdom 759 Email: vivianband0@gmail.com 761 Colin Perkins 762 University of Glasgow 763 School of Computing Science 764 Glasgow 765 G12 8QQ 766 United Kingdom 768 Email: csp@csperkins.org