idnits 2.17.00 (12 Aug 2021) /tmp/idnits39950/draft-hudson-spade-03.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 324 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'Header' on line 150 looks like a reference Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Greg Hudson 2 Expires: March 1, 2000 ghudson@mit.edu 3 MIT 5 Simple Protocol Application Data Encoding 6 draft-hudson-spade-03.txt 8 1. Status of this Memo 10 This document is an Internet-Draft and is in full conformance with all 11 provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering Task 14 Force (IETF), its areas, and its working groups. Note that other 15 groups may also distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference 20 material or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 Please send comments to ghudson@mit.edu. 30 2. Abstract 32 This document describes a simple scheme for encoding network protocol 33 data and a simple notation for describing protocol data elements. 34 All encodings are self-terminating (you know when you've reached the 35 end) and assume that the decoder knows what type of protocol element 36 it is expecting. 38 3. Encoding 40 This encoding scheme uses the ASCII translation of characters into 41 bytes except when otherwise noted. Protocol elements are encoded as 42 follows: 44 A byte: a byte encodes itself. So "a" encodes as "a" and so 45 forth. 47 An integer: a sequence of decimal digits followed by a colon. 48 For instance, the number 27 encodes as "27:". Negative 49 integers are preceded by a minus sign, so -27 encodes as 50 "-27:". Excess leading zeroes are not allowed, and zero must 51 be encoded as "0:" (not as "-0:"). Thus each integer has a 52 single encoding. 54 A symbol: a sequence of letters, numbers, and dashes, 55 beginning with a letter, followed by a colon. Case is 56 significant. For instance, the symbol "foo" encodes as 57 "foo:". 59 A list of : an integer giving the number of elements in 60 the list, followed by the elements of the list. For instance, 61 the list of numbers "1", "2", and "3" encodes as "3:1:2:3:". 63 A structure: a collection of dissimilar elements can simply be 64 concatenated together. For instance, a structure containing 65 the number 3 and the list of bytes "ab" encodes as "3:2:ab". 67 A union value: a symbol giving the type of element, an integer 68 giving the length of the encoding of the element's data, and 69 the data itself. For instance, an element of type "foo" with 70 the same data as in the structure example above would be 71 encoded as "foo:6:3:2:ab". If there is no data to be encoded, 72 a data length of 0 should be given, e.g. "bar:0:". 74 Of these types, the union value is the most complicated. It should be 75 used where a protocol needs extensibility, such as for a command set, 76 and avoided elsewhere if possible. 78 Following is an ABNF (RFC 2234) for the wire encoding. The grammar is 79 highly ambiguous, since the encoding assumes that the decoder has 80 auxiliary type information: 82 element = byte / integer / symbol / list / struct 83 / union-val 84 byte = OCTET 85 unsigned = ("0" / PDIGIT *DIGIT) ":" 86 integer = unsigned / ("-" PDIGIT *DIGIT ":") 87 symbol = ALPHA *(ALPHA / DIGIT / "-") ":" 88 list = unsigned *element 89 ; unsigned determines the number of elements 90 ; all elements must be of same type 91 struct = 1*element 92 ; number, type, and order depend on type 93 union-val = symbol unsigned [element] 94 ; unsigned is the length of the element 95 ; unsigned is 0 if element is omitted 96 PDIGIT = %x31-39 97 ; 1-9 99 4. Notation 101 This notation gives a scheme for describing protocol element types and 102 giving them names for the purpose of semantic descriptions. 104 A variable declaration associates a name to be used in semantic 105 descriptions with a type. Variable names are valid symbols beginning 106 with a lowercase letter. Variable declarations end with a line break, 107 and are written as follows: 109 A byte: "Byte " 110 An integer: "Integer " 111 A symbol: "Symbol " 112 A list of : "List[] " 113 A structure named : " " 114 A union named : " " 116 As a notational short-hand, "String" may be used as a synonym for 117 "List[Byte]". 119 Structure and union names are valid symbols beginning with a capital 120 letter. A structure definition is written as: 122 structure { 123 124 . 125 . 126 . 127 } 129 Unions are defined as: 131 union { 132 : 133 . 134 . 135 . 136 } 138 As a special case, if there is no data for a particular union tag, 139 "Null" can be written in place of a variable declaration. 141 Here is an example of two structure definitions which might be used to 142 describe a mail message: 144 structure Header { 145 String name 146 String value 147 } 149 structure Message { 150 List[Header] headers 151 String body 152 } 154 Here is an example of a union definition which might be used together 155 with the above structure definitions to describe a command set: 157 union Command { 158 send: Message m 159 help: Null 160 quit: Null 161 } 163 A quit command would be encoded as "quit:0:". If I have a message 164 with two headers, one with name "From" and value "Greg" and another 165 with name "To" and value "Bob", and the message body is "Test", then I 166 would encode a command to send this message as: 168 send:29:2:4:From4:Greg2:To3:Bob4:Test 170 5. Rationale 172 The primary goal of this encoding scheme is simplicity. Protocol 173 implementors should not have to read a book to understand how data is 174 encoded. For want of a simple encoding scheme, IETF protocols have 175 been turning to ASN.1's basic encoding rules, which are highly 176 complicated and which have presented a barrier to implementation in 177 practice. 179 Two secondary goals of this encoding scheme are human readability and 180 space efficiency. These goals are of course at odds; integers could 181 be encoded more compactly by using more than ten values per byte, for 182 instance, at the expense of making it more difficult to examine ASCII 183 translations of protocol data. 185 The tagged union encoding provides easy extensibility in most 186 protocols. A protocol can find the end of the encoding of a tagged 187 union element even if it doesn't know the data types for the 188 individual tags. 190 This encoding does not include a length field for structures or an 191 overall length field for lists. Thus, it is impossible to skip to the 192 end of a structure or list without decoding it. This decision was a 193 tradeoff; it simplifies encoding and uses space more efficiently in 194 return for making certain decoding situations more complicated. 196 6. Comparisons 198 This section compares SPADE to several previous wire encodings, 199 including XDR (RFC 1832), CDR (CORBA/IIOP 2.3 section 15.3), ABNF (RFC 200 2234), ASN.1 (ITU-T X.680-X.691), and XML (W3 REC-XML). 202 XDR is similar in philosophy to SPADE: data can only be decoded with 203 auxiliary type information, and simple schemes are used to encode 204 structures and lists. However, XDR made tradeoffs which make the 205 encoding more complex and less general than SPADE. Here are the major 206 differences: 208 * XDR uses a fixed-length binary encoding for integers 209 (usually 32-bit) and enforces four-byte alignment on the 210 encoding of all data types, so that a C implementation on 211 most platforms can byte-swap an integer encoding and cast it 212 to the appropriate type. As a result, there are two 213 different sizes of integers, integers are constrained by any 214 XDR-using protocol to a certain range, integer encodings are 215 not human-readable in an ASCII protocol trace, and padding 216 is required for the encoding and decoding of strings. 218 * XDR needlessly distinguishes between "opaque data" (bytes) 219 and "strings" (ASCII characters). 221 * XDR's "discriminated union" does not provide the length 222 field needed to extend the union in later revisions of a 223 protocol. As a result, unions have less overhead, but it is 224 harder to make a protocol extensible. 226 * XDR does not provide symbols; instead, it provides 227 enumerations which map symbols to numbers. In most cases, 228 such mappings are unnecessary and a debugging hindrance. 229 The discriminant of an XDR union must be a number. 231 * XDR has more types: it provides floating point numbers, 232 fixed-length lists, and optional data, and it distinguishes 233 between signed and unsigned integers. 235 CDR is similar to XDR in all of the respects noted above. It is 236 slightly more complicated than XDR: it provides a wider variety of 237 integral types, has different alignment constraints for different 238 types, and allows integral types to be encoded in either little-endian 239 or big-endian form (XDR always uses big-endian). 241 ABNF is not really a wire encoding scheme at all; it is a scheme for 242 describing syntaxes. It can be used to describe any wire encoding, 243 but does not in itself nail down how integers, strings, lists, 244 etc. should be encoded. As a notational device, it is not geared 245 towards describing data types; the description of a data type would 246 have to be intrinsically linked to its encoding, which is undesirable, 247 and ABNF has no notion of counting, so it cannot link the length field 248 of a list with the number of list elements provided. 250 ASN.1 is a very large and complicated specification. Even third-party 251 attempts to describe the ASN.1 basic and distinguished encoding rules 252 tend to be long and difficult to understand. Some differences between 253 ASN.1 with the Distinguished Encoding Rules (DER) and SPADE are: 255 * The DER provide explicit type information in the encoding. 256 As a result, a DER decoder can turn a wire encoding into 257 a data structure with no auxiliary type information, but the 258 encoding is space-inefficient and the encoding scheme 259 becomes much more complex. Someone writing an encoder by 260 hand for an ASN.1-using protocol will be constantly adding 261 in magic numbers which are always the same. 263 * The DER provide an overall length field for lists ("SEQUENCE 264 OF"), but not an element count. So a decoder can easily 265 skip over a list without parsing the elements, but it cannot 266 allocate memory for the list elements until it has decoded 267 the list. 269 * The DER provide an overall length field for structures 270 ("SEQUENCE"), making it easy to skip over structures without 271 decoding their contents. The cost is a less space-efficient 272 protocol. 274 * The DER rely heavily on bit-packing to encode type tags and 275 lengths, resulting in a very complex wire encoding which is 276 nonetheless not space-efficient. 278 * ASN.1 needlessly distingishes between "OCTET STRING", 279 "IA5String" (ASCII), "PrintableString" (restricted ASCII), 280 and "T61String" (extended ASCII). A wire encoding does not 281 need to get into the interpretation of octets as characters. 283 XML takes a different view of encoding protocol data than SPADE, 284 ASN.1, or XDR. The only "primitive data type" is character data, 285 which is arranged into a tree of elements according to a Document Type 286 Definition (DTD). Protocol data is encoded in a markup language 287 instead of being packed precisely according to rigid data types. Some 288 resulting differences between XML and SPADE are: 290 * An XML encoding will be much more verbose than a SPADE 291 encoding (or even an ASN.1 DER encoding in most cases), 292 due to the addition of element names. Of course, this 293 makes it easier for a human reader of a protocol dump to 294 guess the meaning of each data element. 296 * Because XML uses markup rather than advance-length encoding, 297 an encoder or decoder has more quoting issues to deal with. 298 This and other facets of XML make XML a much more 299 complicated specification than SPADE. 301 * An XML DTD is generally not understandable to a reader not 302 skilled in XML, whereas a SPADE or XDR or ASN.1 data type 303 generally is. 305 * Hand-coding an encoder or decoder for an XML-using protocol 306 is probably a lost cause. 308 7. Security Considerations 310 For maximum generality, this encoding scheme places no limits on the 311 length of any data type. This could lead to denial of service attacks 312 against implementations of protocols using this encoding ("here 313 follows a string of length two gazillion"). It does not seem 314 appropriate to choose limits in the wire encoding to prevent this sort 315 of attack, so guarding against these attacks will have to be the 316 responsibility of particular protocols or their implementations. 318 8. Acknowledgements 320 Thanks to Elliot Schwartz for suggesting the name. 322 Thanks to Chris Newman for providing the basis for the ABNF for the 323 wire encoding, and other useful suggestions.