idnits 2.17.00 (12 Aug 2021) /tmp/idnits44681/draft-bagnulo-tcpm-esn-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 378: '...irst, TCP sender MUST set timestamp in...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 29, 2017) is 1688 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 6824 (Obsoleted by RFC 8684) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Bagnulo 3 Internet-Draft UC3M 4 Intended status: Experimental Y. Nishida 5 Expires: April 2, 2018 GE Global Research 6 September 29, 2017 8 TCP ESN: Extended Sequence Numbers for TCP 9 draft-bagnulo-tcpm-esn-00.txt 11 Abstract 13 This note defines the Extended Sequence Number (ESN) experimental 14 modification to TCP to increase TCP's sequence number using the 15 TimeStamp (TS) option. It also modifies the Window Scale (WS) option 16 to support larger receiver window enable by the extended sequence 17 number space. At this stage, the purpose of this document is to 18 discuss different design choices to generate discussion about the 19 approach to follow. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on April 2, 2018. 38 Copyright Notice 40 Copyright (c) 2017 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Design rationale . . . . . . . . . . . . . . . . . . . . . . 3 57 2.1. Reduced option space consumption in the SYN and graceful 58 fallback . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2.2. Deployability . . . . . . . . . . . . . . . . . . . . . . 4 60 3. RTTM With Extended Sequence Number Prefix . . . . . . . . . . 4 61 4. Middleboxes Implications . . . . . . . . . . . . . . . . . . 7 62 5. SACK for Extended Sequence Number . . . . . . . . . . . . . . 8 63 6. Impacts On Other TCP Extensions . . . . . . . . . . . . . . . 8 64 6.1. PAWS . . . . . . . . . . . . . . . . . . . . . . . . . . 8 65 6.2. Eifel Detection Algorithm . . . . . . . . . . . . . . . . 9 66 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 67 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 68 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 69 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 70 10.1. Normative References . . . . . . . . . . . . . . . . . . 9 71 10.2. Informative References . . . . . . . . . . . . . . . . . 9 72 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 74 1. Overview 76 The proposed Extended Sequence Number (ESN) mechanism re-purposes the 77 TS option [RFC7323] to carry a prefix for the sequence number and a 78 prefix for the Acknowledgement number, increasing the sequence number 79 used in TCP connections. 81 As currently defined, the TS option contains two 32-bit fields, TSval 82 and TSecr. The current ESN proposal re-defines TSval to carry a 83 prefix for the sequence number and TSecr to carry a prefix for the 84 Acknowledgment number. In this way, the actual sequence number 85 corresponding to the first data byte contained in the segment would 86 the the concatenation of the value contained in the TSval and the 87 value of the Sequence Number field of the TCP header. The 88 Acknowledgment sequence number would be the concatenation of the 89 value contained in the TSecr and the value of the Acknowledgment 90 Number field of the TCP header. 92 The proposed ESN mechanism also modifies the WS option as follows: 93 First, values up to 46 are allowed (enabling a RCV window up to 94 2^62). These are encoded in the 6 less significant bits of the 95 shift.count. Second, the remaining two (most significant) bits are 96 turned into flags. In particular, the most significant bit is used 97 as the ESN flag to indicate the ESN support in the connection. 98 Specifically, when the ESN bit is set to 1 in the WS carried in a SYN 99 or a SYN/ACK, it means that: i) the TS option is being used for 100 extended sequence numbers, as defined above, and ii) that the sender 101 of the WS option with the ESN bit set supports receiver window up to 102 2^62 in this connection. The ESN flag defined this way allows 103 endpoints to express and negotiate ESN support during the TCP 3-way 104 handshake. 106 The sequence number of a TCP segment using ESN is the result of 107 prepending the prefix carried in the TS Value and the sequence number 108 contained in the Sequence Number field of the TCP header. Similarly, 109 the ACK number is the result of prepending the value in the TS Echo 110 Reply value and the value in the ACK field of the TCP header. 112 When a client wants to use the extended sequence number for a new 113 connection, it sends a SYN with both the TS and the WS options. In 114 the WS option, it sets the ESN flag to inform that it wants to use 115 ESN for this connection. It encodes the most significant bits of the 116 sequence number in the TS Value and the remaining bits of the 117 extended sequence number in the sequence number field in the TCP 118 header. Since the ACK flag is not set in the TCP header of the SYN 119 packet, the TS Echo Value is set to zero (as defined in [RFC7323]). 121 If the server also supports the extended sequence number mechanism, 122 the server replies with a SYN/ACK carrying both the TS and WS 123 options. In the WS option it sets the ESN flag to confirm the ESN 124 support. It encodes the prefix of its own extended sequence number 125 in the TS Value and the prefix of the ACK in the TS Echo Reply. 127 If the server does not support ESN, it will respond with a SYN/ACK 128 containing a WS option carrying a value lower then 14 i.e. with the 129 most significant bit set to 0. It may also include the TS option 130 indicating its willingness to use timestamps as defined in RFC7323 in 131 this connection. Upon the reception of the SYN/ACK, the client can 132 gracefully fall back to use TS are defined in RFC7323, in particular, 133 PAWS can be used. 135 2. Design rationale 137 Our proposal is to re-utilize the TCP TS option to carry a sequence 138 number offset in addition to the existing 32 bits sequence number. 139 This approach is similar to [I-D.looney-tcpm-64-bit-seqnos] although 140 it has distinct difference. while [I-D.looney-tcpm-64-bit-seqnos] 141 proposes to allocate a new TCP option, we propose to utilize existing 142 TS option instead. We believe this approach will have the following 143 advantages. 145 2.1. Reduced option space consumption in the SYN and graceful fallback 147 The maximum size of the TCP header (including options) is 60 bytes 148 (this is because the Data Offset field of the TCP header is 4 bits 149 and can expresses the offset in 32-bit words). Since the TCP basic 150 header is 20 bytes, a segment can carry 40 bytes of options at most. 151 This is particularly pressing for the TCP SYN and TCP SYN/ACK 152 packets. Currently, there is a fair number of options that are 153 frequently carried in SYN packets, especially in high performance 154 communications. In particular, the MSS option (2 bytes) [RFC0793], 155 the SACK permitted option (2 bytes)[RFC2018], the Window Scale option 156 (3 bytes) and the TimeStamp option (used for PAWS) (10 bytes) 157 [RFC7323]. All these options account for 17 bytes. The are other 158 options that are becoming increasingly popular. For instance, The 159 option length of TCP Fast Open (TFO) [RFC7413] is 6 bytes or 18 bytes 160 depending on the length of the cookie used. There are other options 161 that require SYN and SYN/ACK option space such as MP_CAPABLE in 162 [RFC6824], or TCP-AO [RFC5925]. 164 This means that for instance, a TCP client that would like to 165 initiate a connection including the MSS option, SACK permitted option 166 the WS and TS options and also carry a TFO option would not have room 167 to carry an additional 10 byte long option for the extended sequence 168 number. Since our approach utilizes TS option, additional option 169 space for extended sequence number is not needed. 171 The proposed ESN approach allows for using the extended sequence 172 number if both endpoints support it while enabling graceful fall- 173 back. A client supporting ESN would include the TS option and set 174 the flag in the WS option indicating the ESN support. If the server 175 does not support ESN, the connection can still be established using 176 32 bit sequence numbers and the TS and WS options as defined in 177 RFC7323 (in particular PAWS can be used in the connection). 179 2.2. Deployability 181 [HONDA11] reported that unknown options in the SYN prone to be 182 removed with higher probability than known options. Hence, we 183 believe utilizing existing options will have better chances to avoid 184 unwanted middleboxes' interferences. Although it would be useful to 185 perform some other measurements specifically about how frequently the 186 TS option is removed. 188 3. RTTM With Extended Sequence Number Prefix 190 [RFC7323] defined two uses for the TS option: PAWS and RTTM. When 191 re-purposing the TS option for ESN, we argue that the use of TS for 192 carrying extended sequence number subsumes the uses of PAWS. 194 However, this is not the case for RTTM. We identify the following 195 alternatives in order to archive RTTM when re-purposing the TS option 196 for ESN. 198 Option 1: 199 This approach uses the most significant bit (MSB) of both TSval 200 and TSecr as a flag as depicted in Figure 1. If the MSB is set 201 to 1, it means the field contained a sequence number prefix. If 202 it is reset, it means that it contains a timestamp. This means 203 that we use 31 bits for the extended sequence number prefix, 204 resulting in 63 bit long sequence numbers. The main problem here 205 is that the segments containing the timestamp lack the sequence 206 number prefix information. So, for instance, it is not possible 207 to have more that 2^32 bytes in flight if any of the segments in 208 flight is carrying and actual timestamp, since there is the 209 possibility of confusion (in particular is the receive window is 210 large enough to accommodate two packets with the same 32 bit 211 sequence number, then the receiver would not be able to figure 212 out the right place for the packet that carries the timestamp and 213 does not carry the sequence number prefix). So, if we want to 214 use this option, the receiver window cannot be larger than 2^32. 215 However, this restriction does not address all the problems. If 216 a duplicated packet carrying a timestamp in the TS option gets 217 delay one RTT or more and the 32 bit sequence number wraps 218 around, then the receiver can potentially take this old 219 duplicated packet for a new packet with the same sequence number 220 suffix. It would be possible to rely on PAWS for detecting and 221 eliminating this packets. However, in order for PAWS to be used, 222 it is necessary to keep the timestamp information stored in 223 TS.recent updated. This requires that at least a few actual 224 timestamps are exchanged every 2^31 sequence numbers. 225 Summarizing, the constraints to use this option are first that 226 the light-size is less than 2^32 and that at least n (n=4?) 227 timestamps are exchanged every 2^32 bytes of data. We believe 228 this is poor alternative, especially due to the flight-size 229 constraint. 231 +-------+-------+-+---------------------+-+----------------------+ 232 |Kind=8 | 10 |F| TSval or Prefix |F| TSecr or Prefix | 233 +-------+-------+-+---------------------+-+----------------------+ 234 8 8 1 31 1 31 236 Figure 1: Time Stamp Option format for Option 1 238 Option 2: 240 This approach uses the TSecr in some packets to exchange 241 timestamps. The idea here is that all data segments carry the 242 extended sequence number prefix in the TSval but that some 243 packets do not carry ACK information, which is acceptable because 244 we use cumulative ACKs as long as this only affects a few packets 245 (e.g. one packet per RTT do not carry ACK information). In order 246 to enable both uses of the TSecr (timestamp or sequence number 247 prefix), we need to use 2 bits to encode whether the TSecr 248 carries either an extended sequence number prefix for the ACK, a 249 timestamp or a timestamp echo. This implies that there are 30 250 bits left in TSecr for the actual value, resulting in 30 bit 251 timestamps and 62 bit sequence numbers The receiver of a packet 252 carrying the TS option carrying an actual timestamp or timestamp 253 echo should discard the ACK information since it cannot know the 254 the prefix of the seq number carried in the ACK field. This 255 option seems a reasonable trade-off. If this option is adopted, 256 RTTM could only be used sporadically. However, this may not be a 257 concern, since it is likely that it would be possible to measure 258 the RTT at least once every RTT which is likely to be enough for 259 estimating the RTT for the RTO calculation (see [RFC7323] for 260 further details). 262 +-------+-------+--+--------------------+--+---------------------+ 263 |Kind=8 | 10 |F | TSval or Prefix |F | TSecr or Prefix | 264 +-------+-------+--+--------------------+--+---------------------+ 265 8 8 2 30 2 30 267 Figure 2: Time Stamp Option format for Option 2 269 Option 3: 270 This approach splits the TSval and the TSecr into two 16-bit 271 fields resulting in 16 bit timestamps and 48 bit sequence 272 numbers. 48 bit sequence numbers are a significant improvement 273 from the current 32 bit sequence numbers, so it is probably 274 enough. It is possible to encode the timestamp information using 275 16 bits. For example, [I-D.trammell-tcpm-timestamp-interval] 276 proposes to encode timestamp information using 16 bits, which 277 could be used in this option. 279 +-------+-------+-----------+-----------+------------+-----------+ 280 |Kind=8 | 10 | Prefix | TSval | Prefix | TSecr | 281 +-------+-------+-----------+-----------+------------+-----------+ 282 8 8 16 16 16 16 284 Figure 3: Time Stamp Option format for Option 3 286 Option 4: 287 This approach Only uses the TS for one single purpose per 288 connection either the original purpose or ESN. This will be less 289 attractive because the RTTM cannot be used with ESN in the same 290 connection. 292 +-------+-------+-----------------------+------------------------+ 293 |Kind=8 | 10 | Prefix | Prefix | 294 +-------+-------+-----------------------+------------------------+ 295 8 8 32 32 297 Figure 4: Time Stamp Option format for Option 4 299 Based on the observations above, we believe option 2 and 3 would be 300 worth for further discussions while option 1 and 4 can be discarded 301 due to major drawbacks. 303 4. Middleboxes Implications 305 It has been observed in [HONDA11] that some middleboxes insert the TS 306 Option. Also, there may be boxes out there that modify the sequence 307 number, while not terminating the connection. In order to detect 308 these cases that would break the proposed mechanism, it would be 309 beneficial to add an extra safety measure requiring that the prefix 310 encoded in the TS Option replicates the most significant bits of the 311 value included in the Sequence number field. In this way, a server 312 supporting the extended sequence number mechanism cannot only verify 313 the flag in the WS option, but also check if the TS value matches 314 with the 31 most significant bits in the Sequence Number field in the 315 TCP header. If they do not match, the server should not negotiate 316 the use of the extended sequence number mechanism (i.e. it replies 317 with the WS option resetting the flag for the extended sequence 318 number mechanism). This is adopted from 319 [I-D.looney-tcpm-64-bit-seqnos]. 321 In case that the server is a legacy server, it will reply without the 322 WS option or with the WS option with a shift.count value lower than 323 15. In this case, the client falls back to regular TCP without the 324 extended sequence number and regular timestamps. 326 5. SACK for Extended Sequence Number 328 In the case of SACK blocks, there are two possible complementary 329 approaches: 331 1. we use the currently defined SACK options identifying bits using 332 32 bit sequence numbers. These are used in a connection that has 333 successfully negotiated ESN, the prefix carried in the TSecr of 334 the message applies also to the sequence numbers identifying the 335 SACK blocks. The limitation of such approach is that all SACK 336 blocks in a single SACK option must use to the same prefix, which 337 prevents from SACKing older blocks. However, it is not certain 338 that if we really need to report wide range of SACK blocks in a 339 single SACK option. Another issue would be the case where a SACK 340 option is detached from the original packet and attached to a 341 different one. One possible mitigation for this would be 342 discarding SACK info in case of suspicious as SACK is optional 343 info and a SACK info usually is carried in multiple ACKs. 345 2. define a new SACK block option for extended sequence numbers as 346 proposed in [I-D.looney-tcpm-64-bit-seqnos]. 348 There are a couple of observations regarding the last option using 349 the new SACK block option. First, note that the currently SACK 350 permitted option could still be used. Hence, if a connection 351 negotiated both SACK and ESN, we may presume that it supports the new 352 SACK block option. If the ESN negotiation fails, it means that 353 32-bit SACK are to be used for that connection, providing graceful 354 fallback. 356 6. Impacts On Other TCP Extensions 358 Since this proposal repurpose the existing use of timestamp option, 359 some other proposals that use the option will be affected. We 360 investigated the impacts on the following TCP extensions and propose 361 modifications to make them work with the proposal. 363 6.1. PAWS 365 In order to perform PAWS, receives need to check if the timestamp 366 option in an arrived packet contains sequence number prefix or 367 timestamp info by checking the most significant bit. If it contains 368 timestamp info, it process the timestamp info as described 369 Section 5.3 in [RFC7323]. If it contains sequence number prefix, it 370 can know the extended sequence number of the packet based on the 371 into. If the extended sequence number is outside of the window, the 372 packet will be discarded as PAWS. 374 6.2. Eifel Detection Algorithm 376 If Eifel detection algorithm [RFC3522] is activated, senders performs 377 the logics described in Section 3.2 of [RFC3522] with the following 378 two modifications. First, TCP sender MUST set timestamp info when it 379 retransmit packets. Second, if TCP sender receives the ACK with 380 sequence number prefix for the retransmitted packet, it should treat 381 as if the timestamp is smaller than the value of RetransmitTS. 383 7. Acknowledgments 385 8. Security Considerations 387 9. IANA Considerations 389 10. References 391 10.1. Normative References 393 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 394 RFC 793, DOI 10.17487/RFC0793, September 1981, 395 . 397 [RFC7323] Borman, D., Braden, B., Jacobson, V., and R. 398 Scheffenegger, Ed., "TCP Extensions for High Performance", 399 RFC 7323, DOI 10.17487/RFC7323, September 2014, 400 . 402 10.2. Informative References 404 [HONDA11] Honda, M., Nishida, Y., Raiciu, C., Greenhalgh, A., 405 Handley, M., and H. Tokuda, "Is it still possible to 406 extend TCP?", ACM IMC 2011, 2011. 408 [I-D.looney-tcpm-64-bit-seqnos] 409 jlooney@juniper.net, j., "64-bit Sequence Numbers for 410 TCP", draft-looney-tcpm-64-bit-seqnos-00 (work in 411 progress), March 2017. 413 [I-D.trammell-tcpm-timestamp-interval] 414 Scheffenegger, R., Kuehlewind, M., and B. Trammell, 415 "Encoding of Time Intervals for the TCP Timestamp Option", 416 draft-trammell-tcpm-timestamp-interval-01 (work in 417 progress), July 2013. 419 [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP 420 Selective Acknowledgment Options", RFC 2018, 421 DOI 10.17487/RFC2018, October 1996, 422 . 424 [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm 425 for TCP", RFC 3522, DOI 10.17487/RFC3522, April 2003, 426 . 428 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 429 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 430 June 2010, . 432 [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, 433 "TCP Extensions for Multipath Operation with Multiple 434 Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, 435 . 437 [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP 438 Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, 439 . 441 Authors' Addresses 443 Marcelo Bagnulo 444 UC3M 446 Email: marcelo@it.uc3m.es 448 Yoshifumi Nishida 449 GE Global Research 450 2623 Camino Ramon 451 San Ramon, CA 94583 452 USA 454 Email: nishida@wide.ad.jp