idnits 2.17.00 (12 Aug 2021) /tmp/idnits49911/draft-han-tsvwg-cc-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 3, 2018) is 1533 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'RFC1191' is mentioned on line 133, but not defined == Missing Reference: 'RFC4821' is mentioned on line 133, but not defined == Missing Reference: 'RFC1122' is mentioned on line 141, but not defined == Unused Reference: 'RFC4960' is defined on line 403, but no explicit reference was found in the text == Outdated reference: A later version (-02) exists of draft-cardwell-iccrg-bbr-congestion-control-00 == Outdated reference: A later version (-17) exists of draft-ietf-ippm-ioam-data-01 Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG Working Group L. Han 3 Internet-Draft Y. Qu 4 Intended status: Experimental Huawei 5 Expires: September 4, 2018 T. Nadeau 6 Lucid Vision 7 March 3, 2018 9 A New Congestion Control in Bandwidth Guaranteed Network 10 draft-han-tsvwg-cc-00 12 Abstract 14 In bandwidth guaranteed networks, network resources are reserved 15 before a TCP session starts transmitting data. This draft proposes a 16 new TCP congestion control algorithm used in bandwidth guaranteed 17 networks. It is an extension to the current TCP standards. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on September 4, 2018. 36 Copyright Notice 38 Copyright (c) 2018 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 2. Terminology and Notation . . . . . . . . . . . . . . . . . . 3 55 3. Bandwidth Guaranteed Network . . . . . . . . . . . . . . . . 4 56 4. New Congestion Control . . . . . . . . . . . . . . . . . . . 5 57 4.1. Receiver Advertised Window Size . . . . . . . . . . . . . 5 58 4.2. MinBandwidthWND and MaxBandwidthWND . . . . . . . . . . . 5 59 4.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 6 60 4.4. Fast Retransmit and Fast Recovery . . . . . . . . . . . . 7 61 4.5. Timeout . . . . . . . . . . . . . . . . . . . . . . . . . 8 62 4.6. Idle Recovery . . . . . . . . . . . . . . . . . . . . . . 8 63 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 64 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 65 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 66 7.1. Normative References . . . . . . . . . . . . . . . . . . 9 67 7.2. Informative References . . . . . . . . . . . . . . . . . 9 68 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 11 69 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 71 1. Introduction 73 The original IP protocol suite was designed to support best-effort 74 data transmission. With the development of the Internet, congestion 75 became a real problem. To avoid congestion in the Internet, TCP uses 76 congestion-avoidance algorithms to keep hosts from pumping too much 77 traffic into the network. Over the past 40 years there have been 78 various algorithms and optimizations proposed to solve this problem, 79 including TCP-RENO [RFC5681], TCP-NewReno [RFC6582] [RFC6675], TCP- 80 Cubic [RFC8312] and BBR [I-D.cardwell-iccrg-bbr-congestion-control] 81 etc. 83 In bandwidth guaranteed networks, network resources are reserved 84 before transmitting data. This draft proposes a new congestion 85 control algorithm that should be used in bandwidth guaranteed 86 networks to improve TCP throughput. The following is a list of key 87 differences between this new algorithm and classic TCP congestion 88 control [RFC5681]: 90 It doesn't have a slow start, after a TCP session is successfully 91 initiated its congestion window (cwnd) jumps to CIR and the host 92 is allowed to transmit data. This is based on the assumption that 93 network resources have been reserved in bandwidth guaranteed 94 networks. 96 During congestion avoidance, cwnd stays between CIR (Committed 97 Information Rate) and PIR (Peak Information Rate). If there is no 98 packet loss due to congestion, cwnd has a flat top rate as PIR. 100 OAM is used together with duplicate ACKs to detect whether a 101 packet loss is due to congestion or random failure. 103 This draft is organized as follows. Section 2 defines terminologies 104 used in this draft. Section 3 provides background information for 105 Bandwidth Guaranteed Networks. Section 4 explains the details of the 106 new congestion control algorithm. 108 2. Terminology and Notation 110 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 111 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 112 document are to be interpreted as described in [RFC2119]. 114 Some of the following terms are defined the same as [RFC5681], and 115 they are copied here for readability. 117 FULL-SIZED SEGMENT: A segment that contains the maximum number of 118 data bytes permitted (i.e., a segment containing SMSS bytes of 119 data). 121 RECEIVER WINDOW (rwnd): The most recently advertised receiver 122 window. 124 CONGESTION WINDOW (cwnd): A TCP state variable that limits the 125 amount of data a TCP can send. At any given time, a TCP MUST NOT 126 send data with a sequence number higher than the sum of the 127 highest acknowledged sequence number and the minimum of cwnd and 128 rwnd. 130 Sender Maximum Segment Size (SMSS): The SMSS is the size of the 131 largest segment that the sender can transmit. This value can be 132 based on the maximum transmission unit of the network, the path 133 MTU discovery [RFC1191, RFC4821] algorithm, RMSS (see next item), 134 or other factors. The size does not include the TCP/IP headers 135 and options. 137 RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the size of the 138 largest segment the receiver is willing to accept. This is the 139 value specified in the MSS option sent by the receiver during 140 connection startup. Or, if the MSS option is not used, it is 536 141 bytes [RFC1122]. The size does not include the TCP/IP headers and 142 options. 144 INITIAL WINDOW (IW): The initial window is the size of the 145 sender's congestion window after the three-way handshake is 146 completed. 148 RESTART WINDOW (RW): The restart window is the size of the 149 congestion window after a TCP restarts transmission after an idle 150 period. 152 ssthresh: Slow Start Threshold. 154 OAM: Operations, Administrations, and Maintenance. 156 RTT: Round-Trip Time. 158 CIR: Committed Information Rate. 160 PIR: Peak Information Rate. 162 3. Bandwidth Guaranteed Network 164 With the development of new applications, such as AR/VR, the network 165 is required to provide bandwidth guaranteed services. There have 166 been various solutions, including out-of-band signaling protocols 167 such as RSVP [RFC2205] and NSIS [RFC4080], and in-band-signaling as 168 proposed in [I-D.han-6man-in-band-signaling-for-transport-qos]. The 169 common objective of all these solutions is to have network resources/ 170 bandwidth reserved before data is transmitted. The details of how 171 the resource is reserved are out of the scope of this draft, however 172 it is assumed that in bandwidth guaranteed networks there have been 173 network resources (bandwidths, queues etc.) dedicated to the TCP 174 flows, and data is guaranteed at CIR rate. When data rate is between 175 CIR and PIR shared resources are used, and traffic above CIR rate is 176 not guaranteed. No traffic above PIR rate will be allowed to enter 177 the network. 179 The proposed congestion control also requires that OAM (Operations, 180 administration and management) is used to constantly report on the 181 network condition parameters. Before a TCP session is started, 182 important network parameters need to be detected by OAM, such as 183 number of hops, Round Trip Time (RTT). This might be done through 184 setting up a measuring TCP connection. The measuring TCP connection 185 does not have user data, and it is only used to measure the key 186 network parameters. As the network status is constantly changing, 187 after a TCP session is established, these parameters need to be 188 updated. This requires a sender to periodically or consistently 189 embed TCP data packet with OAM 190 [I-D.han-6man-in-band-signaling-for-transport-qos] 191 [I-D.ietf-ippm-ioam-data] to detect current buffer depth, RTT etc. 193 It is important that OAM needs to be able to detect if any device's 194 buffer depth has exceeded the pre-configured threshold, as this is an 195 indication of potential congestion and packet drop. When this 196 happens, OAM should send a possible congestion alarm to the TCP 197 sender. In case the retransmit timer expires on this TCP sender, if 198 a possible congestion alarm has been received it means a packet is 199 dropped due to congestion. Otherwise it is possible that this packet 200 drop might due to some physical failure. The OAM details are out of 201 the scope of this draft. Please refer to other related drafts. 203 In summary, in bandwidth guaranteed networks resources are reserved 204 before transmitting data, and OAM is used to get network statistics. 205 The new congestion control proposed in this draft is to be used in 206 this kind of bandwidth guaranteed networks. 208 4. New Congestion Control 210 [RFC5681] defines a set of TCP congestion algorithms: slow start, 211 congestion avoidance, fast retransmit and fast recovery. The 212 proposed congestion control in this draft is an extension to RFC 213 5681, and it only differs in the congestion control algorithm on the 214 sender side. 216 4.1. Receiver Advertised Window Size 218 Receiver's advertised window (rwnd) is a receiver-side limit on the 219 amount of outstanding data, so a sender should not send data more 220 than this window size. It is calculated as the following: 222 rwnd = AdvertisedWND = MaxRcvBuffer - (LastByteRcvd - LastByteRead) 224 4.2. MinBandwidthWND and MaxBandwidthWND 226 Same as [RFC5681], on the sender side, the congestion window (cwnd) 227 is the sender-side limit on the amount of data that the sender can 228 transmit before receiving an acknowledgement (ACK). Considering both 229 the sender and the receiver side, the effective sending window is 230 always the minimum of cwnd and rwnd: 232 EffectiveWND = min(cwnd, rwnd) 234 A TCP sender MUST NOT send data more than the minimum of cwnd and 235 rwnd. 237 Slow-start is commonly used in TCP at the beginning of a transfer or 238 after a loss repair as the network conditions are unknown, hence this 239 slow probing is necessary to determine the available network capacity 240 in order to avoid inappropriately sending large burst of data into 241 the network and cause congestion. A detailed discussion about 242 initial window setting is provided in [RFC3390]. 244 RTT is the time taken to send a packet to the destination plus 245 receiving a response packet(ACK). Since the network status is 246 constantly changing, RTT also varies. [RFC6298] specifies how RTT 247 should be sampled and updated. In this new algorithm RTT is updated 248 using the following formula: 250 RTT = a* old RTT + (1-a) * new RTT (0 < a < 1) (1) 252 The initial RTT can be achieved using a measure TCP connection, or 253 configured based on historical data. 255 In bandwidth guaranteed network since resources are already allocated 256 and the network status is known through OAM 257 [I-D.han-6man-in-band-signaling-for-transport-qos], it is safe to 258 remove slow-start and allow a host to start sending traffic at the 259 rate of CIR after the TCP session is established. 261 There are two important window sizes, the MinBandwidthWND and the 262 MaxBandwidthWND are calculated as below: 264 MinBandwidthWND = CIR * RTT/MSS (2) 265 MaxBandwidthWND = PIR * RTT/MSS (3) 267 In bandwidth guaranteed networks, after a TCP session is established, 268 the sender can start transmitting data at an initial window size, 269 which is equal to MinBandwidthWND: 271 cwnd = MinBandwidthWND 272 IW = min (cwnd, rwnd) 274 If the receiver window (rwnd) is not a limiting factor, the sender 275 will start sending data at CIR rate. This is a key difference from 276 the classic TCP slow-start, which usually starts from sending one or 277 two packets [RFC5681]. 279 4.3. Congestion Avoidance 281 In TCP-Reno, a TCP enters congestion avoidance mode after slow-start. 282 In bandwidth guaranteed networks, there is no slow-start, so a TCP 283 enters congestion avoidance mode right after the initial start. 285 During congestion avoidance, for approximately per round-trip time 286 when a valid ACK packet is received, cwnd is increased by one until 287 it reaches MaxBandwidthWND. 289 If (cwnd < MaxBandwidthWND) { 290 cwnd +=1; 291 } else { 292 cwnd = MaxBandwidthWND; 293 } 295 Once the cwnd reaches MaxBandwidthWND , it stays constant at 296 MaxBandwidthWND until packet loss is detected. This is another major 297 difference from [RFC5681]. In [RFC5681] congestion avoidance period, 298 the cwnd keeps increasing until a TCP sender detects segment loss. 299 However, in this new congestion control algorithm, the cwnd stays 300 constant at MaxBandwidthWND until there is packet loss detected. 302 This means a TCP sender is never allowed to send data at a rate 303 larger than PIR, and it's different from TCP Reno. 305 4.4. Fast Retransmit and Fast Recovery 307 Same as defined [RFC5681], a TCP receiver SHOULD send an immediate 308 duplicate ACK when an out-of-order segment arrives. The TCP sender 309 detects and repair loss based on incoming duplicate ACKs. If 3 310 duplicate ACKs are received, the sender uses it as an indication that 311 a segment has been lost, and will perform a retransmission of the 312 lost segment. 314 In TCP-Reno [RFC5681], after the fast retransmit of what appears to 315 be the lost segment, fast recovery is used to continue to transmit 316 new segments at a reduced rate ssthresh. 318 In the new congestion control algorithm, upon receiving duplicate 319 ACKs the fast retransmit and fast recovery follow the below rules: 321 o When a sender receives the first and second duplicate ACKs, same 322 as [RFC5681], the cwnd is not changed, and the sender continues to 323 send traffic. 325 o When a sender receives the third duplicated ACK, if the 326 retransmission timer has not expired and a previous OAM congestion 327 alarm has been received it is likely a segment is lost due to 328 congestion. The sender will perform a retransmission of the lost 329 segment, and the cwnd is set to be MinBandwidthWND. 331 o When a sender receives the third duplicated ACK, but no previous 332 OAM congestion alarm has been received, then it is considered that 333 a segment is lost due to random failure not congestion. In this 334 case the cwnd is not changed. 336 Compared to [RFC5681], where in case of network congestion the new 337 cwnd is set to be ssthresh, which is usually half of the old cwnd. 338 In this new congestion control, in case there is a segment loss 339 detected as described above, the new cwnd is set to be MinBandwithWND 340 as in equation (2). 342 4.5. Timeout 344 If a retransmission timer [RFC6298] in a TCP sender expires, in 345 bandwidth guaranteed networks no matter duplicate ACK received or 346 not, this most likely indicates a physical failure. 348 In this case, the cwnd is set to be one, and the TCP sender will 349 retransmit the lost segment. This packet also services the function 350 of probing network status. If there is really a network failure, no 351 ACK will be received and the retransmission timer will expire again. 352 Upon receiving an expected ACK after the retransmission, it means the 353 network has recovered, and the cwnd will be set to be MinBandwidthWND 354 as in equation (2). 356 4.6. Idle Recovery 358 It is defined in [RFC5681] that a TCP session should use slow start 359 to restart transmission after a long idle period more than one 360 retransmission timeout, and the RW (Restart Window) is the minimum of 361 IW and cwnd. 363 In this proposal, the same rule is still followed. However due to 364 the fact that there is no slow start needed in bandwidth guaranteed 365 networks, and the IW in this new congestion control is set to be 366 MinBandwidthWND, a TCP sender can start transmitting data at CIR rate 367 after a long idle. 369 5. IANA Considerations 371 NA. 373 6. Security Considerations 375 This proposal makes no change to the underlying security of TCP. 376 More information about TCP security concerns can be found in 377 [RFC5681]. 379 7. References 380 7.1. Normative References 382 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 383 Requirement Levels", BCP 14, RFC 2119, 384 DOI 10.17487/RFC2119, March 1997, 385 . 387 7.2. Informative References 389 [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S. 390 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 391 Functional Specification", RFC 2205, DOI 10.17487/RFC2205, 392 September 1997, . 394 [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's 395 Initial Window", RFC 3390, DOI 10.17487/RFC3390, October 396 2002, . 398 [RFC4080] Hancock, R., Karagiannis, G., Loughney, J., and S. Van den 399 Bosch, "Next Steps in Signaling (NSIS): Framework", 400 RFC 4080, DOI 10.17487/RFC4080, June 2005, 401 . 403 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 404 RFC 4960, DOI 10.17487/RFC4960, September 2007, 405 . 407 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 408 Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, 409 . 411 [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, 412 "Computing TCP's Retransmission Timer", RFC 6298, 413 DOI 10.17487/RFC6298, June 2011, 414 . 416 [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The 417 NewReno Modification to TCP's Fast Recovery Algorithm", 418 RFC 6582, DOI 10.17487/RFC6582, April 2012, 419 . 421 [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M., 422 and Y. Nishida, "A Conservative Loss Recovery Algorithm 423 Based on Selective Acknowledgment (SACK) for TCP", 424 RFC 6675, DOI 10.17487/RFC6675, August 2012, 425 . 427 [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and 428 R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", 429 RFC 8312, DOI 10.17487/RFC8312, February 2018, 430 . 432 [I-D.cardwell-iccrg-bbr-congestion-control] 433 Cardwell, N., Cheng, Y., Yeganeh, S., and V. Jacobson, 434 "BBR Congestion Control", draft-cardwell-iccrg-bbr- 435 congestion-control-00 (work in progress), July 2017. 437 [I-D.han-6man-in-band-signaling-for-transport-qos] 438 Han, L., Li, G., Tu, B., Xuefei, T., Li, F., Li, R., 439 Tantsura, J., and K. Smith, "IPv6 in-band signaling for 440 the support of transport with QoS", draft-han-6man-in- 441 band-signaling-for-transport-qos-00 (work in progress), 442 October 2017. 444 [I-D.ietf-ippm-ioam-data] 445 Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., 446 Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, 447 P., Chang, R., and d. daniel.bernier@bell.ca, "Data Fields 448 for In-situ OAM", draft-ietf-ippm-ioam-data-01 (work in 449 progress), October 2017. 451 Acknowledgments 453 The authors wish to thank xxxx for their helpful comments and 454 suggestions. 456 Authors' Addresses 458 Lin Han 459 Huawei 460 2330 Central Expressway 461 Santa Clara CA 95050 462 USA 464 EMail: lin.han@huawei.com 466 Yingzhen Qu 467 Huawei 468 2330 Central Expressway 469 Santa Clara CA 95050 470 USA 472 EMail: yingzhen.qu@huawei.com 474 Thomas Nadeau 475 Lucid Vision 476 Hampton NH 03842 477 USA 479 EMail: tnadeau@lucidvision.com